Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

File download #10052

Closed
ariya opened this issue Mar 1, 2011 · 159 comments
Closed

File download #10052

ariya opened this issue Mar 1, 2011 · 159 comments
Assignees
Labels

Comments

@ariya
Copy link
Owner

ariya commented Mar 1, 2011

[email protected] commented:

It would be good to accept (and save) 'Content-Disposition: attachment; filename=' content.

Disclaimer:
This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #52.
🌟   40 people had starred this issue at the time of migration.

@ariya
Copy link
Owner Author

ariya commented Mar 1, 2011

[email protected] commented:

This is again related to issue 41.

 
Metadata Updates

  • Milestone updated: FutureRelease (was: ---)

@ariya
Copy link
Owner Author

ariya commented Apr 26, 2011

[email protected] commented:

Issue 92 has been merged into this issue.

@btheado
Copy link

btheado commented Jun 23, 2011

[email protected] commented:

I'm trying to implement this functionality and not making much progress. Using the attached patch, I run:

$ bin/phantomjs examples/download.js

and get this output:

WebPage instantiated
WebPage instantiated
Download complete - fail

I added cout of "WebPage instantiated" (to verify my debug messages work as expected). I also added a cout in my downloadRequested slot. That one did not get displayed. Can someone spot what I'm doing wrong or let me know if I'm on the completely wrong track?

Here is where I found out about the downloadRequested signal: http://doc.qt.nokia.com/latest/qwebpage.html#downloadRequested

@btheado
Copy link

btheado commented Jun 23, 2011

[email protected] commented:

Whoops, here is the patch file attachment without the ANSI color codes

@n1k0
Copy link
Contributor

n1k0 commented Aug 16, 2011

[email protected] commented:

Any progress on this issue?

@ariya
Copy link
Owner Author

ariya commented Aug 16, 2011

[email protected] commented:

No progress as of now.

@n1k0
Copy link
Contributor

n1k0 commented Aug 16, 2011

[email protected] commented:

A friend of mine (http://svay.com/) just told me a nice trick for dealing around with this issue, using XHR within the page environment and base64 encoding to retrieve file contents and it works rather great. For the record you can find an example here: http://jsfiddle.net/3kUXy/

@ariya
Copy link
Owner Author

ariya commented Jul 27, 2012

[email protected] commented:

The URL to the file is not always known so XHR is not a general solution. For instance, if you are downloading a utility/bank/cc statement, you may have to click a link which will possibly execute some JS code and trigger another page load with a frame embedding the PDF. Or the statement comes in as an attachment.

What will it take to support the file download feature?

Requirement: Download files that come in embedded in the page/frame or as attachments. The URLs may or may not be known. Allow saving the files to the file system or "upload" them to a web server (so the server can save the files in a DB for instance).

@ariya
Copy link
Owner Author

ariya commented Aug 11, 2012

[email protected] commented:

I've got an early but functional version of this at

https://github.com/woodwardjd/phantomjs/tree/add_download_capabilities

Example:

var page = require('webpage').create();

page.onUnsupportedContentReceived = function(data) {
console.log('Got a download at url: ' + data.url);
page.saveUnsupportedContent('some.file.path', data.id);
phantom.exit();
}

page.open('http://some.pdf.url.com/some.pdf');

I call this "early but functional" because it works where I've tested it (linux, PDF downloads), but has a likely small memory leak, and I'm not 100% convinced the callback mechanism I used is idea.

Comments desired.

@ariya
Copy link
Owner Author

ariya commented Sep 1, 2012

[email protected] commented:

I've downloaded and built the git for above, but I can't seem to get the onUnsupportedContentReceived event to fire and calling saveUnsupportedContent throws an undefined error. Are there special build steps required to enable it?

Thanks,
Robert

@ariya
Copy link
Owner Author

ariya commented Sep 4, 2012

[email protected] commented:

No special build steps required, as far as I know. If
saveUnsupportedContent is undefined, maybe you haven't built the version in
the add_download_capabilities branch (git checkout
add_download_capabilities after the git clone)? Just speculating.

@chbrown
Copy link

chbrown commented Sep 4, 2012

[email protected] commented:

I second the XHR+base64 method. It takes another 50+ lines of code to send to page.evaluate(), and I have to de-base64 the content afterward, and that's basically how CasperJS does it (as far as I can tell from their code—they do a lot of weird (unnecessary, in my book) binding with window.utils in the page context).

I used this one (first answer):
http://stackoverflow.com/questions/7370943/retrieving-binary-file-content-using-javascript-base64-encode-it-and-reverse-de

It works great. Just be sure to try-catch the call to base64ArrayBuffer(), because Uint8Array(arrayBuffer) may throw an error, and check xhr.getHeader('content-type') == 'application/pdf' if you're doing pdf downloads like I was.

@subelsky
Copy link

subelsky commented Oct 4, 2012

[email protected] commented:

I need this as well. Can't use the XHR method because the inline attachments I need to scrape don't come with a URL I can hit.

@chbrown
Copy link

chbrown commented Oct 4, 2012

[email protected] commented:

Wouldn't inline attachments be even more easily downloaded? For an image:
var content = page.evaluate(function() {
return $('img#whatever').attr('src');
});
fs.write(yer_path, content, 'w');


Ariya, can you give some estimate of how long this feature (downloading a url) would take to implement? I'd love to get involved in PhantomJS development, but maybe this issue is a lot trickier than it sounds?

@subelsky
Copy link

subelsky commented Oct 5, 2012

[email protected] commented:

Sorry, I didn't mean to write "inline". The file I need is not an image and is not part of the DOM. It gets sent as a result of a POST with the Content-Disposition header 'attachment;filename="report.csv"'

@ariya
Copy link
Owner Author

ariya commented Nov 21, 2012

[email protected] commented:

Hi there. I think the base64-encoding solution can only be a stop-gap solution.

  • Downloading big files will probably exhaust memory and base64 encoding and -decoding it will use up resources that would have better been spent elsewhere - therefore we want to have the option to redirect a downloaded stream to file
  • We may have pages where we cannot control the loading of a file that is not supported (e.g. PDF)
  • We may want to save resources that have already been loaded as part of the page (e.g. images)

I think the optimal solution would be to add functionality to the onResourceReceived hook to allow setting up a "redirection" handler, and if such a handler is set, unsupported file formats should silently be downloaded. This handler could then have another onDownloadFinished hook to resume operation once the download is done.

@JamesMGreene
Copy link
Collaborator

[email protected] commented:

 

 
Metadata Updates

  • Label(s) removed:
    • Type-Defect
  • Label(s) added:
    • Type-Enhancement
  • Status updated: Accepted

@subelsky
Copy link

subelsky commented Apr 5, 2013

I'm interested in committing some of my company's resources to adding this feature. Is anyone already working on it? If so, could my company sponsor your work? If not, we can assign it to one of our own people. I just want to avoid duplicating anyone else's work.

@MichaelCation
Copy link

I'm also interested in helping with this feature. We're trying to capture an Acrobat file that is sent as a result of a POST with the Content-Disposition header 'attachment;filename="file.pdf"' Is anyone working on this? I don't want to duplicate effort. Ideally we want to access the functionality from CasperJS as well.

@maxcan
Copy link

maxcan commented Jun 9, 2013

any progress on this?

@extempore
Copy link

I'd love to see this fixed too. I saw @vitallium has a fork with download support, as well as a few other fixes.

https://github.com/Vitallium/phantomjs/tree/download-support

Anyone else able/available to help merge the new code? I wouldn't be doing anyone a favor if I messed with the C codebase. I wouldn't mind donating towards a bounty for this.

@vitallium
Copy link
Collaborator

This feature is under development. When it's ready, it'll be merged into the master tree.
I can't say when this feature will be ready.

@FergusNelson
Copy link

I'm also interested in this issue. Will we be able to render the pdf content as png / jpeg? Or is that altogether a different problem?

@chbrown
Copy link

chbrown commented Jun 21, 2013

@FergusNelson that's a different problem, but much more easily solved using ghostscript, X11, ImageMagick, etc.

@subelsky
Copy link

subelsky commented Jul 2, 2013

looks like @vitallium is pretty far along with an awesome solution in his download-support branch, described here: https://groups.google.com/forum/#!msg/phantomjs/JChUakj--24/epby47h3ZGAJ

@matthewlmcclure
Copy link

I see that there are at least two attempts to address this issue on GitHub. @woodwardjd's add_download_capabilities branch, and @vitallium's download-support branch. Is one of those a more promising path forward than the other? What work is outstanding before it would be ready to merge upstream?

@0o-de-lally
Copy link

@vitallium how close is this to being merged with the master?

@matthewlmcclure
Copy link

I rebased @vitallium's download-support branch on a recent master HEAD.

I've been exercising it with a happy path test case, and it seems to be working fine.

@ariya and @vitallium,

I'd like to continue the work that @vitallium started if there's more to do.

What do you think blocks merging this upstream?

@taiar
Copy link

taiar commented Dec 23, 2016

I built the linux-64 version of @SeNaP 's fork in this link: phantomjs.tar.gz

I'm trying to figure out how to use it with poltergeist (https://github.com/teampoltergeist/poltergeist) so I can download files with Capybara (https://github.com/teamcapybara/capybara). Any help is welcome.

@JohnBruce
Copy link

+1 😜

@mynetx
Copy link

mynetx commented Feb 16, 2017

Did someone manage to build a Mac version of @SeNaP ’s fork?

@AnaPana-zz
Copy link

+1 💯

@chadyred
Copy link

+1

@jomix
Copy link

jomix commented May 5, 2017

I built the osx binary of @SeNaP 's fork.
https://github.com/jomix/phantomjs/raw/2.1/bin/phantomjs
SHA1 bbecc70411c8094e95b7b8c6f3a1403cc7edc1e3
Enjoy.

@rakeshnambiar
Copy link

Has someone done it for Java-Selenium?

@ankitgr8
Copy link

ankitgr8 commented May 9, 2017 via email

@rakeshnambiar
Copy link

rakeshnambiar commented May 9, 2017

Hello @ankitgr8 ... I am looking for Windows.

Thanks

@musanas
Copy link

musanas commented Jun 2, 2017

Hi @ankitgr8
I am also looking for the same . Just would like to know how we can set the default download directory if we use your phantom js exe and download the file in java-selenium

Many Thanks
Musaffir

@ankitgr8
Copy link

ankitgr8 commented Jun 7, 2017

Here's the link to download the exe if any one required. https://github.com/ankitgr8/phantomjs2.0..
The above exe has the download capability.
IT run on windows 64 bit.

To set the default download.. U can create the download JS at runtime and set the download directory at that time.. below is the sample of the js code which can be create using JAVA .. CHECK for "downloadFileName"

BufferedWriter bos = new BufferedWriter(fos);
bos.append("var page = require('webpage').create(); ");
for(Cookie ck : webDriver.manage().getCookies()) {
bos.append(" phantom.addCookie({ name: '"+ck.getName()+"', value: '"+ck.getValue()+"', domain: '"+ck.getDomain()+"' }); ");
bos.newLine();
}

bos.append(" page.onFileDownload = function(status){console.log('onFileDownload(' + status + ')'); return '"+downloadFileName+"'; }");
bos.newLine();
bos.append(" page.onResourceReceived = function(status){console.log('onResourceReceived(' + status.stage + ')'); if(status.stage === 'end'){phantom.exit(1);}}");
bos.newLine();
bos.append(" page.onResourceRequested = function(status){console.log('onResourceRequested(' + status + ')'); }");
bos.newLine();
bos.append(" page.onFileDownloadError = function(status){console.log('onFileDownloadError(' + status + ')');phantom.exit(1);}");
bos.newLine();
bos.append(" page.onLoadStarted = function(status){console.log('onLoadStarted(' + status + ')');}");
bos.newLine();
bos.append(" page.onLoadFinished = function(status){console.log('onLoadFinished(' + status + ')');}");
bos.newLine();
bos.append(" page.open('"+downloadURL+"');");
bos.flush();
bos.close();

@rakeshnambiar
Copy link

@ankitgr8 .. Many thanks... Will come back to you in case I am facing any issue.

@rakeshnambiar
Copy link

@ankitgr8 .. I am working in Windows10. Do you know where is the PhantomJs default download directory?

@ankitgr8
Copy link

Not sure what is the path, but u can print the path in your js script, to get the details..

@geotheory
Copy link

geotheory commented Jun 16, 2017

If you hit this wall building on Macbook (Xtools):

Xcode not set up properly. You may need to confirm the license
   agreement by running /usr/bin/xcodebuild without arguments.
ERROR: Failed to build PhantomJS! Configuration of Qt Base failed.

First double-check you have full Xcode (not Command Line Tools version). Then if you fail to build the following should workaround:

cd /Applications/Xcode.app/Contents/Developer/usr/bin/
sudo ln -s xcodebuild xcrun

@yogeshmsharma
Copy link

I am also looking for setting up PhantomJs default download directory.
So when I click on element it should download in that directory. Please suggest

@ankitgr8
Copy link

ankitgr8 commented Dec 4, 2017 via email

@yogeshmsharma
Copy link

on both Windows as well as Linux.

@ghost ghost removed the In progress label Jan 10, 2018
@ghost ghost removed this from the FutureRelease milestone Jan 10, 2018
@ghost ghost removed 2.0 labels Jan 10, 2018
@ghost ghost removed New Feature labels Jan 29, 2018
@ariya
Copy link
Owner Author

ariya commented Dec 25, 2019

Due to our very limited maintenance capacity (see #14541 for more details), we need to prioritize our development focus on other tasks. Therefore, this issue will be automatically closed. In the future, if we see the need to attend to this issue again, then it will be reopened.
Thank you for your contribution!

@ariya ariya closed this as completed Dec 25, 2019
@kriegaex
Copy link

😱

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests