Skip to content

Instantly share code, notes, and snippets.

@giocomai
Created May 1, 2015 15:00
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save giocomai/247d54e097b5083e2451 to your computer and use it in GitHub Desktop.
Save giocomai/247d54e097b5083e2451 to your computer and use it in GitHub Desktop.
Download a webpage with phantomjs from the command line. This allows to wait for javascript to be processed before saving the page, which cannot be achieved with wget. Download SaveWebpage.js, and then, from the terminal, run: phantomjs SaveWebpage.js URL nameOfSavedFile
var system = require('system');
var page = require('webpage').create();
var url = system.args[1];
var destination = system.args[2];
page.settings.resourceTimeout = 10000;
setTimeout(function(){
setInterval(function () {
var fs = require('fs');
var page = require('webpage').create();
page.open(url, function () {
console.log(page.content);
try {
fs.write(destination, page.content, 'w');
} catch(e) {
console.log(e);
}
phantom.exit();
});
}, 20000);
}, 1);
@giocomai
Copy link
Author

giocomai commented May 1, 2015

this timeouts after 20 seconds (it's the 20000 in line 22, time expressed in milliseconds), and saves whatever part of the page it managed to read until that moment (if there was something blocking it in the middle). The structure is more complicated than should be required, but it's because it includes the workaround to make sure the timeout actually works as suggested here: ariya/phantomjs#10832 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment