Skip to content

Instantly share code, notes, and snippets.

@Norcoen
Forked from pmeinhardt/download-site.md
Created March 19, 2021 00:31
Show Gist options
  • Save Norcoen/40d671dcd4e17f07a464ec2b8bf0fde6 to your computer and use it in GitHub Desktop.
Save Norcoen/40d671dcd4e17f07a464ec2b8bf0fde6 to your computer and use it in GitHub Desktop.
download an entire page (including css, js, images) for offline-reading, archiving… using wget

If you ever need to download an entire website, perhaps for off-line viewing, wget can do the job — for example:

$ wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains website.org --no-parent  www.website.org/tutorials/html/

This command downloads the website www.website.org/tutorials/html/.

The options are:

  • --recursive: download the entire website
  • --domains website.org: don't follow links outside website.org
  • --no-parent: don't follow links outside the directory tutorials/html/
  • --page-requisites: get all the elements that compose the page (images, css and so on)
  • --html-extension: save files with the .html extension
  • --convert-links: convert links so that they work locally, off-line
  • --restrict-file-names=windows: modify filenames so that they will work in Windows as well
  • --no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).

Source: http://www.linuxjournal.com/content/downloading-entire-web-site-wget

@Norcoen
Copy link
Author

Norcoen commented Mar 19, 2021

andre-integritas commented on 2 Aug 2016

I had to add "-e robots=off" for it to work on this specific site. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment