Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Download an entire website with wget, along with assets.
# One liner
wget --recursive --page-requisites --adjust-extension --span-hosts --convert-links --restrict-file-names=windows --domains yoursite.com --no-parent yoursite.com
# Explained
wget \
--recursive \ # Download the whole site.
--page-requisites \ # Get all assets/elements (CSS/JS/images).
--adjust-extension \ # Save files with .html on the end.
--span-hosts \ # Include necessary assets from offsite as well.
--convert-links \ # Update links to still work in the static version.
--restrict-file-names=windows \ # Modify filenames to work in Windows as well.
--domains yoursite.com \ # Do not follow links outside this domain.
--no-parent \ # Don't follow links outside the directory you pass in.
yoursite.com/whatever/path # The URL to download
@realowded

This comment has been minimized.

Copy link

realowded commented Aug 29, 2018

sudo apt-get update

@Celestine-Nelson

This comment has been minimized.

Copy link

Celestine-Nelson commented Jul 8, 2019

hello good afternoon...please still don't know how to use it...to download the entire website

@Veracious

This comment has been minimized.

Copy link

Veracious commented Aug 19, 2019

hello good afternoon...please still don't know how to use it...to download the entire website

This is just using wget, just look up how to use wget. There are tons of examples online.

Either way you need to make sure you have wget installed already:
debian:
sudo apt-get install wget

Centos/RHEL:
yum install wget

Here are some usage examples to download an entire site:
convert links for local viewing:
wget --mirror --convert-links --page-requisites ----no-parent -P /path/to/download/to https://example-domain.com

without converting:
wget --mirror --page-requisites ----no-parent -P /path/to/download/to https://example-domain.com

One more example to download an entire site with wget:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org

Explanation of the various flags:

--mirror – Makes (among other things) the download recursive.
--convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.
--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.
--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

Alternatively, the command above may be shortened:
wget -mkEpnp http://example.org

If you still insist on running this script, it is a BASH script so first set it as executable:
chmod u+x wget.sh

and then this to run the script:
./wget.sh

if you still can't run the script edit it by adding this as the first line:
#!/bin/sh

Also you need to specify the site in the script that you want to download. At this point you are really better off just using wget outright.

@vasili111

This comment has been minimized.

Copy link

vasili111 commented Nov 17, 2019

@Veracious

  1. What about --span-hosts ? Should I use it ?
  2. Why to use --mirror instead of --recursive ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.