Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Download an entire website with wget, along with assets.
# One liner
wget --recursive --page-requisites --adjust-extension --span-hosts --convert-links --restrict-file-names=windows --domains yoursite.com --no-parent yoursite.com
# Explained
wget \
--recursive \ # Download the whole site.
--page-requisites \ # Get all assets/elements (CSS/JS/images).
--adjust-extension \ # Save files with .html on the end.
--span-hosts \ # Include necessary assets from offsite as well.
--convert-links \ # Update links to still work in the static version.
--restrict-file-names=windows \ # Modify filenames to work in Windows as well.
--domains yoursite.com \ # Do not follow links outside this domain.
--no-parent \ # Don't follow links outside the directory you pass in.
yoursite.com/whatever/path # The URL to download
@realowded

This comment has been minimized.

Copy link

@realowded realowded commented Aug 29, 2018

sudo apt-get update

@Celestine-Nelson

This comment has been minimized.

Copy link

@Celestine-Nelson Celestine-Nelson commented Jul 8, 2019

hello good afternoon...please still don't know how to use it...to download the entire website

@Veracious

This comment has been minimized.

Copy link

@Veracious Veracious commented Aug 19, 2019

hello good afternoon...please still don't know how to use it...to download the entire website

This is just using wget, just look up how to use wget. There are tons of examples online.

Either way you need to make sure you have wget installed already:
debian:
sudo apt-get install wget

Centos/RHEL:
yum install wget

Here are some usage examples to download an entire site:
convert links for local viewing:
wget --mirror --convert-links --page-requisites ----no-parent -P /path/to/download/to https://example-domain.com

without converting:
wget --mirror --page-requisites ----no-parent -P /path/to/download/to https://example-domain.com

One more example to download an entire site with wget:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org

Explanation of the various flags:

--mirror – Makes (among other things) the download recursive.
--convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.
--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.
--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

Alternatively, the command above may be shortened:
wget -mkEpnp http://example.org

If you still insist on running this script, it is a BASH script so first set it as executable:
chmod u+x wget.sh

and then this to run the script:
./wget.sh

if you still can't run the script edit it by adding this as the first line:
#!/bin/sh

Also you need to specify the site in the script that you want to download. At this point you are really better off just using wget outright.

@vasili111

This comment has been minimized.

Copy link

@vasili111 vasili111 commented Nov 17, 2019

@Veracious

  1. What about --span-hosts ? Should I use it ?
  2. Why to use --mirror instead of --recursive ?
@cdamken

This comment has been minimized.

Copy link

@cdamken cdamken commented Feb 17, 2020

2: ‘--mirror’
Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to ‘-r -N -l inf --no-remove-listing’.

@YubinXie

This comment has been minimized.

Copy link

@YubinXie YubinXie commented Apr 13, 2020

Thanks for the tips. After I download the website, every time I open the file, it links back to its original website. Any idea how to solve this? Thanks!

@tloudon

This comment has been minimized.

Copy link

@tloudon tloudon commented May 2, 2020

@vasili111

This comment has been minimized.

Copy link

@vasili111 vasili111 commented May 2, 2020

@YubinXie

Maybe you need --convert-links option?

@polly4you

This comment has been minimized.

Copy link

@polly4you polly4you commented Sep 16, 2020

H!, if I am wrong you can virtually shoot me, but the no-parent command is maybe hit by a typo because when I tried with ----no-parent it did not recognize the command but when I did some surgery I endid up with --no-parent and it worked so if I am right cool if I am wrong I am sorry:

YS: polly4you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.