Skip to content

Instantly share code, notes, and snippets.

@manifestuk
Last active January 14, 2022 04:15
  • Star 5 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save manifestuk/867191 to your computer and use it in GitHub Desktop.
Gist for retrieving a full website using wget, because I always forget the options.
#
# Explanation:
# `--adjust-extension`
# Add `.html` file extension to any files of type `application/xhtml + xml` or `text/html`.
# Add `.css` file extension to any files of type `text/css`.
#
# `--convert-links`
# Convert full links to relative.
#
# `--level=inf` (`-l inf`)
# Descend an infinite number of levels.
#
# `--mirror` (`-m`)
# Mirror the source (download only "changed" files, based on timestamp).
#
# `--no-parent` (`-np`)
# Do not ascend to the parent directory.
#
# `--page-requisities` (`-p`)
# Download any page prerequisites (images etc.).
#
# `--random-wait`
# Wait for (0.5 * `wait`) to (1.5 * `wait`) between requests.
#
# `--recursive` (`-r`)
# Recursively download the files.
#
# `--wait=1` (`-w 1`)
# Wait for 1 second between requests (randomised by `--random-wait`).
#
wget \
--adjust-extension \
--convert-links \
--level=inf \
--mirror \
--no-parent \
--page-requisites \
--random-wait \
--recursive \
--wait=1 \
http://example.com/
# The short version...
wget -E -k -l inf -m -np -p --random-wait -r -w 1 http://example.com/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment