Skip to content

Instantly share code, notes, and snippets.

@flatlinebb
Created March 13, 2019 16:09
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save flatlinebb/88bdb5225ada1781ae3db510c7ad55e1 to your computer and use it in GitHub Desktop.
Save flatlinebb/88bdb5225ada1781ae3db510c7ad55e1 to your computer and use it in GitHub Desktop.
There is no better utility than wget to recursively download interesting files from the depths of the internet. I will show you why that is the case. From: https://blog.sleeplessbeastie.eu/2017/02/06/how-to-download-files-recursively/
Simply download files recursively. Note, that default maximum depth is set to 5.
$ wget --recursive https://example.org/open-directory/
Download files recursively using defined maximum recursion depth level. It is important to remember that level 0 is equivalent to inf infinite recursion.
$ wget --recursive --level 1 https://example.org/files/presentation/
Download files recursively and specify directory prefix. If not specified then by default files are stored in the current directory.
$ wget --recursive --directory-prefix=/tmp/wget/ https://example.org/open-directory/
Download files recursively but do not ascend to the parent directory.
$ wget --recursive --no-parent https://example.org/files/presentation/
Download files recursively, do not ascend to the parent directory and define user-agent header field if you need to circumvent this security measure.
$ wget --recursive --no-parent --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0" https://example.org/files/presentation/
Download files recursively, do not ascend to the parent directory and reject index.html files.
$ wget --recursive --no-parent --reject "index.html*" https://example.org/files/presentation/
Download files recursively, do not ascend to the parent directory and accept only PDF files.
$ wget --recursive --no-parent --accept "*.pdf" https://example.org/files/presentation/
Download files recursively but ignore robots.txt file as it sometimes gets in the way.
$ wget --recursive --execute robots=off https://example.org/
Download files recursively, do not ascend to the parent directory and wait around 10 seconds (0.5 and 1.5 * wait seconds) between requests.
$ wget --recursive --no-parent --wait 10 --random-wait https://example.org/files/presentation/
Download files recursively but limit the retrieval rate to 250KB/s.
$ wget --recursive --limit-rate=250k https://example.org/files/
Download files recursively, do not ascend to the parent directory, accept only PDF and PNG files but do not create any directories. Every downloaded file will be stored in current directory.
$ wget --recursive --no-parent --accept "*.pdf,*.png" --no-directories https://example.org/files/presentation/
Download files recursively but do not create example.org host-prefixed directory.
$ wget --recursive --no-host-directories https://example.org/files/
Download files recursively using defined username and password.
$ wget --recursive --user="username" --password="password" https://example.org/
Download files recursively, do not ascend to the parent directory, do not create host-prefixed directory and ignore two directory components. It will store first-presentation> directory with downloaded conent.
$ wget --recursive --no-parent --no-host-directories --cut-dirs=2 https://example.org/files/presentation/first-presentation/
Download files recursively using only IPv4 or IPv6 addresses.
$ wget --recursive --inet4-only https://example.org/notes.html
$ wget --recursive --inet6-only https://example.org/notes.html
Continue download started by a previous instance of wget (continue retrieval from an offset equal to the length of the local file).
$ wget --recursive --continue https://example.org/notes.html
Continue download started by a previous instance of wget (skip files that already exist).
$ wget --recursive --no-clobber https://example.org/notes.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment