Skip to content

Instantly share code, notes, and snippets.

@BartlomiejSkwira
Last active March 1, 2018 08:16
Show Gist options
  • Save BartlomiejSkwira/75f2f1629ac10acca6d3c70324853e00 to your computer and use it in GitHub Desktop.
Save BartlomiejSkwira/75f2f1629ac10acca6d3c70324853e00 to your computer and use it in GitHub Desktop.
Wget is a command-line utility that can retrieve all kinds of files over the HTTP and FTP protocols. Since websites are served through HTTP and most web media files are accessible through HTTP or FTP, this makes Wget an excellent tool for ripping websites.
While Wget is typically used to download single files, it can be used to recursively download all pages and files that are found through an initial page:
wget -r -p //www.makeuseof.com
However, some sites may detect and prevent what you’re trying to do because ripping a website can cost them a lot of bandwidth. To get around this, you can disguise yourself as a web browser with a user agent string:
wget -r -p -U Mozilla //www.makeuseof.com
If you want to be polite, you should also limit your download speed (so you don’t hog the web server’s bandwidth) and pause between each download (so you don’t overwhelm the web server with too many requests):
wget -r -p -U Mozilla --wait=10 --limit-rate=35K //www.makeuseof.com
Wget comes bundled with most Unix-based systems. On Mac, you can install Wget using a single Homebrew command: brew install wget (how to set up Homebrew on Mac). On Windows, you’ll need to use this ported version instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment