GNU’s wget
is a command line tool to download files over HTTP(S) and FTP.
While curl
is great to send custom requests, it lacks a recursive mode to download all the resources linked to a page or domain.
This is where wget
is much useful.
1. Copy a whole site locally, including images, css, js and converting links:
$ wget -p -m -k fullweb.io
2. Check for 404 links:
$ wget --spider your-url-list.txt
# Give it an HTML page with -F
$ wget --spider -F you-webpage.html
3. Accept (-A) or reject (-R) some files here only keeping images from the site:
$ wget -p -A png,jpg,jpeg,gif -R html,css,js wikipedia.org