Found the resource on Created By Pete
First, you’ll need to make sure you have Wget, on OS X you can just use Homebrew.
brew install wget
wget --spider -o ~/wget.log -e robots=off -w 1 -r -p http://www.example.com
Breakdown
--spider
, this tells Wget not to download anything since we only want a report so it will only do a HEAD request not a GET.-o ~/wget.log
, log messages to the declared file, in this case a file called wget.log that will be saved to your home directory, you can change this to a more convenient location and filename.-e robots=off
, this one tells wget to ignore the robots.txt file. Learn more about robots.txt.-w 1
, adds a 1 second wait between requests, this slows down Wget to more consistent rate to minimise stress on the hosting server so you don’t get back any false positives.-r
, this means recursive so Wget will keep trying to follow links deeper into your sites until it can find no more!-p
, get all page requisites such as images, etc. needed to display HTML page so we can find broken image links too.http://www.example.com
, finally the website url to start from.
grep -B 2 '404' ~/wget.log
You'll get any references to pages that caused a 404 error. Now it won't show you the pages the link originated on, but it gives you a staring place. If you'd like to find other errors you can substitue 404
for 500
etc.
Here is the manual for wget