danflies/wget-404-finder.md

## wget-404-finder.md

      
    Raw
  

              wget-404-finder.md
            
          
    Found the resource on Created By Pete
Set Up

First, you’ll need to make sure you have Wget, on OS X you can just use Homebrew.
brew install wget

Command

wget --spider -o ~/wget.log -e robots=off -w 1 -r -p http://www.example.com

Breakdown

--spider, this tells Wget not to download anything since we only want a report so it will only do a HEAD request not a GET.
-o ~/wget.log, log messages to the declared file, in this case a file called wget.log that will be saved to your home directory, you can change this to a more convenient location and filename.
-e robots=off, this one tells wget to ignore the robots.txt file. Learn more about robots.txt.
-w 1, adds a 1 second wait between requests, this slows down Wget to more consistent rate to minimise stress on the hosting server so you don’t get back any false positives.
-r, this means recursive so Wget will keep trying to follow links deeper into your sites until it can find no more!
-p, get all page requisites such as images, etc. needed to display HTML page so we can find broken image links too.
http://www.example.com, finally the website url to start from.

Reading the Log

grep -B 2 '404' ~/wget.log

You'll get any references to pages that caused a 404 error. Now it won't show you the pages the link originated on, but it gives you a staring place. If you'd like to find other errors you can substitue 404 for 500 etc.
Here is the manual for wget