Skip to content

Instantly share code, notes, and snippets.

@benjaminblack
Last active September 22, 2015 14:16
Show Gist options
  • Save benjaminblack/3958a1de0424a8949046 to your computer and use it in GitHub Desktop.
Save benjaminblack/3958a1de0424a8949046 to your computer and use it in GitHub Desktop.
Counting unique 404s in Apache log files
Tl;dr:
gunzip --to-stdout logfile-*.gz | grep " 404 " | cut -d " " -f 7 | sort | uniq -c | sort --numeric-sort --reverse > unique-404s.txt
Starting with a bunch of gzipped log files, like "logfile-*.gz".
First, uncompress to stdout:
gunzip --to-stdout logfile-*.gz
Reduce output to just 404s (the 6th field):
| grep " 404 "
Extract the request path:
| cut -d " " -f 7
Sort lexicographically, to group identical requests:
| sort
Reduce duplicate lines to one line, and prefix with duplicate count:
| uniq -c
Sort again, numerically and in reverse order, to produce a descending list of unique 404s:
| sort --numeric-sort --reverse
And finally, redirect to a file:
> unique-404s.txt
All together then:
gunzip --to-stdout logfile-*.gz | grep " 404 " | cut -d " " -f 7 | sort | uniq -c | sort --numeric-sort --reverse > unique-404s.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment