Skip to content

Instantly share code, notes, and snippets.

@Boldewyn
Created August 28, 2013 09:27
Show Gist options
  • Save Boldewyn/6364080 to your computer and use it in GitHub Desktop.
Save Boldewyn/6364080 to your computer and use it in GitHub Desktop.
Use bash-foo to print all Apache log 404s in the order how often they appeared
#!/bin/bash
LOGFOLDER=/var/log/apache2
FIELD=7
HTTP_CODE=404
LOG_BASENAME=access.log
cd "$LOGFOLDER"
# print all zipped logfiles (suppressing errors)
zcat "$LOG_BASENAME".*.gz 2>/dev/null | \
# print current logfile plus the previously zipped ones
cat "$LOG_BASENAME" - | \
# search for matching errors
grep 'HTTP/1\.." '"$HTTP_CODE" | \
# print the field containing the requested URL
awk '{ print $'"$FIELD"'; }' | \
# prepend each line with how often it already occured
# @see <http://stackoverflow.com/a/15389597/113195>
awk '{if(order[$1]==0)order[$1]=++counter;print order[$1]" "$0;}' | \
# sort by this number
sort -n | \
# and reverse order
tac | \
# remove every line, whose URL had already appeared
awk '{if(order[$2]==0){order[$2]=1;print $0;}}' | \
# and sort again, so that the oftenmost URLs land at the bottom
sort -n
@Boldewyn
Copy link
Author

The response looks something like this:

17 /missing.image
22 /moved-article
293 /
691 /wp-login

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment