Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ansell/3f40f066b76a1df7ab849cc695bdfa91 to your computer and use it in GitHub Desktop.
Save ansell/3f40f066b76a1df7ab849cc695bdfa91 to your computer and use it in GitHub Desktop.
Shell one-liner to parse apache access logs and extract a unique URL list with hit count, querystring excluded.
cat access.log | awk -F\" '{print $2}' | awk '{print $2}' | sed '/^$/d' | sed 's/\?.*//g' | sort | uniq -c | sort -rn > url_hits.txt
cat access.log | awk -F\" '{print $2}' | awk '{print $2}' | sed '/^$/d' | sort | uniq -c | sort -rn > url_with_paths_hits.txt
cat access.log | awk -F\" '{print $2}' | awk '{print $2}' | sed '/^$/d' > raw_url_with_paths.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment