Skip to content

Instantly share code, notes, and snippets.

@ttlequals0
Created January 31, 2017 01:05
Show Gist options
  • Save ttlequals0/7104caf5af92fec7323ac63a95cfdabb to your computer and use it in GitHub Desktop.
Save ttlequals0/7104caf5af92fec7323ac63a95cfdabb to your computer and use it in GitHub Desktop.
Scrape squid access logs for top requested sites.
awk '{gsub(/:443/," ",$7);print $7 }' <(find '/var/log/squid3/' -type f -name 'access.log.*.gz' | xargs sudo zcat ) <(find '/var/log/squid3/' -type f -name 'access.log*' -not -name '*.gz' | xargs sudo cat ) | cut -d '/' -f 3| sort | uniq -c |sort -nr
Output:
412451 c.signalsciences.net
206680 sigsci-agent-wafconf.s3.amazonaws.com
11483 us.archive.ubuntu.com
5659 api-62638203.duosecurity.com
3231 security.ubuntu.com
1904 archive.ubuntu.com
114 repo.mysql.com
74 api.rubygems.org
68 changelogs.ubuntu.com
55 pypi.python.org
44 repo.percona.com
24 packagecloud-repositories.s3.dualstack.us-west-1.amazonaws.com
24 apt.signalsciences.net
12 rubygems.org
5 pypi.python.org
4 nginx.org
4 github.com
3 google.com
2 mirror.cs.pitt.edu
2 mirror.cogentco.com
2 mirror.centos.org
2 facebook.com
1 www.google.com
1 www.espn.com
1 proxy.google.com
1 mirror.metrocast.net
1 mirrorlist.centos.org
1 download.opensuse.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment