Skip to content

Instantly share code, notes, and snippets.

@enoch85
Created October 3, 2015 23:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save enoch85/92ab7cbf846508d8a048 to your computer and use it in GitHub Desktop.
Save enoch85/92ab7cbf846508d8a048 to your computer and use it in GitHub Desktop.
#!/bin/bash
#
# Tech and Me 2015 - https://www.en0ch.se
visitor="/var/log/nginx/access.log"
lynx -source http://www.reportsfromearth.com/1140/names-best-known-bots-spiders-crawlers-visiting-website-2014/|grep "<td>"|grep -vE '("<td>"|><)'|sed 's/<td>//g'|sed 's|</td>||g'|sort|uniq|awk 'length >= 4'|grep -ivE '(:|gif|wget|xget)'>.rr_tmp
bnames=$(wc -l .rr_tmp|awk '{print $1}');echo "found ${bnames} known crawlers"
grep -ivf .rr_tmp ${visitor}>.rr1_tmp
rnames=$(wc -l .rr1_tmp|awk '{print $1}');echo "found ${rnames} matching log entries"
unames=$(awk '{print $1}' .rr1_tmp|sort|uniq|wc -l|awk '{print $1}')
echo "total ${unames} unique hits"
rm -f .rr_tmp .rr1_tmp
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment