Skip to content

Instantly share code, notes, and snippets.

@mathieu-aubin
Last active April 23, 2021 14:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mathieu-aubin/e78afae389fa0660a3135a02900cc028 to your computer and use it in GitHub Desktop.
Save mathieu-aubin/e78afae389fa0660a3135a02900cc028 to your computer and use it in GitHub Desktop.
Grab User-Agents from access logs with typical structure
#!/bin/bash
zcat /var/log/*/*access*.gz | grep -Ev '(robots|humans).txt|favicon' | awk -F\" '($2 ~ "^GET /"){print $6}' | \
grep -Eiv "^-$|^$|Wordpress|^WOW$|wget|^curl.+|WhatsApp|Twitter|TOBBOT|GoogleBot|AdsBot|Baidu|Crawl|TurnitinBot|random|knowledge|smurl|thither|urlcheck|Traackr|Spider|^Z$|Zoom|chr\(|test|scrapy|ruby|SafeDNS|Research|Whatweb|semrush|seobility|slack|scan|yahoo|requests|reqwest|queue|serende|yakuza|zmeu|zoxh|xenu|semantic|siri|tagvisit|wapp|p40|PHP|cfnetwork|Pattern|python|seeker|scamadviser|\\x|pinterest|Pocket|thumbor|photon|null|okhttp|panscient|pa11y|OnalyticaBot|fetch|my_linux|powered|node\.js|newspaper|zgrab|gnowit|gzip|lighthouse|Datanyze|7777|MAUI|J2ME|seznam|proxy|detection|libwww|survey|hakai|nmap|go-http-client|PROBT|Nimbostratus|^Mozilla\/[[:digit:]].[[:digit:]][[:digit:]]?$|5.01682558|Mozilla\/5.0 Mozilla\/5.0|Zend|user-agent|symfony|poster|Writter|Rome|project25499|SMTB|^Mozilla [[:digit:]].[[:digit:]]|^Mozilla\/[[:digit:]].[[:digit:]]$|^Mozilla$|WebDav|Ahrefs|aiHitBot|CloudFlare|Dataprovider|DotBot|DuckDuckGo|ips-agent|Encrypt|MJ12|kubernetes|^Java|BUFF|http-?client|\.[[:digit:]]\(|bingbot|hello|fasthttp|8legs|b2b|parser|^Dalvik|akari|^asa$|Rift|Nakuma|^\'|Dispatch|LMAO|uptimebot|uptimerobot|archive\.org|vuhuvBot|tweezler|NET CLR|MAARJS|git\/|whisperapp|Jorgee|ibrowse|Monit\/|DomainStatsBot|facebook|ia_archiver|Microsoft |Mediatoolkit|HttpComponents|akka-http|appengine|bitlybot|Faraday|Magic|Less\.Browser|http-kit|goose|adbeat_bot|bidswitchbot|AHC|Kerrigan|^js$|^cisco$|GigablastOpenSource|^got\/|Intently|InetURL|\[en\] \(|\(en-us;\)|Windows$|Prlog|Qwantify|redditbot|bot\/|vkShare|WebDataStats|Pandalytics|yeti|Powermarks|coccocbot|preview|Google-|Genieo|EECS|OffByOne|Webster|Daum|jersey|jetty|\.$|^foo$|^dw$| bot |deeris|DangDang|com\.google\.|Crazy|InfoPath| \[|winhttp|Siteimprove|SoftPAE|000000|PUTNIK|MS Web|Hotbar|T312461|\(compatible |GStreamer|Lavf|; ;|Video Get|Streamium|Technisat|sentry\.io|RealMedia|re-re\.ru|no-ua|NativeHost|ANNUAIREFRANCAIS|^Motorola$|chicken|WAP-Browser|DDG-Android|en-us, en|WebSauger|^WISE$|WLMHttpTransport|Woodstone|unirest-java|12345|Video ?Saver|Video ?Cache|\(compatible;?\)|Grabber|SEOkicks|LinkChecker|BadooBot|GroupHigh|Apple-PubSub|AppleSyndication|Mediapartners-Google|Monit|ShortLinkTranslate|Analyzer|^robots$|ZEEF|scraper|^NCO$|print\(|Adblock|addthis\.com|Bidtellect|Blackboard|^codekeepers$|^contype$|Blocker|fuelbot|gigabot|Iframely|izabee|jb0mber|datenbutler|Buzzwords|adroll|^Dark$|go http|Dahua|^ApiTool$|^COMODO|PhantomJS|^Solstice|^ds9|chatlyio|houzzbot|evc-batch|G-i-g-a-b-o-t|UCBrowser|TopTenNews|^(PC|MOBILE|win7ie80|webconfs.com)$|WbSrch|MttHD|bayAgent|gvfs|SeMob|vilook|go(wiki|colly|lang|gs)|Polyco|mail\.ru|http\.rb|(eanbit|og-sit|ookretr|3ik|ffnode|8ball\.me|4ce\.ca|speedtest|tools update|2kc|^direct$|a-anal|outub|ajoo|gplas|pokup|adersdo|turn-|teq\.queb)" | \
sort | uniq
@mathieu-aubin
Copy link
Author

can be run like so

bash <(curl -sLk https://4ce.ca/grabua)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment