Skip to content

Instantly share code, notes, and snippets.

@hsiboy
Last active July 26, 2023 20:57
Show Gist options
  • Star 18 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save hsiboy/c7512f6dca87d4bbea8f to your computer and use it in GitHub Desktop.
Save hsiboy/c7512f6dca87d4bbea8f to your computer and use it in GitHub Desktop.
Bot-Buster™ - Tracks nefarious activity on website, and manages accordingly.

Bot-Buster™

Tracks nefarious activity on website, and manages accordingly.

It's probably a bot.

If the requesting entity:

  • declares its user-agent as being wget, curl, webcopier etc - it's probably a bot.
  • requests details -> details -> details -> details ad nauseum - it's probably a bot.
  • requests the html, but not .css, .js or site furniture - it's probably a bot.
  • generates a large number of HTTP error codes > 400 (1.e 401, 403, 404 & 500)- it's probably a bot.
  • originates from an unlikely human traffic source (i.e Amazon AWS) - it's probably a bot.
  • no user-agent (or matching a pattern of known bad ones) - it's probably a bot.
  • no cookie, and wont honor a set cookie - it's probably a bot.
  • no referrer, ever - it's probably a bot.
  • sessions with a lot of hits. it's probably a bot.
  • requests with a missing referer. it's probably a bot.
  • requests with a missing sessionID. it's probably a bot.

Probable bots will be presented with a captcha type page. Humans can confirm their cognisance, bots will be trapped.

This will work at the top of the stack using the ZTM to "manage" the offender.

One more environment to consider: the corporate network.

likely to find many dozens or hundreds of users with the exact same OS, browser, plugins, fonts etc. IP addresses are likely to be the same if the users are behind a corporate firewall.

JavaScript Detection:

window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs
window.__phantomas //PhantomJS-based web perf metrics + monitoring tool 
window.Buffer //nodejs
window.emit //couchjs
window.spawn  //rhino
window.webdriver //selenium
window.domAutomation (or window.domAutomationController) //chromium based automation driver
if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }

Create fingerprint, and store forever:

Current bitmap

X-Bot X-BotBitMap Threat
1 0000000000000001 No Cookie
2 0000000000000010 No Referer
4 0000000000000100 Bad User Agent
8 0000000000001000 Unlikely Human Traffic Source (AWS, Azure, etc)
16 0000000000010000 Known "Evasively Tricky" Source Country

future bitmap:

X-Bot X-BotBitMap Threat
1 0000000000000001 No Cookie
2 0000000000000010 No Referrer
4 0000000000000100 User Agent Spoof (Headers dont match User-Agent String)
8 0000000000001000 Unlikely Human Traffic Source (AWS, Azure, etc)
16 0000000000010000 Known "Evasively Tricky" Source Country
32 0000000000100000 Unlikely Human Behaviour
64 0000000001000000 Browser Integrity (Not requesting furniture)
128 0000000010000000 Session Length Exceeded
256 0000000100000000 Pages Per Session Exceeded
512 0000001000000000 User Agent Spoof (Headers dont match User-Agent String)
1024 0000010000000000 Browser Integrity (Not requesting furniture)
2048 0000100000000000 Generates lots of errors (404s)
4096 0001000000000000 No JavaScript
8192 0010000000000000 JavaScript validation Failed
16384 0100000000000000 Fingerprint Validation Error
32768 1000000000000000 Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)

See it in action alt text

  • Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things
    • 100-continue
  • Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things
    • no-cache
    • Cache-Control
  • Enforce RFC 2965 sec 3.3.5 (Cookie2) and 9 (HISTORICAL)
  • SQL injection
    • ;DECLARE%20@
    • SELECT
    • SLEEP
    • -- (that’s two dashes)
    • @@VERSION
    • VARCHAR
    • CHAR
    • EXEC
    • EXECUTE
    • DECLARE
    • CAST
  • Range: field exists and begins with 0, real user-agents do not start ranges at 0
  • Content-Range is a response header, not a request header
  • Via pinappleproxy || Via PCNETSERVER || Via Invisiware
  • keep-alive and close are mutually exclusive
  • Close shouldn't appear twice
  • Keey-Alive shouldn't appear twice either
  • “Proxy-Connection” does not exist and should never be seen in the wild
  • Referrer, if it exists, it must not be blank, and it must contain the absolute URL.
#!/pseudo/code

$ua = $headers['User-Agent'];

//Referrer, if it exists, must contain a :
//While a relative URL is technically valid in Referrer, all known legit user-agents send an absolute URL

	if (strpos($headers['Referer'], ":") === FALSE) {
			return 400, "An invalid request was received from your browser. This may be caused by a malfunctioning proxy server or browser privacy software.";
		}

// Analyze user agents claiming to be msnbot
if ($ua="bingbot") || ($ua="msnbot") || ($ua="MS Search") {
CheckIp($headers['ip'], array["207.46.0.0/16", "65.52.0.0/14", "207.68.128.0/18", "207.68.192.0/20", "64.4.0.0/18", "157.54.0.0/15", "157.60.0.0/16", "157.56.0.0/14"]);
}

// Analyze user agents claiming to be google
if ($ua="Googlebot") || ($ua="Mediapartners-Google") || ($ua="Google Web Preview"){
CheckIp($headers['ip'], array["66.249.64.0/19", "64.233.160.0/19", "72.14.192.0/18", "203.208.32.0/19", "74.125.0.0/16", "216.239.32.0/19", "209.85.128.0/17"])
if ($headers['from']=="googlebot(at)googlebot.com" // google bot sends this
			}
// Analyze user agents claiming to be Yahoo
if ($ua="Yahoo! Slurp") || ($ua="Yahoo! SearchMonkey") {
CheckIp($headers['ip'], array["202.160.176.0/20", "67.195.0.0/16", "203.209.252.0/24", "72.30.0.0/16", "98.136.0.0/14", "74.6.0.0/16"])
            }

if ($ua~"MSIE") {
		if ($ua~"Opera") {
				// test Opera sent a "Accept" header.
				if ($headers['Accept']) { // looks like opera 
				return "human"
				}
		} else {
				// MSIE does NOT send "Windows ME" or "Windows XP" in the user agent
				if ($headers['User-Agent']="Windows ME") || ($headers['User-Agent']="Windows XP") || ($headers['User-Agent'] ="Windows 2000") || ($headers['User-Agent']="Win32") {
			//this MSIE is a bot
			return "bot"
			}
		} elseif ($ua~"Konqueror") !== FALSE) {
				// CafeKelsa appears to be a dev project at Yahoo which indexes job listings for
	            // Yahoo! HotJobs. It announces itself as Konqueror, so we skip these checks.
	    if (($headers['User-Agent']~"YahooSeeker/CafeKelsa") === FALSE || CheckIp($headers['ip'], "209.73.160.0/19") === FALSE) {
	        // if its a real browser it will send an Accept header
		if ($headers['Accept']) { return "human" }}
		} elseif ($ua~"Opera") !== FALSE) {
			// if its a real browser it will send an Accept header
			if ($headers['Accept']) { return "human" }
		} elseif ($ua~"Safari") !== FALSE) {
			// if its a real browser it will send an Accept header
			if ($headers['Accept']) { return "human" }
		} elseif ($ua~"Lynx") !== FALSE) {
			// if its a real browser it will send an Accept header
			if ($headers['Accept']) { return "human" }
		} elseif ($ua~"Mozilla") !== FALSE && (strpos($ua, "Mozilla") == 0) {
			if ($ua~"Google Desktop") === FALSE && ($ua~"PLAYSTATION 3") === FALSE) {
		   // if its a real browser it will send an Accept header
		if ($headers['Accept']) { return "human" }
	}
		}


sub isBadUserAgent($ua) {
    $BadUserAgents = [

"8484 Boston Project",
"; Widows",
"AddThis.com robot tech.support@clearspring.com",
"BOT/0.1 (BOT for JCE)",
"Bichoo Spider",
"BotBuster Bad Behavior Test",
"COMODOspider/Nutch-1.0",
"CherryPicker",
"ClickTale bot",
"ContextAd Bot 1.0",
"DTS Agent",
"Diamond",
"Digger",
"Domnutch-Bot/Nutch-1.0 (Domnutch; http://www.Nutch.de/)	Nutch-1.0",
"Email Extractor",
"Email Siphon",
"EmailCollector",
"EmailSiphon",
"Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)",
"FreeNutch/Nutch-1.2	Nutch-1.2",
"Fve Nutch Spider/Nutch-1.7",
"GMI sentiment crawler/Nutch-1.0 (GMI sentiment crawler; http://GMI.googlepages.com ; MyEmail)",
"Gecko/25",
"GeoHasher/Nutch-1.0 (GeoHasher Web Search Engine; geohasher.gotdns.org; geo_hasher at yahoo * com)",
"Google-HTTP-Java-Client/1.17.0-rc (gzip)",
"Halebot (Mozilla/5.0 compatible; Halebot/2.1; http://www.tacitknowledge.com/halebot/)",
"HttpProxy",
"ISC Systems iRc",
"Indy Library",
"Infoaxe./Nutch-0.9",
"Infoaxe./Nutch-1.0",
"Internet Explorer",
"Jakarta Commons",
"Java 1.",
"Java/1.",
"KSCrawler/Nutch-1.0 (http://www.kindsight.net/en/kscrawler; crawler@kindsight.net)",
"LWP",
"MJ12bot/v1.0.8",
"MSIE",
"Microsoft URL Control - 6.00.8862",
"Microsoft URL",
"Missigua",
"Movable Type",
"Mozilla/2",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)/Nutch-1.0",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://www.changedetection.com/bot.html )",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0 ; Claritybot)",
"Mozilla/4.0(",
"Mozilla/4.0+(compatible;+",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com)",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; )  Firefox/1.5.0.11; 360Spider",
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13  << seen from this ip 162.242.135.149",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36",
"Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)",
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)",
"Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)",
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
"Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)",
"Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); +http://www.exabot.com/go/robot)",
"Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)",
"Mozilla/5.0 (compatible; Ezooms/1.0; help@moz.com)",
"Mozilla/5.0 (compatible; Genieo/1.0 http://www.genieo.com/webfilter.html)",
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://import.io)",
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
"Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)",
"Mozilla/5.0 (compatible; LinkChecker/8.3; +http://wummel.github.com/linkchecker/)",
"Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)",
"Mozilla/5.0 (compatible; MJ12bot/v1.4.4; http://www.majestic12.co.uk/bot.php?+)",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)",
"Mozilla/5.0 (compatible; MojeekBot/0.6; http://www.mojeek.com/bot.html)",
"Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)",
"Mozilla/5.0 (compatible; SemrushBot/0.97; +http://www.semrush.com/bot.html)",
"Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)",
"Mozilla/5.0 (compatible; URLAppendBot/1.0; +http://www.profound.net/urlappendbot.html)",
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YoudaoBot/1.0; http://www.youdao.com/help/webmaster/spider/; )",
"Mozilla/5.0 (compatible; aiHitBot/2.8; +http://endb-consolidated.aihit.com/)",
"Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)",
"Mozilla/5.0 (compatible; linkCheck)",
"Mozilla/5.0 (compatible; linkdexbot/2.0; +http://www.linkdex.com/about/bots/)",
"Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)",
"Mozilla/5.0 (compatible; special_archiver/3.1.1 +http://www.archive.org/details/archive.org_bot)",
"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7/Nutch-1.0",
"Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)",
"Mozilla/5.0+(compatible;+PiplBot;++http://www.pipl.com/bot/)",
"Murzillo compatible",
"NIS Nutch Spider/Nutch-1.7	Spider/Nutch-1.7",
"Nutch Experimental Crawler/Nutch-1.4	Experimental",
"Nutch12/Nutch-1.2	Nutch-1.2",
"NutchCVS",
"Nutscrape/",
"OmniExplorer",
"POE-Component-Client",
"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)",
"PussyCat",
"PycURL",
"QuerySeekerSpider ( http://queryseeker.com/bot.html )",
"SMNutchSpider/Nutch-1.7",
"SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; http://boston.lti.cs.cmu.edu/crawler/; mhoy@cs.cmu.edu)	http://boston.lti.cs.cmu.edu/crawler/",
"Shockwave Flash",
"ShowyouBot (http://showyou.com/crawler)",
"Slurp/Nutch-1.0-dev (Slurp Search Engineer; http://www.google.com/bot.html; nutch-agent@lucene.apache.org)",
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
"Super Happy Fun",
"Test crawler Nutch/Nutch-1.0-dev (Nutch Test Project; changkuk@cmu.edu)	Nutch-1.0-dev",
"TrackBack/",
"Turing Machine",
"Twitterbot/1.0",
"User Agent:",
"User-Agent: Some-Agent/1.0",
"User-agent:",
"WIRE/0.22 (Linux; x86_64; Bot,Robot,Spider,Crawler)",
"WISEbot",
"WISEnutbot",
"WeSEE:Search/0.1 (Alpha, http://www.wesee.com/bot/)",
"WeSEE:Search/0.1 (Alpha, http://www.wesee.com/en/support/bot/)",
"WebSite-X Suite",
"WebaltBot",
"Windows NT 4.0;)",
"Windows NT 5.0;)",
"Windows NT 5.1;)",
"Windows XP 5",
"Winnie Poh",
"WordPress/4.0.1;",
"WordPress/4.01",
"Wordpress",
"Yahoo:LinkExpander:Slingstone",
"Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)",
"Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; http://zscho.de/ )",
"a href=",
"adidxbot/2.0 (+http://search.msn.com/msnbot.htm)",
"adwords",
"autoemailspider",
"bitlybot",
"blogsearchbot-martin",
"compatible ; MSIE",
"compatible-",
"core-project/",
"ecollector",
"grub crawler",
"grub-client",
"hanzoweb",
"larbin@unspecified",
"libwww-perl",
"libwww-perl/5.805",
"msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm)",
"msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)",
"msnbot/2.0b (+http://search.msn.com/msnbot.htm)",
"nutch-1.3/Nutch-1.3	Nutch-1.3",
"nutch-1.4/Nutch-1.4	Nutch-1.4",
"psbot-image (+http://www.picsearch.com/bot.html)",
"psbot/0.1 (+http://www.picsearch.com/bot.html)",
"psycheclone",
"research-scan-bot/Nutch-1.0",
"rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+shiny@moz.com)",
"spider",
"user",
"www.integromedb.org/Crawler",
""
]
    foreach ($UserAgent in $BadUserAgents)
        if (string($ua, $UserAgent)) return 1;
    return 0;
}

sub isUsefulUserAgent($ua) {
    $UsefulUserAgents = [
"AdsBot-Google (+http://www.google.com/adsbot.html)",
"AdsBot-Google-Mobile (+http://www.google.com/mobile/adsbot.html) Mozilla (iPhone; U; CPU iPhone OS 3 0 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile Safari",
"DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"Feedfetcher-Google;+(+http://www.google.com/feedfetcher.html;",
"GoogleProducer;+(+http://goo.gl/7y4SX)",
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)",
"Mobile for smartphones user-agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)",
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",
"Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
"facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
"ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)",


]
    foreach ($UserAgent in $UsefulUserAgents)
        if (string($ua, $UserAgent)) return 1;
    return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment