Skip to content

Instantly share code, notes, and snippets.

@dale3h
Created June 29, 2018 15:47
Show Gist options
  • Save dale3h/660fe549df8232d1902f338e6d3b39ed to your computer and use it in GitHub Desktop.
Save dale3h/660fe549df8232d1902f338e6d3b39ed to your computer and use it in GitHub Desktop.
[fail2ban] Improved nginx-badbots
# Add to file: /etc/fail2ban/jail.local
[nginx-badbots]
enabled = true
port = http,https
filter = nginx-badbots
logpath = %(nginx_access_log)s
maxretry = 1
findtime = 604800
bantime = 604800
# Create as file: /etc/fail2ban/filter.d/nginx-badbots.conf
[Definition]
badbots = 360Spider|404checker|404enemy|80legs|Abonti|Aboundex|Acunetix|ADmantX|AfD-Verbotsverfahren|AhrefsBot|AIBOT|AiHitBot|Aipbot|Alexibot|Alligator|AllSubmitter|AlphaBot|Anarchie|Apexoo|ASPSeek|Asterias|Attach|autoemailspider|BackDoorBot|Backlink-Ceck|backlink-check|BacklinkCrawler|BackStreet|BackWeb|Badass|Bandit|Barkrowler|BatchFTP|Battleztar Bazinga|BBBike|BDCbot|BDFetch|BetaBot|Bigfoot|Bitacle|Blackboard|Black Hole|BlackWidow|BLEXBot|Blow|BlowFish|Boardreader|Bolt|BotALot|Brandprotect|Brandwatch|Bubing|Buddy|BuiltBotTough|BuiltWith|Bullseye|BunnySlippers|BuzzSumo|Calculon|CATExplorador|CazoodleBot|CCBot|Cegbfeieh|CheeseBot|CherryPicker|ChinaClaw|Chlooe|Claritybot|Cliqzbot|Cloud mapping|coccocbot-web|Cogentbot|cognitiveseo|Collector|com\.plumanalytics|Copier|CopyRightCheck|Copyscape|Cosmos|Craftbot|crawler4j|crawler\.feedback|CrazyWebCrawler|Crescent|CSHttp|Curious|Custo|DatabaseDriverMysqli|DataCha0s|DBLBot|demandbase-bot|Demon|Deusu|Devil|Digincore|DigitalPebble|DIIbot|Dirbuster|Disco|Discobot|Discoverybot|DittoSpyder|DnyzBot|DomainAppender|DomainCrawler|DomainSigmaCrawler|DomainStatsBot|Dotbot|Download Wonder|Dragonfly|Drip|DTS Agent|EasyDL|Ebingbong|eCatch|ECCP/1\.0|Ecxi|EirGrabber|EMail Siphon|EMail Wolf|EroCrawler|evc-batch|Evil|Exabot|Express WebPictures|ExtLinksBot|Extractor|ExtractorPro|Extreme Picture Finder|EyeNetIE|Ezooms|FDM|FemtosearchBot|FHscan|Fimap|Firefox/7\.0|FlashGet|Flunky|Foobot|Freeuploader|FrontPage|Fyrebot|GalaxyBot|Genieo|GermCrawler|Getintent|GetRight|GetWeb|Gigablast|Gigabot|G-i-g-a-b-o-t|Go-Ahead-Got-It|Gotit|GoZilla|Go!Zilla|Grabber|GrabNet|Grafula|GrapeFX|GrapeshotCrawler|GridBot|GT\:\:WWW|Haansoft|HaosouSpider|Harvest|Havij|HEADMasterSEO|Heritrix|Hloader|HMView|HTMLparser|HTTP\:\:Lite|HTTrack|Humanlinks|HybridBot|Iblog|IDBot|Id-search|IlseBot|Image Fetch|Image Sucker|IndeedBot|Indy Library|InfoNaviRobot|InfoTekies|instabid|Intelliseek|InterGET|Internet Ninja|InternetSeer|internetVista monitor|ips-agent|Iria|IRLbot|Iskanie|IstellaBot|JamesBOT|Jbrofuzz|JennyBot|JetCar|JikeSpider|JOC Web Spider|Joomla|Jorgee|JustView|Jyxobot|Kenjin Spider|Keyword Density|Kozmosbot|Lanshanbot|Larbin|LeechFTP|LeechGet|LexiBot|Lftp|LibWeb|Libwhisker|Lightspeedsystems|Likse|Linkdexbot|LinkextractorPro|LinkpadBot|LinkScan|LinksManager|LinkWalker|LinqiaMetadataDownloaderBot|LinqiaRSSBot|LinqiaScrapeBot|Lipperhey|Litemage_walker|Lmspider|LNSpiderguy|Ltx71|lwp-request|LWP\:\:Simple|lwp-trivial|Magnet|Mag-Net|magpie-crawler|Mail\.RU_Bot|Majestic12|MarkMonitor|MarkWatch|Masscan|Mass Downloader|Mata Hari|MauiBot|Meanpathbot|mediawords|MegaIndex\.ru|Metauri|MFC_Tear_Sample|Microsoft Data Access|Microsoft URL Control|MIDown tool|MIIxpc|Mister PiX|MJ12bot|Mojeek|Morfeus Fucking Scanner|Mr\.4x3|MSFrontPage|MSIECrawler|Msrabot|MS Web Services Client Protocol|muhstik-scan|Musobot|Name Intelligence|Nameprotect|Navroad|NearSite|Needle|Nessus|NetAnts|Netcraft|netEstate NE Crawler|NetLyzer|NetMechanic|NetSpider|Nettrack|Net Vampire|Netvibes|NetZIP|NextGenSearchBot|Nibbler|NICErsPRO|Niki-bot|Nikto|NimbleCrawler|Ninja|Nmap|NPbot|Nutch|oBot|Octopus|Offline Explorer|Offline Navigator|Openfind|OpenLinkProfiler|Openvas|OrangeBot|OrangeSpider|OutclicksBot|OutfoxBot|PageAnalyzer|Page Analyzer|PageGrabber|page scorer|PageScorer|Panscient|Papa Foto|Pavuk|pcBrowser|PECL\:\:HTTP|PeoplePal|PHPCrawl|Picscout|Picsearch|PictureFinder|Pimonster|Pi-Monster|Pixray|PleaseCrawl|plumanalytics|Pockey|POE-Component-Client-HTTP|Probethenet|ProPowerBot|ProWebWalker|Psbot|Pump|PxBroker|PyCurl|QueryN Metasearch|Quick-Crawler|RankActive|RankActiveLinkBot|RankFlex|RankingBot|RankingBot2|Rankivabot|RankurBot|RealDownload|Reaper|RebelMouse|Recorder|RedesScrapy|ReGet|RepoMonkey|Ripper|RocketCrawler|Rogerbot|SalesIntelligent|SBIder|ScanAlert|Scanbot|scan\.lol|Scrapy|Screaming|ScreenerBot|Searchestate|SearchmetricsBot|Semrush|SemrushBot|SEOkicks|SEOlyticsCrawler|Seomoz|SEOprofiler|seoscanners|SEOstats|sexsearcher|Seznam|SeznamBot|Shodan|Siphon|SISTRIX|Sitebeam|SiteExplorer|Siteimprove|SiteLockSpider|SiteSnagger|SiteSucker|Site Sucker|Sitevigil|Slackbot-LinkExpanding|SlySearch|SmartDownload|SMTBot|Snake|Snapbot|Snoopy|SocialRankIOBot|Sogou web spider|Sosospider|Sottopop|SpaceBison|Spammen|SpankBot|Spanner|Spbot|Spinn3r|SputnikBot|Sqlmap|Sqlworm|Sqworm|Steeler|Stripper|Sucker|Sucuri|SuperBot|SuperHTTP|Surfbot|SurveyBot|Suzuran|Swiftbot|sysscan|Szukacz|T0PHackTeam|T8Abot|tAkeOut|Teleport|TeleportPro|Telesoft|Telesphoreo|Telesphorep|The Intraformant|TheNomad|TightTwatBot|Titan|Toata|Toweyabot|Trendiction|Trendictionbot|trendiction\.com|trendiction\.de|True_Robot|Turingos|Turnitin|TurnitinBot|TwengaBot|Twice|Typhoeus|UnisterBot|URLy\.Warning|URLy Warning|Vacuum|Vagabondo|VB Project|VCI|VeriCiteCrawler|VidibleScraper|Virusdie|VoidEYE|Voil|Voltron|Wallpapers/3\.0|WallpapersHD|WASALive-Bot|WBSearchBot|Webalta|WebAuto|Web Auto|WebBandit|WebCollage|Web Collage|WebCopier|WEBDAV|WebEnhancer|Web Enhancer|WebFetch|Web Fetch|WebFuck|Web Fuck|WebGo IS|WebImageCollector|WebLeacher|WebmasterWorldForumBot|webmeup-crawler|WebPix|Web Pix|WebReaper|WebSauger|Web Sauger|Webshag|WebsiteExtractor|WebsiteQuester|Website Quester|Webster|WebStripper|WebSucker|Web Sucker|WebWhacker|WebZIP|WeSEE|Whack|Whacker|Whatweb|Who\.is Bot|Widow|WinHTTrack|WiseGuys Robot|WISENutbot|Wonderbot|Woobot|Wotbox|Wprecon|WPScan|WWW-Collector-E|WWW-Mechanize|WWW\:\:Mechanize|WWWOFFLE|x09Mozilla|x22Mozilla|Xaldon_WebSpider|Xaldon WebSpider|Xenu|xpymep1\.exe|YoudaoBot|Zade|Zauba|zauba\.io|Zermelo|Zeus|zgrab|Zitebot|ZmEu|ZumBot|ZyBorg
failregex = (?i)<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:%(badbots)s).*"$
ignoreregex =
@mkressel
Copy link

mkressel commented Apr 2, 2020

Thanks for this fail2ban filter. One issue I found though is that the regex you have may block legitimate requests if the URL itself has one of the bot names within it. E.g.: "https://somedomain.com/How-Do-I-Attach-A-File/"

The "Attach" would match one of the bots called "Attach" and the fail2ban would be triggered, even though the User-Agent string may be valid. I think it might be better to only scan the User-Agent string. Along those lines, I modified your regex for fail2ban to be the following:

failregex = (?i)<HOST> -.*"(GET|POST|HEAD) (.*?)" \d+ \d+ "(.*?)" ".*(?:%(badbots)s).*"$

This will then allow URLs to include the bot names, but will block requests with the bot name in the User-Agent string. The regex may need tweaking to work in all cases, but so far in my tests it works pretty well.

@GwynethLlewelyn
Copy link

Thank you to both. I knew I had this script somewhere, but an unfortunate upgrade wiped it out of my disk, and I couldn't find a copy anywhere...

@mkressel
Copy link

Glad you found it useful!

@ozgurkazancci
Copy link

ozgurkazancci commented Aug 6, 2023

The "Attach" would match one of the bots called "Attach" and the fail2ban would be triggered, even though the User-Agent string may be valid. I think it might be better to only scan the User-Agent string. Along those lines, I modified your regex for fail2ban to be the following:

failregex = (?i)<HOST> -.*"(GET|POST|HEAD) (.*?)" \d+ \d+ "(.*?)" ".*(?:%(badbots)s).*"$

This will then allow URLs to include the bot names, but will block requests with the bot name in the User-Agent string. The regex may need tweaking to work in all cases, but so far in my tests it works pretty well.

Thank you very much for this, @mkressel
It seems the original poster didn't update the code (no idea why?) according your suggestion, but yours helped me a lot! Do you somehow have any further update/improvement on that line?

Many thanks.

@mkressel
Copy link

mkressel commented Aug 6, 2023

Happy it helped!

@brendan-pike
Copy link

I had a case today where it blocked a legitimate user, in the nginx log I think it was probably this that triggered it.
xx.xx.xx.xx - lno [05/Dec/2023:14:34:38 +1100] "PROPFIND /remote.php/dav/files/lnog HTTP/2.0" 207 603 "-" "Microsoft-WebDAV-MiniRedir/10.0.17763"
Is it possible the expression got only part of the agent, ie. Microsoft otherwise I have no idea why it banned it?

@mkressel
Copy link

mkressel commented Dec 7, 2023

Yes, the regular expression is a substring search, so WebDAV will match Microsoft-WebDAV-MiniRedir. You would have to add an exclusion to the regexp to get it to ignore that user agent.

@brendan-pike
Copy link

Thank you so much, my code skills are quite low sadly, would you be willing to update the nginx-badbots.conf example to include an exclussion list. I've found this badbot jail is highly effective but really needs an exemption list where you can add goodbots :)
For example:``````

/var/log/nginx/www.mysite.com_access.log:54.36.149.246 - - [11/Dec/2023:07:04:15 +1100] "GET /Interview HTTP/2.0" 200 10287 "-" "Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/site-audit)"
/var/log/nginx/www.mysite.com_access.log:54.36.149.246 - - [11/Dec/2023:07:05:00 +1100] "GET /support.guy/page HTTP/2.0" 200 10257 "-" "Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/site-audit)"

I'm not even sure why ahref is getting caught because I removed the AhrefsBot from the list but something is still capturing it.

@mkressel
Copy link

This hasn't been tested, but I would add a line like (adding all your user agent strings you wish to ignore):

goodbots = AhrefsSiteAudit|Microsoft-WebDAV-MiniRedir|AnotherUserAgent

And then ignoreregex would be something like:

ignoreregex = (?i)<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:%(goodbots)s).*"$

Which is just the same thing as failregex, but with goodbots instead of badbots.

Caveat: I haven't tested this, so please let me know if this works for you.

@brendan-pike
Copy link

Thanks, I'll test on one of my less critical servers and report back soon.

@brendan-pike
Copy link

I checked this logs today and found this,
2023-12-13 13:50:35,212 fail2ban.transmitter [92677]: WARNING Command ['status', 'nginx-badbots,'] has failed. Received UnknownJailException('nginx-badbots,')
Is there anything I can do to work out what this is about?

@GwynethLlewelyn
Copy link

@brendan-pike I wonder, is there an extra comma after nginx-badbots?...

@brendan-pike
Copy link

Checking back in again, no there is no commas in there.

I'm still trialing this goodbots setup and will let you know how it progresses, so far it seems to be working.
This got banned but I only had AhrefsSiteAudit in goobots, so I've added AhrefsBot as well now to see if that prevents it.

/var/log/nginx/mysite.com_access.log:54.36.149.63 - - [20/Dec/2023:13:17:49 +1100] "GET /robots.txt HTTP/2.0" 302 394 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment