Skip to content

Instantly share code, notes, and snippets.

@dale3h
Created June 29, 2018 15:47
Show Gist options
  • Save dale3h/660fe549df8232d1902f338e6d3b39ed to your computer and use it in GitHub Desktop.
Save dale3h/660fe549df8232d1902f338e6d3b39ed to your computer and use it in GitHub Desktop.
[fail2ban] Improved nginx-badbots
# Add to file: /etc/fail2ban/jail.local
[nginx-badbots]
enabled = true
port = http,https
filter = nginx-badbots
logpath = %(nginx_access_log)s
maxretry = 1
findtime = 604800
bantime = 604800
# Create as file: /etc/fail2ban/filter.d/nginx-badbots.conf
[Definition]
badbots = 360Spider|404checker|404enemy|80legs|Abonti|Aboundex|Acunetix|ADmantX|AfD-Verbotsverfahren|AhrefsBot|AIBOT|AiHitBot|Aipbot|Alexibot|Alligator|AllSubmitter|AlphaBot|Anarchie|Apexoo|ASPSeek|Asterias|Attach|autoemailspider|BackDoorBot|Backlink-Ceck|backlink-check|BacklinkCrawler|BackStreet|BackWeb|Badass|Bandit|Barkrowler|BatchFTP|Battleztar Bazinga|BBBike|BDCbot|BDFetch|BetaBot|Bigfoot|Bitacle|Blackboard|Black Hole|BlackWidow|BLEXBot|Blow|BlowFish|Boardreader|Bolt|BotALot|Brandprotect|Brandwatch|Bubing|Buddy|BuiltBotTough|BuiltWith|Bullseye|BunnySlippers|BuzzSumo|Calculon|CATExplorador|CazoodleBot|CCBot|Cegbfeieh|CheeseBot|CherryPicker|ChinaClaw|Chlooe|Claritybot|Cliqzbot|Cloud mapping|coccocbot-web|Cogentbot|cognitiveseo|Collector|com\.plumanalytics|Copier|CopyRightCheck|Copyscape|Cosmos|Craftbot|crawler4j|crawler\.feedback|CrazyWebCrawler|Crescent|CSHttp|Curious|Custo|DatabaseDriverMysqli|DataCha0s|DBLBot|demandbase-bot|Demon|Deusu|Devil|Digincore|DigitalPebble|DIIbot|Dirbuster|Disco|Discobot|Discoverybot|DittoSpyder|DnyzBot|DomainAppender|DomainCrawler|DomainSigmaCrawler|DomainStatsBot|Dotbot|Download Wonder|Dragonfly|Drip|DTS Agent|EasyDL|Ebingbong|eCatch|ECCP/1\.0|Ecxi|EirGrabber|EMail Siphon|EMail Wolf|EroCrawler|evc-batch|Evil|Exabot|Express WebPictures|ExtLinksBot|Extractor|ExtractorPro|Extreme Picture Finder|EyeNetIE|Ezooms|FDM|FemtosearchBot|FHscan|Fimap|Firefox/7\.0|FlashGet|Flunky|Foobot|Freeuploader|FrontPage|Fyrebot|GalaxyBot|Genieo|GermCrawler|Getintent|GetRight|GetWeb|Gigablast|Gigabot|G-i-g-a-b-o-t|Go-Ahead-Got-It|Gotit|GoZilla|Go!Zilla|Grabber|GrabNet|Grafula|GrapeFX|GrapeshotCrawler|GridBot|GT\:\:WWW|Haansoft|HaosouSpider|Harvest|Havij|HEADMasterSEO|Heritrix|Hloader|HMView|HTMLparser|HTTP\:\:Lite|HTTrack|Humanlinks|HybridBot|Iblog|IDBot|Id-search|IlseBot|Image Fetch|Image Sucker|IndeedBot|Indy Library|InfoNaviRobot|InfoTekies|instabid|Intelliseek|InterGET|Internet Ninja|InternetSeer|internetVista monitor|ips-agent|Iria|IRLbot|Iskanie|IstellaBot|JamesBOT|Jbrofuzz|JennyBot|JetCar|JikeSpider|JOC Web Spider|Joomla|Jorgee|JustView|Jyxobot|Kenjin Spider|Keyword Density|Kozmosbot|Lanshanbot|Larbin|LeechFTP|LeechGet|LexiBot|Lftp|LibWeb|Libwhisker|Lightspeedsystems|Likse|Linkdexbot|LinkextractorPro|LinkpadBot|LinkScan|LinksManager|LinkWalker|LinqiaMetadataDownloaderBot|LinqiaRSSBot|LinqiaScrapeBot|Lipperhey|Litemage_walker|Lmspider|LNSpiderguy|Ltx71|lwp-request|LWP\:\:Simple|lwp-trivial|Magnet|Mag-Net|magpie-crawler|Mail\.RU_Bot|Majestic12|MarkMonitor|MarkWatch|Masscan|Mass Downloader|Mata Hari|MauiBot|Meanpathbot|mediawords|MegaIndex\.ru|Metauri|MFC_Tear_Sample|Microsoft Data Access|Microsoft URL Control|MIDown tool|MIIxpc|Mister PiX|MJ12bot|Mojeek|Morfeus Fucking Scanner|Mr\.4x3|MSFrontPage|MSIECrawler|Msrabot|MS Web Services Client Protocol|muhstik-scan|Musobot|Name Intelligence|Nameprotect|Navroad|NearSite|Needle|Nessus|NetAnts|Netcraft|netEstate NE Crawler|NetLyzer|NetMechanic|NetSpider|Nettrack|Net Vampire|Netvibes|NetZIP|NextGenSearchBot|Nibbler|NICErsPRO|Niki-bot|Nikto|NimbleCrawler|Ninja|Nmap|NPbot|Nutch|oBot|Octopus|Offline Explorer|Offline Navigator|Openfind|OpenLinkProfiler|Openvas|OrangeBot|OrangeSpider|OutclicksBot|OutfoxBot|PageAnalyzer|Page Analyzer|PageGrabber|page scorer|PageScorer|Panscient|Papa Foto|Pavuk|pcBrowser|PECL\:\:HTTP|PeoplePal|PHPCrawl|Picscout|Picsearch|PictureFinder|Pimonster|Pi-Monster|Pixray|PleaseCrawl|plumanalytics|Pockey|POE-Component-Client-HTTP|Probethenet|ProPowerBot|ProWebWalker|Psbot|Pump|PxBroker|PyCurl|QueryN Metasearch|Quick-Crawler|RankActive|RankActiveLinkBot|RankFlex|RankingBot|RankingBot2|Rankivabot|RankurBot|RealDownload|Reaper|RebelMouse|Recorder|RedesScrapy|ReGet|RepoMonkey|Ripper|RocketCrawler|Rogerbot|SalesIntelligent|SBIder|ScanAlert|Scanbot|scan\.lol|Scrapy|Screaming|ScreenerBot|Searchestate|SearchmetricsBot|Semrush|SemrushBot|SEOkicks|SEOlyticsCrawler|Seomoz|SEOprofiler|seoscanners|SEOstats|sexsearcher|Seznam|SeznamBot|Shodan|Siphon|SISTRIX|Sitebeam|SiteExplorer|Siteimprove|SiteLockSpider|SiteSnagger|SiteSucker|Site Sucker|Sitevigil|Slackbot-LinkExpanding|SlySearch|SmartDownload|SMTBot|Snake|Snapbot|Snoopy|SocialRankIOBot|Sogou web spider|Sosospider|Sottopop|SpaceBison|Spammen|SpankBot|Spanner|Spbot|Spinn3r|SputnikBot|Sqlmap|Sqlworm|Sqworm|Steeler|Stripper|Sucker|Sucuri|SuperBot|SuperHTTP|Surfbot|SurveyBot|Suzuran|Swiftbot|sysscan|Szukacz|T0PHackTeam|T8Abot|tAkeOut|Teleport|TeleportPro|Telesoft|Telesphoreo|Telesphorep|The Intraformant|TheNomad|TightTwatBot|Titan|Toata|Toweyabot|Trendiction|Trendictionbot|trendiction\.com|trendiction\.de|True_Robot|Turingos|Turnitin|TurnitinBot|TwengaBot|Twice|Typhoeus|UnisterBot|URLy\.Warning|URLy Warning|Vacuum|Vagabondo|VB Project|VCI|VeriCiteCrawler|VidibleScraper|Virusdie|VoidEYE|Voil|Voltron|Wallpapers/3\.0|WallpapersHD|WASALive-Bot|WBSearchBot|Webalta|WebAuto|Web Auto|WebBandit|WebCollage|Web Collage|WebCopier|WEBDAV|WebEnhancer|Web Enhancer|WebFetch|Web Fetch|WebFuck|Web Fuck|WebGo IS|WebImageCollector|WebLeacher|WebmasterWorldForumBot|webmeup-crawler|WebPix|Web Pix|WebReaper|WebSauger|Web Sauger|Webshag|WebsiteExtractor|WebsiteQuester|Website Quester|Webster|WebStripper|WebSucker|Web Sucker|WebWhacker|WebZIP|WeSEE|Whack|Whacker|Whatweb|Who\.is Bot|Widow|WinHTTrack|WiseGuys Robot|WISENutbot|Wonderbot|Woobot|Wotbox|Wprecon|WPScan|WWW-Collector-E|WWW-Mechanize|WWW\:\:Mechanize|WWWOFFLE|x09Mozilla|x22Mozilla|Xaldon_WebSpider|Xaldon WebSpider|Xenu|xpymep1\.exe|YoudaoBot|Zade|Zauba|zauba\.io|Zermelo|Zeus|zgrab|Zitebot|ZmEu|ZumBot|ZyBorg
failregex = (?i)<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:%(badbots)s).*"$
ignoreregex =
@brendan-pike
Copy link

Thank you so much, my code skills are quite low sadly, would you be willing to update the nginx-badbots.conf example to include an exclussion list. I've found this badbot jail is highly effective but really needs an exemption list where you can add goodbots :)
For example:``````

/var/log/nginx/www.mysite.com_access.log:54.36.149.246 - - [11/Dec/2023:07:04:15 +1100] "GET /Interview HTTP/2.0" 200 10287 "-" "Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/site-audit)"
/var/log/nginx/www.mysite.com_access.log:54.36.149.246 - - [11/Dec/2023:07:05:00 +1100] "GET /support.guy/page HTTP/2.0" 200 10257 "-" "Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/site-audit)"

I'm not even sure why ahref is getting caught because I removed the AhrefsBot from the list but something is still capturing it.

@mkressel
Copy link

This hasn't been tested, but I would add a line like (adding all your user agent strings you wish to ignore):

goodbots = AhrefsSiteAudit|Microsoft-WebDAV-MiniRedir|AnotherUserAgent

And then ignoreregex would be something like:

ignoreregex = (?i)<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:%(goodbots)s).*"$

Which is just the same thing as failregex, but with goodbots instead of badbots.

Caveat: I haven't tested this, so please let me know if this works for you.

@brendan-pike
Copy link

Thanks, I'll test on one of my less critical servers and report back soon.

@brendan-pike
Copy link

I checked this logs today and found this,
2023-12-13 13:50:35,212 fail2ban.transmitter [92677]: WARNING Command ['status', 'nginx-badbots,'] has failed. Received UnknownJailException('nginx-badbots,')
Is there anything I can do to work out what this is about?

@GwynethLlewelyn
Copy link

@brendan-pike I wonder, is there an extra comma after nginx-badbots?...

@brendan-pike
Copy link

Checking back in again, no there is no commas in there.

I'm still trialing this goodbots setup and will let you know how it progresses, so far it seems to be working.
This got banned but I only had AhrefsSiteAudit in goobots, so I've added AhrefsBot as well now to see if that prevents it.

/var/log/nginx/mysite.com_access.log:54.36.149.63 - - [20/Dec/2023:13:17:49 +1100] "GET /robots.txt HTTP/2.0" 302 394 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment