-
-
Save dale3h/660fe549df8232d1902f338e6d3b39ed to your computer and use it in GitHub Desktop.
# Add to file: /etc/fail2ban/jail.local | |
[nginx-badbots] | |
enabled = true | |
port = http,https | |
filter = nginx-badbots | |
logpath = %(nginx_access_log)s | |
maxretry = 1 | |
findtime = 604800 | |
bantime = 604800 |
# Create as file: /etc/fail2ban/filter.d/nginx-badbots.conf | |
[Definition] | |
badbots = 360Spider|404checker|404enemy|80legs|Abonti|Aboundex|Acunetix|ADmantX|AfD-Verbotsverfahren|AhrefsBot|AIBOT|AiHitBot|Aipbot|Alexibot|Alligator|AllSubmitter|AlphaBot|Anarchie|Apexoo|ASPSeek|Asterias|Attach|autoemailspider|BackDoorBot|Backlink-Ceck|backlink-check|BacklinkCrawler|BackStreet|BackWeb|Badass|Bandit|Barkrowler|BatchFTP|Battleztar Bazinga|BBBike|BDCbot|BDFetch|BetaBot|Bigfoot|Bitacle|Blackboard|Black Hole|BlackWidow|BLEXBot|Blow|BlowFish|Boardreader|Bolt|BotALot|Brandprotect|Brandwatch|Bubing|Buddy|BuiltBotTough|BuiltWith|Bullseye|BunnySlippers|BuzzSumo|Calculon|CATExplorador|CazoodleBot|CCBot|Cegbfeieh|CheeseBot|CherryPicker|ChinaClaw|Chlooe|Claritybot|Cliqzbot|Cloud mapping|coccocbot-web|Cogentbot|cognitiveseo|Collector|com\.plumanalytics|Copier|CopyRightCheck|Copyscape|Cosmos|Craftbot|crawler4j|crawler\.feedback|CrazyWebCrawler|Crescent|CSHttp|Curious|Custo|DatabaseDriverMysqli|DataCha0s|DBLBot|demandbase-bot|Demon|Deusu|Devil|Digincore|DigitalPebble|DIIbot|Dirbuster|Disco|Discobot|Discoverybot|DittoSpyder|DnyzBot|DomainAppender|DomainCrawler|DomainSigmaCrawler|DomainStatsBot|Dotbot|Download Wonder|Dragonfly|Drip|DTS Agent|EasyDL|Ebingbong|eCatch|ECCP/1\.0|Ecxi|EirGrabber|EMail Siphon|EMail Wolf|EroCrawler|evc-batch|Evil|Exabot|Express WebPictures|ExtLinksBot|Extractor|ExtractorPro|Extreme Picture Finder|EyeNetIE|Ezooms|FDM|FemtosearchBot|FHscan|Fimap|Firefox/7\.0|FlashGet|Flunky|Foobot|Freeuploader|FrontPage|Fyrebot|GalaxyBot|Genieo|GermCrawler|Getintent|GetRight|GetWeb|Gigablast|Gigabot|G-i-g-a-b-o-t|Go-Ahead-Got-It|Gotit|GoZilla|Go!Zilla|Grabber|GrabNet|Grafula|GrapeFX|GrapeshotCrawler|GridBot|GT\:\:WWW|Haansoft|HaosouSpider|Harvest|Havij|HEADMasterSEO|Heritrix|Hloader|HMView|HTMLparser|HTTP\:\:Lite|HTTrack|Humanlinks|HybridBot|Iblog|IDBot|Id-search|IlseBot|Image Fetch|Image Sucker|IndeedBot|Indy Library|InfoNaviRobot|InfoTekies|instabid|Intelliseek|InterGET|Internet Ninja|InternetSeer|internetVista monitor|ips-agent|Iria|IRLbot|Iskanie|IstellaBot|JamesBOT|Jbrofuzz|JennyBot|JetCar|JikeSpider|JOC Web Spider|Joomla|Jorgee|JustView|Jyxobot|Kenjin Spider|Keyword Density|Kozmosbot|Lanshanbot|Larbin|LeechFTP|LeechGet|LexiBot|Lftp|LibWeb|Libwhisker|Lightspeedsystems|Likse|Linkdexbot|LinkextractorPro|LinkpadBot|LinkScan|LinksManager|LinkWalker|LinqiaMetadataDownloaderBot|LinqiaRSSBot|LinqiaScrapeBot|Lipperhey|Litemage_walker|Lmspider|LNSpiderguy|Ltx71|lwp-request|LWP\:\:Simple|lwp-trivial|Magnet|Mag-Net|magpie-crawler|Mail\.RU_Bot|Majestic12|MarkMonitor|MarkWatch|Masscan|Mass Downloader|Mata Hari|MauiBot|Meanpathbot|mediawords|MegaIndex\.ru|Metauri|MFC_Tear_Sample|Microsoft Data Access|Microsoft URL Control|MIDown tool|MIIxpc|Mister PiX|MJ12bot|Mojeek|Morfeus Fucking Scanner|Mr\.4x3|MSFrontPage|MSIECrawler|Msrabot|MS Web Services Client Protocol|muhstik-scan|Musobot|Name Intelligence|Nameprotect|Navroad|NearSite|Needle|Nessus|NetAnts|Netcraft|netEstate NE Crawler|NetLyzer|NetMechanic|NetSpider|Nettrack|Net Vampire|Netvibes|NetZIP|NextGenSearchBot|Nibbler|NICErsPRO|Niki-bot|Nikto|NimbleCrawler|Ninja|Nmap|NPbot|Nutch|oBot|Octopus|Offline Explorer|Offline Navigator|Openfind|OpenLinkProfiler|Openvas|OrangeBot|OrangeSpider|OutclicksBot|OutfoxBot|PageAnalyzer|Page Analyzer|PageGrabber|page scorer|PageScorer|Panscient|Papa Foto|Pavuk|pcBrowser|PECL\:\:HTTP|PeoplePal|PHPCrawl|Picscout|Picsearch|PictureFinder|Pimonster|Pi-Monster|Pixray|PleaseCrawl|plumanalytics|Pockey|POE-Component-Client-HTTP|Probethenet|ProPowerBot|ProWebWalker|Psbot|Pump|PxBroker|PyCurl|QueryN Metasearch|Quick-Crawler|RankActive|RankActiveLinkBot|RankFlex|RankingBot|RankingBot2|Rankivabot|RankurBot|RealDownload|Reaper|RebelMouse|Recorder|RedesScrapy|ReGet|RepoMonkey|Ripper|RocketCrawler|Rogerbot|SalesIntelligent|SBIder|ScanAlert|Scanbot|scan\.lol|Scrapy|Screaming|ScreenerBot|Searchestate|SearchmetricsBot|Semrush|SemrushBot|SEOkicks|SEOlyticsCrawler|Seomoz|SEOprofiler|seoscanners|SEOstats|sexsearcher|Seznam|SeznamBot|Shodan|Siphon|SISTRIX|Sitebeam|SiteExplorer|Siteimprove|SiteLockSpider|SiteSnagger|SiteSucker|Site Sucker|Sitevigil|Slackbot-LinkExpanding|SlySearch|SmartDownload|SMTBot|Snake|Snapbot|Snoopy|SocialRankIOBot|Sogou web spider|Sosospider|Sottopop|SpaceBison|Spammen|SpankBot|Spanner|Spbot|Spinn3r|SputnikBot|Sqlmap|Sqlworm|Sqworm|Steeler|Stripper|Sucker|Sucuri|SuperBot|SuperHTTP|Surfbot|SurveyBot|Suzuran|Swiftbot|sysscan|Szukacz|T0PHackTeam|T8Abot|tAkeOut|Teleport|TeleportPro|Telesoft|Telesphoreo|Telesphorep|The Intraformant|TheNomad|TightTwatBot|Titan|Toata|Toweyabot|Trendiction|Trendictionbot|trendiction\.com|trendiction\.de|True_Robot|Turingos|Turnitin|TurnitinBot|TwengaBot|Twice|Typhoeus|UnisterBot|URLy\.Warning|URLy Warning|Vacuum|Vagabondo|VB Project|VCI|VeriCiteCrawler|VidibleScraper|Virusdie|VoidEYE|Voil|Voltron|Wallpapers/3\.0|WallpapersHD|WASALive-Bot|WBSearchBot|Webalta|WebAuto|Web Auto|WebBandit|WebCollage|Web Collage|WebCopier|WEBDAV|WebEnhancer|Web Enhancer|WebFetch|Web Fetch|WebFuck|Web Fuck|WebGo IS|WebImageCollector|WebLeacher|WebmasterWorldForumBot|webmeup-crawler|WebPix|Web Pix|WebReaper|WebSauger|Web Sauger|Webshag|WebsiteExtractor|WebsiteQuester|Website Quester|Webster|WebStripper|WebSucker|Web Sucker|WebWhacker|WebZIP|WeSEE|Whack|Whacker|Whatweb|Who\.is Bot|Widow|WinHTTrack|WiseGuys Robot|WISENutbot|Wonderbot|Woobot|Wotbox|Wprecon|WPScan|WWW-Collector-E|WWW-Mechanize|WWW\:\:Mechanize|WWWOFFLE|x09Mozilla|x22Mozilla|Xaldon_WebSpider|Xaldon WebSpider|Xenu|xpymep1\.exe|YoudaoBot|Zade|Zauba|zauba\.io|Zermelo|Zeus|zgrab|Zitebot|ZmEu|ZumBot|ZyBorg | |
failregex = (?i)<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:%(badbots)s).*"$ | |
ignoreregex = |
Thank you to both. I knew I had this script somewhere, but an unfortunate upgrade wiped it out of my disk, and I couldn't find a copy anywhere...
Glad you found it useful!
The "Attach" would match one of the bots called "Attach" and the fail2ban would be triggered, even though the User-Agent string may be valid. I think it might be better to only scan the User-Agent string. Along those lines, I modified your regex for fail2ban to be the following:
failregex = (?i)<HOST> -.*"(GET|POST|HEAD) (.*?)" \d+ \d+ "(.*?)" ".*(?:%(badbots)s).*"$
This will then allow URLs to include the bot names, but will block requests with the bot name in the User-Agent string. The regex may need tweaking to work in all cases, but so far in my tests it works pretty well.
Thank you very much for this, @mkressel
It seems the original poster didn't update the code (no idea why?) according your suggestion, but yours helped me a lot! Do you somehow have any further update/improvement on that line?
Many thanks.
Happy it helped!
I had a case today where it blocked a legitimate user, in the nginx log I think it was probably this that triggered it.
xx.xx.xx.xx - lno [05/Dec/2023:14:34:38 +1100] "PROPFIND /remote.php/dav/files/lnog HTTP/2.0" 207 603 "-" "Microsoft-WebDAV-MiniRedir/10.0.17763"
Is it possible the expression got only part of the agent, ie. Microsoft otherwise I have no idea why it banned it?
Yes, the regular expression is a substring search, so WebDAV will match Microsoft-WebDAV-MiniRedir. You would have to add an exclusion to the regexp to get it to ignore that user agent.
Thank you so much, my code skills are quite low sadly, would you be willing to update the nginx-badbots.conf example to include an exclussion list. I've found this badbot jail is highly effective but really needs an exemption list where you can add goodbots :)
For example:``````
/var/log/nginx/www.mysite.com_access.log:54.36.149.246 - - [11/Dec/2023:07:04:15 +1100] "GET /Interview HTTP/2.0" 200 10287 "-" "Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/site-audit)"
/var/log/nginx/www.mysite.com_access.log:54.36.149.246 - - [11/Dec/2023:07:05:00 +1100] "GET /support.guy/page HTTP/2.0" 200 10257 "-" "Mozilla/5.0 (compatible; AhrefsSiteAudit/6.1; +http://ahrefs.com/robot/site-audit)"
I'm not even sure why ahref is getting caught because I removed the AhrefsBot from the list but something is still capturing it.
This hasn't been tested, but I would add a line like (adding all your user agent strings you wish to ignore):
goodbots = AhrefsSiteAudit|Microsoft-WebDAV-MiniRedir|AnotherUserAgent
And then ignoreregex would be something like:
ignoreregex = (?i)<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:%(goodbots)s).*"$
Which is just the same thing as failregex, but with goodbots
instead of badbots
.
Caveat: I haven't tested this, so please let me know if this works for you.
Thanks, I'll test on one of my less critical servers and report back soon.
I checked this logs today and found this,
2023-12-13 13:50:35,212 fail2ban.transmitter [92677]: WARNING Command ['status', 'nginx-badbots,'] has failed. Received UnknownJailException('nginx-badbots,')
Is there anything I can do to work out what this is about?
@brendan-pike I wonder, is there an extra comma after nginx-badbots
?...
Checking back in again, no there is no commas in there.
I'm still trialing this goodbots setup and will let you know how it progresses, so far it seems to be working.
This got banned but I only had AhrefsSiteAudit in goobots, so I've added AhrefsBot as well now to see if that prevents it.
/var/log/nginx/mysite.com_access.log:54.36.149.63 - - [20/Dec/2023:13:17:49 +1100] "GET /robots.txt HTTP/2.0" 302 394 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)"
Thanks for this fail2ban filter. One issue I found though is that the regex you have may block legitimate requests if the URL itself has one of the bot names within it. E.g.: "https://somedomain.com/How-Do-I-Attach-A-File/"
The "Attach" would match one of the bots called "Attach" and the fail2ban would be triggered, even though the User-Agent string may be valid. I think it might be better to only scan the User-Agent string. Along those lines, I modified your regex for fail2ban to be the following:
failregex = (?i)<HOST> -.*"(GET|POST|HEAD) (.*?)" \d+ \d+ "(.*?)" ".*(?:%(badbots)s).*"$
This will then allow URLs to include the bot names, but will block requests with the bot name in the User-Agent string. The regex may need tweaking to work in all cases, but so far in my tests it works pretty well.