Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
NGINX to block bad bots. (add Twenga|TwengaBot if you want to exclude them too)
if ($http_user_agent ~* (360Spider|80legs.com|Abonti|AcoonBot|Acunetix|adbeat_bot|AddThis.com|adidxbot|ADmantX|AhrefsBot|AngloINFO|Antelope|Applebot|BaiduSpider|BeetleBot|billigerbot|binlar|bitlybot|BlackWidow|BLP_bbot|BoardReader|Bolt\ 0|BOT\ for\ JCE|Bot\ mailto\:craftbot@yahoo\.com|casper|CazoodleBot|CCBot|checkprivacy|ChinaClaw|chromeframe|Clerkbot|Cliqzbot|clshttp|CommonCrawler|comodo|CPython|crawler4j|Crawlera|CRAZYWEBCRAWLER|Curious|Curl|Custo|CWS_proxy|Default\ Browser\ 0|diavol|DigExt|Digincore|DIIbot|discobot|DISCo|DoCoMo|DotBot|Download\ Demon|DTS.Agent|EasouSpider|eCatch|ecxi|EirGrabber|Elmer|EmailCollector|EmailSiphon|EmailWolf|Exabot|ExaleadCloudView|ExpertSearchSpider|ExpertSearch|Express\ WebPictures|ExtractorPro|extract|EyeNetIE|Ezooms|F2S|FastSeek|feedfinder|FeedlyBot|FHscan|finbot|Flamingo_SearchEngine|FlappyBot|FlashGet|flicky|Flipboard|g00g1e|Genieo|genieo|GetRight|GetWeb\!|GigablastOpenSource|GozaikBot|Go\!Zilla|Go\-Ahead\-Got\-It|GrabNet|grab|Grafula|GrapeshotCrawler|GTB5|GT\:\:WWW|Guzzle|harvest|heritrix|HMView|HomePageBot|HTTP\:\:Lite|HTTrack|HubSpot|ia_archiver|icarus6|IDBot|id\-search|IlseBot|Image\ Stripper|Image\ Sucker|Indigonet|Indy\ Library|integromedb|InterGET|InternetSeer\.com|Internet\ Ninja|IRLbot|ISC\ Systems\ iRc\ Search\ 2\.1|jakarta|Java|JetCar|JobdiggerSpider|JOC\ Web\ Spider|Jooblebot|kanagawa|KINGSpider|kmccrew|larbin|LeechFTP|libwww|Lingewoud|LinkChecker|linkdexbot|LinksCrawler|LinksManager\.com_bot|linkwalker|LinqiaRSSBot|LivelapBot|ltx71|LubbersBot|lwp\-trivial|Mail.RU_Bot|masscan|Mass\ Downloader|maverick|Maxthon$|Mediatoolkitbot|MegaIndex|MegaIndex|megaindex|MFC_Tear_Sample|Microsoft\ URL\ Control|microsoft\.url|MIDown\ tool|miner|Missigua\ Locator|Mister\ PiX|mj12bot|Mozilla.*Indy|Mozilla.*NEWT|MSFrontPage|msnbot|Navroad|NearSite|NetAnts|netEstate|NetSpider|NetZIP|Net\ Vampire|NextGenSearchBot|nutch|Octopus|Offline\ Explorer|Offline\ Navigator|OpenindexSpider|OpenWebSpider|OrangeBot|Owlin|PageGrabber|PagesInventory|panopta|panscient\.com|Papa\ Foto|pavuk|pcBrowser|PECL\:\:HTTP|PeoplePal|Photon|PHPCrawl|planetwork|PleaseCrawl|PNAMAIN.EXE|PodcastPartyBot|prijsbest|proximic|psbot|purebot|pycurl|QuerySeekerSpider|R6_CommentReader|R6_FeedFetcher|RealDownload|ReGet|Riddler|Rippers\ 0|rogerbot|RSSingBot|rv\:1.9.1|RyzeCrawler|SafeSearch|SBIder|Scrapy|Scrapy|Screaming|SeaMonkey$|search.goo.ne.jp|SearchmetricsBot|search_robot|SemrushBot|Semrush|SentiBot|SEOkicks|SeznamBot|ShowyouBot|SightupBot|SISTRIX|sitecheck\.internetseer\.com|siteexplorer.info|SiteSnagger|skygrid|Slackbot|Slurp|SmartDownload|Snoopy|Sogou|Sosospider|spaumbot|Steeler|sucker|SuperBot|Superfeedr|SuperHTTP|SurdotlyBot|Surfbot|tAkeOut|Teleport\ Pro|TinEye-bot|TinEye|Toata\ dragostea\ mea\ pentru\ diavola|Toplistbot|trendictionbot|TurnitinBot|turnit|Twitterbot|URI\:\:Fetch|urllib|Vagabondo|Vagabondo|vikspider|VoidEYE|VoilaBot|WBSearchBot|webalta|WebAuto|WebBandit|WebCollage|WebCopier|WebFetch|WebGo\ IS|WebLeacher|WebReaper|WebSauger|Website\ eXtractor|Website\ Quester|WebStripper|WebWhacker|WebZIP|Web\ Image\ Collector|Web\ Sucker|Wells\ Search\ II|WEP\ Search|WeSEE|Wget|Widow|WinInet|woobot|woopingbot|worldwebheritage.org|Wotbox|WPScan|WWWOFFLE|WWW\-Mechanize|Xaldon\ WebSpider|XoviBot|yacybot|Yahoo|YandexBot|Yandex|YisouSpider|zermelo|Zeus|zh-CN|ZmEu|ZumBot|ZyBorg) ) {
return 410;
}
@philippeowagner

This comment has been minimized.

Copy link

@philippeowagner philippeowagner commented Sep 19, 2016

Thanks for sharing this. Works like a charm but I would suggest to use HTTP 444 instead of 410.

@kenguish

This comment has been minimized.

Copy link

@kenguish kenguish commented Nov 25, 2016

Thanks. You might want to add libcurl and libwww-perl too.

@extensionsapp

This comment has been minimized.

Copy link

@extensionsapp extensionsapp commented Jul 31, 2017

These are good search bots. Why are they on the list?
Yahoo|YandexBot|Yandex|Twitterbot

@imagina

This comment has been minimized.

Copy link

@imagina imagina commented Mar 21, 2018

We found another bad bot scanning our servers: trovitBot

@dmitryd

This comment has been minimized.

Copy link

@dmitryd dmitryd commented Feb 19, 2019

@extensionsapp Yandex tend to be too aggressive.

@DroneZzZko

This comment has been minimized.

@jotapepinheiro

This comment has been minimized.

@precogtyrant

This comment has been minimized.

Copy link

@precogtyrant precogtyrant commented Aug 21, 2019

Hello,
Thanks for the code. However, it also contains Yahoo in the list. Does this mean Yahoo search engine's bot. I would rather not block that one ;)

@Vish-was

This comment has been minimized.

Copy link

@Vish-was Vish-was commented Sep 6, 2019

Still giving this "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) in my access.log

@Vish-was

This comment has been minimized.

Copy link

@Vish-was Vish-was commented Sep 9, 2019

Hi This won't work in the nginx.conf setting
also, I can manage to remove some bots via robots.txt

User-agent: MJ12bot
user-agent: SemrushBot
User-agent: Yandex
User-agent: YandexBot
User-agent: UptimeRobot
User-agent: AhrefsBot
User-agent: GoogleBot
User-agent: BingBot
Disallow: /

but some are still there
like GoogleBot, BingBot

@dmhendricks

This comment has been minimized.

Copy link

@dmhendricks dmhendricks commented Nov 15, 2019

Still giving this "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) in my access.log

access_log off;
return 444;
@qaisjp

This comment has been minimized.

Copy link

@qaisjp qaisjp commented Apr 29, 2020

Still giving this "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) in my access.log

access_log off;
return 444;

If it says:

nginx: [emerg] "access_log" directive is not allowed here

Put the if block inside your location directive, as per https://nginx.org/en/docs/http/ngx_http_log_module.html#access_log:

Context: http, server, location, if in location, limit_except

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment