Skip to content

Instantly share code, notes, and snippets.

@ai2ik
Created January 4, 2019 14:16
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ai2ik/29863837a888800f2f242326985d245c to your computer and use it in GitHub Desktop.
Save ai2ik/29863837a888800f2f242326985d245c to your computer and use it in GitHub Desktop.
# Disallow all crawlers access to certain pages.
# Block Yandex from crawling site
User-agent: Yandex
Disallow: /
# Block Yeti
User-agent: Yeti
Disallow: /
# Block NextGenSearchBot
User-agent: NextGenSearchBot
Disallow: /
# Block ia-archiver from crawling site
User-agent: ia_archiver
Disallow: /
# Block Baiduspider from crawling site
User-agent: Baiduspider
Disallow: /
# Block PicScout Crawler from crawling site
User-agent: PicScout
Disallow: /
# Block MJ12bot from crawling site
User-agent: MJ12bot
Disallow: /
# Block 008 from crawling site
User-agent: 008
Disallow: /
# Block AhrefsBot from crawling site
User-agent: AhrefsBot
Disallow: /
# Block CCBot Crawler from crawling site
User-agent: CCBot
Disallow: /
# Block BLEXBot Crawler from crawling site
User-agent: BLEXBot Crawler
Disallow: /
# Block TinEye from crawling site
User-agent: TinEye
Disallow: /
# Block Sogou Spider from crawling site
User-agent: Sogou Spider
Disallow: /
# Block Exabot from crawling site
User-agent: Exabot
Disallow: /
# Block Nutch from crawling site
User-agent: Nutch
Disallow: /
# Block MJ12bot as it is just noise
User-agent: MJ12bot
Disallow: /
# Block Python-urllib
User-agent: Python-urllib
Disallow: /
# Block dotbot
User-agent: dotbot
Disallow: /
# Block SEOkicks
User-agent: SEOkicks-Robot
Disallow: /
# Block BlexBot
User-agent: BLEXBot
Disallow: /
# Block SISTRIX
User-agent: SISTRIX Crawler
Disallow: /
# Block Uptime robot
User-agent: UptimeRobot/2.0
Disallow: /
# Block Ezooms Robot
User-agent: Ezooms Robot
Disallow: /
# Block Perl LWP
User-agent: Perl LWP
Disallow: /
# Block netEstate NE Crawler (+http://www.website-datenbank.de/)
User-agent: netEstate NE Crawler (+http://www.website-datenbank.de/)
Disallow: /
# Block WiseGuys Robot
User-agent: WiseGuys Robot
Disallow: /
# Block Turnitin Robot
User-agent: Turnitin Robot
Disallow: /
# Block Heritrix
User-agent: Heritrix
Disallow: /
# Block pricepi
User-agent: pimonster
Disallow: /
User-agent: Pimonster
Disallow: /
User-agent: Pi-Monster
Disallow: /
# Block Eniro
User-agent: ECCP/1.0 (search@eniro.com)
Disallow: /
# Block YandexBot
User-agent: Yandex
Disallow: /
# Block Baidu
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /
# Block SoGou
User-agent: Sogou Spider
Disallow: /
# Block Psbot
User-agent: Psbot
Disallow: /
# Block Youdao
User-agent: YoudaoBot
Disallow: /
# BLEXBot
User-agent: BLEXBot
Disallow: /
# Block NaverBot
User-agent: NaverBot
User-agent: Yeti
Disallow: /
# Block Psbot
User-agent: Psbot
Disallow: /
# Block Mediapartners-Google
User-agent: Mediapartners-Google
Disallow: /
#Block Googlebot-Image
User-agent: Googlebot-Image
Disallow: /
# Block ZBot
User-agent: ZBot
Disallow: /
# Block Vagabondo
User-agent: Vagabondo
Disallow: /
# Block LinkWalker
User-agent: LinkWalker
Disallow: /
# Block Xenu Link Sleuth
User-agent: Xenu Link Sleuth
Disallow: /
# Block SimplePie
User-agent: SimplePie
Disallow: /
# Block Wget
User-agent: Wget
Disallow: /
# Block Pixray-Seeker
User-agent: Pixray-Seeker
Disallow: /
# Block BoardReader
User-agent: BoardReader
Disallow: /
# Block Unknown Bot
User-agent: Unknown Bot
Disallow: /
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment