Skip to content

Instantly share code, notes, and snippets.

@lyda
Last active December 10, 2015 16:39
Show Gist options
  • Save lyda/4462581 to your computer and use it in GitHub Desktop.
Save lyda/4462581 to your computer and use it in GitHub Desktop.
The suggested changes to the Irish Times robots.txt file to prevent automated scraping.
# This is the modified Irish Times robots.txt file. @hlinehan would like to prevent automated scraping.
# I've made some changes to allow that.
User-agent: *
Disallow: /dublin/
Disallow: /eurotimes/
Disallow: /survey/
Disallow: /promotion/
Disallow: /Storage/
Disallow: /ITImage/
Disallow: /newspaper/pdf/
User-agent: Mediapartners-Google*
Disallow: /Storage/
Disallow: /ITImage/
Disallow: /newspaper/tools/
User-agent: Googlebot-News
Disallow: /
User-agent: 008
Disallow: /blogs/
Sitemap: http://www.irishtimes.com/sitemap.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment