Skip to content

Instantly share code, notes, and snippets.

@simbus82
Forked from petskratt/robots.txt
Last active May 23, 2018 13:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simbus82/4fc712b160d1fa63d559ab1e384f83fe to your computer and use it in GitHub Desktop.
Save simbus82/4fc712b160d1fa63d559ab1e384f83fe to your computer and use it in GitHub Desktop.
Magento 1.9.x - robots.txt
# robots.txt for Magento 1.9.x / v0.1 2018-05-23 / Simone Bussoni / inspired by Peeter Marvet
# # based on:
# https://gist.github.com/petskratt/016c9dbf159a81b9d6aa
# http://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/
# http://www.byte.nl/blog/magento-robots-txt/
# https://astrio.net/blog/optimize-robots-txt-for-magento/
#
# comment and clone at https://gist.github.com/petskratt/016c9dbf159a81b9d6aa
# Keep in mind that by standard robots.txt should NOT contain empty lines, except between UA blocks!
#
# Sitemap (uncomment, change and add language/shop specific sitemaps, if running on multiple domains
# keep in mind sitemap can only point to own domain so something like sitemapindex.php is needed)
# Sitemap: http://example.com/sitemap.xml
#
# Google Image Crawler Setup - having crawler-specific sections makes it ignore generic e.g *
User-agent: Googlebot-Image
Disallow:
#
# Yandex tends to be rather aggressive, may be worth keeping them at arms lenght
User-agent: YandexBot
Crawl-delay: 20
#
# Crawlers Setup
User-agent: *
#
# Allow paging (unless paging inside a listing with more params, as disallowed below)
Allow: /*?p=
#
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /magento/
#Disallow: /media/
Disallow: /media/captcha/
#Disallow: /media/catalog/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
#Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/
#
# Paths (if using shop id in URL must prefix with * or copy for each)
Disallow: */index.php/
Disallow: */catalog/product_compare/
Disallow: */catalog/category/view/
Disallow: */catalog/product/view/
Disallow: */catalog/product/gallery/
Disallow: */catalogsearch/
Disallow: */control/
Disallow: */contacts/
Disallow: */customer/
Disallow: */customize/
Disallow: */newsletter/
Disallow: */poll/
Disallow: */review/
Disallow: */sendfriend/
Disallow: */tag/
Disallow: */wishlist/
Disallow: */checkout/
Disallow: */onestepcheckout/
#
# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
#
# Do not crawl sub category pages that are sorted or filtered.
# This would be very broad, could hurt (incl. SEO).
# Disallow: /*?*
#
# These are more specific, pick what you need - and do not forget to add your custom filters!
Disallow: /*?dir*
Disallow: /*?limit*
Disallow: /*?mode*
Disallow: /*?___from_store=*
Disallow: /*?___store=*
Disallow: /*?cat=*
Disallow: /*?q=*
Disallow: /*?price=*
Disallow: /*?availability=*
Disallow: /*?brand=*
Disallow: /*?min=*
Disallow: /*?max=*
Disallow: /*?formato=*
Disallow: /*?fragranza=*
#
# Paths that can be safely ignored (no clean URLs)
Disallow: /*?p=*&
Disallow: /*.php$
Disallow: /*?SID=
#
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment