Skip to content

Instantly share code, notes, and snippets.

@petskratt
Last active March 5, 2023 16:03
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save petskratt/016c9dbf159a81b9d6aa to your computer and use it in GitHub Desktop.
Save petskratt/016c9dbf159a81b9d6aa to your computer and use it in GitHub Desktop.
Magento robots.txt
# robots.txt for Magento 1.9.x & 2.x / v1.7 2023-01-30 / peeter.marvet@vaimo.com
# - original version from 2015 for 1.9.x, but these rules are OK for M2 as well
# - edited in 2017 to add filter query parameter disallow samples + some wildcards
# - edited in 2018 to add query params blocking to Yandex as named User-agent does not read *
# - edited in 2023 to remove unneeded stuff (license.txt, crawl-delay) and make all rules use */ prefix
# based on:
# https://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/
# https://www.hypernode.com/nl/blog/magento-robots-txt/
# https://astrio.net/blog/optimize-robots-txt-for-magento/
#
# comment and clone at https://gist.github.com/petskratt/016c9dbf159a81b9d6aa
# Keep in mind that by standard robots.txt should NOT contain empty lines, except between UA blocks!
#
# Sitemap (uncomment, change and add language/shop specific sitemaps, if running on multiple domains
# keep in mind sitemap can only point to own domain so something like sitemapindex.php is needed)
# Sitemap: http://example.com/sitemap.xml
#
# Crawlers Setup
User-agent: *
#
# Allow paging (unless paging inside a listing with more params, as disallowed below)
Allow: /*?p=
#
# Directories (technical images only)
Disallow: /media/captcha/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/xmlconnect/
#
# Paths (leading * to make work for single- and multilocale versions)
Disallow: */index.php/
Disallow: */catalog/product_compare/
Disallow: */catalog/category/view/
Disallow: */catalog/product/view/
Disallow: */catalog/product/gallery/
Disallow: */catalogsearch/
Disallow: */control/
Disallow: */customer/
Disallow: */customize/
Disallow: */newsletter/
Disallow: */poll/
Disallow: */review/
Disallow: */sendfriend/
Disallow: */tag/
Disallow: */wishlist/
Disallow: */checkout/
Disallow: */onestepcheckout/
#
# Do not crawl sub category pages that are sorted or filtered.
# NB: Avoid wider rules like /*?* as these would block also assets with timestamp/version as parameter!
#
# These are more specific, pick what you need - and do not forget to add your custom filters!
Disallow: /*?dir*
Disallow: /*?limit*
Disallow: /*?mode*
Disallow: /*?___from_store=*
Disallow: /*?___store=*
Disallow: /*?cat=*
Disallow: /*?q=*
Disallow: /*?price=*
Disallow: /*?availability=*
Disallow: /*?brand=*
#
# Paths that can be safely ignored (no clean URLs)
Disallow: /*?p=*&
Disallow: /*.php$
Disallow: /*?SID=
@simbus82
Copy link

Great piece of code, thanks!

@hanhpv
Copy link

hanhpv commented Jun 12, 2018

Thanks!

@secretagency
Copy link

You the man!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment