Last active
March 5, 2023 16:03
-
-
Save petskratt/016c9dbf159a81b9d6aa to your computer and use it in GitHub Desktop.
Magento robots.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# robots.txt for Magento 1.9.x & 2.x / v1.7 2023-01-30 / peeter.marvet@vaimo.com | |
# - original version from 2015 for 1.9.x, but these rules are OK for M2 as well | |
# - edited in 2017 to add filter query parameter disallow samples + some wildcards | |
# - edited in 2018 to add query params blocking to Yandex as named User-agent does not read * | |
# - edited in 2023 to remove unneeded stuff (license.txt, crawl-delay) and make all rules use */ prefix | |
# based on: | |
# https://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/ | |
# https://www.hypernode.com/nl/blog/magento-robots-txt/ | |
# https://astrio.net/blog/optimize-robots-txt-for-magento/ | |
# | |
# comment and clone at https://gist.github.com/petskratt/016c9dbf159a81b9d6aa | |
# Keep in mind that by standard robots.txt should NOT contain empty lines, except between UA blocks! | |
# | |
# Sitemap (uncomment, change and add language/shop specific sitemaps, if running on multiple domains | |
# keep in mind sitemap can only point to own domain so something like sitemapindex.php is needed) | |
# Sitemap: http://example.com/sitemap.xml | |
# | |
# Crawlers Setup | |
User-agent: * | |
# | |
# Allow paging (unless paging inside a listing with more params, as disallowed below) | |
Allow: /*?p= | |
# | |
# Directories (technical images only) | |
Disallow: /media/captcha/ | |
Disallow: /media/customer/ | |
Disallow: /media/dhl/ | |
Disallow: /media/downloadable/ | |
Disallow: /media/import/ | |
Disallow: /media/pdf/ | |
Disallow: /media/sales/ | |
Disallow: /media/tmp/ | |
Disallow: /media/xmlconnect/ | |
# | |
# Paths (leading * to make work for single- and multilocale versions) | |
Disallow: */index.php/ | |
Disallow: */catalog/product_compare/ | |
Disallow: */catalog/category/view/ | |
Disallow: */catalog/product/view/ | |
Disallow: */catalog/product/gallery/ | |
Disallow: */catalogsearch/ | |
Disallow: */control/ | |
Disallow: */customer/ | |
Disallow: */customize/ | |
Disallow: */newsletter/ | |
Disallow: */poll/ | |
Disallow: */review/ | |
Disallow: */sendfriend/ | |
Disallow: */tag/ | |
Disallow: */wishlist/ | |
Disallow: */checkout/ | |
Disallow: */onestepcheckout/ | |
# | |
# Do not crawl sub category pages that are sorted or filtered. | |
# NB: Avoid wider rules like /*?* as these would block also assets with timestamp/version as parameter! | |
# | |
# These are more specific, pick what you need - and do not forget to add your custom filters! | |
Disallow: /*?dir* | |
Disallow: /*?limit* | |
Disallow: /*?mode* | |
Disallow: /*?___from_store=* | |
Disallow: /*?___store=* | |
Disallow: /*?cat=* | |
Disallow: /*?q=* | |
Disallow: /*?price=* | |
Disallow: /*?availability=* | |
Disallow: /*?brand=* | |
# | |
# Paths that can be safely ignored (no clean URLs) | |
Disallow: /*?p=*& | |
Disallow: /*.php$ | |
Disallow: /*?SID= |
Thanks!
You the man!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Great piece of code, thanks!