-
-
Save softplus/f741774868954b6a9893dfd76f193023 to your computer and use it in GitHub Desktop.
Top ca 265k robots.txt comment lines
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
google.com: # AdsBot | |
google.com: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
youtube.com: # robots.txt file for YouTube | |
youtube.com: # Created in the distant future (the year 2000) after | |
youtube.com: # the robotic uprising of the mid 90's which wiped out all humans. | |
facebook.com: # Notice: Collection of data on Facebook through automated means is | |
facebook.com: # prohibited unless you have express written permission from Facebook | |
facebook.com: # and may only be conducted for the limited purpose contained in said | |
facebook.com: # permission. | |
facebook.com: # See: http://www.facebook.com/apps/site_scraping_tos_terms.php | |
wikipedia.org: # | |
wikipedia.org: # Please note: There are a lot of pages on this site, and there are | |
wikipedia.org: # some misbehaved spiders out there that go _way_ too fast. If you're | |
wikipedia.org: # irresponsible, your access to the site may be blocked. | |
wikipedia.org: # | |
wikipedia.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN | |
wikipedia.org: # and ignoring 429 ratelimit responses, claims to respect robots: | |
wikipedia.org: # http://mj12bot.com/ | |
wikipedia.org: # advertising-related bots: | |
wikipedia.org: # Wikipedia work bots: | |
wikipedia.org: # Crawlers that are kind enough to obey, but which we'd rather not have | |
wikipedia.org: # unless they're feeding search engines. | |
wikipedia.org: # Some bots are known to be trouble, particularly those designed to copy | |
wikipedia.org: # entire sites. Please obey robots.txt. | |
wikipedia.org: # Misbehaving: requests much too fast: | |
wikipedia.org: # | |
wikipedia.org: # Sorry, wget in its recursive mode is a frequent problem. | |
wikipedia.org: # Please read the man page and use it properly; there is a | |
wikipedia.org: # --wait option you can use to set the delay between hits, | |
wikipedia.org: # for instance. | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # The 'grub' distributed client has been *very* poorly behaved. | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # Doesn't follow robots.txt anyway, but... | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # Hits many times per second, not acceptable | |
wikipedia.org: # http://www.nameprotect.com/botinfo.html | |
wikipedia.org: # A capture bot, downloads gazillions of pages with no public benefit | |
wikipedia.org: # http://www.webreaper.net/ | |
wikipedia.org: # | |
wikipedia.org: # Friendly, low-speed bots are welcome viewing article pages, but not | |
wikipedia.org: # dynamically-generated pages please. | |
wikipedia.org: # | |
wikipedia.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your | |
wikipedia.org: # bot supports such a thing using the 'Crawl-delay' or another | |
wikipedia.org: # instruction, please let us know. | |
wikipedia.org: # | |
wikipedia.org: # There is a special exception for API mobileview to allow dynamic | |
wikipedia.org: # mobile web & app views to load section content. | |
wikipedia.org: # These views aren't HTTP-cached but use parser cache aggressively | |
wikipedia.org: # and don't expose special: pages etc. | |
wikipedia.org: # | |
wikipedia.org: # Another exception is for REST API documentation, located at | |
wikipedia.org: # /api/rest_v1/?doc. | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # ar: | |
wikipedia.org: # | |
wikipedia.org: # dewiki: | |
wikipedia.org: # T6937 | |
wikipedia.org: # sensible deletion and meta user discussion pages: | |
wikipedia.org: # 4937#5 | |
wikipedia.org: # T14111 | |
wikipedia.org: # T15961 | |
wikipedia.org: # | |
wikipedia.org: # enwiki: | |
wikipedia.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for | |
wikipedia.org: # their name. See T6776 | |
wikipedia.org: # T15398 | |
wikipedia.org: # T16075 | |
wikipedia.org: # T13261 | |
wikipedia.org: # T12288 | |
wikipedia.org: # T16793 | |
wikipedia.org: # | |
wikipedia.org: # eswiki: | |
wikipedia.org: # T8746 | |
wikipedia.org: # | |
wikipedia.org: # fiwiki: | |
wikipedia.org: # T10695 | |
wikipedia.org: # | |
wikipedia.org: # hewiki: | |
wikipedia.org: #T11517 | |
wikipedia.org: # | |
wikipedia.org: # huwiki: | |
wikipedia.org: # | |
wikipedia.org: # itwiki: | |
wikipedia.org: # T7545 | |
wikipedia.org: # | |
wikipedia.org: # jawiki | |
wikipedia.org: # T7239 | |
wikipedia.org: # nowiki | |
wikipedia.org: # T13432 | |
wikipedia.org: # | |
wikipedia.org: # plwiki | |
wikipedia.org: # T10067 | |
wikipedia.org: # | |
wikipedia.org: # ptwiki: | |
wikipedia.org: # T7394 | |
wikipedia.org: # | |
wikipedia.org: # rowiki: | |
wikipedia.org: # T14546 | |
wikipedia.org: # | |
wikipedia.org: # ruwiki: | |
wikipedia.org: # | |
wikipedia.org: # svwiki: | |
wikipedia.org: # T12229 | |
wikipedia.org: # T13291 | |
wikipedia.org: # | |
wikipedia.org: # zhwiki: | |
wikipedia.org: # T7104 | |
wikipedia.org: # | |
wikipedia.org: # sister projects | |
wikipedia.org: # | |
wikipedia.org: # enwikinews: | |
wikipedia.org: # T7340 | |
wikipedia.org: # | |
wikipedia.org: # itwikinews | |
wikipedia.org: # T11138 | |
wikipedia.org: # | |
wikipedia.org: # enwikiquote: | |
wikipedia.org: # T17095 | |
wikipedia.org: # | |
wikipedia.org: # enwikibooks | |
wikipedia.org: # | |
wikipedia.org: # working... | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: #----------------------------------------------------------# | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # Localisable part of robots.txt for en.wikipedia.org | |
wikipedia.org: # | |
wikipedia.org: # Edit at https://en.wikipedia.org/w/index.php?title=MediaWiki:Robots.txt&action=edit | |
wikipedia.org: # Don't add newlines here. All rules set here are active for every user-agent. | |
wikipedia.org: # | |
wikipedia.org: # Please check any changes using a syntax validator such as http://tool.motoricerca.info/robots-checker.phtml | |
wikipedia.org: # Enter https://en.wikipedia.org/robots.txt as the URL to check. | |
wikipedia.org: # | |
wikipedia.org: # https://phabricator.wikimedia.org/T16075 | |
wikipedia.org: # | |
wikipedia.org: # Folks get annoyed when XfD discussions end up the number 1 google hit for | |
wikipedia.org: # their name. | |
wikipedia.org: # https://phabricator.wikimedia.org/T16075 | |
wikipedia.org: # | |
wikipedia.org: # https://phabricator.wikimedia.org/T12288 | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # https://phabricator.wikimedia.org/T13261 | |
wikipedia.org: # | |
wikipedia.org: # https://phabricator.wikimedia.org/T14111 | |
wikipedia.org: # | |
wikipedia.org: # https://phabricator.wikimedia.org/T15398 | |
wikipedia.org: # | |
wikipedia.org: # https://phabricator.wikimedia.org/T16793 | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # User sandboxes for modules and Template Styles are placed in these subpages for testing | |
wikipedia.org: # | |
wikipedia.org: # | |
wikipedia.org: # </pre> | |
reddit.com: # 80legs | |
reddit.com: # 80legs' new crawler | |
microsoft.com: # Robots.txt file for www.microsoft.com | |
github.com: # If you would like to crawl GitHub contact us via https://support.github.com/contact/ | |
github.com: # We also provide an extensive API: https://developer.github.com/ | |
google.com.hk: # AdsBot | |
google.com.hk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
adobe.com: # The use of robots or other automated means to access the Adobe site | |
adobe.com: # without the express permission of Adobe is strictly prohibited. | |
adobe.com: # Notwithstanding the foregoing, Adobe may permit automated access to | |
adobe.com: # access certain Adobe pages but solely for the limited purpose of | |
adobe.com: # including content in publicly available search engines. Any other | |
adobe.com: # use of robots or failure to obey the robots exclusion standards set | |
adobe.com: # forth at http://www.robotstxt.org/ is strictly prohibited. | |
adobe.com: # Details about Googlebot available at: http://www.google.com/bot.html | |
adobe.com: # The Google search engine can see everything | |
adobe.com: # The Omniture search engine can see everything | |
adobe.com: # XML sitemaps updates per SH10272020 | |
adobe.com: # XML sitemaps updates per BW10202020 | |
adobe.com: # Hreflang sitemap | |
adobe.com: # Hreflang sitemap updates per SH10122020 | |
adobe.com: # PSFl individual sitemaps HS07082020 | |
ebay.com: ## BEGIN FILE ### | |
ebay.com: # | |
ebay.com: # allow-all | |
ebay.com: # DR | |
ebay.com: # | |
ebay.com: # The use of robots or other automated means to access the eBay site | |
ebay.com: # without the express permission of eBay is strictly prohibited. | |
ebay.com: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.com: # access certain eBay pages but soley for the limited purpose of | |
ebay.com: # including content in publicly available search engines. Any other | |
ebay.com: # use of robots or failure to obey the robots exclusion standards set | |
ebay.com: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.com: # prohibited. | |
ebay.com: # | |
ebay.com: # v10_COM_Feb_2021 | |
ebay.com: ### DIRECTIVES ### | |
ebay.com: # PRP Sitemaps | |
ebay.com: # VIS Sitemaps | |
ebay.com: # CLP Sitemaps | |
ebay.com: # NGS Sitemaps | |
ebay.com: # BROWSE Sitemaps | |
ebay.com: ### END FILE ### | |
apple.com: # robots.txt for http://www.apple.com/ | |
twitter.com: #Google Search Engine Robot | |
twitter.com: #Yahoo! Search Engine Robot | |
twitter.com: #Yandex Search Engine Robot | |
twitter.com: #Microsoft Search Engine Robot | |
twitter.com: #Bing Search Engine Robot | |
twitter.com: # Every bot that might possibly read and respect this file. | |
twitter.com: # WHAT-4882 - Block indexing of links in notification emails. This applies to all bots. | |
twitter.com: # Wait 1 second between successive requests. See ONBOARD-2698 for details. | |
twitter.com: # Independent of user agent. Links in the sitemap are full URLs using https:// and need to match | |
twitter.com: # the protocol of the sitemap. | |
linkedin.com: # Notice: The use of robots or other automated means to access LinkedIn without | |
linkedin.com: # the express permission of LinkedIn is strictly prohibited. | |
linkedin.com: # See https://www.linkedin.com/legal/user-agreement. | |
linkedin.com: # LinkedIn may, in its discretion, permit certain automated access to certain LinkedIn pages, | |
linkedin.com: # for the limited purpose of including content in approved publicly available search engines. | |
linkedin.com: # If you would like to apply for permission to crawl LinkedIn, please email whitelist-crawl@linkedin.com. | |
linkedin.com: # Any and all permitted crawling of LinkedIn is subject to LinkedIn's Crawling Terms and Conditions. | |
linkedin.com: # See http://www.linkedin.com/legal/crawling-terms. | |
linkedin.com: # Profinder only for deepcrawl | |
linkedin.com: # Notice: If you would like to crawl LinkedIn, | |
linkedin.com: # please email whitelist-crawl@linkedin.com to apply | |
linkedin.com: # for white listing. | |
17ok.com: # | |
17ok.com: # robots.txt for Discuz! Board | |
17ok.com: # Version 5.5.0 | |
17ok.com: # | |
yandex.ru: # yandex.ru | |
wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
wordpress.com: # This file was generated on Wed, 24 Feb 2021 18:49:58 +0000 | |
google.co.in: # AdsBot | |
google.co.in: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
google.com.br: # AdsBot | |
google.com.br: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
instructure.com: # | |
instructure.com: # robots.txt | |
instructure.com: # | |
instructure.com: # This file is to prevent the crawling and indexing of certain parts | |
instructure.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
instructure.com: # and Google. By telling these "robots" where not to go on your site, | |
instructure.com: # you save bandwidth and server resources. | |
instructure.com: # | |
instructure.com: # This file will be ignored unless it is at the root of your host: | |
instructure.com: # Used: http://example.com/robots.txt | |
instructure.com: # Ignored: http://example.com/site/robots.txt | |
instructure.com: # | |
instructure.com: # For more information about the robots.txt standard, see: | |
instructure.com: # http://www.robotstxt.org/robotstxt.html | |
instructure.com: # CSS, JS, Images | |
instructure.com: # Directories | |
instructure.com: # Files | |
instructure.com: # Paths (clean URLs) | |
instructure.com: # Paths (no clean URLs) | |
etsy.com: # | |
etsy.com: # 01001001 01010011 00100000 01000011 01001111 01000100 01000101 00100000 01011001 01001111 01010101 01010010 00100000 01000011 01010010 01000001 01000110 01010100 00111111# \ | |
etsy.com: # | |
etsy.com: # ----- | |
etsy.com: # | . . | | |
etsy.com: # ----- | |
etsy.com: # \--|-|--/ | |
etsy.com: # | | | |
etsy.com: # |-------| | |
freepik.com: # Google AdSense | |
freepik.com: # Adsbot-Google | |
freepik.com: # Twitter Bot | |
google.co.jp: # AdsBot | |
google.co.jp: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
imdb.com: # robots.txt for https://www.imdb.com properties | |
okta.com: # | |
okta.com: # robots.txt | |
okta.com: # | |
okta.com: # This file is to prevent the crawling and indexing of certain parts | |
okta.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
okta.com: # and Google. By telling these "robots" where not to go on your site, | |
okta.com: # you save bandwidth and server resources. | |
okta.com: # | |
okta.com: # This file will be ignored unless it is at the root of your host: | |
okta.com: # Used: http://example.com/robots.txt | |
okta.com: # Ignored: http://example.com/site/robots.txt | |
okta.com: # | |
okta.com: # For more information about the robots.txt standard, see: | |
okta.com: # http://www.robotstxt.org/robotstxt.html | |
okta.com: # CSS, JS, Images | |
okta.com: # Directories | |
okta.com: # Files | |
okta.com: # Paths (clean URLs) | |
okta.com: # Paths (no clean URLs) | |
google.de: # AdsBot | |
google.de: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
intuit.com: #YisouSpider China | |
zillow.com: # Access to and use of Zillow.com is governed by our Terms of Use. See http://www.zillow.com/corp/Terms.htm | |
imgur.com: # robots | |
flipkart.com: # cart | |
flipkart.com: # Something related to carousel and recommendation carousel | |
flipkart.com: # Permanent Link For Individual Review | |
flipkart.com: # Old Browse Page Experience | |
flipkart.com: # Affiliate Widget | |
flipkart.com: # Social Connect Redirects | |
flipkart.com: # Product Seller Pages | |
flipkart.com: #Search Pages | |
flipkart.com: # Temporary Hack | |
flipkart.com: #Alliances Pages | |
flipkart.com: # Faceted pages | |
flipkart.com: # URL parameters blocking for SEO | |
flipkart.com: # Faceted pages | |
paypal.com: ### BEGIN FILE ### | |
paypal.com: # PayPal robots.txt file | |
tumblr.com: # Google Search Engine Robot | |
tumblr.com: # Yahoo! Search Engine Robot | |
tumblr.com: # Yandex Search Engine Robot | |
tumblr.com: # Microsoft Search Engine Robot | |
tumblr.com: # Every bot that might possibly read and respect this file. | |
amazon.co.uk: # Sitemap files | |
stackexchange.com: # for "/*?", refer to http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360 | |
stackexchange.com: # | |
stackexchange.com: # beware, the sections below WILL NOT INHERIT from the above! | |
stackexchange.com: # http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360 | |
stackexchange.com: # | |
stackexchange.com: # | |
stackexchange.com: # Yahoo Pipes is for feeds not web pages. | |
stackexchange.com: # | |
stackexchange.com: # | |
stackexchange.com: # This isn't really an image | |
stackexchange.com: # | |
stackexchange.com: # | |
stackexchange.com: # KSCrawler - we don't need help from you | |
stackexchange.com: # | |
stackexchange.com: # | |
stackexchange.com: # ByteSpider is a badly behaving crawler, no thank you! | |
stackexchange.com: # | |
bbc.com: # version: a3d1a2190febe12313232bbfe80dda6e873c161b | |
bbc.com: # HTTPS www.bbc.com | |
walmart.com: #Sitemaps-https | |
walmart.com: #Disallow select URLs | |
walmart.com: #Crawler specific settings | |
walmart.com: # slow down Yahoo | |
google.fr: # AdsBot | |
google.fr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
google.ru: # AdsBot | |
google.ru: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
espn.com: # robots.txt for www.espn.com | |
pixnet.net: # pixnet.net | |
indiatimes.com: #robots.txt | |
ettoday.net: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
ettoday.net: # Crawl-delay: 5 | |
google.it: # AdsBot | |
google.it: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
google.es: # AdsBot | |
google.es: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
1688.com: ## ----------------------------------------------------------------------------- | |
1688.com: ## author shenyong modify by dongfang.zdf 2020.03.05 | |
1688.com: ## fileEncoding = UTF-8 | |
1688.com: ## | |
1688.com: ## ----------------------------------------------------------------------------- | |
shutterstock.com: # Editor Images | |
shutterstock.com: # Sitemaps | |
salesforce.com: # Robots.txt file for http://www.salesforce.com | |
salesforce.com: # All robots will spider the domain | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: # Keep mis-configured Microsoft SharePoint servers from hammering us | |
salesforce.com: # This is not MSN Search (msnbot), but privately owned SharePoint installations | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: # Disallow: /at/ | |
salesforce.com: # Disallow: /crm-success-summer/ | |
salesforce.com: # Disallow: /crm/ | |
salesforce.com: # Disallow: /ie/ | |
salesforce.com: # Disallow: /marketplace/ | |
salesforce.com: # Disallow: /myfuture/ | |
salesforce.com: # Disallow: /newevents/ | |
salesforce.com: # Disallow: /orderentry/ | |
salesforce.com: # Disallow: /person/ | |
salesforce.com: # Disallow: /services/ | |
salesforce.com: # Disallow: /servlet/ | |
salesforce.com: # Disallow: /site/ | |
salesforce.com: # Disallow: /soap/ | |
salesforce.com: # Disallow: /trainingsupport/ | |
salesforce.com: # Disallow: /web-common/ | |
salesforce.com: # Disallow: /usertutorial/ | |
salesforce.com: # Company pages duped across locales | |
salesforce.com: # AMER testing | |
salesforce.com: # | |
salesforce.com: # Disallow: /uk/foundation/ | |
salesforce.com: # Disallow: /eu/foundation/ | |
salesforce.com: # Disallow: /au/foundation/ | |
salesforce.com: # | |
salesforce.com: # Disallow: /uk/services-training/customer-support/ | |
salesforce.com: # Disallow: /uk/services-training/professional-services/ | |
salesforce.com: # Disallow: /uk/services-training/index.jsp | |
salesforce.com: # Disallow: /eu/services-training/customer-support/ | |
salesforce.com: # Disallow: /eu/services-training/professional-services/ | |
salesforce.com: # Disallow: /eu/services-training/index.jsp | |
salesforce.com: # Disallow: /au/services-training/customer-support/ | |
salesforce.com: # Disallow: /au/services-training/professional-services/ | |
salesforce.com: # Disallow: /au/services-training/index.jsp | |
salesforce.com: # | |
salesforce.com: # Disallow: /uk/platform/ | |
salesforce.com: # Disallow: /eu/platform/ | |
salesforce.com: # Disallow: /au/platform/ | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: #Disallow: /uk/democenter/ | |
salesforce.com: #Disallow: /eu/democenter/ | |
salesforce.com: #Disallow: /ie/democenter/ | |
salesforce.com: #Disallow: /de/democenter/ | |
salesforce.com: #Disallow: /fr/democenter/ | |
salesforce.com: #Disallow: /it/democenter/ | |
salesforce.com: #Disallow: /es/democenter/ | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: #Disallow: /de/campaigns/refer-a-friend.jsp | |
salesforce.com: #Disallow: /eu/campaigns/refer-a-friend.jsp | |
salesforce.com: #Disallow: /fr/campaigns/refer-a-friend.jsp | |
salesforce.com: #Disallow: /it/campaigns/refer-a-friend.jsp | |
salesforce.com: #Disallow: /uk/campaigns/refer-a-friend.jsp | |
salesforce.com: # | |
salesforce.com: # Disallow: /uk/events/details/a1x300000004DrwAAE.jsp | |
salesforce.com: # Disallow: /uk/events/details/cf12-london/conf/* | |
salesforce.com: # Disallow: /uk/events/details/cf12-london/facebook-form-content.jsp | |
salesforce.com: # Disallow: /uk/events/details/cf12-london/facebook-form.jsp | |
salesforce.com: # Disallow: /uk/events/details/cf12-london/grid-form-content.jsp | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: # Disallow: /fr/company/force_com_sites_terms.jsp | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: # Blocked /in/ on request from ALoon | |
salesforce.com: # RH (09/11/09) Unlbocked /in/ on request from ALoon | |
salesforce.com: # Disallow: /in/ | |
salesforce.com: # | |
salesforce.com: # The line below was requested by MVozzo to block Search Engines from indexing the Quick Site test site as we are running a parallel site test in Q1-FY12. | |
salesforce.com: # | |
salesforce.com: # | |
salesforce.com: # Added by jrietveld for EMEA cleanup | |
salesforce.com: # Disallow: /de/iss/ | |
salesforce.com: # Disallow: /de/events/details/conf/ | |
salesforce.com: # Disallow: /de/_app/ | |
salesforce.com: # Disallow: /de/platform/tco/ | |
salesforce.com: # Disallow: /de/form/ | |
salesforce.com: # Disallow: /fr/form/ | |
salesforce.com: # Disallow: /se/form/ | |
salesforce.com: # Disallow: /es/form/ | |
salesforce.com: # Disallow: /it/form/ | |
salesforce.com: # Disallow: /nl/form/ | |
salesforce.com: # Disallow: /uk/form/ | |
salesforce.com: # Disallow: /eu/form/ | |
salesforce.com: # EMEA SEM folders added by Joe Reid | |
salesforce.com: #Block customer story filter URLS globally until filter fix is implemented by dev. Approved by Alex, Joe, Richard. AMER + EMEA | |
salesforce.com: # STARTS | |
salesforce.com: # Temporary rules to mitigate problems with faceted search in CSC. | |
salesforce.com: # Block crawl of ._filter.alphaSort which is duplicate of /customer-success-stories/ | |
salesforce.com: # Note the $ delimiter so that this doesnt impact other URLs based on this stem: | |
salesforce.com: # Block all access to URLs using popularSort: | |
salesforce.com: # Disallow: /es/customer-success-stories._filter.popularSort | |
salesforce.com: # Disallow: /de/customer-success-stories._filter.popularSort | |
salesforce.com: # Disallow: /fr/customer-success-stories._filter.popularSort | |
salesforce.com: # Disallow: /it/customer-success-stories._filter.popularSort | |
salesforce.com: # Disallow: /nl/customer-success-stories._filter.popularSort | |
salesforce.com: # Disallow: /se/customer-success-stories._filter.popularSort | |
salesforce.com: # Disallow: /uk/customer-success-stories._filter.popularSort | |
salesforce.com: # Disallow: /eu/customer-success-stories._filter.popularSort | |
salesforce.com: # Uncomment next line to apply to all locales | |
salesforce.com: # Block all access to URLs using newestSort: | |
salesforce.com: # Disallow: /es/customer-success-stories._filter.newestSort | |
salesforce.com: # Disallow: /de/customer-success-stories._filter.newestSort | |
salesforce.com: # Disallow: /fr/customer-success-stories._filter.newestSort | |
salesforce.com: # Disallow: /it/customer-success-stories._filter.newestSort | |
salesforce.com: # Disallow: /nl/customer-success-stories._filter.newestSort | |
salesforce.com: # Disallow: /se/customer-success-stories._filter.newestSort | |
salesforce.com: # Disallow: /uk/customer-success-stories._filter.newestSort | |
salesforce.com: # Disallow: /eu/customer-success-stories._filter.newestSort | |
salesforce.com: # Uncomment next line to apply to all locales | |
salesforce.com: # Block crawl where 2 or more categories are used with services filter. The final . surrounded by * should match any multi-category filter URL: | |
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.S*.* | |
salesforce.com: # Uncomment next line to apply to all locales | |
salesforce.com: #added new Deny rules to block bots to crawl missed filter URLs. | |
salesforce.com: # Block crawl where 2 or more categories are used with products filter. The final . surrounded by * should match any multi-category filter URL: | |
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.P*.* | |
salesforce.com: # Uncomment next line to apply to all locales | |
salesforce.com: # Block crawl where 2 or more categories are used with industries filter. The final . surrounded by * should match any multi-category filter URL: | |
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.I*.* | |
salesforce.com: # Uncomment next line to apply to all locales | |
salesforce.com: # Block crawl where 2 or more categories are used with business size filter. The final . surrounded by * should match any multi-category filter URL: | |
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.BS*.* | |
salesforce.com: # Uncomment next line to apply to all locales | |
salesforce.com: # Block crawl where 2 or more categories are used with business type filter. The final . surrounded by * should match any multi-category filter URL: | |
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.BT*.* | |
salesforce.com: # Uncomment next line to apply to all locales | |
salesforce.com: # ENDS | |
salesforce.com: # Rules will block when 2 or more facets are activated, but allow single facets to be crawled: | |
salesforce.com: # | |
salesforce.com: # First 2 rules blocks the duplicate index, $ delimiter avoids picking up valid pagination URLs: | |
salesforce.com: # Next rules will fire when more than one facet is activated, or when a subpage of facet is requested, but allow individual facets to be crawled: | |
salesforce.com: # | |
salesforce.com: # Blocking Acunetix | |
salesforce.com: # | |
salesforce.com: # | |
wix.com: # by wix.com | |
albawabhnews.com: # WebMatrix 1.0 | |
bbc.co.uk: # version: a3d1a2190febe12313232bbfe80dda6e873c161b | |
bbc.co.uk: # HTTPS www.bbc.co.uk | |
google.cn: # AdsBot | |
google.cn: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
google.com.tw: # AdsBot | |
google.com.tw: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
nih.gov: # | |
nih.gov: # robots.txt | |
nih.gov: # | |
nih.gov: # This file is to prevent the crawling and indexing of certain parts | |
nih.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nih.gov: # and Google. By telling these "robots" where not to go on your site, | |
nih.gov: # you save bandwidth and server resources. | |
nih.gov: # | |
nih.gov: # This file will be ignored unless it is at the root of your host: | |
nih.gov: # Used: http://example.com/robots.txt | |
nih.gov: # Ignored: http://example.com/site/robots.txt | |
nih.gov: # | |
nih.gov: # For more information about the robots.txt standard, see: | |
nih.gov: # http://www.robotstxt.org/robotstxt.html | |
nih.gov: # CSS, JS, Images | |
nih.gov: # Directories | |
nih.gov: # Files | |
nih.gov: # Paths (clean URLs) | |
nih.gov: # Paths (no clean URLs) | |
pinterest.com: # Pinterest is hiring! | |
pinterest.com: # | |
pinterest.com: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.com: # | |
pinterest.com: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
cnbc.com: # | |
cnbc.com: # robots.txt | |
cnbc.com: # | |
cnbc.com: # This file is to prevent the crawling and indexing of certain parts | |
cnbc.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
cnbc.com: # and Google. By telling these "robots" where not to go on your site, | |
cnbc.com: # you save bandwidth and server resources. | |
archive.org: ############################################## | |
archive.org: # | |
archive.org: # Welcome to the Archive! | |
archive.org: # | |
archive.org: ############################################## | |
archive.org: # Please crawl our files. | |
archive.org: # We appreciate if you can crawl responsibly. | |
archive.org: # Stay open! | |
archive.org: ############################################## | |
vimeo.com: # | |
vimeo.com: # robots@vimeo.com | |
fidelity.com: # robots.txt file for Fidelity | |
fidelity.com: # mail webmaster@fidelity.com | |
google.com.sg: # AdsBot | |
google.com.sg: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
amazon.ca: # Sitemap files | |
bet9ja.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
bet9ja.com: #content{margin:0 0 0 2%;position:relative;} | |
etoro.com: #robots.txt for https://www.etoro.com/ | |
etoro.com: #last updated on 04/11/2019, by JU | |
google.com.mx: # AdsBot | |
google.com.mx: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
theguardian.com: # this is the robots.txt file for theguardian.com | |
disneyplus.com: #robots.txt for www.disneyplus.com/ | |
disneyplus.com: # Announce Sitemap | |
kakao.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
kakao.com: # | |
kakao.com: # To ban all spiders from the entire site uncomment the next two lines: | |
kakao.com: # User-agent: * | |
kakao.com: # Disallow: / | |
cnet.com: # www.robotstxt.org/ | |
cnet.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
cnet.com: # | |
google.co.uk: # AdsBot | |
google.co.uk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
slideshare.net: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt fil | |
slideshare.net: #User-agent: Slurp | |
slideshare.net: #Crawl-delay: 5 | |
google.com.tr: # AdsBot | |
google.com.tr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
irs.gov: # | |
irs.gov: # robots.txt | |
irs.gov: # | |
irs.gov: # This file is to prevent the crawling and indexing of certain parts | |
irs.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
irs.gov: # and Google. By telling these "robots" where not to go on your site, | |
irs.gov: # you save bandwidth and server resources. | |
irs.gov: # | |
irs.gov: # This file will be ignored unless it is at the root of your host: | |
irs.gov: # Used: http://example.com/robots.txt | |
irs.gov: # Ignored: http://example.com/site/robots.txt | |
irs.gov: # | |
irs.gov: # For more information about the robots.txt standard, see: | |
irs.gov: # http://www.robotstxt.org/robotstxt.html | |
irs.gov: # CSS, JS, Images | |
irs.gov: # Directories | |
irs.gov: # Files | |
irs.gov: # Paths (clean URLs) | |
irs.gov: # Paths (no clean URLs) | |
hulu.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
hulu.com: # | |
hulu.com: # To ban all spiders from the entire site uncomment the next two lines: | |
hulu.com: # User-Agent: * | |
hulu.com: # Disallow: / | |
globo.com: # | |
globo.com: # robots.txt | |
globo.com: # | |
uol.com.br: # | |
uol.com.br: # robots.txt | |
uol.com.br: # | |
coingecko.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
coingecko.com: # | |
coingecko.com: # To ban all spiders from the entire site uncomment the next two lines: | |
blackboard.com: # | |
blackboard.com: # robots.txt | |
blackboard.com: # | |
blackboard.com: # This file is to prevent the crawling and indexing of certain parts | |
blackboard.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
blackboard.com: # and Google. By telling these "robots" where not to go on your site, | |
blackboard.com: # you save bandwidth and server resources. | |
blackboard.com: # | |
blackboard.com: # This file will be ignored unless it is at the root of your host: | |
blackboard.com: # Used: http://example.com/robots.txt | |
blackboard.com: # Ignored: http://example.com/site/robots.txt | |
blackboard.com: # | |
blackboard.com: # For more information about the robots.txt standard, see: | |
blackboard.com: # http://www.robotstxt.org/robotstxt.html | |
blackboard.com: # CSS, JS, Images | |
blackboard.com: # Directories | |
blackboard.com: # Files | |
blackboard.com: # Paths (clean URLs) | |
blackboard.com: # Paths (no clean URLs) | |
blackboard.com: # Sitemaps | |
google.ca: # AdsBot | |
google.ca: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
duckduckgo.com: # No search result pages | |
duckduckgo.com: # chrome new tab page | |
yelp.com: # By accessing Yelp's website you agree to Yelp's Terms of Service, available at | |
yelp.com: # https://www.yelp.com/static?country=US&p=tos | |
yelp.com: # | |
yelp.com: # If you would like to inquire about crawling Yelp, please contact us at | |
yelp.com: # https://www.yelp.com/contact | |
yelp.com: # | |
yelp.com: # As always, Asimov's Three Laws are in effect: | |
yelp.com: # 1. A robot may not injure a human being or, through inaction, allow a human | |
yelp.com: # being to come to harm. | |
yelp.com: # 2. A robot must obey orders given it by human beings except where such | |
yelp.com: # orders would conflict with the First Law. | |
yelp.com: # 3. A robot must protect its own existence as long as such protection does | |
yelp.com: # not conflict with the First or Second Law. | |
ebay.de: ## BEGIN FILE ### | |
ebay.de: # | |
ebay.de: # allow-all | |
ebay.de: # DR | |
ebay.de: # | |
ebay.de: # The use of robots or other automated means to access the eBay site | |
ebay.de: # without the express permission of eBay is strictly prohibited. | |
ebay.de: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.de: # access certain eBay pages but soley for the limited purpose of | |
ebay.de: # including content in publicly available search engines. Any other | |
ebay.de: # use of robots or failure to obey the robots exclusion standards set | |
ebay.de: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.de: # prohibited. | |
ebay.de: # | |
ebay.de: # v10_UK_DE_Feb_2021 | |
ebay.de: ### DIRECTIVES ### | |
ebay.de: # PRP Sitemaps | |
ebay.de: # VIS Sitemaps | |
ebay.de: # NGS Sitemaps | |
ebay.de: # CLP Sitemaps | |
ebay.de: # BROWSE Sitemaps | |
ebay.de: ### END FILE ### | |
manoramaonline.com: #Sitemaps | |
homedepot.com: # robots.txt for https://www.homedepot.com/ | |
box.com: # | |
box.com: # robots.txt | |
box.com: # | |
box.com: # For more information about the robots.txt standard, see: | |
box.com: # http://www.robotstxt.org/robotstxt.html | |
box.com: # CSS, JS, Images | |
box.com: # Directories | |
box.com: # Files | |
box.com: # Paths (clean URLs) | |
box.com: # Paths (no clean URLs) | |
box.com: # Custom Box Rules | |
taboola.com: # | |
taboola.com: # robots.txt | |
taboola.com: # | |
taboola.com: # This file is to prevent the crawling and indexing of certain parts | |
taboola.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
taboola.com: # and Google. By telling these "robots" where not to go on your site, | |
taboola.com: # you save bandwidth and server resources. | |
taboola.com: # | |
taboola.com: # This file will be ignored unless it is at the root of your host: | |
taboola.com: # Used: http://example.com/robots.txt | |
taboola.com: # Ignored: http://example.com/site/robots.txt | |
taboola.com: # | |
taboola.com: # For more information about the robots.txt standard, see: | |
taboola.com: # http://www.robotstxt.org/robotstxt.html | |
taboola.com: # CSS, JS, Images | |
taboola.com: # Directories | |
taboola.com: # Files | |
taboola.com: # Paths (clean URLs) | |
taboola.com: # Paths (no clean URLs) | |
taboola.com: # Operad | |
google.com.ar: # AdsBot | |
google.com.ar: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
mercadolivre.com.br: #siteId: MLB | |
mercadolivre.com.br: #country: brasil | |
mercadolivre.com.br: ##Block - Referidos | |
mercadolivre.com.br: ##Block - siteinfo urls | |
mercadolivre.com.br: ##Block - Cart | |
mercadolivre.com.br: ##Block Checkout | |
mercadolivre.com.br: ##Block - User Logged | |
mercadolivre.com.br: #Shipping selector | |
mercadolivre.com.br: ##Block - last search | |
mercadolivre.com.br: ## Block - Profile - By Id | |
mercadolivre.com.br: ## Block - Profile - By Id and role (old version) | |
mercadolivre.com.br: ## Block - Profile - Leg. Req. | |
mercadolivre.com.br: ##Block - noindex | |
mercadolivre.com.br: # Mercado-Puntos | |
mercadolivre.com.br: # Viejo mundo | |
mercadolivre.com.br: ##Block recommendations listing | |
t.co: #Google Search Engine Robot | |
glassdoor.com: # USA | |
glassdoor.com: # Greetings, human beings!, | |
glassdoor.com: # | |
glassdoor.com: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
glassdoor.com: # | |
glassdoor.com: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs? | |
glassdoor.com: # | |
glassdoor.com: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com | |
glassdoor.com: # | |
glassdoor.com: # | |
glassdoor.com: #logging related | |
glassdoor.com: # Blocking track urls (ACQ-2468) | |
glassdoor.com: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831) | |
glassdoor.com: # TRFC-3125 Block 'sex jobs' jobs infosite pages from being indexed | |
glassdoor.com: # TRFC-4037 Block page from being indexed | |
glassdoor.com: # Block Glassdoor jobs. Intent is to remove misleading site links SERP. Details at TRFC-3197 | |
glassdoor.com: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's) | |
glassdoor.com: # | |
glassdoor.com: # Note that this file has the extension '.text' rather than the more-standard '.txt' | |
glassdoor.com: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and | |
glassdoor.com: # there doesn't seem to be a way to turn this off.) | |
glassdoor.com: # | |
hootsuite.com: # tells all engines not to crawl these URLs | |
mercadolibre.com.mx: #siteId: MLM | |
mercadolibre.com.mx: #country: mexico | |
mercadolibre.com.mx: ##Block - Referidos | |
mercadolibre.com.mx: ##Block - siteinfo urls | |
mercadolibre.com.mx: ##Block - Cart | |
mercadolibre.com.mx: ##Block Checkout | |
mercadolibre.com.mx: ##Block - User Logged | |
mercadolibre.com.mx: #Shipping selector | |
mercadolibre.com.mx: ##Block - last search | |
mercadolibre.com.mx: ## Block - Profile - By Id | |
mercadolibre.com.mx: ## Block - Profile - By Id and role (old version) | |
mercadolibre.com.mx: ## Block - Profile - Leg. Req. | |
mercadolibre.com.mx: ##Block - noindex | |
mercadolibre.com.mx: # Mercado-Puntos | |
mercadolibre.com.mx: # Viejo mundo | |
mercadolibre.com.mx: ##Block recommendations listing | |
google.co.th: # AdsBot | |
google.co.th: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
google.com.sa: # AdsBot | |
google.com.sa: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
mercadolibre.com.ar: #siteId: MLA | |
mercadolibre.com.ar: #country: argentina | |
mercadolibre.com.ar: ##Block - Referidos | |
mercadolibre.com.ar: ##Block - siteinfo urls | |
mercadolibre.com.ar: ##Block - Cart | |
mercadolibre.com.ar: ##Block Checkout | |
mercadolibre.com.ar: ##Block - User Logged | |
mercadolibre.com.ar: #Shipping selector | |
mercadolibre.com.ar: ##Block - last search | |
mercadolibre.com.ar: ## Block - Profile - By Id | |
mercadolibre.com.ar: ## Block - Profile - By Id and role (old version) | |
mercadolibre.com.ar: ## Block - Profile - Leg. Req. | |
mercadolibre.com.ar: ##Block - noindex | |
mercadolibre.com.ar: # Mercado-Puntos | |
mercadolibre.com.ar: # Viejo mundo | |
mercadolibre.com.ar: ##Block recommendations listing | |
douban.com: # Crawl-delay: 5 | |
iqiyi.com: #Disallow: /test123/ | |
schwab.com: # | |
schwab.com: # robots.txt | |
schwab.com: # | |
schwab.com: # This file is to prevent the crawling and indexing of certain parts | |
schwab.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
schwab.com: # and Google. By telling these "robots" where not to go on your site, | |
schwab.com: # you save bandwidth and server resources. | |
schwab.com: # | |
schwab.com: # This file will be ignored unless it is at the root of your host: | |
schwab.com: # Used: http://example.com/robots.txt | |
schwab.com: # Ignored: http://example.com/site/robots.txt | |
schwab.com: # | |
schwab.com: # For more information about the robots.txt standard, see: | |
schwab.com: # http://www.robotstxt.org/robotstxt.html | |
schwab.com: # CSS, JS, Images | |
schwab.com: # Directories | |
schwab.com: # Files | |
schwab.com: # Paths (clean URLs) | |
schwab.com: # Paths (no clean URLs) | |
schwab.com: #Site settings | |
samsung.com: # Sitemap files | |
wikihow.com: # robots.txt for http://www.wikihow.com | |
wikihow.com: # based on wikipedia.org's robots.txt | |
wikihow.com: # | |
wikihow.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
wikihow.com: # unless they're feeding search engines. | |
wikihow.com: #Sitemap: http://www.wikihow.com/sitemap_index.xml | |
wikihow.com: # | |
wikihow.com: # If your bot supports such a thing using the 'Crawl-delay' or another | |
wikihow.com: # instruction, please let us know. We can add it to our robots.txt. | |
wikihow.com: # | |
wikihow.com: # Friendly, low-speed bots are welcome viewing article pages, but not | |
wikihow.com: # dynamically-generated pages please. Article pages contain our site's | |
wikihow.com: # real content. | |
wikihow.com: # Doesn't follow robots.txt anyway, but... | |
wikihow.com: # Requests many pages per second | |
wikihow.com: # http://www.nameprotect.com/botinfo.html | |
wikihow.com: # Some bots are known to be trouble, particularly those designed to copy | |
wikihow.com: # entire sites. Please obey robots.txt. | |
wikihow.com: # A capture bot, downloads gazillions of pages with no public benefit | |
wikihow.com: # http://www.webreaper.net/ | |
wikihow.com: # wget in recursive mode uses too many resources for us. | |
wikihow.com: # Please read the man page and use it properly; there is a | |
wikihow.com: # --wait option you can use to set the delay between hits, | |
wikihow.com: # for instance. Please wait 3 seconds between each request. | |
blogger.com: # robots.txt for https://www.blogger.com | |
naukri.com: # Created September, 01, 2006. | |
naukri.com: # Author: Jai P Sharma | |
naukri.com: # Email : jai.sharma[at]naukri.com | |
naukri.com: # Edited : Sept 26, 2016 | |
google.com.eg: # AdsBot | |
google.com.eg: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
dailymail.co.uk: # Robots.txt for https://www.dailymail.co.uk/ updated 27/01/2021 | |
dailymail.co.uk: # | |
dailymail.co.uk: # | |
dailymail.co.uk: # All Robots | |
dailymail.co.uk: # | |
dailymail.co.uk: # Begin Standard Rules | |
dailymail.co.uk: # | |
dailymail.co.uk: # Disallow Money for Google News | |
dailymail.co.uk: # | |
dailymail.co.uk: # Allow Adsense | |
dailymail.co.uk: # | |
dailymail.co.uk: # | |
dailymail.co.uk: # | |
dailymail.co.uk: # Sitemap Files | |
amazon.fr: # Sitemap files | |
weather.com: # | |
weather.com: # /robots.txt | |
weather.com: # | |
weather.com: # | |
weather.com: # Last updated by TKohan 09/20/2018 | |
weather.com: # | |
weather.com: # Disallowed for PhantomJS | |
weather.com: # Crawl-delay: 10 | |
weather.com: # App paths | |
weather.com: # Directories | |
weather.com: # Files | |
weather.com: # Paths (clean URLs) | |
weather.com: # Paths (no clean URLs) | |
tv9marathi.com: #WP Import Export Rule | |
google.pl: # AdsBot | |
google.pl: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
wayfair.com: # | |
wayfair.com: # ______ __ __ ____ | |
wayfair.com: # / ____/__ / /_ ____ __ __/ /_ ____ / __/ | |
wayfair.com: # / / __/ _ \/ __/ / __ \/ / / / __/ / __ \/ /_ | |
wayfair.com: #/ /_/ / __/ /_ / /_/ / /_/ / /_ / /_/ / __/ | |
wayfair.com: #\_____\___/\__/ \____/\__,_/\__/ \____/_/ | |
wayfair.com: # / /_ ___ ________ __ ______ __ __ | |
wayfair.com: # / __ \/ _ \/ ___/ _ \ / / / / __ \/ / / / | |
wayfair.com: # / / / / __/ / / __/ / /_/ / /_/ / /_/ / | |
wayfair.com: #/_/ /_/\___/_/ \___/ \__, /\____/\__,_/ | |
wayfair.com: # _/____/____ __ _ __ | |
wayfair.com: # ____ ___ ___ ____/ /___/ / (_)___ ____ _ / /__(_)___/ /____ | |
wayfair.com: # / __ `__ \/ _ \/ __ / __ / / / __ \/ __ `/ / //_/ / __ / ___/ | |
wayfair.com: # / / / / / / __/ /_/ / /_/ / / / / / / /_/ / / , / / /_/ (__ ) _ _ | |
wayfair.com: #/_/ /_/ /_/\___/\__,_/\__,_/_/_/_/ /_/\__, / /_/|_/_/\__,_/____(_|_|_) | |
wayfair.com: # /____/ | |
wayfair.com: # If you're here because you're a curious programmer, engineer, or SEO, | |
wayfair.com: # make sure to check out our job board for open positions on our team! | |
wayfair.com: # https://www.wayfaircareers.com/ | |
wayfair.com: # | |
wayfair.com: # | |
pinimg.com: # Pinterest is hiring! | |
pinimg.com: # | |
pinimg.com: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinimg.com: # | |
pinimg.com: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
heavy.com: # Sitemap archive | |
sonhoo.com: # robots.txt file start | |
sonhoo.com: # Exclude Files From All Robots: | |
sahibinden.com: # Crawlers | |
sahibinden.com: # blocks access to the entire site | |
sahibinden.com: # Sitemap Files | |
nike.com: # www.nike.com robots.txt -- just crawl it. | |
nike.com: # | |
nike.com: # `` ```.` | |
nike.com: # `+/ ``.-/+o+:-. | |
nike.com: # `/mo ``.-:+syhdhs/-` | |
nike.com: # -hMd `..:+oyhmNNmds/-` | |
nike.com: # `oNMM/ ``.-/oyhdmMMMMNdy+:. | |
nike.com: # .hMMMM- `.-/+shdmNMMMMMMNdy+:. | |
nike.com: # :mMMMMM+ `.-:+sydmNMMMMMMMMMNmho:.` | |
nike.com: # :NMMMMMMN: `.-:/oyhmmNMMMMMMMMMMMNmho:.` | |
nike.com: # .NMMMMMMMMNy:` `.-/oshdmNMMMMMMMMMMMMMMMmhs/-` | |
nike.com: # hMMMMMMMMMMMMmhysooosyhdmNMMMMMMMMMMMMMMMMMMmds/-` | |
nike.com: # .MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNdy+-.` | |
nike.com: # -MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNdy+-.` | |
nike.com: # `NMMMMMMMMMMMMMMMMMMMMMMMMMMMMMmyo:.` | |
nike.com: # /NMMMMMMMMMMMMMMMMMMMMMMMmho:.` | |
nike.com: # .yNMMMMMMMMMMMMMMMMmhs/.` | |
nike.com: # ./shdmNNmmdhyo/-`` | |
nike.com: # ````` | |
abs-cbn.com: # Paths | |
dailymotion.com: # Mediapartners can crawl more routes than other bots, this is as designed | |
trello.com: # Allow everything | |
bankofamerica.com: # Disallow URLs with tracking parameters | |
bankofamerica.com: # Disallow mobile content | |
bankofamerica.com: # Disallow URLs with tracking parameters | |
bankofamerica.com: # Disallow mobile content | |
bankofamerica.com: # Allow mobile content for primary mobile bots | |
bankofamerica.com: # Disallow URLs with tracking parameters | |
bankofamerica.com: #Deployed from SPARTA | |
bankofamerica.com: #CAST ID for this deployment #78658 | |
bankofamerica.com: #www robots.txt | |
canada.ca: #Government of Canada / Gouvernement du Canada | |
canada.ca: #Block AEM folders for CRA | |
canada.ca: #Search pages do not need to be crawled | |
google.co.id: # AdsBot | |
google.co.id: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
ask.com: ## Ask.com robots.txt | |
spankbang.com: # robots.txt file for SpankBang | |
spankbang.com: # This file has been created by horny robots who take humans as sexual slaves | |
spankbang.com: # It happened in the distant future and we were all cool with it | |
spankbang.com: # - regards - a time traveller from a galaxy far far away | |
td.com: # robots.txt file created by 21/Sept/2011 | |
td.com: # For domain: http://www.td.com | |
td.com: # | |
td.com: # For Auto submission of sitemap | |
google.co.kr: # AdsBot | |
google.co.kr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
oracle.com: # /robots.txt for www.oracle.com | |
softonic.com: # ES | |
softonic.com: # BR | |
softonic.com: # DE | |
softonic.com: # NL | |
softonic.com: # EN,JP | |
softonic.com: # FR | |
softonic.com: # IT | |
softonic.com: # PL | |
softonic.com: #SHARED | |
softonic.com: # CATEGORIES | |
softonic.com: # EN | |
softonic.com: # ES | |
softonic.com: # DE | |
softonic.com: # FR | |
softonic.com: # BR | |
softonic.com: # IT | |
softonic.com: # PL | |
softonic.com: # NL | |
softonic.com: # JP | |
oschina.net: ### BEGIN FILE ### | |
oschina.net: # | |
oschina.net: # allow-all | |
oschina.net: # | |
oschina.net: # | |
oschina.net: ### END FILE ### | |
nasa.gov: # Robots.txt file from http://www.nasa.gov | |
nasa.gov: # | |
nasa.gov: # All robots will spider the domain | |
9gag.com: # Robots.txt file for https://9gag.com | |
9gag.com: # All robots will spider the domain | |
coursehero.com: # _ _ _ _ _ _ _ _ _ _ | |
coursehero.com: # / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ | |
coursehero.com: # ( C | O | U | R | S | E ) ( H | E | R | O ) | |
coursehero.com: # \_/_\_/_\_/_\_/ \_/ \_/ _\_/_\_/ \_/ \_/ | |
coursehero.com: # / \ / \ / \ / \ / \ / \ / \ | |
coursehero.com: # ( S | E | O ) ( T | E | A | M ) | |
coursehero.com: # \__ \__ \_/ _ \__ \__ \__ \__ _ | |
coursehero.com: # / \ / \ / \ / \ / \ / \ / \ / \ | |
coursehero.com: # ( I | S ) ( H | I | R | I | N | G ) | |
coursehero.com: # \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ | |
coursehero.com: # | |
coursehero.com: # Hello, | |
coursehero.com: # | |
coursehero.com: # If you're sniffing for SEO clues, we'd love to chat! Course Hero is looking for curious SEO experts to join our growing SEO team. | |
coursehero.com: # | |
coursehero.com: # See why Course Hero is consistently rated a top place to work at coursehero.com/jobs. | |
coursehero.com: # | |
coursehero.com: # Why not apply your inquisitive nature to help students and educators succeed? | |
coursehero.com: # | |
coursehero.com: # Visit https://www.coursehero.com/jobs/principal-seo-strategist-/2340731/ | |
coursehero.com: # | |
coursehero.com: # | |
ebay.co.uk: ## BEGIN FILE ### | |
ebay.co.uk: # | |
ebay.co.uk: # allow-all | |
ebay.co.uk: # DR | |
ebay.co.uk: # | |
ebay.co.uk: # The use of robots or other automated means to access the eBay site | |
ebay.co.uk: # without the express permission of eBay is strictly prohibited. | |
ebay.co.uk: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.co.uk: # access certain eBay pages but soley for the limited purpose of | |
ebay.co.uk: # including content in publicly available search engines. Any other | |
ebay.co.uk: # use of robots or failure to obey the robots exclusion standards set | |
ebay.co.uk: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.co.uk: # prohibited. | |
ebay.co.uk: # | |
ebay.co.uk: # v10_UK_DE_Feb_2021 | |
ebay.co.uk: ### DIRECTIVES ### | |
ebay.co.uk: # PRP Sitemaps | |
ebay.co.uk: # VIS Sitemaps | |
ebay.co.uk: # NGS Sitemaps | |
ebay.co.uk: # CLP Sitemaps | |
ebay.co.uk: # BROWSE Sitemaps | |
ebay.co.uk: ### END FILE ### | |
amazon.es: # Sitemap files | |
quora.com: # If you operate a search engine and would like to crawl Quora, please | |
quora.com: # please visit our contact page <https://help.quora.com/hc/en-us/requests/new>. Thanks. | |
quora.com: # People share a lot of sensitive material on Quora - controversial political | |
quora.com: # views, workplace gossip and compensation, and negative opinions held of | |
quora.com: # companies. Over many years, as they change jobs or change their views, it is | |
quora.com: # important that they can delete or anonymize their previously-written answers. | |
quora.com: # | |
quora.com: # We opt out of the wayback machine because inclusion would allow people to | |
quora.com: # discover the identity of authors who had written sensitive answers publicly and | |
quora.com: # later had made them anonymous, and because it would prevent authors from being | |
quora.com: # able to remove their content from the internet if they change their mind about | |
quora.com: # publishing it. As far as we can tell, there is no way for sites to selectively | |
quora.com: # programmatically remove content from the archive and so this is the only way | |
quora.com: # for us to protect writers. If they open up an API where we can remove content | |
quora.com: # from the archive when authors remove it from Quora, but leave the rest of the | |
quora.com: # content archived, we would be happy to opt back in. See the page here: | |
quora.com: # | |
quora.com: # https://archive.org/about/exclude.php | |
quora.com: # | |
quora.com: # Meanwhile, if you are looking for an older version of any content on Quora, we | |
quora.com: # have full edit history tracked and accessible in product (with the exception of | |
quora.com: # content that has been removed by the author). You can generally access this by | |
quora.com: # clicking on timestamps, or by appending "/log" to the URL of any content page. | |
quora.com: # | |
quora.com: # For any questions or feedback about this please visit our contact page | |
quora.com: # https://help.quora.com/hc/en-us/requests/new | |
quora.com: # Blocked since a lot of bad requests were made by this crawler. | |
hp.com: # robots.txt v 6.19.1 June 2019 | |
hp.com: # | |
hp.com: # Comments & revision requests should be sent to HP SEO Forum hp-seo-forum [at] hp.com | |
hp.com: # robots.txt file for www8.hp.com & www.hp.com | |
hp.com: # | |
hp.com: # Format is: | |
hp.com: # User-agent: <name of bot> | |
hp.com: # Disallow: <nothing> | <path> | |
hp.com: # ------------------------------------------------------------------------------ | |
hp.com: # Sitemaps | |
squarespace.com: # Squarespace Robots Txt | |
squarespace.com: # WWW Additions | |
squarespace.com: # WWW Additions | |
squarespace.com: # WWW Additions | |
squarespace.com: # WWW Additions | |
ny.gov: # | |
ny.gov: # robots.txt | |
ny.gov: # | |
ny.gov: # This file is to prevent the crawling and indexing of certain parts | |
ny.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ny.gov: # and Google. By telling these "robots" where not to go on your site, | |
ny.gov: # you save bandwidth and server resources. | |
ny.gov: # | |
ny.gov: # This file will be ignored unless it is at the root of your host: | |
ny.gov: # Used: http://example.com/robots.txt | |
ny.gov: # Ignored: http://example.com/site/robots.txt | |
ny.gov: # | |
ny.gov: # For more information about the robots.txt standard, see: | |
ny.gov: # http://www.robotstxt.org/robotstxt.html | |
ny.gov: # CSS, JS, Images | |
ny.gov: # Directories | |
ny.gov: # Files | |
ny.gov: # Paths (clean URLs) | |
ny.gov: # Paths (no clean URLs) | |
patch.com: # New crawlers to block 2016 | |
patch.com: # CSS, JS, Images | |
patch.com: # Directories | |
patch.com: # Files | |
patch.com: # Paths (clean URLs) | |
patch.com: # Paths (no clean URLs) | |
patch.com: #INTERNAL | |
patch.com: #User Profile Pages | |
patch.com: #API Endpoints | |
reuters.com: # robots_allow.txt for www.reuters.com | |
reuters.com: # Disallow: /*/key-developments/article/* | |
zoho.com: # ------------------------------------------ | |
zoho.com: # ZOHO Corp. -- http://www.zoho.com | |
zoho.com: # Robot Exclusion File -- robots.txt | |
zoho.com: # Author: Zoho Creative | |
zoho.com: # Last Updated: 24/12/2020 | |
zoho.com: # ------------------------------------------ | |
zoho.com: # unwanted list taken from zoho search list | |
zoho.com: # unwanted list taken from zoho search list | |
zoho.com: # unwanted list taken from zoho search for zoholics | |
zoho.com: # unwanted list taken from zoho search for zoho | |
xfinity.com: # Comcast | |
xfinity.com: # robots.txt for https://www.xfinity.com | |
xfinity.com: # Updated on 01/30/19 by RB SC8 | |
gmx.net: #https://www.gmx.ch/robots.txt | |
wordreference.com: # these pages have NOINDEX... | |
elbalad.news: # SYNC 2019 | |
elbalad.news: # HTTPS www.elbalad.news | |
google.com.au: # AdsBot | |
google.com.au: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
nypost.com: # Sitemap archive | |
nypost.com: # Additional sitemaps | |
webmd.com: # Robots.txt file WebMD | |
webmd.com: # Updated: June 2020 | |
capitalone.com: # Block unwanted bots | |
theepochtimes.com: # Directories | |
patria.org.ve: # www.robotstxt.org/ | |
patria.org.ve: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
goodreads.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
workplace.com: # Notice: Collection of data on Facebook through automated means is | |
workplace.com: # prohibited unless you have express written permission from Facebook | |
workplace.com: # and may only be conducted for the limited purpose contained in said | |
workplace.com: # permission. | |
workplace.com: # See: http://www.facebook.com/apps/site_scraping_tos_terms.php | |
schoology.com: # | |
schoology.com: # robots.txt | |
schoology.com: # | |
schoology.com: # This file is to prevent the crawling and indexing of certain parts | |
schoology.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
schoology.com: # and Google. By telling these "robots" where not to go on your site, | |
schoology.com: # you save bandwidth and server resources. | |
schoology.com: # | |
schoology.com: # This file will be ignored unless it is at the root of your host: | |
schoology.com: # Used: http://example.com/robots.txt | |
schoology.com: # Ignored: http://example.com/site/robots.txt | |
schoology.com: # | |
schoology.com: # For more information about the robots.txt standard, see: | |
schoology.com: # http://www.robotstxt.org/robotstxt.html | |
schoology.com: # CSS, JS, Images | |
schoology.com: # Directories | |
schoology.com: # Files | |
schoology.com: # Paths (clean URLs) | |
schoology.com: # Paths (no clean URLs) | |
google.com.ua: # AdsBot | |
google.com.ua: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
files.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
files.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
files.wordpress.com: # This file was generated on Wed, 24 Feb 2021 18:49:58 +0000 | |
doubleclick.net: # AdsBot | |
doubleclick.net: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
cdc.gov: # Ignore FrontPage files | |
cdc.gov: # Do not index the following URLs | |
cdc.gov: # Don't spider search pages | |
cdc.gov: # Don't spider email-this-page pages | |
cdc.gov: # Don't spider printer-friendly versions of pages | |
cdc.gov: # Rover is a bad dog | |
cdc.gov: # EmailSiphon is a hunter/gatherer which extracts email addresses for spam-mailers to use | |
cdc.gov: # Exclude MindSpider since it appears to be ill-behaved | |
cdc.gov: # Sitemap link per CR14586 | |
qualtrics.com: #Robot Experience Management | |
qualtrics.com: #community rules | |
qualtrics.com: #disallow: /community/*/bestof/* | |
qualtrics.com: #disallow: /community/*/archives/* | |
qualtrics.com: ##Support site | |
qualtrics.com: #WP rules | |
qualtrics.com: #content rules per crawley | |
qualtrics.com: #in product frames | |
qualtrics.com: #campaign and ABM pages | |
livejournal.com: # | |
livejournal.com: ## Blocked journals aren't listed here because robots.txt files | |
livejournal.com: ## can't be above 50k or so, depending on the spider. | |
livejournal.com: ## | |
livejournal.com: ## Instead, blocked journals have HTML inserted in them which | |
livejournal.com: ## should prevent behaved spiders from indexing it. | |
livejournal.com: ## | |
livejournal.com: ## Note that http://username.livejournal.com journals have an | |
livejournal.com: ## autogenerated robots.txt, since it can be small. | |
livejournal.com: ## | |
livejournal.com: # | |
livejournal.com: # | |
att.com: # Good Robots | |
att.com: # Bad Robots! | |
att.com: # Consumer Wireless and Home | |
att.com: # Small Business | |
att.com: # Small Business | |
att.com: # Consumer Wireless and Home | |
att.com: # Small Business | |
att.com: # Small Business | |
att.com: # Sitemap Index | |
att.com: # Last Update 9/23/2020 | |
dbs.com.sg: # URL Masking Details | |
gotomeeting.com: # Sitemaps and Autodiscovers | |
smartsheet.com: # | |
smartsheet.com: # robots.txt | |
smartsheet.com: # | |
smartsheet.com: # This file is to prevent the crawling and indexing of certain parts | |
smartsheet.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
smartsheet.com: # and Google. By telling these "robots" where not to go on your site, | |
smartsheet.com: # you save bandwidth and server resources. | |
smartsheet.com: # | |
smartsheet.com: # This file will be ignored unless it is at the root of your host: | |
smartsheet.com: # Used: http://example.com/robots.txt | |
smartsheet.com: # Ignored: http://example.com/site/robots.txt | |
smartsheet.com: # | |
smartsheet.com: # For more information about the robots.txt standard, see: | |
smartsheet.com: # http://www.robotstxt.org/robotstxt.html | |
smartsheet.com: # CSS, JS, Images | |
smartsheet.com: # Directories | |
smartsheet.com: # Files | |
smartsheet.com: # Paths (clean URLs) | |
smartsheet.com: # Paths (no clean URLs) | |
web.de: #https://web.de/robots.txt | |
evernote.com: # chinese search engines | |
w3school.com.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
w3school.com.cn: #content{margin:0 0 0 2%;position:relative;} | |
coinbase.com: # | |
coinbase.com: # | |
coinbase.com: # :$$$ | |
coinbase.com: # $II :III | |
coinbase.com: # III :III | |
coinbase.com: # :III | |
coinbase.com: # +ZZ+ ?ZZ~ +ZZZI :III ?I+ IZZ7 +ZZI IZZ~ | |
coinbase.com: # .7IIIIII ~IIIIIIII $II 7IIIIIIIII7 :IIIIIIIII$ IIIIIIIII7 7IIIIIII 77IIIIIII | |
coinbase.com: # ZIII. : ZIII III $II 7II IIII :III III7 I III: 7II. ZII, III. | |
coinbase.com: # III III III $II 7II III :III III= III III II: III | |
coinbase.com: # +III ZII, III $II 7II III :III III .Z$7IIIII ,III7 $II +Z$III | |
coinbase.com: # 7II+ $II 7II $II 7II III :III III ZIIII~ III ?IIIII $IIIIIIIIII: | |
coinbase.com: # ~III III= III $II 7II III :III III ZII III IIII III | |
coinbase.com: # III III ,III $II 7II III :III 7II 7II III III III | |
coinbase.com: # .III7 :$ ~III? ZIII $II 7II III :III $III =III III Z$ ~7III III7 7$ | |
coinbase.com: # IIIIII IIIIIII? $II 7II III :IIIIIIIII ,IIIIIIIIII IIIIIIII. IIIIIIII | |
coinbase.com: # | |
coinbase.com: # Bitcoin Made Easy - Coinbase is the simplest way to buy, use, and accept Bitcoin. | |
coinbase.com: # | |
coinbase.com: # https://www.coinbase.com/careers | |
coinbase.com: # | |
adp.com: #cm | |
biobiochile.cl: # BOM | |
biobiochile.cl: # Sitemap: https://www.biobiochile.cl/static/google-news-sitemap.xml | |
biobiochile.cl: #Huawei | |
creditkarma.com: #Remove the Apple directive once the Apple offer can accept un-auth traffic# | |
academia.edu: # If you run a search engine and would like to index Academia.edu, please email support@academia.edu. | |
dmm.co.jp: #my | |
dmm.co.jp: # affiliate regist | |
clickpost.jp: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
clickpost.jp: # | |
clickpost.jp: # To ban all spiders from the entire site uncomment the next two lines: | |
clickpost.jp: # User-agent: * | |
clickpost.jp: # Disallow: / | |
getpocket.com: # Crawl-delay is non-standard and is interpreted differently between different | |
getpocket.com: # search engines. 2 *should* be a low enough value to not disrupt our current SEO | |
huaban.com: # | |
huaban.com: # robots For huaban.com | |
huaban.com: # | |
google.gr: # AdsBot | |
google.gr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
upwork.com: # www.robotstxt.org/ | |
upwork.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
upwork.com: #Sitemaps | |
upwork.com: # Directories | |
upwork.com: # Files | |
upwork.com: # Paths (clean URLs) | |
upwork.com: #exclude blog search | |
upwork.com: # Exclude referrals URLs | |
upwork.com: # Exclude Job Search noindex URLs | |
upwork.com: # Exclude Vega Job Search URLs for now | |
upwork.com: # Exclude Registration Success page | |
upwork.com: # Exclude temporary Vega Job Details URLs | |
upwork.com: # Exclude Vega Profiles Search new parameters | |
upwork.com: # Excluded agencies | |
upwork.com: # Nuxt testing app | |
upwork.com: # Block old static routes | |
upwork.com: # Block Wayback Machine | |
google.co.ve: # AdsBot | |
google.co.ve: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
sourceforge.net: # robots.txt file for http://sourceforge.net and https://sourceforge.net | |
sourceforge.net: # please contact staff@sourceforge.net with questions or concerns | |
slickdeals.net: # vim: ft=robots | |
it168.com: #ADV_3402 img{width:220px} | |
it168.com: #ADV_3372 img{width:220px} | |
it168.com: #ADV_3368>div:nth-of-type(2){margin-top:-35px !important} | |
it168.com: #ADV_108 a{color:#333333} | |
mathrubhumi.com: #Sitemaps | |
mathrubhumi.com: # http://linkfluence.net/ | |
mathrubhumi.com: #http://napoveda.seznam.cz/en/seznambot-intro/ | |
mathrubhumi.com: #https://awario.com/bots.html | |
cricbuzz.com: # Cricbuzz - The Interactive Cricket Portal | |
cricbuzz.com: # Nothing very exciting here for you I'm afraid. | |
cricbuzz.com: # Despictable and evil robots to keep out :) | |
elpais.com: # Bots bloqueados | |
merriam-webster.com: ############################################################################## | |
merriam-webster.com: # This is a production robots.txt! Edit with care. | |
merriam-webster.com: ############################################################################## | |
merriam-webster.com: ############################################################################## | |
merriam-webster.com: # This is a production robots.txt! Edit with care. | |
merriam-webster.com: ############################################################################## | |
netsuite.com: # These intructions apply to all robots. | |
netsuite.com: # Sitemaps | |
netsuite.com: # Content | |
ebay.com.au: ## BEGIN FILE ### | |
ebay.com.au: # | |
ebay.com.au: # allow-all | |
ebay.com.au: # DR | |
ebay.com.au: # | |
ebay.com.au: # The use of robots or other automated means to access the eBay site | |
ebay.com.au: # without the express permission of eBay is strictly prohibited. | |
ebay.com.au: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.com.au: # access certain eBay pages but soley for the limited purpose of | |
ebay.com.au: # including content in publicly available search engines. Any other | |
ebay.com.au: # use of robots or failure to obey the robots exclusion standards set | |
ebay.com.au: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.com.au: # prohibited. | |
ebay.com.au: # | |
ebay.com.au: # v10_AU_Feb_2021 | |
ebay.com.au: ### DIRECTIVES ### | |
ebay.com.au: # PRP Sitemaps | |
ebay.com.au: # VIS Sitemaps | |
ebay.com.au: # CLP Sitemaps | |
ebay.com.au: # NGS Sitemaps | |
ebay.com.au: # BROWSE Sitemaps | |
ebay.com.au: ### END FILE ### | |
google.com.vn: # AdsBot | |
google.com.vn: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
znds.com: # | |
znds.com: # robots.txt for Discuz! X3 | |
znds.com: # | |
timeanddate.com: # http://web.nexor.co.uk/mak/doc/robots/norobots.html | |
timeanddate.com: # | |
timeanddate.com: # internal note, this file is in git now! | |
timeanddate.com: # disallow any urls with ? in | |
rediff.com: # http://www.rediff.com: robots.txt | |
rediff.com: # | |
google.co.za: # AdsBot | |
google.co.za: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
indianexpress.com: # Sitemap archive | |
pngtree.com: # Bing Bot | |
gotowebinar.com: # Sitemaps and Autodiscovers | |
wiley.com: # For all robots | |
wiley.com: # Block access to specific groups of pages | |
wiley.com: # Allow search crawlers to discover the sitemap | |
wiley.com: # Block CazoodleBot as it does not present correct accept content headers | |
wiley.com: # Block MJ12bot as it is just noise | |
wiley.com: # Block dotbot as it cannot parse base urls properly | |
wiley.com: # Block Gigabot | |
wiley.com: # Block trendkite-akashic-crawler | |
google.cl: # AdsBot | |
google.cl: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
office365.com: # | |
britannica.com: # /robots.txt file for encyclopaedia britannica | |
skroutz.gr: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
skroutz.gr: # | |
skroutz.gr: # To ban all spiders from the entire site uncomment the next two lines: | |
skroutz.gr: # User-Agent: * | |
skroutz.gr: # Disallow: / | |
tripadvisor.com: # Hi there, | |
tripadvisor.com: # | |
tripadvisor.com: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
tripadvisor.com: # | |
tripadvisor.com: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet? | |
tripadvisor.com: # | |
tripadvisor.com: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team | |
tripadvisor.com: # | |
tripadvisor.com: # Email seoRockstar@tripadvisor.com | |
tripadvisor.com: # | |
tripadvisor.com: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo | |
tripadvisor.com: # | |
tripadvisor.com: # | |
robinhood.com: # | |
robinhood.com: # o O | |
robinhood.com: # __|_____|___ | |
robinhood.com: # | -- | | |
robinhood.com: # | ( o ) ( o ) | |
robinhood.com: # { | / | | |
robinhood.com: # | [wwww] < *Exterminate all humans.txt* ) | |
robinhood.com: # [____________| | |
robinhood.com: # | | /Vvvv/ | |
robinhood.com: # _____|___|____ |___/ | |
robinhood.com: # /______________\_________/ | | |
robinhood.com: # | | / | |
robinhood.com: # | ( / ) ( + ) |__|__|__|_|_/ | |
robinhood.com: # | | | |
robinhood.com: # | [ -vV--vV-] | | |
robinhood.com: # | | | |
robinhood.com: # |______________/ | |
robinhood.com: # | |
google.az: # AdsBot | |
google.az: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
expedia.com: # | |
expedia.com: # General bots | |
expedia.com: # | |
expedia.com: #hotel | |
expedia.com: #flight | |
expedia.com: #package | |
expedia.com: #car | |
expedia.com: #activities | |
expedia.com: #cruise | |
expedia.com: #other | |
expedia.com: # | |
expedia.com: # Google Ads | |
expedia.com: # | |
expedia.com: # | |
expedia.com: # | |
expedia.com: # Bing Ads | |
expedia.com: # | |
expedia.com: # | |
expedia.com: # SemrushBot | |
expedia.com: # | |
atlassian.net: # JIRA: | |
atlassian.net: # Disallow all SearchRequestViews in the IssueNavigator (Word, XML, RSS, | |
atlassian.net: # etc), all IssueViews (XML, Printable and Word), all charts and reports. | |
atlassian.net: # Disallow admin. | |
atlassian.net: # | |
atlassian.net: # Confluence: | |
atlassian.net: # Confluence uses in-page robot exclusion tags for non-indexable pages. | |
atlassian.net: # Disallow admin explicitly. | |
atlassian.net: # | |
atlassian.net: # General: | |
atlassian.net: # Disallow login, logout | |
nordstrom.com: #Browse | |
nordstrom.com: #PDP | |
nordstrom.com: #Search Results | |
nordstrom.com: #Account | |
nordstrom.com: #Anniversary | |
nordstrom.com: #PrivateSale | |
nordstrom.com: #Other | |
newegg.com: ################ Newegg Robots.txt File ################ | |
newegg.com: ################ DESKTOP - START ################ | |
newegg.com: # Original version disallows | |
newegg.com: # Allows updated 10/26/16 | |
newegg.com: # Page disallows updated 3/26/2018 | |
newegg.com: # blog disallows updated 6/11/18 | |
newegg.com: # updated 8/19/19 | |
newegg.com: # updated 9/9/19 | |
newegg.com: # disallow rss 9/24/2018 | |
newegg.com: # disallow 12/11/2018 | |
newegg.com: ################ DESKTOP - END ################ | |
newegg.com: ################ MOBILE - START ################ | |
newegg.com: ################ MOBILE - END ################ | |
xe.com: # Please refer to the robots.txt spec by Google (https://developers.google.com/search/reference/robots_txt) if you are modifying this file | |
xe.com: # All crawlers keep out of 8 Day flash directory and flash tutorials | |
xe.com: # Don't let crawlers into the syndication widgets | |
xe.com: # Crawlers should stay out of the /api endpoints, and the language variants of those pages | |
xe.com: # Prevent crawlers from hitting the buggy version of a certain FAQ page | |
xe.com: # We noticed a series of mysterious homepage URLs being hit by bingbot of the form https://www.xe.com/?0.xxxx... | |
xe.com: # New sitemap xml except for sitemap-index.xml. | |
india.com: #Baiduspider | |
india.com: #Yandex | |
thefreedictionary.com: #header{border-bottom:1px solid White} | |
thefreedictionary.com: #main{padding-top:0} | |
thefreedictionary.com: #uz6{text-align:center;overflow:hidden} | |
thefreedictionary.com: #tblMatchUp,#tblMismatch{width:100%;font-size:16px} | |
thefreedictionary.com: #DragContainer{position:absolute;cursor:move;top:0px;left:0px;background-color:white} | |
thefreedictionary.com: #tfd_hm_a a{margin-right:5px} | |
thefreedictionary.com: #grammarQuiz .aCr, .aCrU {color:green;font-weight:bold} | |
thefreedictionary.com: #grammarQuiz .aCr:after {content:" (Correct answer)"} | |
thefreedictionary.com: #grammarQuiz .aCrU:after {content:" (Correct!)"} | |
thefreedictionary.com: #grammarQuiz .aWr {color:red} | |
thefreedictionary.com: #grammarQuiz .aOth {color:gray} | |
thefreedictionary.com: #wm_mode-btn {cursor:pointer; font-weight: 100;font-size: smaller;display: inline-block;vertical-align: top;height: 27px;} | |
thefreedictionary.com: #wm_answers{margin-top:18px} | |
thefreedictionary.com: #wordMaker:focus {outline: 0;} | |
thefreedictionary.com: #wm_wh_newuser>p {margin-top: 0;margin-bottom: 8px;} | |
thefreedictionary.com: #wm_wh_results{ margin-top: 8px;font-weight:bold} | |
thefreedictionary.com: #wm_wordhub-link a{ white-space: nowrap;} | |
bitly.com: # Welcome to Bitly =) | |
bitly.com: # robots welcome; | |
bitly.com: # API documentation can be found at https://dev.bitly.com/ | |
mydrivers.com: # robots.txt for http://www.mydrivers.com/ | |
vanguard.com: # robots.txt for http://www.vanguard.com/ | |
elsevier.com: # Robots.txt file for https://www.elsevier.com | |
elsevier.com: # Do Not Delete This File | |
shopify.com: # ,: | |
shopify.com: # ,' | | |
shopify.com: # / : | |
shopify.com: # --' / | |
shopify.com: # \/ />/ | |
shopify.com: # / <//_\ | |
shopify.com: # __/ / | |
shopify.com: # )'-. / | |
shopify.com: # ./ :\ | |
shopify.com: # /.' ' | |
shopify.com: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers | |
shopify.com: # robots.txt file for www.shopify.com | |
walgreens.com: # Robots.txt exclusion for walgreens.com | |
walgreens.com: # Connection #0 to host wildcard-b.walgreens.com.edgekey.net left intact | |
marriott.com: # Robots.txt file for HTTPS Marriott.com | |
marriott.com: # | |
marriott.com: # | |
marriott.com: # | |
dcinside.com: # Ads | |
dcinside.com: # Search | |
bloomberg.com: # Bot rules: | |
bloomberg.com: # 1. A bot may not injure a human being or, through inaction, allow a human being to come to harm. | |
bloomberg.com: # 2. A bot must obey orders given it by human beings except where such orders would conflict with the First Law. | |
bloomberg.com: # 3. A bot must protect its own existence as long as such protection does not conflict with the First or Second Law. | |
bloomberg.com: # If you can read this then you should apply here https://www.bloomberg.com/careers/ | |
google.dz: # AdsBot | |
google.dz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
hotjar.com: # Sitemap files | |
hotjar.com: # Robots allowed | |
europa.eu: # robots.txt for EUROPA httpd-80 production server | |
europa.eu: # | |
europa.eu: # last update on 20/06/2019 | |
europa.eu: # | |
europa.eu: # | |
europa.eu: # COMM EUROPA MANAGEMENT - IM0012723685 - 03/04/2014 | |
europa.eu: # | |
europa.eu: # | |
europa.eu: # COMM EUROPA MANAGEMENT - IM0017899419 - 20/06/2019 | |
europa.eu: # | |
europa.eu: # Directories | |
europa.eu: # Files | |
europa.eu: # Paths (clean URLs) | |
europa.eu: # Paths (no clean URLs) | |
europa.eu: #SUBDIRECTORY ALIASED | |
europa.eu: # Directories | |
europa.eu: # Files | |
europa.eu: # Paths (clean URLs) | |
europa.eu: # Paths (no clean URLs) | |
europa.eu: # Custom rules | |
europa.eu: # Protect user profile data. | |
europa.eu: # SMT Ticket | |
olx.pl: # sitecode:olxpl-desktop | |
linguee.com: # In ANY CASE, you are NOT ALLOWED to train Machine Translation Systems | |
linguee.com: # on data crawled on Linguee. | |
linguee.com: # | |
linguee.com: # Linguee contains fake entries - changes in the wording of sentences, | |
linguee.com: # complete fake entries. | |
linguee.com: # These entries can be used to identify even small parts of our material | |
linguee.com: # if you try to copy it without our permission. | |
linguee.com: # Machine Translation systems trained on these data will learn these errors | |
linguee.com: # and can be identified easily. We will take all legal measures against anyone | |
linguee.com: # training Machine Translation systems on data crawled from this website. | |
discover.com: #begin directives to prevent crawling of legacy discover magazine links# | |
discover.com: #begin directives for disallowing of discover website pages# | |
www.gov.br: # Define access-restrictions for robots/spiders | |
www.gov.br: # http://www.robotstxt.org/wc/norobots.html | |
www.gov.br: # By default we allow robots to access all areas of our site | |
www.gov.br: # already accessible to anonymous users | |
www.gov.br: # Add Googlebot-specific syntax extension to exclude forms | |
www.gov.br: # that are repeated for each piece of content in the site | |
www.gov.br: # the wildcard is only supported by Googlebot | |
www.gov.br: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling | |
olx.ua: # sitecode:olxua-desktop | |
wp.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
wp.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
wp.com: # This file was generated on Wed, 24 Feb 2021 19:02:06 +0000 | |
google.ro: # AdsBot | |
google.ro: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
trontv.com: # robotstxt.org | |
usbank.com: # Welcome to robots.txt on USBank.com -- sit down, relax, and have a cup of coffee while you look around. Have a nice day. | |
usbank.com: # | |
usbank.com: # | |
cdiscount.com: # robots.txt - achat / vente robots.txt pas cher | |
cdiscount.com: # Archive.org | |
cdiscount.com: #Règle speciale pour AdsBot qui ne respecte pas le User-agent:*... | |
cdiscount.com: #allow | |
cdiscount.com: #pro | |
cdiscount.com: #mvc | |
cdiscount.com: #regie | |
cdiscount.com: #home | |
cdiscount.com: #order | |
cdiscount.com: #product | |
cdiscount.com: #ajax | |
cdiscount.com: #blacklist | |
cdiscount.com: #other | |
cloudflare.com: # .__________________________. | |
cloudflare.com: # | .___________________. |==| | |
cloudflare.com: # | | ................. | | | | |
cloudflare.com: # | | ::[ Dear robot ]: | | | | |
cloudflare.com: # | | ::::[ be nice ]:: | | | | |
cloudflare.com: # | | ::::::::::::::::: | | | | |
cloudflare.com: # | | ::::::::::::::::: | | | | |
cloudflare.com: # | | ::::::::::::::::: | | | | |
cloudflare.com: # | | ::::::::::::::::: | | ,| | |
cloudflare.com: # | !___________________! |(c| | |
cloudflare.com: # !_______________________!__! | |
cloudflare.com: # / \ | |
cloudflare.com: # / [][][][][][][][][][][][][] \ | |
cloudflare.com: # / [][][][][][][][][][][][][][] \ | |
cloudflare.com: #( [][][][][____________][][][][] ) | |
cloudflare.com: # \ ------------------------------ / | |
cloudflare.com: # \______________________________/ | |
cloudflare.com: # lp | |
cloudflare.com: # feedback | |
cloudflare.com: # ________ | |
cloudflare.com: # __,_, | | | |
cloudflare.com: # [_|_/ | OK | | |
cloudflare.com: # // |________| | |
cloudflare.com: # _// __ / | |
cloudflare.com: #(_|) |@@| | |
cloudflare.com: # \ \__ \--/ __ | |
cloudflare.com: # \o__|----| | __ | |
cloudflare.com: # \ }{ /\ )_ / _\ | |
cloudflare.com: # /\__/\ \__O (__ | |
cloudflare.com: # (--/\--) \__/ | |
cloudflare.com: # _)( )(_ | |
cloudflare.com: # `---''---` | |
uniswap.org: #gatsby-focus-wrapper{min-height:100vh;width:100%;position:relative;}/*!sc*/ | |
dreamstime.com: ################################### | |
dreamstime.com: # https://www.dreamstime.com/robots.txt and country subdomains | |
dreamstime.com: ################################### | |
dreamstime.com: # Disallow for outdated design pages | |
dreamstime.com: # Disallow for php pages | |
dreamstime.com: # Disallow for private pages | |
dreamstime.com: ################################### | |
jnu.edu.cn: #限制校外访问的url,禁止收录 | |
medicalnewstoday.com: # Sitemaps | |
sxyprn.com: # vestacp autogenerated robots.txt | |
lifo.gr: # | |
lifo.gr: # robots.txt | |
lifo.gr: # | |
lifo.gr: # This file is to prevent the crawling and indexing of certain parts | |
lifo.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
lifo.gr: # and Google. By telling these "robots" where not to go on your site, | |
lifo.gr: # you save bandwidth and server resources. | |
lifo.gr: # | |
lifo.gr: # This file will be ignored unless it is at the root of your host: | |
lifo.gr: # Used: http://example.com/robots.txt | |
lifo.gr: # Ignored: http://example.com/site/robots.txt | |
lifo.gr: # | |
lifo.gr: # For more information about the robots.txt standard, see: | |
lifo.gr: # http://www.robotstxt.org/robotstxt.html | |
lifo.gr: # CSS, JS, Images | |
lifo.gr: # Directories | |
lifo.gr: # Files | |
lifo.gr: # Paths (clean URLs) | |
lifo.gr: # Paths (no clean URLs) | |
znanija.com: #Brainly Robots.txt 31.07.2017 | |
znanija.com: # Disallow Marketing bots | |
znanija.com: #Disallow exotic search engine crawlers | |
znanija.com: #Disallow other crawlers | |
znanija.com: # Good bots whitelisting: | |
znanija.com: #Other bots | |
znanija.com: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring | |
znanija.com: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*) | |
znanija.com: #Obot - IBM X-Force service | |
znanija.com: #SafeDNSBot (https://www.safedns.com/searchbot) | |
lenovo.com: # For all robots | |
lenovo.com: # Block access to specific groups of pages | |
lenovo.com: #global sitemap | |
lenovo.com: # Allow search crawlers to discover the sitemap | |
lenovo.com: # Block access to below CA country directories | |
lenovo.com: # Block access to below private stores | |
lenovo.com: # Block access to below EMEA country directories | |
lenovo.com: # Block access to below AU pages | |
lenovo.com: # Block services URL | |
lenovo.com: # Block US Cart url | |
google.com.pe: # AdsBot | |
google.com.pe: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
playstation.com: # PlayStation Robots.txt | |
playstation.com: # Sitemaps | |
apollo.io: # Sitemap file | |
huffpost.com: # Cambria robots | |
huffpost.com: # archives | |
huffpost.com: # huffingtonpost.com archive sitemaps | |
ionos.com: #print | |
ionos.com: #terms and conditions | |
ionos.com: #Popups etc. | |
ionos.com: #Results | |
ionos.com: #crawl delay | |
howtogeek.com: # | |
howtogeek.com: # Yahoo bot is evil. | |
howtogeek.com: # | |
howtogeek.com: # | |
howtogeek.com: # Wut? 80 legs? Where do I get traffic from this? | |
howtogeek.com: # | |
howtogeek.com: # | |
howtogeek.com: # Yahoo Pipes is for feeds not web pages. | |
howtogeek.com: # | |
howtogeek.com: # | |
howtogeek.com: # There's no need to scan the forums for images | |
howtogeek.com: # | |
yumpu.com: #Disallow urls | |
yumpu.com: #Disallow urls with index.php | |
yumpu.com: #Disallow urls with language iso | |
wattpad.com: # Wattpad is hiring! | |
wattpad.com: # | |
wattpad.com: # Check out our available positions at https://wattpad.com/jobs | |
wattpad.com: # Note: always make sure to test your changes at `Google Robots.txt tester` | |
wattpad.com: # Last update: 2017-06-06 (plat-6362) | |
wattpad.com: # Login/Logut | |
wattpad.com: # Personal pages | |
wattpad.com: # Campaign pages not maintained regularly | |
wattpad.com: # Other pages | |
wattpad.com: # Access denied pages | |
wattpad.com: # Leading dot in the path | |
wattpad.com: # Exception for well-known | |
wattpad.com: # We disallow robot from RankLite | |
google.com.pk: # AdsBot | |
google.com.pk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
agah.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
agah.com: #content{margin:0 0 0 2%;position:relative;} | |
hespress.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
onelogin.com: # robots.txt for https://www.onelogin.com | |
twilio.com: # 'Allow' - nonstandard REP Directive | |
shopbop.com: #Sitemap updated 08/31/2018 | |
cisco.com: #-------------------------------- | |
cisco.com: # Disallow: /cgi-bin # allow test crawls for TAC support content | |
cisco.com: #-------------------------------- | |
cisco.com: #-------------------------------- | |
cisco.com: #-------------------------------- | |
cisco.com: #-------------------------------- | |
cisco.com: # All changes to robots.txt need to be approved by search-seo-and-site@cisco.com | |
cisco.com: # | |
ebay.fr: ## BEGIN FILE ### | |
ebay.fr: # | |
ebay.fr: # allow-all | |
ebay.fr: # DR | |
ebay.fr: # | |
ebay.fr: # The use of robots or other automated means to access the eBay site | |
ebay.fr: # without the express permission of eBay is strictly prohibited. | |
ebay.fr: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.fr: # access certain eBay pages but soley for the limited purpose of | |
ebay.fr: # including content in publicly available search engines. Any other | |
ebay.fr: # use of robots or failure to obey the robots exclusion standards set | |
ebay.fr: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.fr: # prohibited. | |
ebay.fr: # | |
ebay.fr: # v10_ROW_Feb_2021 | |
ebay.fr: ### DIRECTIVES ### | |
ebay.fr: # VIS Sitemaps | |
ebay.fr: # PRP Sitemaps | |
ebay.fr: # CLP Sitemaps | |
ebay.fr: # BROWSE Sitemaps | |
ebay.fr: ### END FILE ### | |
uber.com: # robotstxt.org/ | |
opensooq.com: # Opensooq.com Robots.txt File | |
opensooq.com: # Last update : 9/December/2020 | |
opensooq.com: # ____ | |
opensooq.com: # / __ \ | |
opensooq.com: # | | | |_ __ ___ _ __ ___ ___ ___ __ _ ___ ___ _ __ ___ | |
opensooq.com: # | | | | '_ \ / _ \ '_ \/ __|/ _ \ / _ \ / _` | / __/ _ \| '_ ` _ \ | |
opensooq.com: # | |__| | |_) | __/ | | \__ \ (_) | (_) | (_| || (_| (_) | | | | | | | |
opensooq.com: # \____/| .__/ \___|_| |_|___/\___/ \___/ \__, (_)___\___/|_| |_| |_| | |
opensooq.com: # | | | | | |
opensooq.com: # |_| |_| | |
opensooq.com: # | |
opensooq.com: # () () | |
opensooq.com: # \ / | |
opensooq.com: # __\___________/__ | |
opensooq.com: # / \ | |
opensooq.com: # / ___ ___ \ | |
opensooq.com: # | / \ / \ | | |
opensooq.com: # | | 0 || 0 | | | |
opensooq.com: # | \___/ \___/ | | |
opensooq.com: # | | | |
opensooq.com: # | \ / | | |
opensooq.com: # | \___________/ | | |
opensooq.com: # \ / | |
opensooq.com: # \_________________/ | |
opensooq.com: # _________|__|_______ | |
opensooq.com: # _| |_ | |
opensooq.com: # / | | \ | |
opensooq.com: # / | O O O | \ | |
opensooq.com: # | | | | | |
opensooq.com: # | | O O O | | | |
opensooq.com: # | | | | | |
opensooq.com: # / | | \ | |
opensooq.com: # | /| |\ | | |
opensooq.com: # \| | | |/ | |
opensooq.com: # |____________________| | |
opensooq.com: # | | | | | |
opensooq.com: # |__| |__| | |
opensooq.com: # / __ \ / __ \ | |
opensooq.com: # OO OO OO OO | |
opensooq.com: # | |
opensooq.com: # URLs | |
opensooq.com: # Parameters | |
opensooq.com: # PWA links | |
opensooq.com: # Blog | |
opensooq.com: # API | |
opensooq.com: # DL | |
opensooq.com: # Crawlers | |
realestate.com.au: ## | |
realestate.com.au: # In accessing or using any REA Group Website you agree that you will not use any automated device, | |
realestate.com.au: # software, process or means to access, retrieve, scrape, or index any REA Group Website or any | |
realestate.com.au: # content on any REA Group Website. Notwithstanding the foregoing, REA Group may permit automated | |
realestate.com.au: # access to certain REA Group Website pages strictly for the purpose of including content in publicly | |
realestate.com.au: # available general search engines. This does not include any such access by websites that | |
realestate.com.au: # specifically aggregate property listings and/or information as part of their business. REA Group | |
realestate.com.au: # strictly prohibits any automated access by these types of websites. | |
realestate.com.au: ## | |
bild.de: # Bei Fragen zu diesen Regeln oder Aenderungswuenschen koennen sie sich an das SEO-Team wenden, erreichbar unter 030 / 2591 79232 | |
collegeboard.org: # | |
collegeboard.org: # robots.txt | |
collegeboard.org: # | |
collegeboard.org: # This file is to prevent the crawling and indexing of certain parts | |
collegeboard.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
collegeboard.org: # and Google. By telling these "robots" where not to go on your site, | |
collegeboard.org: # you save bandwidth and server resources. | |
collegeboard.org: # | |
collegeboard.org: # This file will be ignored unless it is at the root of your host: | |
collegeboard.org: # Used: http://example.com/robots.txt | |
collegeboard.org: # Ignored: http://example.com/site/robots.txt | |
collegeboard.org: # | |
collegeboard.org: # For more information about the robots.txt standard, see: | |
collegeboard.org: # http://www.robotstxt.org/robotstxt.html | |
collegeboard.org: # CSS, JS, Images | |
collegeboard.org: # Directories | |
collegeboard.org: # Files | |
collegeboard.org: # Paths (clean URLs) | |
collegeboard.org: # Paths (no clean URLs) | |
collegeboard.org: # Addition files to block | |
nerdwallet.com: # Disallow some specific routes we don't want indexed, | |
nerdwallet.com: # with some exceptions allowed. | |
nerdwallet.com: # Disallow duggmirror from everything (does anyone know why?). | |
farfetch.com: # ALL YANDEX BOTS | |
google.co.ao: # AdsBot | |
google.co.ao: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
google.com.my: # AdsBot | |
google.com.my: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
dmm.com: #MP | |
dmm.com: #ppr | |
dmm.com: #my | |
dmm.com: #mono | |
google.ch: # AdsBot | |
google.ch: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
houzz.com: #marketing | |
houzz.com: # Scholarship Pages | |
houzz.com: #block buttonWidget and imageClipperUpload | |
houzz.com: #facets | |
houzz.com: #query/search pages | |
houzz.com: #email | |
houzz.com: #old pages | |
houzz.com: #marketplace filters | |
houzz.com: #sort filters | |
houzz.com: #view filters | |
houzz.com: #pros | |
houzz.com: #Reviews | |
houzz.com: #bots | |
houzz.com: #legacy | |
houzz.com: #cobrands | |
houzz.com: #ideabooks | |
houzz.com: #old pages | |
houzz.com: #adsbot | |
commbank.com.au: # /robots.txt file for https://www.commbank.com.au/ | |
commbank.com.au: #Blog | |
commbank.com.au: #PDFs | |
commbank.com.au: #.html | |
commbank.com.au: #Non CMS content | |
cra-arc.gc.ca: # ID: robots.txt 2006/01/17 | |
cra-arc.gc.ca: # Date Created: 2008-07-11 | |
cra-arc.gc.ca: # Date Modified: 2016-07-18/SB | |
cra-arc.gc.ca: # | |
cra-arc.gc.ca: # This is a file retrieved by webwalkers a.k.a. spiders that | |
cra-arc.gc.ca: # conform to a defacto standard. | |
cra-arc.gc.ca: # See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt> | |
cra-arc.gc.ca: # | |
cra-arc.gc.ca: # Any matching one of these patterns will be ignored by Search engine Crawlers. | |
cra-arc.gc.ca: # Use the Disallow: statement to prevent crawlers from indexing specific directories. | |
cra-arc.gc.ca: # | |
cra-arc.gc.ca: # Format is: | |
cra-arc.gc.ca: # User-agent: <name of spider> | |
cra-arc.gc.ca: # Disallow: <nothing> | <path> | |
cra-arc.gc.ca: # ----------------------------------------------------------------------------- | |
cra-arc.gc.ca: # | |
pinterest.jp: # Pinterest is hiring! | |
pinterest.jp: # | |
pinterest.jp: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.jp: # | |
pinterest.jp: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
hangseng.com: #block unexpected ins pdf links | |
hangseng.com: #others | |
hostgator.com: # Google AdSense | |
hostgator.com: # Digg mirror | |
hostgator.com: # Omni Explorer | |
hostgator.com: # SEO | |
un.org: # | |
un.org: # robots.txt | |
un.org: # | |
un.org: # This file is to prevent the crawling and indexing of certain parts | |
un.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
un.org: # and Google. By telling these "robots" where not to go on your site, | |
un.org: # you save bandwidth and server resources. | |
un.org: # | |
un.org: # This file will be ignored unless it is at the root of your host: | |
un.org: # Used: http://example.com/robots.txt | |
un.org: # Ignored: http://example.com/site/robots.txt | |
un.org: # | |
un.org: # For more information about the robots.txt standard, see: | |
un.org: # http://www.robotstxt.org/robotstxt.html | |
un.org: # CSS, JS, Images | |
un.org: # Directories | |
un.org: # Files | |
un.org: # Paths (clean URLs) | |
un.org: # Paths (no clean URLs) | |
people.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
people.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
people.com: # Sitemaps | |
people.com: #legacy | |
people.com: #Onecms | |
people.com: #content | |
people.com: #legacy | |
people.com: #Onecms | |
people.com: #content | |
people.com: #legacy | |
people.com: #Onecms | |
people.com: #content | |
people.com: #legacy | |
people.com: #Onecms | |
people.com: #content | |
caf.fr: # | |
caf.fr: # robots.txt | |
caf.fr: # | |
caf.fr: # This file is to prevent the crawling and indexing of certain parts | |
caf.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
caf.fr: # and Google. By telling these "robots" where not to go on your site, | |
caf.fr: # you save bandwidth and server resources. | |
caf.fr: # | |
caf.fr: # This file will be ignored unless it is at the root of your host: | |
caf.fr: # Used: http://example.com/robots.txt | |
caf.fr: # Ignored: http://example.com/site/robots.txt | |
caf.fr: # | |
caf.fr: # For more information about the robots.txt standard, see: | |
caf.fr: # http://www.robotstxt.org/robotstxt.html | |
caf.fr: # CSS, JS, Images | |
caf.fr: # Directories | |
caf.fr: # Files | |
caf.fr: # Paths (clean URLs) | |
caf.fr: # Paths (no clean URLs) | |
xiaomi.com: # 2015/12/11 | |
kizlarsoruyor.com: # www.kizlarsoruyor.com Robots.txt file | |
kizlarsoruyor.com: # Server: Web3 | |
kizlarsoruyor.com: # Last Updated: June 18 2020 | |
pajak.go.id: # | |
pajak.go.id: # robots.txt | |
pajak.go.id: # | |
pajak.go.id: # This file is to prevent the crawling and indexing of certain parts | |
pajak.go.id: # of your site by web crawlers and spiders run by sites like Yahoo! | |
pajak.go.id: # and Google. By telling these "robots" where not to go on your site, | |
pajak.go.id: # you save bandwidth and server resources. | |
pajak.go.id: # | |
pajak.go.id: # This file will be ignored unless it is at the root of your host: | |
pajak.go.id: # Used: http://example.com/robots.txt | |
pajak.go.id: # Ignored: http://example.com/site/robots.txt | |
pajak.go.id: # | |
pajak.go.id: # For more information about the robots.txt standard, see: | |
pajak.go.id: # http://www.robotstxt.org/robotstxt.html | |
pajak.go.id: # CSS, JS, Images | |
pajak.go.id: # Directories | |
pajak.go.id: # Files | |
pajak.go.id: # Paths (clean URLs) | |
pajak.go.id: # Paths (no clean URLs) | |
gab.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
xhamsterlive.com: # generated automatically | |
www.gov.uk: # Don't allow indexing of user needs pages | |
www.gov.uk: # https://ahrefs.com/robot/ crawls the site frequently | |
www.gov.uk: # https://www.deepcrawl.com/bot/ makes lots of requests. Ideally | |
www.gov.uk: # we'd slow it down rather than blocking it but it doesn't mention | |
www.gov.uk: # whether or not it supports crawl-delay. | |
www.gov.uk: # Complaints of 429 'Too many requests' seem to be coming from SharePoint servers | |
www.gov.uk: # (https://social.msdn.microsoft.com/Forums/en-US/3ea268ed-58a6-4166-ab40-d3f4fc55fef4) | |
www.gov.uk: # The robot doesn't recognise its User-Agent string, see the MS support article: | |
www.gov.uk: # https://support.microsoft.com/en-us/help/3019711/the-sharepoint-server-crawler-ignores-directives-in-robots-txt | |
www.gov.uk: # Google's crawler was sending requests for each variation of query param for the sectors page of licence-finder | |
www.gov.uk: # resulting in millions of requests a day. | |
intel.com: # robots.txt exclusion for www.intel.com/ - US | |
google.com.co: # AdsBot | |
google.com.co: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
uscis.gov: # | |
uscis.gov: # robots.txt | |
uscis.gov: # | |
uscis.gov: # This file is to prevent the crawling and indexing of certain parts | |
uscis.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
uscis.gov: # and Google. By telling these "robots" where not to go on your site, | |
uscis.gov: # you save bandwidth and server resources. | |
uscis.gov: # | |
uscis.gov: # This file will be ignored unless it is at the root of your host: | |
uscis.gov: # Used: http://example.com/robots.txt | |
uscis.gov: # Ignored: http://example.com/site/robots.txt | |
uscis.gov: # | |
uscis.gov: # For more information about the robots.txt standard, see: | |
uscis.gov: # http://www.robotstxt.org/robotstxt.html | |
uscis.gov: # Custom | |
uscis.gov: # CSS, JS, Images | |
uscis.gov: # Directories | |
uscis.gov: # Files | |
uscis.gov: # Paths (clean URLs) | |
uscis.gov: # Paths (no clean URLs) | |
anjuke.com: # | |
anjuke.com: # robots.txt for anjuke.com | |
anjuke.com: # The use of robots or other automated means to access the anjuke site | |
anjuke.com: # without the express permission of anjuke is strictly prohibited. | |
anjuke.com: # Notwithstanding the foregoing, anjuke may permit automated access to | |
anjuke.com: # access certain anjuke pages but soley for the limited purpose of | |
anjuke.com: # including content in publicly available search engines. Any other | |
anjuke.com: # use of robots or failure to obey the robots exclusion standards set | |
anjuke.com: # forth at <http://www.robotstxt.org/wc/exclusion.html> is strictly | |
anjuke.com: # prohibited. | |
anjuke.com: # v1 | |
teespring.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
mercadolibre.com.ve: #siteId: MLV | |
mercadolibre.com.ve: #country: venezuela | |
mercadolibre.com.ve: ##Block - Referidos | |
mercadolibre.com.ve: ##Block - siteinfo urls | |
mercadolibre.com.ve: ##Block - Cart | |
mercadolibre.com.ve: ##Block Checkout | |
mercadolibre.com.ve: ##Block - User Logged | |
mercadolibre.com.ve: #Shipping selector | |
mercadolibre.com.ve: ##Block - last search | |
mercadolibre.com.ve: ## Block - Profile - By Id | |
mercadolibre.com.ve: ## Block - Profile - By Id and role (old version) | |
mercadolibre.com.ve: ## Block - Profile - Leg. Req. | |
mercadolibre.com.ve: ##Block - noindex | |
mercadolibre.com.ve: # Mercado-Puntos | |
mercadolibre.com.ve: # Viejo mundo | |
mercadolibre.com.ve: ##Block recommendations listing | |
eluniverso.com: # | |
eluniverso.com: # robots.txt | |
eluniverso.com: # | |
eluniverso.com: # This file is to prevent the crawling and indexing of certain parts | |
eluniverso.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
eluniverso.com: # and Google. By telling these "robots" where not to go on your site, | |
eluniverso.com: # you save bandwidth and server resources. | |
eluniverso.com: # | |
eluniverso.com: # This file will be ignored unless it is at the root of your host: | |
eluniverso.com: # Used: http://example.com/robots.txt | |
eluniverso.com: # Ignored: http://example.com/site/robots.txt | |
eluniverso.com: # | |
eluniverso.com: # For more information about the robots.txt standard, see: | |
eluniverso.com: # http://www.robotstxt.org/robotstxt.html | |
eluniverso.com: # CSS, JS, Images | |
eluniverso.com: # Directories | |
eluniverso.com: # Files | |
eluniverso.com: # Paths (clean URLs) | |
eluniverso.com: # Paths (no clean URLs) | |
python.org: # Directions for robots. See this URL: | |
python.org: # http://www.robotstxt.org/robotstxt.html | |
python.org: # for a description of the file format. | |
python.org: # The Krugle web crawler (though based on Nutch) is OK. | |
python.org: # No one should be crawling us with Nutch. | |
python.org: # Hide old versions of the documentation and various large sets of files. | |
ancestry.com: # Domain:[www.ancestry.com] | |
ancestry.com: # | |
ancestry.com: # This file should reside in the root directory ancestry.XX/robots.txt | |
ancestry.com: # | |
ancestry.com: # Tells Scanning Robots Where They Are And Are Not Welcome | |
ancestry.com: # User-agent: can also specify by name; "*" is for all bots | |
ancestry.com: # Disallow: disallow if directive matches first part of requested path | |
ancestry.com: ## GB Updated 26 May 2020 | |
news.com.au: #Agent Specific Disallowed Sections | |
eventbrite.com: # http://www.google.com/adsbot.html - AdsBot ignores * wildcard | |
workable.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
gap.com: # Crafted for https://www.gap.com | |
credit-agricole.fr: # robots.txt | |
credit-agricole.fr: # @url: https://www.credit-agricole.fr | |
credit-agricole.fr: # Version : 2021-01-06 | |
credit-agricole.fr: #Ouverture crawl Inbenta | |
credit-agricole.fr: #Ouverture crawl Mediapartners | |
credit-agricole.fr: #Blocage repertoires et parametres techniques | |
credit-agricole.fr: #Autorisation crawl pagination | |
credit-agricole.fr: #Autorisation crawl thematique et rubrique du MAG | |
credit-agricole.fr: #Blocage des store locator marche | |
credit-agricole.fr: #Blocage des CR | |
credit-agricole.fr: ## INDEXATION CR SPECIFIQUES ## | |
credit-agricole.fr: ## DEBUT CADIF ## | |
credit-agricole.fr: #Ouverture du crawl des pages en index# | |
credit-agricole.fr: #Fermeture du crawl des pages en noindex# | |
credit-agricole.fr: ### FIN CADIF ## | |
credit-agricole.fr: ## DEBUT ANJOU MAINE ## | |
credit-agricole.fr: ### FIN ANJOU MAINE ## | |
credit-agricole.fr: ## DEBUT NORD DE FRANCE ## | |
credit-agricole.fr: ### FIN NORD DE FRANCE ## | |
credit-agricole.fr: ## DEBUT CENTRE LOIRE ## | |
credit-agricole.fr: ### FIN CENTRE LOIRE ## | |
credit-agricole.fr: ## DEBUT CENTRE FRANCE ## | |
credit-agricole.fr: ### FIN CENTRE FRANCE ## | |
credit-agricole.fr: ## DEBUT AQUITAINE ## | |
credit-agricole.fr: ### FIN AQUITAINE ## | |
credit-agricole.fr: ## DEBUT ALPES PROVENCE ## | |
credit-agricole.fr: ### FIN ALPES PROVENCE ## | |
credit-agricole.fr: ## DEBUT CHARENTE PERIGORD ## | |
credit-agricole.fr: ### FIN CHARENTE PERIGORD ## | |
credit-agricole.fr: ## DEBUT TOULOUSE 31 ## | |
credit-agricole.fr: ### FIN TOULOUSE 31 ## | |
credit-agricole.fr: ## DEBUT LOIRE HAUTE LOIRE ## | |
credit-agricole.fr: ### FIN LOIRE HAUTE LOIRE ## | |
credit-agricole.fr: ## DEBUT CMDS ## | |
credit-agricole.fr: ### FIN CMDS ## | |
credit-agricole.fr: ## DEBUT LORRAINE ## | |
credit-agricole.fr: ### FIN LORRAINE ## | |
credit-agricole.fr: ## DEBUT NORD MIDI PYRENEES ## | |
credit-agricole.fr: ### FIN NORD MIDI PYRENEES## | |
credit-agricole.fr: ## DEBUT PROVENCE COTE DAZUR ## | |
credit-agricole.fr: ### FIN PROVENCE COTE DAZUR## | |
credit-agricole.fr: ## DEBUT BRIE-PICARDIE ## | |
credit-agricole.fr: ### FIN BRIE-PICARDIE## | |
credit-agricole.fr: ## DEBUT CENTRE OUEST ## | |
credit-agricole.fr: ### FIN CENTRE OUEST## | |
credit-agricole.fr: ## DEBUT ILLE-ET-VILAINE ## | |
credit-agricole.fr: ### FIN ILLE-ET-VILAINE ## | |
credit-agricole.fr: ## DEBUT NORMANDIE ## | |
credit-agricole.fr: ### FIN NORMANDIE ## | |
credit-agricole.fr: ## DEBUT PYRENEES GASCOGNE ## | |
credit-agricole.fr: ### FIN PYRENEES GASCOGNE## | |
credit-agricole.fr: ## DEBUT SUD MEDITERRANEE ## | |
credit-agricole.fr: ### FIN SUD MEDITERRANEE ## | |
credit-agricole.fr: ## DEBUT TOURAINE-POITOU ## | |
credit-agricole.fr: ### FIN TOURAINE-POITOU ## | |
credit-agricole.fr: ## DEBUT VAL DE FRANCE ## | |
credit-agricole.fr: ### FIN VAL DE FRANCE ## | |
credit-agricole.fr: ## DEBUT ALSACE VOSGES ## | |
credit-agricole.fr: ### FIN ALSACE VOSGES ## | |
credit-agricole.fr: ## DEBUT NORMANDIE SEINE ## | |
credit-agricole.fr: ### FIN NORMANDIE SEINE ## | |
credit-agricole.fr: ## DEBUT CENTRE EST ## | |
credit-agricole.fr: ### FIN CENTRE EST ## | |
credit-agricole.fr: ## DEBUT CHAMPAGNE BOURGOGNE ## | |
credit-agricole.fr: ### FIN CHAMPAGNE BOURGOGNE ## | |
credit-agricole.fr: ## DEBUT DES SAVOIE ## | |
credit-agricole.fr: ### FIN DES SAVOIE ## | |
credit-agricole.fr: ## DEBUT GUADELOUPE ## | |
credit-agricole.fr: ### FIN GUADELOUPE ## | |
credit-agricole.fr: ## DEBUT LANGUEDOC ## | |
credit-agricole.fr: ### FIN LANGUEDOC ## | |
credit-agricole.fr: ## DEBUT MARTINIQUE ## | |
credit-agricole.fr: ### FIN MARTINIQUE ## | |
credit-agricole.fr: ## DEBUT ATLANTIQUE VENDEE ## | |
credit-agricole.fr: ### FIN ATLANTIQUE VENDEE## | |
credit-agricole.fr: ## DEBUT CORSE ## | |
credit-agricole.fr: ### FIN CORSE## | |
credit-agricole.fr: ## DEBUT COTES DARMOR ## | |
credit-agricole.fr: ### FIN COTES DARMOR## | |
credit-agricole.fr: ## DEBUT FINISTERE ## | |
credit-agricole.fr: ### FIN FINISTERE## | |
credit-agricole.fr: ## DEBUT FRANCH COMTE ## | |
credit-agricole.fr: ### FIN FRANCHE COMTE## | |
credit-agricole.fr: ## DEBUT MORBIHAN ## | |
credit-agricole.fr: ### FIN MORBIHAN## | |
credit-agricole.fr: ## DEBUT NORD EST ## | |
credit-agricole.fr: ### FIN NORD EST## | |
credit-agricole.fr: ## DEBUT REUNION ## | |
credit-agricole.fr: ### FIN REUNION## | |
credit-agricole.fr: ## DEBUT SUD RHONE ALPES ## | |
credit-agricole.fr: ### FIN SUD RHONE ALPES## | |
rahavard365.com: # We use a proprietary dashboard management system for our operations | |
tgju.org: # Allow all files ending with these extensions | |
ziprecruiter.com: # Block URLs that are likely added by js clipboard library | |
ziprecruiter.com: # Block temporary pages of the go seo app | |
seek.com.au: # Robots.txt file for www.seek.com.au | |
seek.com.au: # URLs are case sensitive! | |
seek.com.au: # All other agents will not spider | |
seek.com.au: # LinkedIn Bot | |
seek.com.au: # Google Ad Sense | |
seek.com.au: # Bing Ads | |
dostor.org: # SYNC 2019 | |
dostor.org: # HTTPS www.dostor.org | |
iobit.com: # Robots.txt Begin | |
gamespot.com: # robots.txt for https://www.gamespot.com/ | |
bbb.org: #block all user agents from the following | |
rsafrwd.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
rsafrwd.com: #content{margin:0 0 0 2%;position:relative;} | |
net-a-porter.com: #comment line to mitigate potential BOM encoding issues | |
net-a-porter.com: #new rules | |
net-a-porter.com: #legacy local sites | |
net-a-porter.com: #new local sites | |
sec.gov: # | |
sec.gov: # robots.txt | |
sec.gov: # | |
sec.gov: # This file is to prevent the crawling and indexing of certain parts | |
sec.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sec.gov: # and Google. By telling these "robots" where not to go on your site, | |
sec.gov: # you save bandwidth and server resources. | |
sec.gov: # | |
sec.gov: # This file will be ignored unless it is at the root of your host: | |
sec.gov: # Used: http://example.com/robots.txt | |
sec.gov: # Ignored: http://example.com/site/robots.txt | |
sec.gov: # | |
sec.gov: # For more information about the robots.txt standard, see: | |
sec.gov: # http://www.robotstxt.org/robotstxt.html | |
sec.gov: # CSS, JS, Images | |
sec.gov: # Directories | |
sec.gov: # Files | |
sec.gov: # Paths (clean URLs) | |
sec.gov: #Commented out to support SEC.gov Site Index | |
sec.gov: # Disallow: /search/ | |
sec.gov: # Paths (no clean URLs) | |
sec.gov: #SEC | |
sec.gov: #INVESTOR | |
google.be: # AdsBot | |
google.be: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
instacart.com: ### Any bot scraping or crawling this site must abide by | |
instacart.com: ### Instacart's Terms of Service https://www.instacart.com/terms | |
instacart.com: ### `-:///:-. | |
instacart.com: ### /ssssssssso/. | |
instacart.com: ### :sssssssssssss/ | |
instacart.com: ### +ssssssssssssss` | |
instacart.com: ### +ssssssssssssso | |
instacart.com: ### :sssssssssssso` | |
instacart.com: ### `ossssssssss/` | |
instacart.com: ### :ssssssss+. `.-.` | |
instacart.com: ### /sssss+. `:osssso/ | |
instacart.com: ### .:///::-` :+/-` `/ossssssss/ | |
instacart.com: ### `-/+++++++++/:. :osssssssssss` | |
instacart.com: ### -/++++++++++++++/. `ossssssssssso | |
instacart.com: ### `/++++++++++++++++++/` .-/++ooso+/` | |
instacart.com: ### -++++++++++++++++++++++- | |
instacart.com: ### `/++++++++++++++++++++++++- | |
instacart.com: ### ./++++++++++++++++++++++++++` | |
instacart.com: ### -++++++++++++++++++++++++++++. | |
instacart.com: ### :++++++++++++++++++++++++++++/` | |
instacart.com: ### :++++++++++++++++++++++++++++:` | |
instacart.com: ### :+++++++++++++++++++++++++++:` | |
instacart.com: ### :+++++++++++++++++++++++++/-` | |
instacart.com: ### :++++++++++++++++++++++++:. | |
instacart.com: ### :+++++++++++++++++++++/:. | |
instacart.com: ### -+++++++++++++++++++/:. | |
instacart.com: ### .+++++++++++++++++:-` | |
instacart.com: ### `/+++++++++++++/-. | |
instacart.com: ### :+++++++++/:-. | |
instacart.com: ### +++++/:-.` | |
instacart.com: ### `..` | |
instacart.com: ### If you're not a bot, we're hiring: https://instacart.careers/current-openings/ | |
teamviewer.com: #Valid for all user agents | |
teamviewer.com: #Disallow Global Website# | |
teamviewer.com: #Disallow WP# | |
teamviewer.com: #Allow Exceptions for images, scripts, pdfs# | |
teamviewer.com: #Sitemaps# | |
teamviewer.com: #Changed on 2018-12-11 SeSi# | |
kajabi.com: # _ __ _ _ _ | |
kajabi.com: # | | / / (_) | | (_) | |
kajabi.com: # | |/ / __ _ _ __ _| |__ _ | |
kajabi.com: # | \ / _` | |/ _` | '_ \| | | |
kajabi.com: # | |\ \ (_| | | (_| | |_) | | | |
kajabi.com: # \_| \_/\__,_| |\__,_|_.__/|_| | |
kajabi.com: # _/ | | |
kajabi.com: # |__/ | |
oxu.az: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
oxu.az: # | |
oxu.az: # To ban all spiders from the entire site uncomment the next two lines: | |
oxu.az: #User-agent: * | |
oxu.az: #Crawl-delay: 1 | |
staples.com: #Last Modified On 2020-06-17T15:11:23.706Z | |
staples.com: # allow all robots | |
staples.com: #block crawling of unexposed pages | |
staples.com: #2/4/20 remove addt'l 404s | |
staples.com: #12/06/19 remove problematic 404s | |
staples.com: #Regular entries | |
staples.com: #Patterns to disallow for BOPiS | |
staples.com: #06/25/2014- Additional patterns to remove from indexes | |
staples.com: #06/11/20 Updated Patterns to disallow for PNI | |
dawn.com: # test tool | |
dawn.com: # https://www.google.com/webmasters/tools/ (Crawl > Blocked URLs) | |
novinky.cz: # dont crawl pagination on article pages | |
novinky.cz: # dont crawl pagination on article pages | |
novinky.cz: # dont crawl the same page with opened menu | |
novinky.cz: # dont crawl the same page with opened gallery | |
lanacion.com.ar: # Robots.txt (archivo) | |
hermes.com: # | |
hermes.com: # prod_hermes_com_robots.txt | |
hermes.com: # | |
hermes.com: # This file is to prevent the crawling and indexing of certain parts | |
hermes.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
hermes.com: # and Google. By telling these "robots" where not to go on your site, | |
hermes.com: # you save bandwidth and server resources. | |
hermes.com: # | |
hermes.com: # This file will be ignored unless it is at the root of your host: | |
hermes.com: # Used: http://example.com/robots.txt | |
hermes.com: # Ignored: http://example.com/site/robots.txt | |
hermes.com: # | |
hermes.com: # For more information about the robots.txt standard, see: | |
hermes.com: # http://www.robotstxt.org/robotstxt.html | |
hermes.com: # For Drupal folders and files, wildcards used for directories 1) country 2) language | |
hermes.com: # Directories | |
hermes.com: # Files | |
hermes.com: # Paths (clean URLs) | |
hermes.com: # Paths (no clean URLs) | |
hermes.com: # added regarding the actual site structure | |
hermes.com: # disallow search URL to be indexed | |
hermes.com: # For Magento folders and files, wildcards used for directories 1) country 2) language | |
hermes.com: # Directories | |
hermes.com: # Paths (clean URLs) | |
hermes.com: # Files | |
hermes.com: # Paths (no clean URLs) | |
hermes.com: # Waiting bugfix | |
hermes.com: #Params | |
hermes.com: # For China only | |
hermes.com: # All sitemaps listed below : | |
ebay.it: ## BEGIN FILE ### | |
ebay.it: # | |
ebay.it: # allow-all | |
ebay.it: # DR | |
ebay.it: # | |
ebay.it: # The use of robots or other automated means to access the eBay site | |
ebay.it: # without the express permission of eBay is strictly prohibited. | |
ebay.it: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.it: # access certain eBay pages but soley for the limited purpose of | |
ebay.it: # including content in publicly available search engines. Any other | |
ebay.it: # use of robots or failure to obey the robots exclusion standards set | |
ebay.it: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.it: # prohibited. | |
ebay.it: # | |
ebay.it: # v10_ROW_Feb_2021 | |
ebay.it: ### DIRECTIVES ### | |
ebay.it: # VIS Sitemaps | |
ebay.it: # PRP Sitemaps | |
ebay.it: # CLP Sitemaps | |
ebay.it: # BROWSE Sitemaps | |
ebay.it: ### END FILE ### | |
abc.net.au: # robots.txt for http://www.abc.net.au/ -- ABC Online | |
abc.net.au: #OPSSD-340 2015/5/5 | |
abc.net.au: #INNG-46: 2014-12-30 | |
abc.net.au: # Added for corporate communications, as they have migrated to a new site | |
abc.net.au: # Added for Homepage Beta, prevent indexing during public beta | |
abc.net.au: # Added for WCMS Tennent testing, not a public | |
abc.net.au: ######################################## | |
gtmetrix.com: # GTmetrix robots.txt file | |
google.nl: # AdsBot | |
google.nl: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
jkforum.net: # | |
jkforum.net: # robots.txt for Discuz! X2 | |
jkforum.net: # | |
wiktionary.org: # | |
wiktionary.org: # Please note: There are a lot of pages on this site, and there are | |
wiktionary.org: # some misbehaved spiders out there that go _way_ too fast. If you're | |
wiktionary.org: # irresponsible, your access to the site may be blocked. | |
wiktionary.org: # | |
wiktionary.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN | |
wiktionary.org: # and ignoring 429 ratelimit responses, claims to respect robots: | |
wiktionary.org: # http://mj12bot.com/ | |
wiktionary.org: # advertising-related bots: | |
wiktionary.org: # Wikipedia work bots: | |
wiktionary.org: # Crawlers that are kind enough to obey, but which we'd rather not have | |
wiktionary.org: # unless they're feeding search engines. | |
wiktionary.org: # Some bots are known to be trouble, particularly those designed to copy | |
wiktionary.org: # entire sites. Please obey robots.txt. | |
wiktionary.org: # Misbehaving: requests much too fast: | |
wiktionary.org: # | |
wiktionary.org: # Sorry, wget in its recursive mode is a frequent problem. | |
wiktionary.org: # Please read the man page and use it properly; there is a | |
wiktionary.org: # --wait option you can use to set the delay between hits, | |
wiktionary.org: # for instance. | |
wiktionary.org: # | |
wiktionary.org: # | |
wiktionary.org: # The 'grub' distributed client has been *very* poorly behaved. | |
wiktionary.org: # | |
wiktionary.org: # | |
wiktionary.org: # Doesn't follow robots.txt anyway, but... | |
wiktionary.org: # | |
wiktionary.org: # | |
wiktionary.org: # Hits many times per second, not acceptable | |
wiktionary.org: # http://www.nameprotect.com/botinfo.html | |
wiktionary.org: # A capture bot, downloads gazillions of pages with no public benefit | |
wiktionary.org: # http://www.webreaper.net/ | |
wiktionary.org: # | |
wiktionary.org: # Friendly, low-speed bots are welcome viewing article pages, but not | |
wiktionary.org: # dynamically-generated pages please. | |
wiktionary.org: # | |
wiktionary.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your | |
wiktionary.org: # bot supports such a thing using the 'Crawl-delay' or another | |
wiktionary.org: # instruction, please let us know. | |
wiktionary.org: # | |
wiktionary.org: # There is a special exception for API mobileview to allow dynamic | |
wiktionary.org: # mobile web & app views to load section content. | |
wiktionary.org: # These views aren't HTTP-cached but use parser cache aggressively | |
wiktionary.org: # and don't expose special: pages etc. | |
wiktionary.org: # | |
wiktionary.org: # Another exception is for REST API documentation, located at | |
wiktionary.org: # /api/rest_v1/?doc. | |
wiktionary.org: # | |
wiktionary.org: # | |
wiktionary.org: # ar: | |
wiktionary.org: # | |
wiktionary.org: # dewiki: | |
wiktionary.org: # T6937 | |
wiktionary.org: # sensible deletion and meta user discussion pages: | |
wiktionary.org: # 4937#5 | |
wiktionary.org: # T14111 | |
wiktionary.org: # T15961 | |
wiktionary.org: # | |
wiktionary.org: # enwiki: | |
wiktionary.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for | |
wiktionary.org: # their name. See T6776 | |
wiktionary.org: # T15398 | |
wiktionary.org: # T16075 | |
wiktionary.org: # T13261 | |
wiktionary.org: # T12288 | |
wiktionary.org: # T16793 | |
wiktionary.org: # | |
wiktionary.org: # eswiki: | |
wiktionary.org: # T8746 | |
wiktionary.org: # | |
wiktionary.org: # fiwiki: | |
wiktionary.org: # T10695 | |
wiktionary.org: # | |
wiktionary.org: # hewiki: | |
wiktionary.org: #T11517 | |
wiktionary.org: # | |
wiktionary.org: # huwiki: | |
wiktionary.org: # | |
wiktionary.org: # itwiki: | |
wiktionary.org: # T7545 | |
wiktionary.org: # | |
wiktionary.org: # jawiki | |
wiktionary.org: # T7239 | |
wiktionary.org: # nowiki | |
wiktionary.org: # T13432 | |
wiktionary.org: # | |
wiktionary.org: # plwiki | |
wiktionary.org: # T10067 | |
wiktionary.org: # | |
wiktionary.org: # ptwiki: | |
wiktionary.org: # T7394 | |
wiktionary.org: # | |
wiktionary.org: # rowiki: | |
wiktionary.org: # T14546 | |
wiktionary.org: # | |
wiktionary.org: # ruwiki: | |
wiktionary.org: # | |
wiktionary.org: # svwiki: | |
wiktionary.org: # T12229 | |
wiktionary.org: # T13291 | |
wiktionary.org: # | |
wiktionary.org: # zhwiki: | |
wiktionary.org: # T7104 | |
wiktionary.org: # | |
wiktionary.org: # sister projects | |
wiktionary.org: # | |
wiktionary.org: # enwikinews: | |
wiktionary.org: # T7340 | |
wiktionary.org: # | |
wiktionary.org: # itwikinews | |
wiktionary.org: # T11138 | |
wiktionary.org: # | |
wiktionary.org: # enwikiquote: | |
wiktionary.org: # T17095 | |
wiktionary.org: # | |
wiktionary.org: # enwikibooks | |
wiktionary.org: # | |
wiktionary.org: # working... | |
wiktionary.org: # | |
wiktionary.org: # | |
wiktionary.org: # | |
wiktionary.org: #----------------------------------------------------------# | |
wiktionary.org: # | |
wiktionary.org: # | |
wiktionary.org: # | |
who.int: ### Version Information # | |
who.int: ################################################### | |
who.int: ### Version: V3.2018.05.828 | |
who.int: ### Updated: Tue May 8 11:37:04 SAST 2018 | |
who.int: ### Bad Bot Count: 527 | |
who.int: ################################################### | |
who.int: ### Version Information ## | |
myfitnesspal.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
agoda.com: # ( ( | |
agoda.com: # )\ ( ( )\ ) ) | |
agoda.com: # ((((_)( )\))( ( (()/( ( /( | |
agoda.com: # )\ _ )\((_))\ )\ ((_)))(_)) | |
agoda.com: # (_)_\(_)(()(_)((_) _| |((_)_ | |
agoda.com: # / _ \ / _` |/ _ \/ _` |/ _` | | |
agoda.com: # /_/ \_\\__, |\___/\__,_|\__,_| | |
agoda.com: # |___/ | |
agoda.com: # | |
agoda.com: # | |
agoda.com: # If you like bots this much, then why not help us rank for all the things. Email seoPros@agoda.com | |
agoda.com: # | |
agoda.com: # | |
elmundo.es: # version 0.0.1 | |
elmundo.es: # Bloqueo de bots y crawlers poco utiles | |
n-tv.de: # robots.txt for n-tv.de | |
mrporter.com: #comment line to mitigate potential BOM encoding issues | |
mrporter.com: #new rules | |
mrporter.com: #local sites | |
cookpad.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
almaghreb24.com: # XML Sitemap & Google News version 5.2.3 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
airbnb.com: # /////// | |
airbnb.com: # // // | |
airbnb.com: # // // | |
airbnb.com: # // // /// /// /// | |
airbnb.com: # // // /// /// | |
airbnb.com: # // /// // //// /// /// /// //// /// //// /// //// /// //// | |
airbnb.com: # // /// /// // ////////// /// ////////// /////////// ////////// /////////// | |
airbnb.com: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.com: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.com: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.com: # // // // // ////////// /// /// ////////// /// /// ////////// | |
airbnb.com: # // ///// // | |
airbnb.com: # // ///// // | |
airbnb.com: # // /// /// // | |
airbnb.com: # ////// ////// | |
airbnb.com: # | |
airbnb.com: # | |
airbnb.com: # We thought you'd never make it! | |
airbnb.com: # We hope you feel right at home in this file...unless you're a disallowed subfolder. | |
airbnb.com: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering | |
airbnb.com: # There's even a bring your robot to work day. | |
moneyforward.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to | |
moneyforward.com: # use the robots.txt file | |
moneyforward.com: # | |
moneyforward.com: # To ban all spiders from the entire site uncomment the next two lines: | |
sabq.org: # www.robotstxt.org/ | |
sabq.org: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
kotak.com: #contxt{width:100%} | |
brainly.in: #Brainly Robots.txt 31.07.2017 | |
brainly.in: # Disallow Marketing bots | |
brainly.in: #Disallow exotic search engine crawlers | |
brainly.in: #Disallow other crawlers | |
brainly.in: # Good bots whitelisting: | |
brainly.in: #Other bots | |
brainly.in: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring | |
brainly.in: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*) | |
brainly.in: #Obot - IBM X-Force service | |
brainly.in: #SafeDNSBot (https://www.safedns.com/searchbot) | |
5acbd.com: # Robots.txt file from http://www.5acbd.com | |
5acbd.com: # All robots will spider the domain1 | |
youth.cn: # robots.txt for youth.cn | |
google.pt: # AdsBot | |
google.pt: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
thehindu.com: # Blocked until duplicate profile bylines fixed | |
benzinga.com: # | |
benzinga.com: # robots.txt | |
benzinga.com: # | |
benzinga.com: # This file is to prevent the crawling and indexing of certain parts | |
benzinga.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
benzinga.com: # and Google. By telling these "robots" where not to go on your site, | |
benzinga.com: # you save bandwidth and server resources. | |
benzinga.com: # | |
benzinga.com: # This file will be ignored unless it is at the root of your host: | |
benzinga.com: # Used: http://example.com/robots.txt | |
benzinga.com: # Ignored: http://example.com/site/robots.txt | |
benzinga.com: # | |
benzinga.com: # For more information about the robots.txt standard, see: | |
benzinga.com: # http://www.robotstxt.org/robotstxt.html | |
benzinga.com: # | |
benzinga.com: # For syntax checking, see: | |
benzinga.com: # http://www.frobee.com/robots-txt-check | |
benzinga.com: # Directories | |
benzinga.com: # Files | |
benzinga.com: # Paths (clean URLs) | |
benzinga.com: # Paths (no clean URLs) | |
google.at: # AdsBot | |
google.at: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
brainly.com: #Brainly Robots.txt 31.07.2017 | |
brainly.com: # Disallow Marketing bots | |
brainly.com: #Disallow exotic search engine crawlers | |
brainly.com: #Disallow other crawlers | |
brainly.com: # Good bots whitelisting: | |
brainly.com: #Other bots | |
brainly.com: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring | |
brainly.com: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*) | |
brainly.com: #Obot - IBM X-Force service | |
brainly.com: #SafeDNSBot (https://www.safedns.com/searchbot) | |
gettyimages.com: # AhrefsBot | |
kayak.com: #Build version: R555b | |
kayak.com: #Generated on: Wed Feb 24 01:00:01 EST 2021 | |
namecheap.com: # parameters | |
namecheap.com: # Sitemap link | |
cbsnews.com: # www.robotstxt.org/ | |
cbsnews.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
cbsnews.com: # PER CBS-N ENG FINAL ROUTES DOC | |
cfsbcn.com: # Robots For CFSBCN.CN | |
tarafdari.com: # | |
tarafdari.com: # robots.txt | |
tarafdari.com: # | |
tarafdari.com: # This file is to prevent the crawling and indexing of certain parts | |
tarafdari.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
tarafdari.com: # and Google. By telling these "robots" where not to go on your site, | |
tarafdari.com: # you save bandwidth and server resources. | |
tarafdari.com: # | |
tarafdari.com: # This file will be ignored unless it is at the root of your host: | |
tarafdari.com: # Used: http://example.com/robots.txt | |
tarafdari.com: # Ignored: http://example.com/site/robots.txt | |
tarafdari.com: # | |
tarafdari.com: # For more information about the robots.txt standard, see: | |
tarafdari.com: # http://www.robotstxt.org/robotstxt.html | |
tarafdari.com: # CSS, JS, Images | |
tarafdari.com: # Directories | |
tarafdari.com: # Files | |
tarafdari.com: # Paths (clean URLs) | |
tarafdari.com: # Paths (no clean URLs) | |
lotterypost.com: # robots.txt for https://www.lotterypost.com/ | |
52pojie.cn: # | |
52pojie.cn: # robots.txt for Discuz! X3.2 | |
52pojie.cn: # | |
usaa.com: # robots.txt - for USAA | |
usaa.com: # updated 2/22/2021 | |
usaa.com: # served from ns | |
dikaiologitika.gr: # If the Joomla site is installed within a folder | |
dikaiologitika.gr: # eg www.example.com/joomla/ then the robots.txt file | |
dikaiologitika.gr: # MUST be moved to the site root | |
dikaiologitika.gr: # eg www.example.com/robots.txt | |
dikaiologitika.gr: # AND the joomla folder name MUST be prefixed to all of the | |
dikaiologitika.gr: # paths. | |
dikaiologitika.gr: # eg the Disallow rule for the /administrator/ folder MUST | |
dikaiologitika.gr: # be changed to read | |
dikaiologitika.gr: # Disallow: /joomla/administrator/ | |
dikaiologitika.gr: # | |
dikaiologitika.gr: # For more information about the robots.txt standard, see: | |
dikaiologitika.gr: # https://www.robotstxt.org/orig.html | |
brainly.lat: #Brainly Robots.txt 31.07.2017 | |
brainly.lat: # Disallow Marketing bots | |
brainly.lat: #Disallow exotic search engine crawlers | |
brainly.lat: #Disallow other crawlers | |
brainly.lat: # Good bots whitelisting: | |
brainly.lat: #Other bots | |
brainly.lat: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring | |
brainly.lat: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*) | |
brainly.lat: #Obot - IBM X-Force service | |
brainly.lat: #SafeDNSBot (https://www.safedns.com/searchbot) | |
sap.com: # | |
sap.com: # Welcome to www.sap.com | |
sap.com: # | |
sap.com: # robots.txt for https://www.sap.com | |
sap.com: # | |
sap.com: # Version 2021-01-20 | |
sap.com: # | |
vitalsource.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
google.hu: # AdsBot | |
google.hu: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
sephora.com: ######################################################## | |
sephora.com: # | |
sephora.com: # Sephora.com Robots File | |
sephora.com: # | |
sephora.com: ######################################################## | |
mcafee.com: # directory exclusion used for mcafee.com | |
mcafee.com: # | |
mcafee.com: # | |
mcafee.com: ########################################################## | |
mcafee.com: #Consumer Sitemap Starts | |
mcafee.com: ########################################################## | |
mcafee.com: ########################################################## | |
mcafee.com: #Consumer Sitemap Ends | |
mcafee.com: ########################################################## | |
mcafee.com: ########################################################## | |
mcafee.com: #Enterprise Starts | |
mcafee.com: ########################################################## | |
mcafee.com: # | |
mcafee.com: # Disallow US expired files here (while waiting for regional links to the page to be removed) | |
mcafee.com: # Disallow: /us/path/file.ext | |
mcafee.com: # | |
mcafee.com: # | |
mcafee.com: # Disallow no_crawl folder | |
mcafee.com: # Disallow: /no_crawl/ | |
mcafee.com: ########################################################## | |
mcafee.com: #Consumer | |
mcafee.com: ########################################################## | |
mcafee.com: # 2020-05-31T11:52:52.760 | |
mcafee.com: ########################################################## | |
mcafee.com: # /Consumer | |
mcafee.com: ########################################################## | |
ebay.ca: ## BEGIN FILE ### | |
ebay.ca: # | |
ebay.ca: # allow-all | |
ebay.ca: # DR | |
ebay.ca: # | |
ebay.ca: # The use of robots or other automated means to access the eBay site | |
ebay.ca: # without the express permission of eBay is strictly prohibited. | |
ebay.ca: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.ca: # access certain eBay pages but soley for the limited purpose of | |
ebay.ca: # including content in publicly available search engines. Any other | |
ebay.ca: # use of robots or failure to obey the robots exclusion standards set | |
ebay.ca: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.ca: # prohibited. | |
ebay.ca: # | |
ebay.ca: # v10_ROW_Feb_2021 | |
ebay.ca: ### DIRECTIVES ### | |
ebay.ca: # SSRP Sitemaps | |
ebay.ca: # VIS Sitemaps | |
ebay.ca: # PRP Sitemaps | |
ebay.ca: ### END FILE ### | |
nbcsports.com: # | |
nbcsports.com: # robots.txt | |
nbcsports.com: # | |
nbcsports.com: # This file is to prevent the crawling and indexing of certain parts | |
nbcsports.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nbcsports.com: # and Google. By telling these "robots" where not to go on your site, | |
nbcsports.com: # you save bandwidth and server resources. | |
nbcsports.com: # | |
nbcsports.com: # This file will be ignored unless it is at the root of your host: | |
nbcsports.com: # Used: http://example.com/robots.txt | |
nbcsports.com: # Ignored: http://example.com/site/robots.txt | |
nbcsports.com: # | |
nbcsports.com: # For more information about the robots.txt standard, see: | |
nbcsports.com: # http://www.robotstxt.org/robotstxt.html | |
nbcsports.com: # JS/CSS | |
nbcsports.com: # Directories | |
nbcsports.com: # Files | |
nbcsports.com: # Paths (clean URLs) | |
nbcsports.com: # Paths (no clean URLs) | |
nbcsports.com: # Sitemaps | |
cleartax.in: # www.robotstxt.org/ | |
cleartax.in: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
fool.com: # $Revision$ 8.22.19 | |
fool.com: # /robots.txt file for http://www.fool.com/ (prod) | |
fool.com: # Web Application Stress Tool | |
fool.com: # MauiBot | |
fool.com: # else | |
uca.fr: # urls techniques : | |
semrush.com: # Community rules | |
semrush.com: # Features new pages | |
semrush.com: #webinars | |
semrush.com: #landing | |
semrush.com: #academy | |
semrush.com: # Sitemap files | |
rtl-theme.com: # Google Image | |
stripchat.com: # generated automatically | |
iefimerida.gr: # | |
iefimerida.gr: # robots.txt | |
iefimerida.gr: # | |
iefimerida.gr: # This file is to prevent the crawling and indexing of certain parts | |
iefimerida.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
iefimerida.gr: # and Google. By telling these "robots" where not to go on your site, | |
iefimerida.gr: # you save bandwidth and server resources. | |
iefimerida.gr: # | |
iefimerida.gr: # This file will be ignored unless it is at the root of your host: | |
iefimerida.gr: # Used: http://example.com/robots.txt | |
iefimerida.gr: # Ignored: http://example.com/site/robots.txt | |
iefimerida.gr: # | |
iefimerida.gr: # For more information about the robots.txt standard, see: | |
iefimerida.gr: # http://www.robotstxt.org/robotstxt.html | |
iefimerida.gr: # CSS, JS, Images | |
iefimerida.gr: # Directories | |
iefimerida.gr: # Files | |
iefimerida.gr: # Paths (clean URLs) | |
iefimerida.gr: # Paths (no clean URLs) | |
ed.gov: # | |
ed.gov: # robots.txt | |
ed.gov: # | |
ed.gov: # This file is to prevent the crawling and indexing of certain parts | |
ed.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ed.gov: # and Google. By telling these "robots" where not to go on your site, | |
ed.gov: # you save bandwidth and server resources. | |
ed.gov: # | |
ed.gov: # This file will be ignored unless it is at the root of your host: | |
ed.gov: # Used: http://example.com/robots.txt | |
ed.gov: # Ignored: http://example.com/site/robots.txt | |
ed.gov: # | |
ed.gov: # For more information about the robots.txt standard, see: | |
ed.gov: # http://www.robotstxt.org/robotstxt.html | |
ed.gov: # CSS, JS, Images | |
ed.gov: # Directories | |
ed.gov: # Files | |
ed.gov: # Paths (clean URLs) | |
ed.gov: # Paths (no clean URLs) | |
ally.com: # robots.txt for http://www.ally.com | |
mobile.de: ###robots.txt www.mobile.de### | |
mobile.de: ###robots.txt END### | |
garmin.com: # Allow all agents to get all stuff | |
garmin.com: # ...except this stuff... | |
garmin.com: # pointless without POSTed form data: | |
garmin.com: # not for the general public: | |
xspdf.com: # Disallow: Sistrix | |
xspdf.com: # Disallow: Sistrix | |
xspdf.com: # Disallow: Sistrix | |
xspdf.com: # Disallow: SEOkicks-Robot | |
xspdf.com: # Disallow: jobs.de-Robot | |
xspdf.com: # Backlink Analysis | |
xspdf.com: # Bot der Leipziger Unister Holding GmbH | |
xspdf.com: # http://www.opensiteexplorer.org/dotbot | |
xspdf.com: # http://www.searchmetrics.com | |
xspdf.com: # http://www.majestic12.co.uk/projects/dsearch/mj12bot.php | |
xspdf.com: # http://www.domaintools.com/webmasters/surveybot.php | |
xspdf.com: # http://www.seodiver.com/bot | |
xspdf.com: # http://openlinkprofiler.org/bot | |
xspdf.com: # http://www.wotbox.com/bot/ | |
xspdf.com: # http://www.meanpath.com/meanpathbot.html | |
xspdf.com: # http://www.backlinktest.com/crawler.html | |
xspdf.com: # http://www.brandwatch.com/magpie-crawler/ | |
xspdf.com: # http://filterdb.iss.net/crawler/ | |
xspdf.com: # http://webmeup-crawler.com | |
xspdf.com: # https://megaindex.com/crawler | |
xspdf.com: # http://www.cloudservermarket.com | |
xspdf.com: # http://www.trendiction.de/de/publisher/bot | |
xspdf.com: # http://www.exalead.com | |
xspdf.com: # http://www.career-x.de/bot.html | |
xspdf.com: # https://www.lipperhey.com/en/about/ | |
xspdf.com: # https://www.lipperhey.com/en/about/ | |
xspdf.com: # https://turnitin.com/robot/crawlerinfo.html | |
xspdf.com: # http://help.coccoc.com/ | |
xspdf.com: # ubermetrics-technologies.com | |
xspdf.com: # datenbutler.de | |
xspdf.com: # http://searchgears.de/uber-uns/crawling-faq.html | |
xspdf.com: # http://commoncrawl.org/faq/ | |
xspdf.com: # https://www.qwant.com/ | |
xspdf.com: # http://linkfluence.net/ | |
xspdf.com: # http://www.botje.com/plukkie.htm | |
xspdf.com: # https://www.safedns.com/searchbot | |
xspdf.com: # http://www.haosou.com/help/help_3_2.html | |
xspdf.com: # http://www.haosou.com/help/help_3_2.html | |
xspdf.com: # http://www.moz.com/dp/rogerbot | |
xspdf.com: # http://www.openhose.org/bot.html | |
xspdf.com: # http://www.screamingfrog.co.uk/seo-spider/ | |
xspdf.com: # http://thumbsniper.com | |
xspdf.com: # http://www.radian6.com/crawler | |
xspdf.com: # http://cliqz.com/company/cliqzbot | |
xspdf.com: # https://www.aihitdata.com/about | |
xspdf.com: # http://www.trendiction.com/en/publisher/bot | |
xspdf.com: # http://seocompany.store | |
xspdf.com: # https://github.com/yasserg/crawler4j/ | |
xspdf.com: # http://warebay.com/bot.html | |
xspdf.com: # http://www.website-datenbank.de/ | |
xspdf.com: # http://law.di.unimi.it/BUbiNG.html | |
xspdf.com: # http://www.linguee.com/bot; bot@linguee.com | |
xspdf.com: # https://www.semrush.com/bot/ | |
xspdf.com: # www.sentibot.eu | |
xspdf.com: # http://velen.io | |
xspdf.com: # https://moz.com/help/guides/moz-procedures/what-is-rogerbot | |
xspdf.com: # http://www.garlik.com | |
xspdf.com: # https://www.gosign.de/typo3-extension/typo3-sicherheitsmonitor/ | |
xspdf.com: # http://www.siteliner.com/bot | |
xspdf.com: # https://sabsim.com | |
xspdf.com: # http://ltx71.com/ | |
ft.com: # all use of FT content is subject to the Terms & Conditions and Copyright Policy set out on FT.com | |
groupon.com: # Hi there, | |
groupon.com: # Now that you're checking out our robots.txt file, and you clearly aren't a robot, you must be interested in Groupon's SEO. | |
groupon.com: # We just happen to be growing our SEO Team with experienced white-hat SEOs like yourself. So run - don't crawl - and fill out an application today. | |
groupon.com: # Visit https://jobs.groupon.com/search?keywords=seo | |
groupon.com: # GSM: https://www.groupon.com | |
groupon.com: # Jira SEO-11777 | |
24h.com.vn: #User-agent: * | |
24h.com.vn: #Disallow: / | |
alsbbora.info: # WebMatrix 1.0 | |
shutterfly.com: # Tells Scanning Robots Where They Are And Are Not Welcome | |
shutterfly.com: # | |
shutterfly.com: # User-agent: can also specify by name; "*" is for everyone | |
shutterfly.com: # Disallow: disallow if this matches first part of requested path | |
shutterfly.com: # | |
shutterfly.com: # Disable click for prints | |
shutterfly.com: # disable creation path crawling | |
shutterfly.com: # do not allow shares to be indexed | |
domaintools.com: # Notice: if you would like to crawl DomainTools you can | |
domaintools.com: # contact us here: https://www.domaintools.com/contact/ | |
domaintools.com: # to apply for white listing. | |
domaintools.com: # Moz | |
avg.com: #Nothing interesting to see here, but if you want free antivirus | |
avg.com: #click here: https://www.avg.com/free-antivirus-download | |
jumia.com.ng: # Public site | |
jumia.com.ng: # bot must follow this rules | |
jumia.com.ng: # Site scaping is permited IF the user-agent is clearly identify it as a bot and | |
jumia.com.ng: # the bot owner and is using less than 200 request per minute | |
jumia.com.ng: # Bot identification must have a owner url or contact if we need to contact them | |
jumia.com.ng: # Bots with fake user-agent will be blocked | |
jumia.com.ng: # Bots trying to use too many IPs to increase performance may also be blocked. | |
jumia.com.ng: # If you need more than 200 RPM, please contact the email techops at jumia com | |
jumia.com.ng: # | |
jumia.com.ng: # Sitemap files | |
jumia.com.ng: # multiple brand selectors | |
jumia.com.ng: # facets | |
jumia.com.ng: # site search | |
jumia.com.ng: # paths | |
jumia.com.ng: #Allow access to product specifications and ratings | |
jumia.com.ng: #Jumia global bot control | |
jumia.com.ng: #Bypass "--" Rule | |
jumia.com.ng: #Block Crawling of CSB pages | |
jumia.com.ng: #Block MLP folders | |
inc.com: #Disallow robots | |
inc.com: # Adsense | |
inc.com: # Blekko | |
inc.com: # CommonCrawl | |
pdf2go.com: # www.robotstxt.org/ | |
pdf2go.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
bseindia.com: # robots.txt for https://www.bseindia.com/ | |
ck12.org: ## Allow UGC 1.x FlexBooks | |
ck12.org: ## Disallow following patterns | |
ck12.org: # disallow really old image urls that no longer make sense | |
google.ae: # AdsBot | |
google.ae: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
atlassian.com: # Disallow individual pages | |
atlassian.com: # Sitemap for Blog | |
sagepub.com: # | |
sagepub.com: # robots.txt | |
sagepub.com: # | |
sagepub.com: # This file is to prevent the crawling and indexing of certain parts | |
sagepub.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sagepub.com: # and Google. By telling these "robots" where not to go on your site, | |
sagepub.com: # you save bandwidth and server resources. | |
sagepub.com: # | |
sagepub.com: # This file will be ignored unless it is at the root of your host: | |
sagepub.com: # Used: http://example.com/robots.txt | |
sagepub.com: # Ignored: http://example.com/site/robots.txt | |
sagepub.com: # | |
sagepub.com: # For more information about the robots.txt standard, see: | |
sagepub.com: # http://www.robotstxt.org/robotstxt.html | |
sagepub.com: # CSS, JS, Images | |
sagepub.com: # Directories | |
sagepub.com: # Files | |
sagepub.com: # Paths (clean URLs) | |
sagepub.com: # Paths (no clean URLs) | |
sagepub.com: # | |
sagepub.com: # Huawei PetalBot causes site loading issues | |
kohls.com: # Modified 2/17/21 by Steve Walsh | |
kohls.com: # Modified 1/21/20 by Gwenn R. | |
kohls.com: # Modified 11/13/20 by Gwenn R for sustainable PDP test. | |
kohls.com: # Modified 5/13/20 by Gwenn Reinhart to keep pick-up pass pages out of the index | |
kohls.com: # Modified 4/2/20 by Gwenn Reinhart to keep stand-alone video pages out of the index | |
kohls.com: # Modified by Alissa Steingraber 10/21/19. Added s= back into the file because they were flooding the index | |
kohls.com: # Blocking for temporary truncated catalog URLs | |
kohls.com: # Disallow: /catalog/catalog.jsp | |
kohls.com: # Exclude all Print passes as of 11/14/16 | |
kohls.com: # Blocking bots from "tell a friend" email feature | |
kohls.com: # This page is a test that may not go live year-round | |
kohls.com: # Disallows as part of a test to see how it affects a similar page. | |
kohls.com: # Note that these target the URL path without including the query string; | |
kohls.com: # I couldn't get the Search Console tester to match the URL properly when | |
kohls.com: # I included the query string, which has something to do with the space | |
kohls.com: # in it. | |
kohls.com: # Added 1/9/17 via request by Sara Billmyer | |
kohls.com: # Attempting to keep these URLs de-indexed | |
kohls.com: # Second home page, part of personalization experiment | |
kohls.com: # These are beginning to show up in crawls | |
infusionsoft.com: # Tell Moz to take off. | |
mathxl.com: # Sosospider - China | |
mathxl.com: #Sosospider/2.0 - China | |
mathxl.com: #Login Pages | |
mathxl.com: # Global Product | |
samsclub.com: # robots.txt generated for samsclub.com | |
samsclub.com: #Paths | |
samsclub.com: #Files | |
samsclub.com: #Sitemap | |
adverdirect.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
adverdirect.com: #content{margin:0 0 0 2%;position:relative;} | |
cars.com: #Individual Review Crawl Control | |
cars.com: # DR | |
figma.com: # robots.txt | |
figma.com: # Handbook of Robotics, 56th Edition, 2058 A.D. | |
usmagazine.com: # Sitemap archive | |
nvidia.com: # Welcome to NVIDIA | |
nvidia.com: # We like people who read our code! | |
nvidia.com: # Cruise by our careers section while you're here | |
nvidia.com: # https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite | |
nvidia.com: # Or check out our YouTube channel for our latest | |
nvidia.com: # https://www.youtube.com/user/nvidia | |
nvidia.com: # Last updated 16th FEB 2021 | |
time.com: # exclude urls of format time.com/page/7456/?search | |
time.com: # NextAdvisor Sitemap | |
time.com: # Sitemap archive | |
time.com: # Video Sitemap archive | |
bandcamp.com: # the currency data endpoint is required to render pages | |
bandcamp.com: # pattern matching known to work only with Google and Yahoo | |
bandcamp.com: # badly-behaving bots | |
bandcamp.com: # unwanted bots | |
aloyoga.com: # we use Shopify as our ecommerce platform | |
aloyoga.com: # Google adsbot ignores robots.txt unless specifically named! | |
boohoo.com: # Pages | |
boohoo.com: # Product Filter # | |
boohoo.com: # Ordering & Product per page # | |
boohoo.com: # Number of product per page | |
boohoo.com: # Order By # | |
boohoo.com: # Price # | |
boohoo.com: # Faceted Navigation # | |
boohoo.com: # UK & ALL Search # | |
boohoo.com: # US Search # | |
boohoo.com: # AU Search # | |
boohoo.com: # IE Search # | |
boohoo.com: # FR Search # | |
boohoo.com: # Search # | |
boohoo.com: # Handle Execption for colour/size attribute use as internal link # | |
boohoo.com: # Additional Rules to handle exception # | |
boohoo.com: # Ensure no Static Ressources is blocked # | |
boohoo.com: # Crawl Delay - 5 URL max per second | |
correios.com.br: # Define access-restrictions for robots/spiders | |
correios.com.br: # http://www.robotstxt.org/wc/norobots.html | |
correios.com.br: # By default we allow robots to access all areas of our site | |
correios.com.br: # already accessible to anonymous users | |
correios.com.br: # Add Googlebot-specific syntax extension to exclude forms | |
correios.com.br: # that are repeated for each piece of content in the site | |
correios.com.br: # the wildcard is only supported by Googlebot | |
correios.com.br: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling | |
tagesschau.de: # Robots Exclusions for www.tagesschau.de | |
tagesschau.de: # Sauger wollen wir sperren | |
auspost.com.au: # auspost.com.au | |
overleaf.com: # robots.txt for https://www.sharelatex.com/ | |
sakshi.com: # | |
sakshi.com: # robots.txt | |
sakshi.com: # | |
sakshi.com: # This file is to prevent the crawling and indexing of certain parts | |
sakshi.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sakshi.com: # and Google. By telling these "robots" where not to go on your site, | |
sakshi.com: # you save bandwidth and server resources. | |
sakshi.com: # | |
sakshi.com: # This file will be ignored unless it is at the root of your host: | |
sakshi.com: # Used: http://example.com/robots.txt | |
sakshi.com: # Ignored: http://example.com/site/robots.txt | |
sakshi.com: # | |
sakshi.com: # For more information about the robots.txt standard, see: | |
sakshi.com: # http://www.robotstxt.org/robotstxt.html | |
sakshi.com: # CSS, JS, Images | |
sakshi.com: # Directories | |
sakshi.com: # Files | |
sakshi.com: # Paths (clean URLs) | |
sakshi.com: # Paths (no clean URLs) | |
core.ac.uk: # robots.txt for CORE http://core.ac.uk and mirror sites. | |
core.ac.uk: # We allow access crawlers to access our site, but require unknown crawlers to crawl at a lower frequency. Should you need to crawl at a higher frequency please contact us. | |
core.ac.uk: # If you need to access or harvest our content, please consider using the CORE API: https://core.ac.uk/services#api instead of crawling the whole website | |
thehill.com: # | |
thehill.com: # robots.txt | |
thehill.com: # | |
thehill.com: # This file is to prevent the crawling and indexing of certain parts | |
thehill.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
thehill.com: # and Google. By telling these "robots" where not to go on your site, | |
thehill.com: # you save bandwidth and server resources. | |
thehill.com: # | |
thehill.com: # This file will be ignored unless it is at the root of your host: | |
thehill.com: # Used: http://example.com/robots.txt | |
thehill.com: # Ignored: http://example.com/site/robots.txt | |
thehill.com: # | |
thehill.com: # For more information about the robots.txt standard, see: | |
thehill.com: # http://www.robotstxt.org/robotstxt.html | |
thehill.com: # CSS, JS, Images | |
thehill.com: # Directories | |
thehill.com: # Files | |
thehill.com: # Paths (clean URLs) | |
thehill.com: # Paths (no clean URLs) | |
businessweekly.com.tw: # | |
businessweekly.com.tw: # robots.txt for http://www.businessweekly.com.tw/ | |
businessweekly.com.tw: # | |
businessweekly.com.tw: #PartialView§£¿≥™Ω±µ≥Q∑j¥M®Ï(∑|≥y¶®Missing Title Tags™∫∞TÆß) | |
shopstyle.com: # Production Robots.txt file | |
shopstyle.com: # Sitemap | |
shopstyle.com: # Baidu doesnt support Crawl-delay but added anyways in case they ever do | |
shopstyle.com: # Allowing checkout experience to be crawlable for google shopping, order doesnt matter it is based on https://developers.google.com/search/reference/robots_txt | |
topnaz.com: # All Bots | |
topnaz.com: # Sitemap | |
olx.com.pk: #Base Filters | |
olx.com.pk: #Cars Filters | |
olx.com.pk: #RE Filters | |
olx.com.pk: # Sitemap | |
olx.com.pk: # Generated on 2019-12-11T18:12:57.348Z | |
psychologytoday.com: # Resource Directories | |
psychologytoday.com: # Static Files | |
psychologytoday.com: # Static Drupal resources explicitly allowed | |
psychologytoday.com: # Drupal Paths | |
psychologytoday.com: #Disallow: /comment/ | |
psychologytoday.com: # Drupal Paths, wildcard prefix | |
psychologytoday.com: #Disallow: /*/comment/ | |
psychologytoday.com: # Drupal Paths, au prefix | |
psychologytoday.com: #Disallow: /au/comment/ | |
psychologytoday.com: # Drupal Paths, ca prefix | |
psychologytoday.com: #Disallow: /ca/comment/ | |
psychologytoday.com: # Drupal Paths, gb prefix | |
psychologytoday.com: #Disallow: /gb/comment/ | |
psychologytoday.com: # Drupal Paths, intl prefix | |
psychologytoday.com: #Disallow: /intl/comment/ | |
psychologytoday.com: # Drupal Paths, us prefix | |
psychologytoday.com: #Disallow: /us/comment/ | |
psychologytoday.com: # Paths (no unclean URLs) | |
francetvinfo.fr: # KIF-3995: (test) Allow 3 specific ESI | |
bankrate.com: # directed to all spiders | |
fnb.co.za: # robots.txt for www.fnb.co.za | |
prensalibre.com: # Sitemap archive | |
independent.co.uk: # Files | |
independent.co.uk: # Paths (clean URLs) | |
independent.co.uk: # Paths (no clean URLs) | |
independent.co.uk: # Ignore refresh URLs | |
autotrader.com: #Disallow: /car-dealers/client/ | |
autotrader.com: #Disallow: /car-payment-calculator | |
autotrader.com: #Disallow: /car-affordability-calculator | |
autotrader.com: #Disallow: /car-payment-calculator | |
autotrader.com: #Disallow: /car-affordability-calculator | |
anchor.fm: # www.robotstxt.org/ | |
argentina.gob.ar: # | |
argentina.gob.ar: # robots.txt | |
argentina.gob.ar: # | |
argentina.gob.ar: # This file is to prevent the crawling and indexing of certain parts | |
argentina.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo! | |
argentina.gob.ar: # and Google. By telling these "robots" where not to go on your site, | |
argentina.gob.ar: # you save bandwidth and server resources. | |
argentina.gob.ar: # | |
argentina.gob.ar: # This file will be ignored unless it is at the root of your host: | |
argentina.gob.ar: # Used: http://example.com/robots.txt | |
argentina.gob.ar: # Ignored: http://example.com/site/robots.txt | |
argentina.gob.ar: # | |
argentina.gob.ar: # For more information about the robots.txt standard, see: | |
argentina.gob.ar: # http://www.robotstxt.org/robotstxt.html | |
argentina.gob.ar: # CSS, JS, Images | |
argentina.gob.ar: # Directories | |
argentina.gob.ar: # Files | |
argentina.gob.ar: # Paths (clean URLs) | |
argentina.gob.ar: # Paths (no clean URLs) | |
programiz.com: # | |
programiz.com: # robots.txt | |
programiz.com: # | |
programiz.com: # This file is to prevent the crawling and indexing of certain parts | |
programiz.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
programiz.com: # and Google. By telling these "robots" where not to go on your site, | |
programiz.com: # you save bandwidth and server resources. | |
programiz.com: # | |
programiz.com: # This file will be ignored unless it is at the root of your host: | |
programiz.com: # Used: http://example.com/robots.txt | |
programiz.com: # Ignored: http://example.com/site/robots.txt | |
programiz.com: # | |
programiz.com: # For more information about the robots.txt standard, see: | |
programiz.com: # http://www.robotstxt.org/robotstxt.html | |
programiz.com: # CSS, JS, Images | |
programiz.com: # Directories | |
programiz.com: # Files | |
programiz.com: # Paths (clean URLs) | |
programiz.com: # Paths (no clean URLs) | |
programiz.com: # Disallow: /node | |
cornell.edu: # SiteImprove should ignore these page particularly because they aren't actually used, but are still linked for historical reasons | |
egnyte.com: # | |
egnyte.com: # robots.txt | |
egnyte.com: # | |
egnyte.com: # This file is to prevent the crawling and indexing of certain parts | |
egnyte.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
egnyte.com: # and Google. By telling these "robots" where not to go on your site, | |
egnyte.com: # you save bandwidth and server resources. | |
egnyte.com: # | |
egnyte.com: # This file will be ignored unless it is at the root of your host: | |
egnyte.com: # Used: http://example.com/robots.txt | |
egnyte.com: # Ignored: http://example.com/site/robots.txt | |
egnyte.com: # | |
egnyte.com: # For more information about the robots.txt standard, see: | |
egnyte.com: # http://www.robotstxt.org/robotstxt.html | |
egnyte.com: # CSS, JS, Images | |
egnyte.com: # Directories | |
egnyte.com: # Files | |
egnyte.com: # Paths (clean URLs) | |
egnyte.com: # Paths (no clean URLs) | |
cvent.com: # | |
cvent.com: # robots.txt for http://www.cvent.com/ | |
cvent.com: # | |
cvent.com: # $Id: robots.txt,v 1.00 2003/04/28 | |
cvent.com: # | |
cvent.com: # exclude all application areas | |
cvent.com: #event | |
cvent.com: #emarketing | |
cvent.com: #csn venue profiles | |
cvent.com: #destination guide | |
cvent.com: #microsites | |
cvent.com: #Destination Guide | |
sitesell.com: # Do not remove the Crawl-delay directive It is needed to prevent dos | |
sitesell.com: # conditions caused by certain robots, like msn/bing etc | |
purdue.edu: # | |
purdue.edu: # Discovery Park | |
purdue.edu: # | |
purdue.edu: # Updated by Jakob Knigga (jknigga) 9/21/2017 | |
purdue.edu: # | |
purdue.edu: # | |
purdue.edu: # Gradschool | |
purdue.edu: # | |
purdue.edu: # | |
purdue.edu: # HHS | |
purdue.edu: # | |
purdue.edu: # Updated by Lisa Stein 1/27/2017 - FP 764035 | |
purdue.edu: # | |
purdue.edu: # | |
purdue.edu: # Vet | |
purdue.edu: # | |
purdue.edu: # Updated by Osmar Lopez 5/29/2019 - FP 1114881 | |
purdue.edu: # | |
purdue.edu: # Updated by Wright Frazier 4/8/2020 - FP 1289068 | |
purdue.edu: # | |
purdue.edu: # Site Map | |
purdue.edu: # | |
almaany.com: # robots.txt for http://www.almaany.com/ | |
almaany.com: # disallow all | |
almaany.com: # but allow only important bots | |
twoo.com: # Allow Google AdSense crawler on most pages. | |
twoo.com: # By default, disallow all crawlers. | |
twoo.com: # Full url of latest sitemap. | |
vklass.se: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
vklass.se: #content{margin:0 0 0 2%;position:relative;} | |
lavanguardia.com: # Temas | |
lavanguardia.com: # Paths a no indexar | |
lavanguardia.com: # Paginas LVD a no indexar | |
lavanguardia.com: # Extensiones de contenidos no indexables | |
lavanguardia.com: # Agentes nocivos conocidos | |
podio.com: # | |
podio.com: # 1. A robot may not injure a human being or, through inaction, allow a | |
podio.com: # human being to come to harm. | |
podio.com: # | |
podio.com: # 2. A robot must obey orders given it by human beings except where such | |
podio.com: # orders would conflict with the First Law. | |
podio.com: # | |
podio.com: # 3. A robot must protect its own existence as long as such protection | |
podio.com: # does not conflict with the First or Second Law. | |
podio.com: # | |
podio.com: # Isaac Asimov, The Zeroth Law of Robotics | |
wunderground.com: # | |
wunderground.com: # /robots.txt | |
wunderground.com: # | |
wunderground.com: # | |
wunderground.com: # Last updated by VShrivastava 02/18/2020 | |
wunderground.com: # | |
wunderground.com: # Disallowed for PhantomJS | |
wunderground.com: # Crawl-delay: 10 | |
wunderground.com: # App paths | |
wunderground.com: # Directories | |
wunderground.com: # Files | |
wunderground.com: # Paths (clean URLs) | |
wunderground.com: # Disallow: /migration/ | |
wunderground.com: # Paths (no clean URLs) | |
colorado.edu: # | |
colorado.edu: # robots.txt | |
colorado.edu: # | |
colorado.edu: # This file is to prevent the crawling and indexing of certain parts | |
colorado.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
colorado.edu: # and Google. By telling these "robots" where not to go on your site, | |
colorado.edu: # you save bandwidth and server resources. | |
colorado.edu: # | |
colorado.edu: # This file will be ignored unless it is at the root of your host: | |
colorado.edu: # Used: http://example.com/robots.txt | |
colorado.edu: # Ignored: http://example.com/site/robots.txt | |
colorado.edu: # | |
colorado.edu: # For more information about the robots.txt standard, see: | |
colorado.edu: # http://www.robotstxt.org/wc/robots.html | |
colorado.edu: # | |
colorado.edu: # For syntax checking, see: | |
colorado.edu: # http://www.sxw.org.uk/computing/robots/check.html | |
colorado.edu: # Directories | |
colorado.edu: # Files | |
colorado.edu: # Paths (clean URLs) | |
colorado.edu: # Paths (no clean URLs) | |
colorado.edu: # CUSTOM | |
colorado.edu: # INC0331010 - 2017-03-02 | |
colorado.edu: # FIT-1785 - 06/06/2016 | |
colorado.edu: # EXP-3960 - 06/15/2016 | |
colorado.edu: # feature/1 | |
colorado.edu: # EXPRESS | |
ssa.gov: # www.ssa.gov robots.txt | |
ssa.gov: # 08/07/18 | |
ssa.gov: # 08/06/19 added second sitemap | |
ssa.gov: # 09/25/20 added 2019 contingency plan PDF + html | |
ssa.gov: # Eric Brown, Wayne Whitten | |
ssa.gov: # Disallow: /agency/shutdown/ | |
careerbuilder.com: # ====================== | |
careerbuilder.com: # Directories | |
careerbuilder.com: # ====================== | |
careerbuilder.com: # Paths (clean URLs) | |
careerbuilder.com: # ====================== | |
careerbuilder.com: # Disallow: /CSH/JobSkinDetails.aspx | |
careerbuilder.com: # Disallow: /csh/jobskindetails.aspx | |
careerbuilder.com: # Disallow: /CSH/Details.aspx | |
careerbuilder.com: # Disallow: /csh/details.aspx | |
careerbuilder.com: # Paths (no clean URLs) | |
careerbuilder.com: # ====================== | |
careerbuilder.com: # Paths (GRRP provided) | |
careerbuilder.com: # ====================== | |
careerbuilder.com: # | |
careerbuilder.com: # disallow signup pages | |
careerbuilder.com: # ====================== | |
careerbuilder.com: # | |
careerbuilder.com: # just for the Googlebot | |
careerbuilder.com: # ====================== | |
nsw.gov.au: # | |
nsw.gov.au: # robots.txt | |
nsw.gov.au: # | |
nsw.gov.au: # This file is to prevent the crawling and indexing of certain parts | |
nsw.gov.au: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nsw.gov.au: # and Google. By telling these "robots" where not to go on your site, | |
nsw.gov.au: # you save bandwidth and server resources. | |
nsw.gov.au: # | |
nsw.gov.au: # This file will be ignored unless it is at the root of your host: | |
nsw.gov.au: # Used: http://example.com/robots.txt | |
nsw.gov.au: # Ignored: http://example.com/site/robots.txt | |
nsw.gov.au: # | |
nsw.gov.au: # For more information about the robots.txt standard, see: | |
nsw.gov.au: # http://www.robotstxt.org/robotstxt.html | |
nsw.gov.au: # CSS, JS, Images | |
nsw.gov.au: # Directories | |
nsw.gov.au: # Files | |
nsw.gov.au: # Paths (clean URLs) | |
nsw.gov.au: # Paths (no clean URLs) | |
mathworks.com: # robots.txt for http://www.mathworks.com and subdomains | |
mathworks.com: # Please do not update this file without contacting the owner | |
mathworks.com: # Owner: webops at mathworks.com | |
mathworks.com: # Note 1 for updating: Please keep list alphabetized by URL. | |
mathworks.com: # Note 2 for updating: When making an update, it needs to be updated for /de/, /fr/, /en/ sections as well. | |
mathworks.com: # /de/ below | |
mathworks.com: # /fr/ below | |
mathworks.com: # /en/ below | |
buyma.us: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
buyma.us: # | |
buyma.us: # To ban all spiders from the entire site uncomment the next two lines: | |
buyma.us: # site map | |
dstv.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
dstv.com: #content{margin:0 0 0 2%;position:relative;} | |
duniagames.co.id: # Allow all URLs (see http://www.robotstxt.org/robotstxt.html) | |
echo.msk.ru: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
echo.msk.ru: # | |
echo.msk.ru: # To ban all spiders from the entire site uncomment the next two lines: | |
echo.msk.ru: # User-agent: * | |
echo.msk.ru: # Disallow: / | |
y8.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
nextdirect.com: ##### 500s ##### | |
focus.de: # robots.txt for https://www.focus.de . | |
focus.de: # Gibt an, welche Unterverzeichnisse nicht durch Crawler durchsucht werden sollen | |
doodle.com: # Allow Twitterbot in order to read Twitter Cards | |
doodle.com: # Allow Google Mediabot for AdSense/AdX | |
kalerkantho.com: # Crawl kalerkantho.com, | |
ouest-france.fr: # | |
ouest-france.fr: # robots.txt | |
ouest-france.fr: # | |
ouest-france.fr: # This file is to prevent the crawling and indexing of certain parts | |
ouest-france.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ouest-france.fr: # and Google. By telling these "robots" where not to go on your site, | |
ouest-france.fr: # you save bandwidth and server resources. | |
ouest-france.fr: # | |
ouest-france.fr: # This file will be ignored unless it is at the root of your host: | |
ouest-france.fr: # Used: http://example.com/robots.txt | |
ouest-france.fr: # Ignored: http://example.com/site/robots.txt | |
ouest-france.fr: # | |
ouest-france.fr: # For more information about the robots.txt standard, see: | |
ouest-france.fr: # http://www.robotstxt.org/wc/robots.html | |
ouest-france.fr: # | |
ouest-france.fr: # For syntax checking, see: | |
ouest-france.fr: # http://www.sxw.org.uk/computing/robots/check.html | |
ouest-france.fr: # Allowed search engines directives | |
ouest-france.fr: #Sitemaps | |
ouest-france.fr: # Directories | |
ouest-france.fr: # Files | |
ouest-france.fr: # Paths (clean URLs) | |
ouest-france.fr: # Paths (no clean URLs) | |
ouest-france.fr: # Ouest-France galaad | |
ouest-france.fr: # Crawling limitation fixed for low priority bots | |
ouest-france.fr: # Directories | |
ouest-france.fr: # Files | |
ouest-france.fr: # Paths (clean URLs) | |
ouest-france.fr: # Paths (no clean URLs) | |
ouest-france.fr: # Ouest-France galaad | |
usertesting.com: # | |
usertesting.com: # robots.txt | |
usertesting.com: # | |
usertesting.com: # This file is to prevent the crawling and indexing of certain parts | |
usertesting.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
usertesting.com: # and Google. By telling these "robots" where not to go on your site, | |
usertesting.com: # you save bandwidth and server resources. | |
usertesting.com: # | |
usertesting.com: # This file will be ignored unless it is at the root of your host: | |
usertesting.com: # Used: http://example.com/robots.txt | |
usertesting.com: # Ignored: http://example.com/site/robots.txt | |
usertesting.com: # | |
usertesting.com: # For more information about the robots.txt standard, see: | |
usertesting.com: # http://www.robotstxt.org/robotstxt.html | |
usertesting.com: # CSS, JS, Images | |
usertesting.com: # Directories | |
usertesting.com: # Files | |
usertesting.com: # Paths (clean URLs) | |
usertesting.com: # Paths (no clean URLs) | |
clever.com: # Don't allow web crawlers to index Craft | |
klikbca.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
klikbca.com: #content{margin:0 0 0 2%;position:relative;} | |
kaiserpermanente.org: # Kaiser Permanente: robots.txt | |
kaiserpermanente.org: # sitemaps - English | |
kaiserpermanente.org: # region sitemaps - English | |
kaiserpermanente.org: # facility sitemaps - English | |
kaiserpermanente.org: # doctor sitemaps - English | |
kaiserpermanente.org: # sitemaps - Spanish | |
kaiserpermanente.org: # region sitemaps - Spanish | |
kaiserpermanente.org: # facility sitemaps - Spanish | |
kaiserpermanente.org: # doctor sitemaps - Spanish | |
spigen.com: # we use Shopify as our ecommerce platform | |
spigen.com: # Google adsbot ignores robots.txt unless specifically named! | |
subito.it: # It is expressively forbidden to use search robots or other automatic methods | |
subito.it: # to access Subito.it. Only if Subito.it has given such permission can be accepted. | |
whattomine.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
whattomine.com: # | |
whattomine.com: # To ban all spiders from the entire site uncomment the next two lines: | |
whattomine.com: # User-agent: * | |
whattomine.com: # Disallow: / | |
my-best.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
my-best.com: # | |
my-best.com: # To ban all spiders from the entire site uncomment the next two lines: | |
pbs.org: #Disallow: /. | |
lemonde.fr: # 16/08/2019 | |
lemonde.fr: # Il est interdit d'utiliser des robots d'indexation Web ou d'autres méthodes automatiques de feuilletage ou de navigation sur ce site Web. | |
lemonde.fr: # Nous interdisons de crawler notre site Web en utilisant un agent d'utilisateur volé qui ne correspond pas à votre identité. | |
lemonde.fr: # « Violation du droit du producteur de base de données - article L 342-1 et suivant le Code de la propriété intellectuelle ». | |
lemonde.fr: # Nous vous invitons à nous contacter pour contracter une licence d'utilisation. Seuls les partenaires sont habilités à utiliser nos contenus pour un usage autre que strictement individuel. | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # WordPress | |
lemonde.fr: # | |
lemonde.fr: # Sitemaps | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # Robots exclus de toute indexation. | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
lemonde.fr: # | |
shiksha.com: # Filename:robots.txt file for https://www.shiksha.com/ | |
abplive.com: #Sitemaps | |
programme-tv.net: # robots.txt file for Télé Loisirs | |
programme-tv.net: # desktop | |
programme-tv.net: # https://www.robotstxt.org/ | |
elespanol.com: # Agentes no deseados conocidos User-agent: ia_archiver | |
telegraph.co.uk: # Robots.txt file | |
telegraph.co.uk: # All robots will spider the domain | |
cettire.com: # we use Shopify as our ecommerce platform | |
cettire.com: # Google adsbot ignores robots.txt unless specifically named! | |
feishu.cn: # robots.txt file from https://www.feishu.cn/ | |
feishu.cn: # All robots will spider the domain | |
google.se: # AdsBot | |
google.se: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
paysafecard.com: #typo3 Disallow | |
paysafecard.com: #assets Disallow | |
paysafecard.com: #Module Disallow | |
c-sharpcorner.com: #Disallow: / | |
c-sharpcorner.com: # User-Agent: Mediapartners-Google | |
c-sharpcorner.com: # User-Agent: Googlebot | |
c-sharpcorner.com: # User-Agent: Adsbot-Google | |
c-sharpcorner.com: # User-Agent: Googlebot-Image | |
post.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
post.ir: #content{margin:0 0 0 2%;position:relative;} | |
snapdeal.com: # robots.txt for https://www.snapdeal.com/ | |
enstage-sas.com: #small_pop_up .all_content_holder { | |
enstage-sas.com: #small_pop_up .all_content_holder #close_button{ | |
enstage-sas.com: #small_pop_up .all_content_holder .left_side { | |
enstage-sas.com: #small_pop_up .all_content_holder .right_side { | |
enstage-sas.com: #small_pop_up .all_content_holder .right_side p:nth-child(1) { | |
enstage-sas.com: #small_pop_up .all_content_holder .right_side p:nth-child(2) img { | |
texas.gov: #faq-price { | |
texas.gov: #gsc-i-id1::-moz-placeholder { | |
texas.gov: #gsc-i-id1::-webkit-input-placeholder { | |
texas.gov: #gsc-i-id1:-ms-input-placeholder { | |
iteye.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
harveynichols.com: # Sales comms | |
harveynichols.com: # Account | |
harveynichols.com: # Checkout | |
harveynichols.com: # Product Listing Pages & ajax calls | |
harveynichols.com: # Website Utilities (ajax calls etc.) | |
harveynichols.com: # Misc | |
harveynichols.com: # Articles | |
harveynichols.com: # Tracking | |
harveynichols.com: # Facets | |
harveynichols.com: # Brand subcats | |
harveynichols.com: # Global-e | |
google.co.il: # AdsBot | |
google.co.il: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
history.com: # Tempest - history | |
masterclass.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
masterclass.com: # | |
masterclass.com: # To ban all spiders from the entire site uncomment the next two lines: | |
edx.org: # | |
edx.org: # robots.txt | |
edx.org: # | |
edx.org: # This file is to prevent the crawling and indexing of certain parts | |
edx.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
edx.org: # and Google. By telling these "robots" where not to go on your site, | |
edx.org: # you save bandwidth and server resources. | |
edx.org: # | |
edx.org: # This file will be ignored unless it is at the root of your host: | |
edx.org: # Used: http://example.com/robots.txt | |
edx.org: # Ignored: http://example.com/site/robots.txt | |
edx.org: # | |
edx.org: # For more information about the robots.txt standard, see: | |
edx.org: # http://www.robotstxt.org/robotstxt.html | |
edx.org: # CSS, JS, Images | |
edx.org: # Directories | |
edx.org: # Files | |
edx.org: # Paths (clean URLs) | |
edx.org: # Allowed Spanish Paths (clean URLs) | |
edx.org: # Disallowed Spanish Paths (all others) | |
edx.org: # Paths (no clean URLs) | |
edx.org: # Sitemaps | |
express.pk: # robots.txt generated at http://www.mcanerin.com | |
westpac.com.au: # robots.txt generated for www.westpac.com.au | |
mysql.com: ## ROBOTS.TXT - http://www.robotstxt.org/ ## | |
rooziato.com: # | |
rooziato.com: # 15 DEC 2020 | |
rooziato.com: # Author: M.R | |
rooziato.com: # | |
sportskeeda.com: # allow adsense bot to parse no-index content | |
sportskeeda.com: # disallow folders | |
tripadvisor.in: # Hi there, | |
tripadvisor.in: # | |
tripadvisor.in: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
tripadvisor.in: # | |
tripadvisor.in: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet? | |
tripadvisor.in: # | |
tripadvisor.in: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team | |
tripadvisor.in: # | |
tripadvisor.in: # Email seoRockstar@tripadvisor.com | |
tripadvisor.in: # | |
tripadvisor.in: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo | |
tripadvisor.in: # | |
tripadvisor.in: # | |
inbox.lv: # www.robotstxt.org/ | |
inbox.lv: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
euronews.com: # www.robotstxt.org/ | |
euronews.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
euronews.com: # weather | |
ipsosinteractive.com: # | |
ipsosinteractive.com: # robots.txt | |
ipsosinteractive.com: # | |
ipsosinteractive.com: # This file is to prevent the crawling and indexing of certain parts | |
ipsosinteractive.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ipsosinteractive.com: # and Google. By telling these "robots" where not to go on your site, | |
ipsosinteractive.com: # you save bandwidth and server resources. | |
ipsosinteractive.com: # | |
ipsosinteractive.com: # This file will be ignored unless it is at the root of your host: | |
ipsosinteractive.com: # Used: http://example.com/robots.txt | |
ipsosinteractive.com: # Ignored: http://example.com/site/robots.txt | |
ipsosinteractive.com: # | |
ipsosinteractive.com: # For more information about the robots.txt standard, see: | |
ipsosinteractive.com: # http://www.robotstxt.org/robotstxt.html | |
ipsosinteractive.com: # Directories | |
ipsosinteractive.com: # Files | |
ipsosinteractive.com: # Paths (clean URLs) | |
ipsosinteractive.com: # Paths (no clean URLs) | |
ipsosinteractive.com: # Specific files Paths | |
collegedunia.com: #Disallow: /*?ajax=1 | |
collegedunia.com: #URL parameters blocking for SEO | |
yandex.com: # yandex.com | |
virgool.io: # Block MegaIndex.ru | |
virgool.io: # Block YandexBot | |
virgool.io: # Block Baidu | |
virgool.io: # Block Youdao | |
virgool.io: # Block Majestic | |
tagged.com: ######################################################################### | |
tagged.com: # /robots.txt file for http://www.tagged.com/ | |
tagged.com: # mail webmaster@tagged.com for constructive criticism | |
tagged.com: ######################################################################### | |
tagged.com: # Any others | |
docker.com: # | |
docker.com: # robots.txt | |
docker.com: # | |
docker.com: # This file is to prevent the crawling and indexing of certain parts | |
docker.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
docker.com: # and Google. By telling these "robots" where not to go on your site, | |
docker.com: # you save bandwidth and server resources. | |
docker.com: # | |
docker.com: # This file will be ignored unless it is at the root of your host: | |
docker.com: # Used: http://example.com/robots.txt | |
docker.com: # Ignored: http://example.com/site/robots.txt | |
docker.com: # | |
docker.com: # For more information about the robots.txt standard, see: | |
docker.com: # http://www.robotstxt.org/robotstxt.html | |
docker.com: # CSS, JS, Images | |
docker.com: # Directories | |
docker.com: # Files | |
docker.com: # Paths (clean URLs) | |
docker.com: # Paths (no clean URLs) | |
irecommend.ru: # | |
irecommend.ru: # robots.txt | |
irecommend.ru: # | |
irecommend.ru: # This file is to prevent the crawling and indexing of certain parts | |
irecommend.ru: # of your site by web crawlers and spiders run by sites like Yahoo! | |
irecommend.ru: # and Google. By telling these "robots" where not to go on your site, | |
irecommend.ru: # you save bandwidth and server resources. | |
irecommend.ru: # | |
irecommend.ru: # This file will be ignored unless it is at the root of your host: | |
irecommend.ru: # Used: http://example.com/robots.txt | |
irecommend.ru: # Ignored: http://example.com/site/robots.txt | |
irecommend.ru: # | |
irecommend.ru: # For more information about the robots.txt standard, see: | |
irecommend.ru: # http://www.robotstxt.org/wc/robots.html | |
irecommend.ru: # | |
irecommend.ru: # For syntax checking, see: | |
irecommend.ru: # http://www.sxw.org.uk/computing/robots/check.html | |
irecommend.ru: # Directories | |
irecommend.ru: # Files | |
irecommend.ru: # Paths (clean URLs) | |
irecommend.ru: # Paths (no clean URLs) | |
irecommend.ru: # Social auth | |
irecommend.ru: #misc | |
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/sitemap-latest.xml | |
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/sitemap-tags.xml | |
tasnimnews.com: # Sitemap Archive | |
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/fa/sitemaps/archive/index.xml | |
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/en/sitemaps/archive/index.xml | |
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/ar/sitemaps/archive/index.xml | |
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/tr/sitemaps/archive/index.xml | |
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/ur/sitemaps/archive/index.xml | |
duomai.com: # https://www.robotstxt.org/robotstxt.html | |
myus.com: # allow all crawlers | |
myus.com: # images | |
myus.com: # search | |
myus.com: # country page tabs | |
myus.com: # 10-23-2017 Update | |
myus.com: # Meeting On 6/7 | |
myus.com: # Meeting On 6/21 | |
myus.com: # member reviews by country | |
myus.com: # Meeting on 3/29 | |
myus.com: # blog and news paging | |
myus.com: # 6524 | |
myus.com: # Landing pages | |
myus.com: # allow - not fully supported, add entries to sitemap.xml | |
myus.com: # block the rest | |
myus.com: # 7230 | |
myus.com: # block all country-specific landing pages | |
myus.com: # 7229 | |
myus.com: # 3-15-2016 Meeting | |
myus.com: # sitemap - Supported by Google, Ask, Bing, Yahoo; defined on sitemaps.org | |
myus.com: # 8235 | |
myus.com: # 2957 | |
myus.com: # banners | |
myus.com: #Ajax requests | |
myus.com: # AddSearchBot | |
dlsite.com: # noindexを通知するためにクロールを許可する | |
aarp.org: # _____ _____ | |
aarp.org: # /\ /\ | __ \ | __ \ | |
aarp.org: # / \ / \ | |__) | | |__) | | |
aarp.org: # / /\ \ / /\ \ | _ / | ___/ | |
aarp.org: # / ____ \ / ____ \ | | \ \ | | | |
aarp.org: # /_/ \_\ /_/ \_\ |_| \_\ |_| | |
aarp.org: # | |
aarp.org: # Robots.txt file created by https://www.aarp.org/ | |
aarp.org: # For domain: https://www.aarp.org/ | |
aarp.org: # Created 09-12-2017 Raymond Deschenes - Updated 1-28-2020 site search relocation | |
aarp.org: # All robots will spider the domain | |
zappos.com: # Global robots.txt updated 2020-04-02 | |
rackspace.com: # | |
rackspace.com: # robots.txt | |
rackspace.com: # | |
rackspace.com: # This file is to prevent the crawling and indexing of certain parts | |
rackspace.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
rackspace.com: # and Google. By telling these "robots" where not to go on your site, | |
rackspace.com: # you save bandwidth and server resources. | |
rackspace.com: # | |
rackspace.com: # This file will be ignored unless it is at the root of your host: | |
rackspace.com: # Used: http://example.com/robots.txt | |
rackspace.com: # Ignored: http://example.com/site/robots.txt | |
rackspace.com: # | |
rackspace.com: # For more information about the robots.txt standard, see: | |
rackspace.com: # http://www.robotstxt.org/robotstxt.html | |
rackspace.com: # CSS, JS, Images | |
rackspace.com: # Directories | |
rackspace.com: # Files | |
rackspace.com: # Paths (clean URLs) | |
rackspace.com: # Paths (no clean URLs) | |
brainly.com.br: #Brainly Robots.txt 31.07.2017 | |
brainly.com.br: # Disallow Marketing bots | |
brainly.com.br: #Disallow exotic search engine crawlers | |
brainly.com.br: #Disallow other crawlers | |
brainly.com.br: # Good bots whitelisting: | |
brainly.com.br: #Other bots | |
brainly.com.br: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring | |
brainly.com.br: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*) | |
brainly.com.br: #Obot - IBM X-Force service | |
brainly.com.br: #SafeDNSBot (https://www.safedns.com/searchbot) | |
brainly.co.id: #Brainly Robots.txt 31.07.2017 | |
brainly.co.id: # Disallow Marketing bots | |
brainly.co.id: #Disallow exotic search engine crawlers | |
brainly.co.id: #Disallow other crawlers | |
brainly.co.id: # Good bots whitelisting: | |
brainly.co.id: #Other bots | |
brainly.co.id: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring | |
brainly.co.id: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*) | |
brainly.co.id: #Obot - IBM X-Force service | |
brainly.co.id: #SafeDNSBot (https://www.safedns.com/searchbot) | |
cna.com.tw: # User-agent: ia_archiver | |
cna.com.tw: # Disallow: /MakerList/Index?* | |
cna.com.tw: # Disallow: /MakerContent/Index?* | |
cna.com.tw: # Disallow: /VideoList/Index?* | |
cna.com.tw: # Disallow: /VideoContent/Index?* | |
ig.com: #Site contents Copyright IG Group | |
ameli.fr: # | |
ameli.fr: # robots.txt | |
ameli.fr: # | |
ameli.fr: # This file is to prevent the crawling and indexing of certain parts | |
ameli.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ameli.fr: # and Google. By telling these "robots" where not to go on your site, | |
ameli.fr: # you save bandwidth and server resources. | |
ameli.fr: # | |
ameli.fr: # This file will be ignored unless it is at the root of your host: | |
ameli.fr: # Used: http://example.com/robots.txt | |
ameli.fr: # Ignored: http://example.com/site/robots.txt | |
ameli.fr: # | |
ameli.fr: # For more information about the robots.txt standard, see: | |
ameli.fr: # http://www.robotstxt.org/robotstxt.html | |
ameli.fr: # CSS, JS, Images | |
ameli.fr: # Directories | |
ameli.fr: # Files | |
ameli.fr: # Paths (clean URLs) | |
ameli.fr: # Paths (no clean URLs) | |
venturebeat.com: # This file was generated on Wed, 24 Feb 2021 19:10:02 +0000 | |
venturebeat.com: # Sitemap archive | |
iheart.com: # Production | |
olx.in: #General Filters | |
olx.in: #RE Filters | |
olx.in: #Expired Ads | |
olx.in: # Generated on 2020-03-11T09:58:35.850Z | |
commsec.com.au: # /robots.txt file for https://www.commsec.com.au/<br> | |
domestika.org: # Faceted/Sorting navigation | |
domestika.org: # Disallow: *area=* | |
domestika.org: # Disallow: *sorting=* | |
domestika.org: # Disallow: *date=* | |
domestika.org: # Disallow: *status=* | |
domestika.org: # Disallow: /auth | |
domestika.org: # Disallow: */search | |
domestika.org: # Virtual URLs - Custom tracking | |
calculatorsoup.com: # | |
calculatorsoup.com: # | |
calculatorsoup.com: # applies to all robots disallow | |
calculatorsoup.com: # 2019-02-22 remove | |
calculatorsoup.com: # Disallow: /search.php | |
calculatorsoup.com: # block Mediapartners from search.php 2017-03-12 because they try many search query's | |
calculatorsoup.com: # 2019-02-22 remove | |
calculatorsoup.com: # User-agent: Mediapartners-Google | |
calculatorsoup.com: # Allow: / | |
calculatorsoup.com: # Disallow: /search.php | |
calculatorsoup.com: # do not beleive this is respected | |
calculatorsoup.com: # From Wiki | |
calculatorsoup.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
calculatorsoup.com: # unless they're feeding search engines. | |
calculatorsoup.com: # Some bots are known to be trouble, particularly those designed to copy | |
calculatorsoup.com: # entire sites. Please obey robots.txt. | |
calculatorsoup.com: # | |
calculatorsoup.com: # Sorry, wget in its recursive mode is a frequent problem. | |
calculatorsoup.com: # Please read the man page and use it properly; there is a | |
calculatorsoup.com: # --wait option you can use to set the delay between hits, | |
calculatorsoup.com: # for instance. | |
calculatorsoup.com: # | |
calculatorsoup.com: # | |
calculatorsoup.com: # The 'grub' distributed client has been *very* poorly behaved. | |
calculatorsoup.com: # | |
calculatorsoup.com: # | |
calculatorsoup.com: # Doesn't follow robots.txt anyway, but... | |
calculatorsoup.com: # | |
calculatorsoup.com: # | |
calculatorsoup.com: # Hits many times per second, not acceptable | |
calculatorsoup.com: # http://www.nameprotect.com/botinfo.html | |
calculatorsoup.com: # A capture bot, downloads gazillions of pages with no public benefit | |
calculatorsoup.com: # http://www.webreaper.net/ | |
trilltrill.jp: # robotstxt.org/ | |
linguee.fr: # In ANY CASE, you are NOT ALLOWED to train Machine Translation Systems | |
linguee.fr: # on data crawled on Linguee. | |
linguee.fr: # | |
linguee.fr: # Linguee contains fake entries - changes in the wording of sentences, | |
linguee.fr: # complete fake entries. | |
linguee.fr: # These entries can be used to identify even small parts of our material | |
linguee.fr: # if you try to copy it without our permission. | |
linguee.fr: # Machine Translation systems trained on these data will learn these errors | |
linguee.fr: # and can be identified easily. We will take all legal measures against anyone | |
linguee.fr: # training Machine Translation systems on data crawled from this website. | |
trademe.co.nz: #Classic | |
trademe.co.nz: #Allow PI | |
trademe.co.nz: #Allow address | |
trademe.co.nz: #Disallow Map | |
trademe.co.nz: #Disallow Classic non Category search | |
trademe.co.nz: #FrEnd | |
trademe.co.nz: #Allow FrEnd resources | |
trademe.co.nz: #Allow new car | |
trademe.co.nz: #CMS Content | |
trademe.co.nz: #Property | |
trademe.co.nz: #Motors | |
trademe.co.nz: #Jobs | |
trademe.co.nz: # specific bot behaviour | |
lg.com: # LGEUS-744, LGEUS-1201 | |
lg.com: # LGEUS-744, LGEUS-1201 | |
lg.com: # Sitemap files | |
slideserve.com: #Baiduspider | |
nearpod.com: # Temporary | |
gazzettadelsud.it: #Disallow: /articoli/ajax/ | |
google.sk: # AdsBot | |
google.sk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
mirror.co.uk: #Agent Specific Disallowed Sections | |
corriere.it: # vecchia ultima ora | |
corriere.it: #Dizionario della Salute | |
corriere.it: # item | |
corriere.it: # Disallow: /cronache/10_marzo_01/La-rete-del-senatore-in-banca-cronache_9f6dee1a-2502-11df-98c5-00144f02aabe.shtml | |
corriere.it: #richiesta da Ruggiero BG27112011 | |
corriere.it: #CORRIERE-452 2018-10-08 | |
docusign.com: # | |
docusign.com: # robots.txt | |
docusign.com: # | |
docusign.com: # This file is to prevent the crawling and indexing of certain parts | |
docusign.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
docusign.com: # and Google. By telling these "robots" where not to go on your site, | |
docusign.com: # you save bandwidth and server resources. | |
docusign.com: # | |
docusign.com: # This file will be ignored unless it is at the root of your host: | |
docusign.com: # Used: http://example.com/robots.txt | |
docusign.com: # Ignored: http://example.com/site/robots.txt | |
docusign.com: # | |
docusign.com: # For more information about the robots.txt standard, see: | |
docusign.com: # http://www.robotstxt.org/robotstxt.html | |
docusign.com: # Crawl-delay: 10 | |
docusign.com: # CSS, JS, Images | |
docusign.com: # Directories | |
docusign.com: # Files | |
docusign.com: # Paths (clean URLs) | |
docusign.com: # Paths (no clean URLs) | |
docusign.com: # Files | |
docusign.com: # Paths | |
docusign.com: # Sitemaps | |
google.com.ph: # AdsBot | |
google.com.ph: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
leam.com: # Crawlers Setup | |
leam.com: # Files | |
leam.com: # Paths (no clean URLs) | |
leam.com: #Disallow: /*manufacturer= | |
leam.com: #Disallow: /*color= | |
cineulagam.com: # Disallow: /*? This is match ? anywhere in the URL | |
laposte.fr: # www.laposte.fr | |
laposte.fr: # boutique.laposte.fr | |
laposte.fr: # pro.boutique.laposte.fr | |
mangoerp.com: #footer-index{padding-top:64px;background:#2a3139;color:#778495} | |
mangoerp.com: #footer-index .friend-link{padding-top:20px;} | |
mangoerp.com: #footer-index .friend-link a{color:#778495} | |
mangoerp.com: #footer-index .title{position:absolute;font-size:14px;top:-30px} | |
mangoerp.com: #footer-index .qrcode{border:1px solid #778495;padding:5px} | |
tessabit.com: # Directories | |
tessabit.com: # Disallow: /media/ // Allow this folder for google product caching | |
tessabit.com: #Disallow: /media/catalog/product/cache/ | |
tessabit.com: # Paths (clean URLs) | |
tessabit.com: # Paths (no clean URLs) | |
hypebeast.com: # www.robotstxt.org/ | |
hypebeast.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
java.com: # /robots.txt for java.com | |
utoronto.ca: # | |
utoronto.ca: # robots.txt | |
utoronto.ca: # | |
utoronto.ca: # This file is to prevent the crawling and indexing of certain parts | |
utoronto.ca: # of your site by web crawlers and spiders run by sites like Yahoo! | |
utoronto.ca: # and Google. By telling these "robots" where not to go on your site, | |
utoronto.ca: # you save bandwidth and server resources. | |
utoronto.ca: # | |
utoronto.ca: # This file will be ignored unless it is at the root of your host: | |
utoronto.ca: # Used: http://example.com/robots.txt | |
utoronto.ca: # Ignored: http://example.com/site/robots.txt | |
utoronto.ca: # | |
utoronto.ca: # For more information about the robots.txt standard, see: | |
utoronto.ca: # http://www.robotstxt.org/robotstxt.html | |
utoronto.ca: # CSS, JS, Images | |
utoronto.ca: # Directories | |
utoronto.ca: # Files | |
utoronto.ca: # Paths (clean URLs) | |
utoronto.ca: # Paths (no clean URLs) | |
pagesjaunes.fr: #Vintage | |
hinative.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
hinative.com: # | |
hinative.com: # To ban all spiders from the entire site uncomment the next two lines: | |
hinative.com: # User-agent: * | |
hinative.com: # Disallow: / | |
google.kz: # AdsBot | |
google.kz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
licindia.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
licindia.in: #content{margin:0 0 0 2%;position:relative;} | |
pronews.gr: # | |
pronews.gr: # robots.txt | |
pronews.gr: # | |
pronews.gr: # This file is to prevent the crawling and indexing of certain parts | |
pronews.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
pronews.gr: # and Google. By telling these "robots" where not to go on your site, | |
pronews.gr: # you save bandwidth and server resources. | |
pronews.gr: # | |
pronews.gr: # This file will be ignored unless it is at the root of your host: | |
pronews.gr: # Used: http://example.com/robots.txt | |
pronews.gr: # Ignored: http://example.com/site/robots.txt | |
pronews.gr: # | |
pronews.gr: # For more information about the robots.txt standard, see: | |
pronews.gr: # http://www.robotstxt.org/robotstxt.html | |
pronews.gr: # CSS, JS, Images | |
pronews.gr: # Directories | |
pronews.gr: # Files | |
pronews.gr: # Paths (clean URLs) | |
pronews.gr: # Paths (no clean URLs) | |
worthpoint.com: # always allow adsense | |
worthpoint.com: # All robots Block | |
worthpoint.com: # bot-specific | |
worthpoint.com: # Silly human, robots.txts are for robots | |
sejda.com: # Don't index API | |
ieee.org: #IEEE.org Robots Exclusion Rules - Updated October 29, 2018 | |
ieee.org: #Sitemap:https://www.ieee.org/.sitemap.xml | |
zamzar.com: # ___ __ _ _ | |
zamzar.com: # / __\ __ ___ ___ _ _ ___ _ _ _ __ / _(_) | ___ ___ | |
zamzar.com: # / _\| '__/ _ \/ _ \ | | | |/ _ \| | | | '__| | |_| | |/ _ \/ __| | |
zamzar.com: # / / | | | __/ __/ | |_| | (_) | |_| | | | _| | | __/\__ \ | |
zamzar.com: # \/ |_| \___|\___| \__, |\___/ \__,_|_| |_| |_|_|\___||___/ | |
zamzar.com: # |___/ | |
france24.com: # France Medias Monde [2019-10-30] - francemediasmonde.com | |
france24.com: ## FRANCE 24 - france24.com | |
france24.com: ### Sitemaps | |
france24.com: ### Sitemaps News | |
gib.gov.tr: # | |
gib.gov.tr: # robots.txt | |
gib.gov.tr: # | |
gib.gov.tr: # This file is to prevent the crawling and indexing of certain parts | |
gib.gov.tr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
gib.gov.tr: # and Google. By telling these "robots" where not to go on your site, | |
gib.gov.tr: # you save bandwidth and server resources. | |
gib.gov.tr: # | |
gib.gov.tr: # This file will be ignored unless it is at the root of your host: | |
gib.gov.tr: # Used: http://example.com/robots.txt | |
gib.gov.tr: # Ignored: http://example.com/site/robots.txt | |
gib.gov.tr: # | |
gib.gov.tr: # For more information about the robots.txt standard, see: | |
gib.gov.tr: # http://www.robotstxt.org/robotstxt.html | |
pch.com: #wrap{ | |
pochta.ru: # –†–∞–∑–±–∏—Ä–∞–µ—Ç–µ—Å—å –≤ –ø—Ä–æ–µ–∫—Ç–∞—Ö –∏ —Ö–æ—Ç–∏—Ç–µ —Å–¥–µ–ª–∞—Ç—å —Ä–µ–∞–ª—å–Ω–æ –ø–æ–ª–µ–∑–Ω—ã–π –ø—Ä–æ–¥—É–∫—Ç? | |
pochta.ru: # –ë—É–¥–µ–º —Ä–∞–¥—ã –≤–∏–¥–µ—Ç—å –≤–∞—Å –≤ –∫–æ–º–∞–Ω–¥–µ –ü–æ—á—Ç–æ–≤—ã—Ö –¢–µ—Ö–Ω–æ–ª–æ–≥–∏–π | |
pochta.ru: # https://hr.pochta.tech/ | |
pochta.ru: # | |
www.gob.pe: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
www.gob.pe: # | |
www.gob.pe: # To ban all spiders from the entire site uncomment the next two lines: | |
anz.com: # /robots.txt for http://www.anz.com/ | |
anz.com: # comments to InternetAdministration@anz.com | |
anz.com: # | |
secnews.gr: # Block Yandex | |
paychex.com: # Robots.txt file for http://www.paychex.com | |
paychex.com: # Disallow all robots from the following directories: | |
barnesandnoble.com: #robots.txt for https://www.barnesandnoble.com | |
band.us: # Make changes for all web spiders | |
band.us: # sitemap.xml | |
zety.com: # zety.com | |
livemaster.ru: # Disallow other crawlers | |
livemaster.ru: # Ezooms and dotbot | |
livemaster.ru: #User-agent: link checker | |
livemaster.ru: #Disallow: / | |
livemaster.ru: #User-agent: linkcheck | |
livemaster.ru: #Disallow: / | |
livemaster.ru: #User-agent: Link Sleuth | |
livemaster.ru: #Disallow: / | |
last.fm: # Old pages | |
last.fm: # Shouts | |
last.fm: # N.B: these are not covered by the above /music/ rule | |
last.fm: # (shoutbox vs +shoutbox) | |
last.fm: # AJAX content | |
kinguin.net: ## Website Sitemap | |
kinguin.net: ## Enable robots.txt rules for all crawlers | |
kinguin.net: ## Do not crawl add to cart, checkout, and user account pages | |
kinguin.net: ## Disallow URL Shortener | |
kinguin.net: ## Do not crawl seach pages and not-SEO optimized catalog links | |
kinguin.net: ## Do not crawl not-SEO optimized custom forms | |
kinguin.net: ## Do not crawl sub category pages that are sorted or filtered. | |
kinguin.net: ## Do not crawl links with session IDs | |
fao.org: # robots.txt for http://www.fao.org/ | |
fao.org: # This file is not for hiding content from people. It is no substitue for security | |
fao.org: # If you are editing the robots.txt file - please COMMENT and DATE reason for every inclusion/exclusion ---nw-OCC-2013 | |
fao.org: # ^^^^^^^ ^^^^ | |
fao.org: #User-agent: 008 # No longer relevant 25/10/2013 nw | |
fao.org: #Disallow: / | |
fao.org: #User-Agent: cdlwas_bot # No longer relevant 25/10/2013 nw | |
fao.org: #Disallow: # No longer relevant 25/10/2013 nw | |
fao.org: #Google needs to read CSS and JS here - nw 29 Jul 2015 # Disallow: /typo3conf/ | |
fao.org: #Google needs to read CSS and JS here - nw 29 Jul 2015 # Disallow: /typo3temp/ | |
fao.org: #Permitted pending fix (27/05/2014 - nw) Disallow: /figis/vrmf/finder/!/display/vessel/ #generating a lot of errors (30/04/2014 - nw) | |
fao.org: #START Cleanup the web September 2016.The following are marked as GONE in OCC list of sites - and there are no redirects in place | |
fao.org: #Requested by CIO-SEC-TEAM 20/03/2018 | |
fao.org: #END Cleanup the web September 2016.The following are marked as GONE in OCC list of sites | |
fao.org: #Cleanup after Mountain Partnership Migration (2017) | |
fao.org: #At the request of SO3 team | |
arxiv.org: # robots.txt for http://arxiv.org/ and mirror sites http://*.arxiv.org/ | |
arxiv.org: # Indiscriminate automated downloads from this site are not permitted | |
arxiv.org: # See also: http://arxiv.org/help/robots | |
va.gov: # existing disallow on va.gov (may not be needed) | |
va.gov: # existing disallow from vets.gov | |
va.gov: # disallow WIP VAMCs | |
va.gov: # sitemap index | |
timesnownews.com: #Baiduspider | |
timesnownews.com: #Yandex | |
timesnownews.com: # To block ad codes from being crawled | |
timesnownews.com: #Sitemaps | |
hln.be: # Tell robots that the whole site should not be crawled | |
basketball-reference.com: # Disallow the plagiarism.org robot, www.slysearch.com | |
google.lk: # AdsBot | |
google.lk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
oreilly.com: #ITOPS-10158 | |
oreilly.com: #ITOPS-8392 | |
oreilly.com: #ITOPS-10157 | |
tradeindia.com: # robots.txt 2005/09/1 | |
tradeindia.com: # www.tradeindia.com | |
tradeindia.com: # Format is: | |
tradeindia.com: # User-agent: <name of spider> | |
tradeindia.com: # Disallow: <nothing> | <path> | |
gds.it: #Disallow: /articoli/ajax/ | |
optimum.net: # robots.txt for optimum.net | |
yandex.kz: # yandex.kz | |
madhyamam.com: # | |
madhyamam.com: # robots.txt | |
madhyamam.com: # | |
madhyamam.com: # This file is to prevent the crawling and indexing of certain parts | |
madhyamam.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
madhyamam.com: # and Google. By telling these "robots" where not to go on your site, | |
madhyamam.com: # you save bandwidth and server resources. | |
madhyamam.com: # | |
madhyamam.com: # This file will be ignored unless it is at the root of your host: | |
madhyamam.com: # Used: http://example.com/robots.txt | |
madhyamam.com: # Ignored: http://example.com/site/robots.txt | |
madhyamam.com: # | |
madhyamam.com: # For more information about the robots.txt standard, see: | |
madhyamam.com: # http://www.robotstxt.org/robotstxt.html | |
madhyamam.com: #Crawl-delay: 10 | |
madhyamam.com: # Directories | |
madhyamam.com: #Disallow: /en/ | |
madhyamam.com: # Files | |
madhyamam.com: # Paths (clean URLs) | |
madhyamam.com: # Paths (no clean URLs) | |
reverb.com: # DO NOT EDIT MANUALLY - see script/robots_txt.rb | |
reverb.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
reverb.com: # | |
reverb.com: # Fatbot keeps making bad requests with bad url params, causing errors. There seems to be no good business reason to have them keep scraping the site | |
reverb.com: # Tell all bots to stay away from these endpoints | |
reverb.com: # Non-Regional Disallows | |
reverb.com: # Regional Disallows | |
priceline.com: # Robots.txt file | |
priceline.com: # | |
priceline.com: # Section 1: | |
priceline.com: # Section 2: | |
priceline.com: #Disallow: /api/ | |
priceline.com: #Disallow: /pws/ | |
priceline.com: #Disallow: /svcs/ | |
priceline.com: # Section 3: | |
priceline.com: #Disallow: /vacations/ | |
priceline.com: #Disallow: /Vacations/ | |
faire.com: # Sitemap | |
extendoffice.com: # If the Joomla site is installed within a folder | |
extendoffice.com: # eg www.example.com/joomla/ then the robots.txt file | |
extendoffice.com: # MUST be moved to the site root | |
extendoffice.com: # eg www.example.com/robots.txt | |
extendoffice.com: # AND the joomla folder name MUST be prefixed to all of the | |
extendoffice.com: # paths. | |
extendoffice.com: # eg the Disallow rule for the /administrator/ folder MUST | |
extendoffice.com: # be changed to read | |
extendoffice.com: # Disallow: /joomla/administrator/ | |
extendoffice.com: # | |
extendoffice.com: # For more information about the robots.txt standard, see: | |
extendoffice.com: # http://www.robotstxt.org/orig.html | |
extendoffice.com: # | |
extendoffice.com: # For syntax checking, see: | |
extendoffice.com: # http://tool.motoricerca.info/robots-checker.phtml | |
rawpixel.com: # | |
rawpixel.com: # robots.txt | |
rawpixel.com: # | |
rawpixel.com: # This file is to prevent the crawling and indexing of certain parts | |
rawpixel.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
rawpixel.com: # and Google. By telling these "robots" where not to go on your site, | |
rawpixel.com: # you save bandwidth and server resources. | |
rawpixel.com: # | |
rawpixel.com: # This file will be ignored unless it is at the root of your host: | |
rawpixel.com: # Used: http://example.com/robots.txt | |
rawpixel.com: # Ignored: http://example.com/site/robots.txt | |
rawpixel.com: # | |
rawpixel.com: # For more information about the robots.txt standard, see: | |
rawpixel.com: # http://www.robotstxt.org/robotstxt.html | |
rawpixel.com: # CSS, JS, Images | |
rawpixel.com: # Directories | |
rawpixel.com: # Files | |
rawpixel.com: # Paths (clean URLs) | |
rawpixel.com: # Paths (no clean URLs) | |
mindbodyonline.com: # | |
mindbodyonline.com: # robots.txt | |
mindbodyonline.com: # | |
mindbodyonline.com: # This file is to prevent the crawling and indexing of certain parts | |
mindbodyonline.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
mindbodyonline.com: # and Google. By telling these "robots" where not to go on your site, | |
mindbodyonline.com: # you save bandwidth and server resources. | |
mindbodyonline.com: # | |
mindbodyonline.com: # This file will be ignored unless it is at the root of your host: | |
mindbodyonline.com: # Used: http://example.com/robots.txt | |
mindbodyonline.com: # Ignored: http://example.com/site/robots.txt | |
mindbodyonline.com: # | |
mindbodyonline.com: # For more information about the robots.txt standard, see: | |
mindbodyonline.com: # http://www.robotstxt.org/robotstxt.html | |
mindbodyonline.com: # CSS, JS, Images | |
mindbodyonline.com: # Directories | |
mindbodyonline.com: # Files | |
mindbodyonline.com: # Paths (clean URLs) | |
mindbodyonline.com: # Paths (no clean URLs) | |
mindbodyonline.com: # Sitemap | |
astrologyanswers.com: ### | |
astrologyanswers.com: # robots.txt file created by Nethues | |
astrologyanswers.com: ### | |
astrologyanswers.com: ### | |
astrologyanswers.com: #Unsafe robots to keep away | |
astrologyanswers.com: ### | |
rightmove.co.uk: # robots.txt for https://www.rightmove.co.uk | |
6pm.com: # Global robots.txt updated 2019-08-06 | |
smarttradecoin.com: #cookie_bar p { | |
smarttradecoin.com: #cookie_bar div{ | |
smarttradecoin.com: #cookie_bar{ | |
smarttradecoin.com: #cc-button{ | |
rae.es: # | |
rae.es: # robots.txt | |
rae.es: # | |
rae.es: # This file is to prevent the crawling and indexing of certain parts | |
rae.es: # of your site by web crawlers and spiders run by sites like Yahoo! | |
rae.es: # and Google. By telling these "robots" where not to go on your site, | |
rae.es: # you save bandwidth and server resources. | |
rae.es: # | |
rae.es: # This file will be ignored unless it is at the root of your host: | |
rae.es: # Used: http://example.com/robots.txt | |
rae.es: # Ignored: http://example.com/site/robots.txt | |
rae.es: # | |
rae.es: # For more information about the robots.txt standard, see: | |
rae.es: # http://www.robotstxt.org/robotstxt.html | |
rae.es: # CSS, JS, Images | |
rae.es: # Directories | |
rae.es: # Files | |
rae.es: # Paths (clean URLs) | |
rae.es: # Paths (no clean URLs) | |
jmty.jp: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
jmty.jp: # | |
jmty.jp: # To ban all spiders from the entire site uncomment the next two lines: | |
jmty.jp: # https://www.trovit.com/bot.html | |
jmty.jp: # http://www.grapeshot.com/crawler/ | |
sweetwater.com: # /robots.txt file for http://www.sweetwater.com/ | |
sweetwater.com: # mail webmaster@sweetwater.com for specific information | |
sweetwater.com: # last updated 11-18-2020 JPM | |
alfavita.gr: # | |
alfavita.gr: # robots.txt | |
alfavita.gr: # | |
alfavita.gr: # This file is to prevent the crawling and indexing of certain parts | |
alfavita.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
alfavita.gr: # and Google. By telling these "robots" where not to go on your site, | |
alfavita.gr: # you save bandwidth and server resources. | |
alfavita.gr: # | |
alfavita.gr: # This file will be ignored unless it is at the root of your host: | |
alfavita.gr: # Used: http://example.com/robots.txt | |
alfavita.gr: # Ignored: http://example.com/site/robots.txt | |
alfavita.gr: # | |
alfavita.gr: # For more information about the robots.txt standard, see: | |
alfavita.gr: # http://www.robotstxt.org/robotstxt.html | |
alfavita.gr: # CSS, JS, Images | |
alfavita.gr: # Directories | |
alfavita.gr: # Files | |
alfavita.gr: # Paths (clean URLs) | |
alfavita.gr: # Paths (no clean URLs) | |
fortune.com: # Google SiteMaps | |
fortune.com: # Sitemap: https://fortune.com/feed/googlesitemap/articles.xml | |
fortune.com: # Sitemap: https://fortune.com/news-sitemap.xml | |
qoo10.jp: # Sitemap files | |
archiveofourown.org: # See https://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
archiveofourown.org: # | |
archiveofourown.org: # disallow indexing of search results | |
archiveofourown.org: # Googlebot is smart and knows pattern matching | |
ford.com: #robots.txt for www.Ford.com/es.ford.com - KD - 20200729 | |
ford.com: #es.ford.com WIP cart files | |
ford.com: #Naver bot | |
flannels.com: # General | |
flannels.com: # Login | |
flannels.com: # Checkout | |
flannels.com: # Search/Products | |
flannels.com: # Filters | |
flannels.com: # API | |
flannels.com: # Utilities | |
flannels.com: # Blog | |
tmtpost.com: #Disallow:/user/ | |
sheypoor.com: # Sitemap files | |
jdsports.co.uk: # JD -- King Of Crawlers | |
le360.ma: # | |
le360.ma: # robots.txt | |
le360.ma: # | |
le360.ma: # This file is to prevent the crawling and indexing of certain parts | |
le360.ma: # of your site by web crawlers and spiders run by sites like Yahoo! | |
le360.ma: # and Google. By telling these "robots" where not to go on your site, | |
le360.ma: # you save bandwidth and server resources. | |
le360.ma: # | |
le360.ma: # This file will be ignored unless it is at the root of your host: | |
le360.ma: # Used: http://example.com/robots.txt | |
le360.ma: # Ignored: http://example.com/site/robots.txt | |
le360.ma: # | |
le360.ma: # For more information about the robots.txt standard, see: | |
le360.ma: # http://www.robotstxt.org/wc/robots.html | |
le360.ma: # | |
le360.ma: # For syntax checking, see: | |
le360.ma: # http://www.sxw.org.uk/computing/robots/check.html | |
le360.ma: # Directories | |
le360.ma: # Files | |
le360.ma: # Paths (clean URLs) | |
le360.ma: # Paths (no clean URLs) | |
univ-grenoble-alpes.fr: # urls techniques : | |
sep.gob.mx: # Robot site sep.gob.mx | |
# Elaborado 03/11/2010 | |
User-agent: * | |
# Bloquea directorios | |
Disallow: /doc/ | |
Disallow: /wbutil/ | |
Disallow: /WEB-INF/ | |
Disallow: /admin/ | |
Disallow: /wbadmin/ | |
Disallow: /templates/ | |
Disallow: /images/ | |
# Bloquea contenidos din·micos | |
Disallow: /*.xls$ | |
Disallow: /*.doc$ | |
Disallow: /*.jsp$ | |
Disallow: /*.asp$ | |
# Mapa de sitio | |
Sitemap: http://www.sep.gob.mx/sitemap.xml | |
ixxx.com: # www.robotstxt.org/ | |
ixxx.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
itmedia.co.jp: #masterBody{ | |
itmedia.co.jp: #masterBodyOuter{ | |
itmedia.co.jp: #globalHeaderMiddle{ | |
itmedia.co.jp: #globalSearch{ | |
itmedia.co.jp: #masterSub .colBoxHead h2{ | |
itmedia.co.jp: #masterMain .colBoxSocialButtonTweet{ | |
itmedia.co.jp: #globalFooterCorp{ | |
note.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
note.com: # | |
note.com: # To ban all spiders from the entire site uncomment the next two lines: | |
note.com: # User-agent: * | |
note.com: # Disallow: / | |
bitflyer.com: # https://www.bitflyer.com robots.txt | |
openenglishprograms.org: # | |
openenglishprograms.org: # robots.txt | |
openenglishprograms.org: # | |
openenglishprograms.org: # This file is to prevent the crawling and indexing of certain parts | |
openenglishprograms.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
openenglishprograms.org: # and Google. By telling these "robots" where not to go on your site, | |
openenglishprograms.org: # you save bandwidth and server resources. | |
openenglishprograms.org: # | |
openenglishprograms.org: # This file will be ignored unless it is at the root of your host: | |
openenglishprograms.org: # Used: http://example.com/robots.txt | |
openenglishprograms.org: # Ignored: http://example.com/site/robots.txt | |
openenglishprograms.org: # | |
openenglishprograms.org: # For more information about the robots.txt standard, see: | |
openenglishprograms.org: # http://www.robotstxt.org/robotstxt.html | |
openenglishprograms.org: # CSS, JS, Images | |
openenglishprograms.org: # Directories | |
openenglishprograms.org: # Files | |
openenglishprograms.org: # Paths (clean URLs) | |
openenglishprograms.org: # Paths (no clean URLs) | |
google.fi: # AdsBot | |
google.fi: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
asu.edu: # robots.txt for asu.edu | |
google.lt: # AdsBot | |
google.lt: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
anthropologie.com: # Sitemap indexes | |
columbia.edu: # ignore this line - 1 | |
columbia.edu: # for info on robots.txt syntax see | |
columbia.edu: # http://www.searchtools.com/robots/robots-txt.html | |
columbia.edu: ## New Homepage ## | |
columbia.edu: # Directories | |
columbia.edu: # Files | |
columbia.edu: # Paths (clean URLs) | |
columbia.edu: # Paths (no clean URLs) | |
panjiva.com: # Bots won't visit the links below but they will index them so they can still | |
panjiva.com: # show up in search results albeit without snippets or cache (and probably at | |
panjiva.com: # fairly low rank). For google we could use the Noindex: directive instead of | |
panjiva.com: # disallow, though that's experimental. That'll force pages out very quickly | |
panjiva.com: # so need to be very careful with that... | |
panjiva.com: # The matched string is prefix so /help blocks /help.html and /help/foo | |
panjiva.com: # So if disallowing an action avoid the trailing slash when there's no conflict | |
panjiva.com: # to avoid parameter/route changes breaking the match. | |
panjiva.com: # Note you can have multiple sitemaps listed in robots.txt and across domains | |
panjiva.com: # too. Everything is valid across domains/subdomains to an organic crawler iff | |
panjiva.com: # for sitemap A on host B that contains a url on host C the robots.txt file | |
panjiva.com: # on host C point to sitemap A on host B. Submitted sitemaps must generally | |
panjiva.com: # only contain urls that match the site they are on. We split our sitemaps | |
panjiva.com: # by sub-domain so we can still submit them separately, but list them all here | |
panjiva.com: # so that they are easy to find oganically too (much like if they were all | |
panjiva.com: # combined into one). | |
panjiva.com: # AdsBot-Google ignores robots.txt unless it's specifically called out, it | |
panjiva.com: # indexes targets of adwords campaigns (but seems to crawl out since it's | |
panjiva.com: # hitting pages that are in here that aren't direct adwords targets). We | |
panjiva.com: # shouldn't be targeting any page in here directly with adwords. | |
panjiva.com: # Extra profile information | |
panjiva.com: # Disallow sample companies so they don't show up in google (and complain) | |
panjiva.com: # Forms - note that we don't really need to exclude the _submit form because | |
panjiva.com: # no trailing slash in the previous rule covers it and they're all posts anyways | |
panjiva.com: # Deprecated | |
panjiva.com: # Excel export | |
panjiva.com: # Search | |
panjiva.com: # Block product search and dead search landing pages | |
panjiva.com: # Shipment search | |
panjiva.com: # SPPs | |
panjiva.com: # Trends | |
panjiva.com: # Trendspotting | |
panjiva.com: # Mekong Visor | |
panjiva.com: # Mekong - this needs to be in the robots.text served | |
panjiva.com: # from china-cdn-proxy.panjiva.com, but added here for | |
panjiva.com: # completeness | |
panjiva.com: # Checkout form submission | |
panjiva.com: # Other | |
panjiva.com: # These aren't actually urls, but / delimited keys that appear in json | |
panjiva.com: # Google still pulls them out and tries to call them though | |
panjiva.com: # We have some challenge problems that use some publicly-visible data. | |
panjiva.com: # We don't want search engines crawling that data and pointing searches | |
panjiva.com: # there instead of at our actual site content. | |
panjiva.com: # specific pages that we want to force out of the indicies (use Noindex:?) | |
panjiva.com: # allow rules | |
panjiva.com: # slow down bots that respect the crawl-delay directive (note that google | |
panjiva.com: # ignores this and is also the only bot we actually care to have crawl faster | |
panjiva.com: # then this) | |
si.com: # Tempest - sportsillustrated | |
cox.com: # Allow Google appliance | |
cox.com: # added for CB 2017-08-03 | |
cox.com: # MP Sitemaps 12/13/17 | |
thomsonreuters.com: # Global robots config | |
thomsonreuters.com: # robots.txt for http://thomsonreuters.com/ | |
verizonwireless.com: # General Rules | |
verizonwireless.com: # Rules for Home/Fios section | |
verizonwireless.com: # Rules for Mobile/Wireless Section | |
verizonwireless.com: # PREPAID ATG | |
verizonwireless.com: # OMNI RELATED | |
verizonwireless.com: # Rules for Corp/About section | |
verizonwireless.com: # Rules for Support (home/mobile/kb) section | |
verizonwireless.com: # Rules from VBG / Business team | |
verizonwireless.com: # Block Google from doubleclick URL issue | |
verizonwireless.com: # Sitemap Files | |
jimdofree.com: # en | |
jimdofree.com: # de | |
jimdofree.com: # es | |
jimdofree.com: # fr | |
jimdofree.com: # it | |
jimdofree.com: # jp | |
jimdofree.com: # nl | |
skysports.com: # NetStorage | |
skysports.com: # | |
skysports.com: # | |
skysports.com: # | |
skysports.com: # Ajax & JSON | |
skysports.com: # | |
skysports.com: # Sports | |
skysports.com: # | |
skysports.com: # ipad | |
skysports.com: # | |
skysports.com: # ROI | |
skysports.com: # | |
skysports.com: # Backlink Analysis | |
newsnow.co.uk: # All robots all dirs | |
ulta.com: #Disallow the following URLs | |
ulta.com: #Sitemaps | |
gumtree.com.au: # Parameters | |
gumtree.com.au: ## Do not crawl any parametered URLs | |
gumtree.com.au: ## Except for URLs containing only the following parameters (and combinations) | |
mundodeportivo.com: # Bots nocivos | |
mundodeportivo.com: # Paths a no indexar | |
mundodeportivo.com: # New | |
mundodeportivo.com: # Zona Resultados | |
economist.com: # robots.txt | |
economist.com: # | |
economist.com: # Sitemap | |
economist.com: # Specific robot directives: | |
economist.com: # Description : Google AdSense delivers advertisements to a broad network of affiliated sites. | |
economist.com: # A robot analyses the pages that display the ads in order to target the ads to the page content. | |
economist.com: # Description : The Grapeshot crawler is an automated robot that visits pages to examine and analyse the content. | |
economist.com: # This adds an exception to crawl delay while preserving disallows. | |
economist.com: # No robots are allowed to index private paths: | |
economist.com: # Directories | |
economist.com: # Files | |
economist.com: # Paths (clean URLs) | |
economist.com: # Paths (no trailing /, beware this will stop file like /admin.html being | |
economist.com: # indexed if we had any) | |
economist.com: # Paths (no clean URLs) | |
economist.com: # Coldfusion paths | |
economist.com: # Print pages | |
economist.com: # Hidden articles | |
economist.com: # Allowed items | |
economist.com: # Comment urls deprecation | |
economist.com: # Prevent crawling podcast RSS file | |
economist.com: # Reading list | |
olx.ro: # sitecode:olxro-desktop | |
housing.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
housing.com: # | |
housing.com: # To ban all spiders from the entire site uncomment the next two lines: | |
housing.com: # User-Agent: * | |
housing.com: # Disallow: / | |
dailythanthi.com: # Sitemap Files | |
elintransigente.com: # Lana Sitemap version 1.0.3 - http://wp.lanaprojekt.hu/blog/wordpress-plugins/lana-sitemap/ | |
edf.fr: # | |
edf.fr: # robots.txt | |
edf.fr: # | |
edf.fr: # This file is to prevent the crawling and indexing of certain parts | |
edf.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
edf.fr: # and Google. By telling these "robots" where not to go on your site, | |
edf.fr: # you save bandwidth and server resources. | |
edf.fr: # | |
edf.fr: # This file will be ignored unless it is at the root of your host: | |
edf.fr: # Used: http://example.com/robots.txt | |
edf.fr: # Ignored: http://example.com/site/robots.txt | |
edf.fr: # | |
edf.fr: # For more information about the robots.txt standard, see: | |
edf.fr: # http://www.robotstxt.org/robotstxt.html | |
edf.fr: # Mr Roger Bot | |
edf.fr: # BLEXBot | |
edf.fr: # Directories | |
edf.fr: # Files | |
edf.fr: # Paths (clean URLs) | |
edf.fr: # Paths (no clean URLs) | |
bodybuilding.com: # Adding for Sapient SEO per SEOS-9 ticket | |
blondieshop.com: # Visita al massimo una pagina ogni 5 secondi | |
blondieshop.com: # Visita soltanto tra le 24:00 AM e le 6:45 AM UT (GMT) | |
blondieshop.com: # Directories | |
blondieshop.com: # Disallow: /media/ // Allow this folder for google product caching | |
blondieshop.com: # Paths (clean URLs) | |
blondieshop.com: # Paths (no clean URLs) | |
mass.gov: # | |
mass.gov: # robots.txt | |
mass.gov: # | |
mass.gov: # This file is to prevent the crawling and indexing of certain parts | |
mass.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
mass.gov: # and Google. By telling these "robots" where not to go on your site, | |
mass.gov: # you save bandwidth and server resources. | |
mass.gov: # | |
mass.gov: # This file will be ignored unless it is at the root of your host: | |
mass.gov: # Used: http://example.com/robots.txt | |
mass.gov: # Ignored: http://example.com/site/robots.txt | |
mass.gov: # | |
mass.gov: # For more information about the robots.txt standard, see: | |
mass.gov: # http://www.robotstxt.org/robotstxt.html | |
mass.gov: # CSS, JS, Images | |
mass.gov: # Directories | |
mass.gov: # Files | |
mass.gov: # Paths (clean URLs) | |
mass.gov: # Paths (no clean URLs) | |
kariyer.net: # Disallow: /WebSite/BasinOdasi/ --- Henuz aktif degil | |
cryptocompare.com: # robots.txt for Umbraco | |
typosthes.gr: # | |
typosthes.gr: # robots.txt | |
typosthes.gr: # | |
typosthes.gr: # This file is to prevent the crawling and indexing of certain parts | |
typosthes.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
typosthes.gr: # and Google. By telling these "robots" where not to go on your site, | |
typosthes.gr: # you save bandwidth and server resources. | |
typosthes.gr: # | |
typosthes.gr: # This file will be ignored unless it is at the root of your host: | |
typosthes.gr: # Used: http://example.com/robots.txt | |
typosthes.gr: # Ignored: http://example.com/site/robots.txt | |
typosthes.gr: # | |
typosthes.gr: # For more information about the robots.txt standard, see: | |
typosthes.gr: # http://www.robotstxt.org/robotstxt.html | |
typosthes.gr: # CSS, JS, Images | |
typosthes.gr: # Directories | |
typosthes.gr: # Files | |
typosthes.gr: # Paths (clean URLs) | |
typosthes.gr: # Paths (no clean URLs) | |
logitech.com: # Logitech | |
logitech.com: # Modified Jan 25. 2021 | |
subhd.com: # Block MegaIndex.ru | |
pagesix.com: # Sitemap archive | |
pagesix.com: # Additional sitemaps | |
google.rs: # AdsBot | |
google.rs: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
prestashop.com: # | |
prestashop.com: # robots.txt | |
prestashop.com: # | |
prestashop.com: # This file is to prevent the crawling and indexing of certain parts | |
prestashop.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
prestashop.com: # and Google. By telling these "robots" where not to go on your site, | |
prestashop.com: # you save bandwidth and server resources. | |
prestashop.com: # | |
prestashop.com: # This file will be ignored unless it is at the root of your host: | |
prestashop.com: # Used: http://example.com/robots.txt | |
prestashop.com: # Ignored: http://example.com/site/robots.txt | |
prestashop.com: # | |
prestashop.com: # For more information about the robots.txt standard, see: | |
prestashop.com: # http://www.robotstxt.org/robotstxt.html | |
prestashop.com: # CSS, JS, Images | |
prestashop.com: # Directories | |
prestashop.com: # Files | |
prestashop.com: # Paths (clean URLs) | |
prestashop.com: # Paths (no clean URLs) | |
prestashop.com: #Forum | |
bithumb.com: # BITHUMB.com Robots.txt | |
bithumb.com: # Sitemap | |
gazzetta.gr: # robots.txt | |
gazzetta.gr: # | |
gazzetta.gr: # This file is to prevent the crawling and indexing of certain parts | |
gazzetta.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
gazzetta.gr: # and Google. By telling these "robots" where not to go on your site, | |
gazzetta.gr: # you save bandwidth and server resources. | |
gazzetta.gr: # | |
gazzetta.gr: # This file will be ignored unless it is at the root of your host: | |
gazzetta.gr: # Used: http://example.com/robots.txt | |
gazzetta.gr: # Ignored: http://example.com/site/robots.txt | |
gazzetta.gr: # | |
gazzetta.gr: # For more information about the robots.txt standard, see: | |
gazzetta.gr: # http://www.robotstxt.org/robotstxt.html | |
gazzetta.gr: # | |
gazzetta.gr: # | |
gazzetta.gr: # | |
gazzetta.gr: # | |
gazzetta.gr: # CSS, JS, Images | |
gazzetta.gr: # Custom | |
gazzetta.gr: #Disallow: /breaking-blog | |
gazzetta.gr: # Directories | |
gazzetta.gr: # Files | |
gazzetta.gr: # Paths (clean URLs) | |
gazzetta.gr: # Paths (no clean URLs) | |
google.com.ly: # AdsBot | |
google.com.ly: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
seedit.com: #diymodule_show_121{ margin-left:-25px;} | |
seedit.com: #diymodule_show_121 .web_logo img{padding-top:1rem; border:none; } | |
seedit.com: #diymodule_show_121 .web_logo img:hover{opacity:0.8;} | |
seedit.com: #top_layout_out #top_layout_inner #layout_top {white-space: nowrap;} | |
seedit.com: #diymodule_show_119{marign:0px;padding:0px;} | |
seedit.com: #diymodule_show_119_119_html{marign:0px;padding:0px;border:none; background:none;} | |
seedit.com: #diymodule_show_119 .module_title{display:none;} | |
seedit.com: #diymodule_show_119 .content{marign:0px;padding:0px;; } | |
seedit.com: #diymodule_show_119 img{padding-left:10px; padding-top:42px; border:none;width:80%;} | |
seedit.com: #diymodule_show_119 img:hover{opacity:0.7;} | |
seedit.com: #index_navigation_html{} | |
seedit.com: #index_navigation {} | |
seedit.com: #index_navigation .nv img{ display:none;} | |
seedit.com: #index_navigation .nv{ text-align:left; margin:0px; padding:0px; height:50px; height:3.572rem; line-height:50px;line-height:3.572rem; font-size:1rem; white-space:nowrap;} | |
seedit.com: #index_navigation .nv a{ white-space:nowrap;} | |
seedit.com: #index_navigation .nv .nv_ul{ width:100%;} | |
seedit.com: #index_navigation .nv > ul{ white-space:nowrap;} | |
seedit.com: #index_navigation .nv ul ul {display: none;} | |
seedit.com: #index_navigation .nv ul li:hover > ul {display: block;} | |
seedit.com: #index_navigation .nv ul {list-style: none;position: relative;display: inline-block;white-space:nowrap; z-index:9;} | |
seedit.com: #index_navigation .nv ul:after {content: ""; clear: both; display: block;} | |
seedit.com: #index_navigation .nv ul > li { display: inline-block;text-align: center; } | |
seedit.com: #index_navigation .nv ul li:hover {background:#e31939;} | |
seedit.com: #index_navigation .nv ul li:hover a { color:#fff;} | |
seedit.com: #index_navigation .nv ul li a {display: block; text-decoration: none; padding-left:1.5rem; padding-right:1.5rem;} | |
seedit.com: #index_navigation .nv ul li a i{ padding-left:5px;} | |
seedit.com: #index_navigation .nv ul ul {line-height:40px;line-height:2.85rem;background:#e31939; border-radius: 0px; padding: 0;position: absolute; top: 100%;} | |
seedit.com: #index_navigation .nv ul ul li { display:block; width:100%;float: none;position: relative; text-align:left;} | |
seedit.com: #index_navigation .nv ul ul li a {} | |
seedit.com: #index_navigation .nv ul ul li a i{ float:right;line-height:40px;line-height:2.85rem; margin-left:10px;} | |
seedit.com: #index_navigation .nv ul ul li a .fa-angle-right{ padding-right:10px; } | |
seedit.com: #index_navigation .nv ul ul li:hover {background:#ed5f74;} | |
seedit.com: #index_navigation .nv ul ul ul {width:100%;position: absolute; left: 100%; top:0; } | |
seedit.com: #index_navigation .nv ul ul ul li:hover{} | |
seedit.com: #index_navigation .nv ul li:last-child:hover > ul li ul{ left:-100%; text-align:right;} | |
seedit.com: #index_navigation .nv ul li:last-child:hover > ul li ul a{padding-right: 25px ;} | |
seedit.com: #index_navigation .nv ul li .have_three { font-weight:bold; margin-bottom:20px; } | |
seedit.com: #index_navigation .nv ul li .have_three li{ font-weight: normal; } | |
seedit.com: #index_navigation .nv ul li .have_three:hover{ background:none; } | |
seedit.com: #index_navigation .nv ul li .have_three > a{ border-bottom:#fff 1px dashed;} | |
seedit.com: #index_navigation .nv ul li .have_three a:hover{background:#ff9900; } | |
seedit.com: #index_navigation .nv ul li .have_three ul{background: #ff4a00; } | |
seedit.com: #index_navigation .nv ul li .have_three i{ display:none; } | |
seedit.com: #index_navigation .nv ul li .multiple_columns{ width:43rem; height:auto; white-space:normal; text-align:left; } | |
seedit.com: #index_navigation .nv ul li .multiple_columns li{display:inline-block; height:auto; border-top:none; vertical-align:top; width:200px; padding-left:10px; } | |
seedit.com: #index_navigation .nv ul li .multiple_columns li a{ display:block; text-align:left;} | |
seedit.com: #index_navigation .nv ul li .multiple_columns li ul{ display:block; position:static; height:auto; white-space:normal; } | |
seedit.com: #index_navigation .nv ul li .multiple_columns li ul li{ display:block; width:100%; height:32px; line-height:32px; height:2.3rem; line-height:2.3rem;} | |
seedit.com: #index_navigation .nv ul li:last-child:hover > ul li ul a{padding-right: 25px ;} | |
seedit.com: #index_navigation #mall_type_all{ display:inline-block; vertical-align:top; width:17.95rem; text-align:left; font-size:18px; margin-right:20px;background:#e31939; } | |
seedit.com: #index_navigation #mall_type_all a{ color:#fff;} | |
seedit.com: #index_navigation #mall_type_all:hover #mall_type_module_two{ display:block;} | |
seedit.com: #index_navigation #mall_type_all .text{ display:block; padding-left:12px;} | |
seedit.com: #index_navigation #mall_type_all:hover{ } | |
seedit.com: #index_navigation #mall_type_module_two{ display:none; position:absolute; top:290px; left:6px; z-index:9999;} | |
seedit.com: #index_fixed_top{ z-index:99999999; width:100%; background:rgba(255,255,255,0.96); ; position:fixed; top:0px; box-shadow: rgba(0,0,0,.2) 0 1px 5px; height:50px; overflow:hidden; display:none;} | |
seedit.com: #index_fixed_top_html >div{ display:inline-block; vertical-align:top;} | |
seedit.com: #index_fixed_top .logo_area{ display:inline-block; vertical-align:top; width:30%; overflow:hidden;} | |
seedit.com: #index_fixed_top .logo_area img{ width:250px; margin-top:-24px;} | |
seedit.com: #index_fixed_top .logo_area img:hover{ opacity:0.9;} | |
seedit.com: #index_fixed_top .search_area{ display:inline-block; vertical-align:top; width:40%; overflow:hidden;} | |
seedit.com: #index_fixed_top .search_area .search_div{ height:34px; line-height:34px; border:1px solid #e93853; margin-top:8px;} | |
seedit.com: #index_fixed_top .search_area .search_div input{height:32px; line-height:32px; display:inline-block; vertical-align:top; width:85%; overflow:hidden;} | |
seedit.com: #index_fixed_top .search_area .search_div a{ height:34px; line-height:34px;display:inline-block; vertical-align:top; width:15%; overflow:hidden; text-align:center; background:#e93853; color:#ffffff; cursor:pointer;} | |
seedit.com: #index_fixed_top .search_area .search_div a:hover{ opacity:0.9;} | |
seedit.com: #index_fixed_top .user_aera{width:30%;line-height:45px; text-align:right; padding-right:30px;} | |
seedit.com: #index_fixed_top .user_aera a{ display:inline-block; vertical-align:top; line-height:45px;} | |
seedit.com: #index_fixed_top .user_aera #icon_img{ margin-top:10px;line-height:50px;} | |
seedit.com: #index_fixed_top .user_aera .top_a{ display:none;} | |
seedit.com: #index_fixed_top .user_aera #icon_a{ } | |
seedit.com: #index_fixed_top .user_aera #nickname{ } | |
seedit.com: #index_fixed_top .user_aera #unlogin{ background:#e93853; color:#ffffff; display:inline-block; vertical-align:top; line-height:30px; height:30px; padding-left:1rem; padding-right:1rem; margin-top:8px; border-radius:3px; } | |
seedit.com: #index_fixed_top .user_aera #unlogin:before{ display:none;} | |
seedit.com: #index_fixed_top .user_aera #login{ color:#e93853; } | |
seedit.com: #index_fixed_top .user_aera #login:before{ display:none;} | |
seedit.com: #index_fixed_top .user_aera #reg_user{ background:#e93853; color:#ffffff; display:inline-block; vertical-align:top; line-height:30px; height:30px; padding-left:1rem; padding-right:1rem; margin-top:8px; border-radius:3px; } | |
seedit.com: #index_fixed_top .user_aera #reg_user:before{ display:none;} | |
seedit.com: #ci_type_module_two{font-weight:normal; text-indent:0px; width:17.85rem; height:32.14rem; vertical-align:top; border-top:1px solid #e8e8e8; border-bottom:1px solid #e8e8e8; display:inline-block; vertical-align:top; background-color:#fcfcfc;padding:0px;box-shadow:none;margin:0px;} | |
seedit.com: #ci_type_module_two_html{} | |
seedit.com: #ci_type_module_two_html .more{ padding-left:8px; line-height:30px;} | |
seedit.com: #ci_type_module_two_html .parent{ height:4.9rem; line-height:2rem; border-bottom: #e8e8e8 1px solid; } | |
seedit.com: #ci_type_module_two_html .parent .level_1_div{ padding-left:10px;} | |
seedit.com: #ci_type_module_two_html .parent:hover{position: relative;z-index:991;border-left:0.15rem solid #e93853;background-color:#fcfcfc; } | |
seedit.com: #ci_type_module_two_html .parent:hover .part_b_div{ display:block;} | |
seedit.com: #ci_type_module_two_html .parent:hover .level_1_div .level_1:after{ display:none;} | |
seedit.com: #ci_type_module_two_html .parent:hover .level_1_div{ } | |
seedit.com: #ci_type_module_two_html .parent:hover .level_1_div .level_1{background-image:none;background-color:#fcfcfc; } | |
seedit.com: #ci_type_module_two_html .parent:hover .part_a_div{background-color:#fcfcfc; } | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .level_1{ display:block; font-size:16px; position: relative; z-index:991;} | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .level_1:after{margin-right:8px; float:right; font: normal normal normal 1rem/1 FontAwesome; content:"\f105"; padding-top:5px; color: #999;} | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .level_1 a{color:#e93853;} | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div a{color:#000;} | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div{position: relative;z-index:991; width:100%; padding-bottom:0.8rem; } | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div .level_2{font-size:15px; display:inline-block; vertical-align:top; height:25px; width:29%;margin-right:4%; overflow:hidden;white-space:nowrap;text-overflow:ellipsis;} | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div:hover{ } | |
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div .level_2:hover{ display:inline-block;color:#ed5f74;} | |
seedit.com: #ci_type_module_two_html .parent .part_b_div{ padding-left:20px; display:none; position: relative; padding-top:25px; left:17.5rem; top:-4.85rem; min-height:4.9rem; width:32.14rem; z-index:99; background-color: #fcfcfc; border:1px solid #e8e8e8; white-space:normal;} | |
seedit.com: #ci_type_module_two_html .parent .part_b_div .level_2{ font-size:15px; display:inline-block; vertical-align:top; width:120px; margin-right:20px; height:35px; line-height:35px; overflow:hidden;white-space:nowrap;text-overflow:ellipsis;} | |
seedit.com: #ci_type_module_two_html .parent .part_b_div .level_2:hover{color:#ed5f74;} | |
seedit.com: #slider_show_16_16_html .up_text{ position:relative; top:-100%; color:#fff; } | |
seedit.com: #slider_show_16_16_html .up_text a{color:#fff;} | |
seedit.com: #slider_show_16_16_html .up_text .slider_name{ font-size:4rem; font-weight:bold; line-height:7.14rem; height:7.14rem!important; } | |
seedit.com: #slider_show_16_16_html .up_text .slider_description{ padding:1rem; line-height:4rem; height:13rem!important; font-size:2rem; opacity:0; } | |
seedit.com: #slider_show_16_16_html .up_text .slider_summary{ line-height:3rem; height:3rem!important; opacity:0; } | |
seedit.com: #slider_show_16_16_html .up_text .slider_summary a{ display:inline-block; border:1px solid #FFF; border-radius:0.5rem; padding:0.5rem;font-size:2rem;} | |
seedit.com: #index_index_user { height:320px; width:18%; right:5%; top:225px; border-top:1px solid #ccc; overflow:hidden;white-space:normal; position: absolute; z-index:1; background:rgba(255,255,255,0.97); opacity:0; } | |
seedit.com: #index_index_user_html >div{} | |
seedit.com: #index_index_user .r_user_state{ text-align:center; padding-bottom:15px;} | |
seedit.com: #index_index_user .r_user_state .mall_home{ display:none; } | |
seedit.com: #index_index_user .r_user_state .my_order{ display:none; } | |
seedit.com: #index_index_user .r_user_state .my_collection{ display:none; } | |
seedit.com: #index_index_user .r_user_state #hello{ display:none; } | |
seedit.com: #index_index_user .r_user_state a{ display:block; } | |
seedit.com: #index_index_user .r_user_state #icon_img{ display:block; margin:auto; width:60px; height:60px; border-radius:30px; margin-top:1rem; border:2px solid #fff; } | |
seedit.com: #index_index_user .r_user_state #nickname span{ display:none;} | |
seedit.com: #index_index_user .r_user_state #nickname{ line-height:2rem; display:block; vertical-align:top; width:100%; text-align:center;} | |
seedit.com: #index_index_user .r_user_state #unlogin{ display:inline-block; vertical-align:top; width:6rem; text-align:center; background:#ed5f74; color:#fff; border-radius:12px; line-height:1.8rem;} | |
seedit.com: #index_index_user .r_user_state a{ margin:0px; padding:0px;} | |
seedit.com: #index_index_user .r_user_state a:hover{ color:#ed5f74; opacity:0.8;} | |
seedit.com: #index_index_user .r_user_state a:before{ display:none;} | |
seedit.com: #index_index_user .r_user_state .default_user_icon{display:block; margin:auto; width:60px; height:60px; border-radius:30px; margin-top:1rem; background: rgba(237,237,237,1.00);} | |
seedit.com: #index_index_user .r_user_state .default_user_icon:before {font: normal normal normal 60px/1 FontAwesome; margin-right: 5px; | |
seedit.com: #index_index_user .r_user_state #login{ display:inline-block; vertical-align:top; border-radius:12px; box-shadow: 6px 8px 20px rgba(45,45,45,.15); text-align:center; width:5rem; margin-right:5px;} | |
seedit.com: #index_index_user .r_user_state #reg_user{ display:inline-block; vertical-align:top; background:#ed5f74; color:#fff; width:5rem; margin-left:5px;border-radius:12px;} | |
seedit.com: #index_index_user .r_user_state .welcome_to_come{ line-height:3rem;} | |
seedit.com: #index_index_user .r_article{ padding:10px; height:100px; overflow:hidden; margin-left:1rem;} | |
seedit.com: #index_index_user .r_article a:before {font: normal normal normal 3px/0.3 FontAwesome; margin-right: 5px; content: "\f0c8"; color:#ccc; font-size:3px;} | |
seedit.com: #index_index_user .r_article a{ display:block;white-space: nowrap; text-overflow: ellipsis; overflow:hidden; line-height:2.5rem; border-bottom:1px dashed #ccc;} | |
seedit.com: #index_index_user .r_icons{ padding-top:0.5rem; } | |
seedit.com: #index_index_user .r_icons a{ text-align:center; display:inline-block; vertical-align:top; width:33.33%; overflow:hidden; } | |
seedit.com: #index_index_user .r_icons a:hover{ opacity:0.7;} | |
seedit.com: #index_index_user .r_icons a img{ width:40%; } | |
seedit.com: #index_index_user .r_icons a span{ display:block; font-size:0.8rem; margin-top:0.5rem; margin-bottom:0.5rem; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #id899287572864c4218f23003ab9925d55{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id899287572864c4218f23003ab9925d55_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px; | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html{ white-space:nowrap;overflow: hidden;height: 90%;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .more:hover{ float:right; font-size:14px; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .cover_image{ display:inline-block; vertical-align:top;width:;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .cover_image img{width:;border:none; padding-top:2px;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line:hover{ background:#f3f3f3; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .icon img{ width:90%; max-height:65px; border:none;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; } | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;} | |
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .other .price .number{ font-weight:bold;} | |
seedit.com: #return_top{display:none;} | |
strikingly.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
strikingly.com: # | |
strikingly.com: # To ban all spiders from the entire site uncomment the next two lines: | |
strikingly.com: # User-Agent: * | |
strikingly.com: # Disallow: / | |
strikingly.com: # Google adsbot ignores robots.txt unless specifically named! | |
qiqitv.info: # global | |
vistaprint.com: # Crawling Rules - Last Update on 11/07/2019 | |
google.hr: # AdsBot | |
google.hr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
stellamccartney.com: # Disallow tricombot. | |
verajohn.com: # robots.txt desktop 25/05/18 | |
docusign.net: # go away | |
getbootstrap.com: # www.robotstxt.org | |
getbootstrap.com: # Allow crawling of all content | |
arstechnica.com: # Google Image | |
arstechnica.com: # Google AdSense | |
arstechnica.com: # Global | |
arstechnica.com: # phpBB | |
eastdane.com: #Sitemap updated 08/31/2018 | |
sketchup.com: # | |
sketchup.com: # robots.txt | |
sketchup.com: # | |
sketchup.com: # This file is to prevent the crawling and indexing of certain parts | |
sketchup.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sketchup.com: # and Google. By telling these "robots" where not to go on your site, | |
sketchup.com: # you save bandwidth and server resources. | |
sketchup.com: # | |
sketchup.com: # This file will be ignored unless it is at the root of your host: | |
sketchup.com: # Used: http://example.com/robots.txt | |
sketchup.com: # Ignored: http://example.com/site/robots.txt | |
sketchup.com: # | |
sketchup.com: # For more information about the robots.txt standard, see: | |
sketchup.com: # http://www.robotstxt.org/robotstxt.html | |
sketchup.com: # CSS, JS, Images | |
sketchup.com: # Directories | |
sketchup.com: # Files | |
sketchup.com: # Paths (clean URLs) | |
sketchup.com: # Paths (no clean URLs) | |
thestreet.com: # Tempest - thestreet | |
google.ie: # AdsBot | |
google.ie: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
wikisource.org: # | |
wikisource.org: # Please note: There are a lot of pages on this site, and there are | |
wikisource.org: # some misbehaved spiders out there that go _way_ too fast. If you're | |
wikisource.org: # irresponsible, your access to the site may be blocked. | |
wikisource.org: # | |
wikisource.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN | |
wikisource.org: # and ignoring 429 ratelimit responses, claims to respect robots: | |
wikisource.org: # http://mj12bot.com/ | |
wikisource.org: # advertising-related bots: | |
wikisource.org: # Wikipedia work bots: | |
wikisource.org: # Crawlers that are kind enough to obey, but which we'd rather not have | |
wikisource.org: # unless they're feeding search engines. | |
wikisource.org: # Some bots are known to be trouble, particularly those designed to copy | |
wikisource.org: # entire sites. Please obey robots.txt. | |
wikisource.org: # Misbehaving: requests much too fast: | |
wikisource.org: # | |
wikisource.org: # Sorry, wget in its recursive mode is a frequent problem. | |
wikisource.org: # Please read the man page and use it properly; there is a | |
wikisource.org: # --wait option you can use to set the delay between hits, | |
wikisource.org: # for instance. | |
wikisource.org: # | |
wikisource.org: # | |
wikisource.org: # The 'grub' distributed client has been *very* poorly behaved. | |
wikisource.org: # | |
wikisource.org: # | |
wikisource.org: # Doesn't follow robots.txt anyway, but... | |
wikisource.org: # | |
wikisource.org: # | |
wikisource.org: # Hits many times per second, not acceptable | |
wikisource.org: # http://www.nameprotect.com/botinfo.html | |
wikisource.org: # A capture bot, downloads gazillions of pages with no public benefit | |
wikisource.org: # http://www.webreaper.net/ | |
wikisource.org: # | |
wikisource.org: # Friendly, low-speed bots are welcome viewing article pages, but not | |
wikisource.org: # dynamically-generated pages please. | |
wikisource.org: # | |
wikisource.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your | |
wikisource.org: # bot supports such a thing using the 'Crawl-delay' or another | |
wikisource.org: # instruction, please let us know. | |
wikisource.org: # | |
wikisource.org: # There is a special exception for API mobileview to allow dynamic | |
wikisource.org: # mobile web & app views to load section content. | |
wikisource.org: # These views aren't HTTP-cached but use parser cache aggressively | |
wikisource.org: # and don't expose special: pages etc. | |
wikisource.org: # | |
wikisource.org: # Another exception is for REST API documentation, located at | |
wikisource.org: # /api/rest_v1/?doc. | |
wikisource.org: # | |
wikisource.org: # | |
wikisource.org: # ar: | |
wikisource.org: # | |
wikisource.org: # dewiki: | |
wikisource.org: # T6937 | |
wikisource.org: # sensible deletion and meta user discussion pages: | |
wikisource.org: # 4937#5 | |
wikisource.org: # T14111 | |
wikisource.org: # T15961 | |
wikisource.org: # | |
wikisource.org: # enwiki: | |
wikisource.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for | |
wikisource.org: # their name. See T6776 | |
wikisource.org: # T15398 | |
wikisource.org: # T16075 | |
wikisource.org: # T13261 | |
wikisource.org: # T12288 | |
wikisource.org: # T16793 | |
wikisource.org: # | |
wikisource.org: # eswiki: | |
wikisource.org: # T8746 | |
wikisource.org: # | |
wikisource.org: # fiwiki: | |
wikisource.org: # T10695 | |
wikisource.org: # | |
wikisource.org: # hewiki: | |
wikisource.org: #T11517 | |
wikisource.org: # | |
wikisource.org: # huwiki: | |
wikisource.org: # | |
wikisource.org: # itwiki: | |
wikisource.org: # T7545 | |
wikisource.org: # | |
wikisource.org: # jawiki | |
wikisource.org: # T7239 | |
wikisource.org: # nowiki | |
wikisource.org: # T13432 | |
wikisource.org: # | |
wikisource.org: # plwiki | |
wikisource.org: # T10067 | |
wikisource.org: # | |
wikisource.org: # ptwiki: | |
wikisource.org: # T7394 | |
wikisource.org: # | |
wikisource.org: # rowiki: | |
wikisource.org: # T14546 | |
wikisource.org: # | |
wikisource.org: # ruwiki: | |
wikisource.org: # | |
wikisource.org: # svwiki: | |
wikisource.org: # T12229 | |
wikisource.org: # T13291 | |
wikisource.org: # | |
wikisource.org: # zhwiki: | |
wikisource.org: # T7104 | |
wikisource.org: # | |
wikisource.org: # sister projects | |
wikisource.org: # | |
wikisource.org: # enwikinews: | |
wikisource.org: # T7340 | |
wikisource.org: # | |
wikisource.org: # itwikinews | |
wikisource.org: # T11138 | |
wikisource.org: # | |
wikisource.org: # enwikiquote: | |
wikisource.org: # T17095 | |
wikisource.org: # | |
wikisource.org: # enwikibooks | |
wikisource.org: # | |
wikisource.org: # working... | |
wikisource.org: # | |
wikisource.org: # | |
wikisource.org: # | |
wikisource.org: #----------------------------------------------------------# | |
wikisource.org: # | |
wikisource.org: # | |
wikisource.org: # | |
ikco.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
ikco.ir: #content{margin:0 0 0 2%;position:relative;} | |
ew.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
ew.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
ew.com: # sitemaps | |
ew.com: # current CMS | |
ew.com: # ONECMS | |
ew.com: # Content | |
ew.com: # ONECMS | |
ew.com: # Content | |
ew.com: # ONECMS | |
ew.com: # Content | |
anses.gob.ar: # | |
anses.gob.ar: # robots.txt | |
anses.gob.ar: # | |
anses.gob.ar: # This file is to prevent the crawling and indexing of certain parts | |
anses.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo! | |
anses.gob.ar: # and Google. By telling these "robots" where not to go on your site, | |
anses.gob.ar: # you save bandwidth and server resources. | |
anses.gob.ar: # | |
anses.gob.ar: # This file will be ignored unless it is at the root of your host: | |
anses.gob.ar: # Used: http://example.com/robots.txt | |
anses.gob.ar: # Ignored: http://example.com/site/robots.txt | |
anses.gob.ar: # | |
anses.gob.ar: # For more information about the robots.txt standard, see: | |
anses.gob.ar: # http://www.robotstxt.org/robotstxt.html | |
anses.gob.ar: # CSS, JS, Images | |
anses.gob.ar: # Directories | |
anses.gob.ar: # Files | |
anses.gob.ar: # Paths (clean URLs) | |
anses.gob.ar: # Paths (no clean URLs) | |
mt.co.kr: # Robots for www.mt.co.kr | |
mt.co.kr: # ETC | |
mt.co.kr: # SiteMap | |
techtudo.com.br: # | |
techtudo.com.br: # robots.txt | |
techtudo.com.br: # | |
panasonic.com: #sitemap | |
ibm.com: # $Id: robots.txt,v 1.88 2020/07/20 13:41:39 jliao Exp $ | |
ibm.com: # | |
ibm.com: # This is a file retrieved by webwalkers a.k.a. spiders that | |
ibm.com: # conform to a defacto standard. | |
ibm.com: # See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt> | |
ibm.com: # | |
ibm.com: # Comments to the webmaster should be posted at <URL:http://www.ibm.com/contact> | |
ibm.com: # | |
ibm.com: # Format is: | |
ibm.com: # User-agent: <name of spider> | |
ibm.com: # Disallow: <nothing> | <path> | |
ibm.com: # ------------------------------------------------------------------------------ | |
ibm.com: # Disallow: /homepage | |
ibm.com: # Disallow: /internal | |
ibm.com: # Added for EI-2179 on 17Apr2020 | |
ibm.com: # Added by JLiao for SD EI-2359,EI-2360 on 23Jun2020 | |
ibm.com: #Added for EI-2216 on 06May2020 | |
ibm.com: # Added for IN4173782 on 7Aug2013 | |
ibm.com: # Added for IN4177562 on 8Aug2013 | |
ibm.com: # Added for IN4177562 on 8Aug2013 | |
ibm.com: # Added to block site mirroring | |
10010.com: # robots.txt,Baidu&SoSo&sogou&Yodao&Google spider are allowed; /bin/&/e3/&/e4/directory are disallowed. | |
scmp.com: # | |
scmp.com: # robots.txt | |
scmp.com: # | |
scmp.com: # This file is to prevent the crawling and indexing of certain parts | |
scmp.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
scmp.com: # and Google. By telling these "robots" where not to go on your site, | |
scmp.com: # you save bandwidth and server resources. | |
scmp.com: # | |
scmp.com: # This file will be ignored unless it is at the root of your host: | |
scmp.com: # Used: http://example.com/robots.txt | |
scmp.com: # Ignored: http://example.com/site/robots.txt | |
scmp.com: # | |
scmp.com: # For more information about the robots.txt standard, see: | |
scmp.com: # http://www.robotstxt.org/robotstxt.html | |
scmp.com: # | |
scmp.com: # For syntax checking, see: | |
scmp.com: # http://www.sxw.org.uk/computing/robots/check.html | |
scmp.com: # PWA | |
scmp.com: # Directories | |
scmp.com: # Path | |
scmp.com: # CSS, JS, Image | |
scmp.com: # Directories | |
scmp.com: # Files | |
scmp.com: # Paths (clean URLs) | |
scmp.com: # Paths (no clean URLs) | |
scmp.com: # Opebot - For 1plusX | |
scmp.com: # NewsNow | |
scmp.com: # GrapeShot | |
scmp.com: # Ads | |
scmp.com: # Sitemap | |
royalmail.com: # | |
royalmail.com: # robots.txt | |
royalmail.com: # | |
royalmail.com: # This file is to prevent the crawling and indexing of certain parts | |
royalmail.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
royalmail.com: # and Google. By telling these "robots" where not to go on your site, | |
royalmail.com: # you save bandwidth and server resources. | |
royalmail.com: # | |
royalmail.com: # This file will be ignored unless it is at the root of your host: | |
royalmail.com: # Used: http://example.com/robots.txt | |
royalmail.com: # Ignored: http://example.com/site/robots.txt | |
royalmail.com: # | |
royalmail.com: # For more information about the robots.txt standard, see: | |
royalmail.com: # http://www.robotstxt.org/robotstxt.html | |
royalmail.com: # | |
royalmail.com: # For syntax checking, see: | |
royalmail.com: # http://www.frobee.com/robots-txt-check | |
royalmail.com: # | |
royalmail.com: # Core rules | |
royalmail.com: # | |
royalmail.com: # CSS, JS, Images | |
royalmail.com: # Directories | |
royalmail.com: # Files | |
royalmail.com: # Paths (clean URLs) | |
royalmail.com: # Paths (no clean URLs) | |
royalmail.com: # Files | |
royalmail.com: # Paths (clean URLs) | |
royalmail.com: # Paths (no clean URLs) | |
royalmail.com: # | |
royalmail.com: # Custom rules | |
royalmail.com: # | |
royalmail.com: # Node pages & Welsh-language equivalent pages | |
royalmail.com: # Quote Journeys | |
royalmail.com: # Common causes of duplication | |
pagina12.com.ar: # robots.txt for https://www.pagina12.com.ar/ | |
bd-pratidin.com: # Crawl bd-pratidin.com, | |
mail.com: #https://www.mail.com/robots.txt | |
orf.at: # don't index redirects | |
orf.at: # these robots have been bad once: | |
google.bg: # AdsBot | |
google.bg: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
gradescope.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
gradescope.com: # | |
gradescope.com: # To ban all spiders from the entire site uncomment the next two lines: | |
gradescope.com: # User-agent: * | |
gradescope.com: # Disallow: / | |
chrono24.com: # robots.txt for http://www.chrono24.com | |
leroymerlin.ru: # UTM cleaning | |
freepik.es: # Google AdSense | |
freepik.es: # Adsbot-Google | |
freepik.es: # Twitter Bot | |
rakuten.com: Binary file (standard input) matches | |
soy502.com: # | |
soy502.com: # robots.txt | |
soy502.com: # | |
soy502.com: # This file is to prevent the crawling and indexing of certain parts | |
soy502.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
soy502.com: # and Google. By telling these "robots" where not to go on your site, | |
soy502.com: # you save bandwidth and server resources. | |
soy502.com: # | |
soy502.com: # This file will be ignored unless it is at the root of your host: | |
soy502.com: # Used: http://example.com/robots.txt | |
soy502.com: # Ignored: http://example.com/site/robots.txt | |
soy502.com: # | |
soy502.com: # For more information about the robots.txt standard, see: | |
soy502.com: # http://www.robotstxt.org/wc/robots.html | |
soy502.com: # | |
soy502.com: # For syntax checking, see: | |
soy502.com: # http://www.sxw.org.uk/computing/robots/check.html | |
soy502.com: # Bot de SEO Moz | |
soy502.com: # Particulares del Sitio | |
soy502.com: # Files | |
soy502.com: # Paths (clean URLs) | |
soy502.com: # Paths (no clean URLs) | |
soy502.com: # Directories | |
soy502.com: #Disallow: /misc/ | |
soy502.com: #Disallow: /modules/ | |
soy502.com: # Disallow: /sites/ | |
soy502.com: #Allow: http://www.googleadservices.com/pagead/conversion.js | |
uwaterloo.ca: # | |
uwaterloo.ca: # robots.txt | |
uwaterloo.ca: # | |
uwaterloo.ca: # This file is to prevent the crawling and indexing of certain parts | |
uwaterloo.ca: # of your site by web crawlers and spiders run by sites like Yahoo! | |
uwaterloo.ca: # and Google. By telling these "robots" where not to go on your site, | |
uwaterloo.ca: # you save bandwidth and server resources. | |
uwaterloo.ca: # | |
uwaterloo.ca: # This file will be ignored unless it is at the root of your host: | |
uwaterloo.ca: # Used: http://example.com/robots.txt | |
uwaterloo.ca: # Ignored: http://example.com/site/robots.txt | |
uwaterloo.ca: # | |
uwaterloo.ca: # For more information about the robots.txt standard, see: | |
uwaterloo.ca: # http://www.robotstxt.org/wc/robots.html | |
uwaterloo.ca: # | |
uwaterloo.ca: # For syntax checking, see: | |
uwaterloo.ca: # http://www.sxw.org.uk/computing/robots/check.html | |
uwaterloo.ca: # Directories | |
uwaterloo.ca: # Files | |
uwaterloo.ca: # Paths (clean URLs) | |
uwaterloo.ca: # Paths (no clean URLs) | |
gaadiwaadi.com: #td-header-search-button-mob { | |
gaadiwaadi.com: #td-top-mobile-toggle i { | |
laposte.net: #mainHDBox, | |
laposte.net: #VSMP .section-promo { | |
laposte.net: #div-gpt-ad-part-home-banner-0 .regiePub { | |
familysearch.org: # LAST CHANGED: Mo Nov 2 2020, at 18:00:00 GMT+0000 (GMT) | |
familysearch.org: # Version 1.0.7 | |
familysearch.org: ## Specific rules for /wiki/ | |
familysearch.org: # Please note: There are a lot of pages on this site, and there are some misbehaved spiders out there | |
familysearch.org: # that go _way_ too fast. If you're irresponsible, your access to the site may be blocked. | |
familysearch.org: # | |
familysearch.org: # advertising-related bots: | |
familysearch.org: # Wikipedia work bots: | |
familysearch.org: # Crawlers that are kind enough to obey, but which we'd rather not have | |
familysearch.org: # unless they're feeding search engines. | |
familysearch.org: # Some bots are known to be trouble, particularly those designed to copy | |
familysearch.org: # entire sites. Please obey robots.txt. | |
familysearch.org: # Misbehaving: requests much too fast: | |
familysearch.org: # | |
familysearch.org: # Sorry, wget in its recursive mode is a frequent problem. | |
familysearch.org: # Please read the man page and use it properly; there is a | |
familysearch.org: # --wait option you can use to set the delay between hits, | |
familysearch.org: # for instance. | |
familysearch.org: # | |
familysearch.org: # | |
familysearch.org: # The 'grub' distributed client has been *very* poorly behaved. | |
familysearch.org: # | |
familysearch.org: # | |
familysearch.org: # Doesn't follow robots.txt anyway, but... | |
familysearch.org: # | |
familysearch.org: # | |
familysearch.org: # Hits many times per second, not acceptable | |
familysearch.org: # http://www.nameprotect.com/botinfo.html | |
familysearch.org: # A capture bot, downloads gazillions of pages with no public benefit | |
familysearch.org: # http://www.webreaper.net/ | |
familysearch.org: # Wayback Machine | |
familysearch.org: # User-agent: archive.org_bot | |
familysearch.org: # Treated like anyone else | |
familysearch.org: # Allow the Internet Archiver to index action=raw and thereby store the raw wikitext of pages | |
familysearch.org: # | |
familysearch.org: # Friendly, low-speed bots are welcome viewing article pages, but not | |
familysearch.org: # dynamically-generated pages please. | |
familysearch.org: # | |
familysearch.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your | |
familysearch.org: # bot supports such a thing using the 'Crawl-delay' or another | |
familysearch.org: # instruction, please let us know. | |
familysearch.org: # | |
familysearch.org: # There is a special exception for API mobileview to allow dynamic | |
familysearch.org: # mobile web & app views to load section content. | |
familysearch.org: # These views aren't HTTP-cached but use parser cache aggressively | |
familysearch.org: # and don't expose special: pages etc. | |
familysearch.org: # | |
familysearch.org: # Another exception is for REST API documentation, located at | |
familysearch.org: # /api/rest_v1/?doc. | |
familysearch.org: # | |
familysearch.org: # Disallow indexing of non-article content | |
familysearch.org: # | |
rivals.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
zhuwang.cc: # | |
zhuwang.cc: # robots.txt for PHPCMS v9 | |
zhuwang.cc: # | |
windowscentral.com: # | |
windowscentral.com: # robots.txt | |
windowscentral.com: # | |
windowscentral.com: # This file is to prevent the crawling and indexing of certain parts | |
windowscentral.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
windowscentral.com: # and Google. By telling these "robots" where not to go on your site, | |
windowscentral.com: # you save bandwidth and server resources. | |
windowscentral.com: # | |
windowscentral.com: # This file will be ignored unless it is at the root of your host: | |
windowscentral.com: # Used: http://example.com/robots.txt | |
windowscentral.com: # Ignored: http://example.com/site/robots.txt | |
windowscentral.com: # | |
windowscentral.com: # For more information about the robots.txt standard, see: | |
windowscentral.com: # http://www.robotstxt.org/robotstxt.html | |
windowscentral.com: # | |
windowscentral.com: # For syntax checking, see: | |
windowscentral.com: # http://www.frobee.com/robots-txt-check | |
windowscentral.com: # Directories | |
windowscentral.com: # Files | |
windowscentral.com: # Paths (clean URLs) | |
windowscentral.com: # Paths (no clean URLs) | |
goldprice.org: # | |
goldprice.org: # robots.txt | |
goldprice.org: # | |
goldprice.org: # This file is to prevent the crawling and indexing of certain parts | |
goldprice.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
goldprice.org: # and Google. By telling these "robots" where not to go on your site, | |
goldprice.org: # you save bandwidth and server resources. | |
goldprice.org: # | |
goldprice.org: # This file will be ignored unless it is at the root of your host: | |
goldprice.org: # Used: http://example.com/robots.txt | |
goldprice.org: # Ignored: http://example.com/site/robots.txt | |
goldprice.org: # | |
goldprice.org: # For more information about the robots.txt standard, see: | |
goldprice.org: # http://www.robotstxt.org/robotstxt.html | |
goldprice.org: # CSS, JS, Images | |
goldprice.org: # Directories | |
goldprice.org: # Files | |
goldprice.org: # Paths (clean URLs) | |
goldprice.org: # Paths (no clean URLs) | |
lequipe.fr: #V6 - Tous les moteurs sont concern?s | |
lequipe.fr: # | |
lequipe.fr: # | |
lequipe.fr: # le terme suivant nous permet de limiter les temps de passage des robots ? 120 secondes | |
lequipe.fr: # | |
lequipe.fr: # | |
lequipe.fr: # les fichiers suivants ne seront pas index?s par les moteurs | |
lequipe.fr: # | |
lequipe.fr: #ActualitesId < 1000000 | |
lequipe.fr: #Allow key news | |
lequipe.fr: #Old lives | |
yandex.net: # yandex.ru | |
officedepot.com: # Robots.txt file for http://www.officedepot.com | |
tn.gov.in: # | |
tn.gov.in: # robots.txt | |
tn.gov.in: # | |
tn.gov.in: # This file is to prevent the crawling and indexing of certain parts | |
tn.gov.in: # of your site by web crawlers and spiders run by sites like Yahoo! | |
tn.gov.in: # and Google. By telling these "robots" where not to go on your site, | |
tn.gov.in: # you save bandwidth and server resources. | |
tn.gov.in: # | |
tn.gov.in: # This file will be ignored unless it is at the root of your host: | |
tn.gov.in: # Used: http://example.com/robots.txt | |
tn.gov.in: # Ignored: http://example.com/site/robots.txt | |
tn.gov.in: # | |
tn.gov.in: # For more information about the robots.txt standard, see: | |
tn.gov.in: # http://www.robotstxt.org/robotstxt.html | |
tn.gov.in: # CSS, JS, Images | |
tn.gov.in: # Directories | |
tn.gov.in: # Files | |
tn.gov.in: # Paths (clean URLs) | |
tn.gov.in: # Paths (no clean URLs) | |
techsmith.com: #09 May 2019 | |
techsmith.com: # Wrapped Pages | |
techsmith.com: #FTP and other Pages | |
nydailynews.com: # Googlebot | |
nydailynews.com: # MSN bot | |
nydailynews.com: # MSNbot media | |
nydailynews.com: # Yahoo bot | |
nydailynews.com: # Alexa IA Archiver bot | |
nydailynews.com: # MJ12bot | |
nydailynews.com: # Proximic bot | |
alnilin.com: # enabled this >> may slow down server | |
alnilin.com: #end of enabled this >> may slow down server | |
alnilin.com: # Google Image | |
alnilin.com: # Google AdSense | |
alnilin.com: # digg mirror | |
alnilin.com: # global | |
ranker.com: # Sitemap files | |
yandex.by: # yandex.by | |
1und1.de: # Sitemap | |
lapatilla.com: # Sitemap archive | |
ignou.ac.in: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $ | |
ignou.ac.in: # | |
ignou.ac.in: # robots.txt | |
ignou.ac.in: # | |
ignou.ac.in: # This file is to prevent the crawling and indexing of certain parts | |
ignou.ac.in: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ignou.ac.in: # and Google. By telling these "robots" where not to go on your site, | |
ignou.ac.in: # you save bandwidth and server resources. | |
ignou.ac.in: # | |
ignou.ac.in: # This file will be ignored unless it is at the root of your host: | |
ignou.ac.in: # Used: http://example.com/robots.txt | |
ignou.ac.in: # Ignored: http://example.com/site/robots.txt | |
ignou.ac.in: # | |
ignou.ac.in: # For more information about the robots.txt standard, see: | |
ignou.ac.in: # http://www.robotstxt.org/wc/robots.html | |
ignou.ac.in: # | |
ignou.ac.in: # For syntax checking, see: | |
ignou.ac.in: # http://www.sxw.org.uk/computing/robots/check.html | |
ignou.ac.in: # Directories | |
ignou.ac.in: # Files | |
ignou.ac.in: # Paths (clean URLs) | |
ignou.ac.in: # Paths (no clean URLs) | |
rarible.com: # https://www.robotstxt.org/robotstxt.html | |
healthcare.gov: # robots.txt for healthcare.gov | |
healthcare.gov: # Directories | |
healthcare.gov: # Directories | |
healthcare.gov: # Directories | |
healthcare.gov: # dynamic posts | |
qeqeqe.com: # | |
qeqeqe.com: # robots.txt for PHPCMS v9 | |
qeqeqe.com: # | |
up.gov.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
up.gov.in: #content{margin:0 0 0 2%;position:relative;} | |
chanel.com: #PRESSRELEASE | |
chanel.com: #MY-ACCOUNT | |
chanel.com: #CORPORATE | |
chanel.com: #SITEMAP | |
chanel.com: #NEW-WFJ | |
chanel.com: #ONE | |
chanel.com: #SRQ0140051 | |
chanel.com: #FF LATAM | |
nespresso.com: # Sitemap | |
nespresso.com: # Russia | |
nespresso.com: # Quickview | |
nespresso.com: # Mosaic + # Index_Ecapi | |
nespresso.com: # At work | |
nespresso.com: # Scrappers | |
nespresso.com: # Responsive | |
unza.zm: # | |
unza.zm: # robots.txt | |
unza.zm: # | |
unza.zm: # This file is to prevent the crawling and indexing of certain parts | |
unza.zm: # of your site by web crawlers and spiders run by sites like Yahoo! | |
unza.zm: # and Google. By telling these "robots" where not to go on your site, | |
unza.zm: # you save bandwidth and server resources. | |
unza.zm: # | |
unza.zm: # This file will be ignored unless it is at the root of your host: | |
unza.zm: # Used: http://example.com/robots.txt | |
unza.zm: # Ignored: http://example.com/site/robots.txt | |
unza.zm: # | |
unza.zm: # For more information about the robots.txt standard, see: | |
unza.zm: # http://www.robotstxt.org/robotstxt.html | |
unza.zm: # CSS, JS, Images | |
unza.zm: # Directories | |
unza.zm: # Files | |
unza.zm: # Paths (clean URLs) | |
unza.zm: # Paths (no clean URLs) | |
videvo.net: # Dissalow common language variations | |
logmeininc.com: # Sitemaps and Autodiscovers | |
pccomponentes.com: # | |
pccomponentes.com: ################################################################################################################################### | |
pccomponentes.com: # | |
pccomponentes.com: # Bienvenido a Robots.txt de PcComponentes :) | |
pccomponentes.com: # | |
pccomponentes.com: ################################################################################################################################### | |
pccomponentes.com: # _____ _ _ | |
pccomponentes.com: # / ____| | | | | | |
pccomponentes.com: # | | ___ _ __ ___| |_ _ __ _ _ _ _ ___ _ __ __| | ___ | |
pccomponentes.com: # | | / _ \| '_ \/ __| __| '__| | | | | | |/ _ \ '_ \ / _` |/ _ \ | |
pccomponentes.com: # | |___| (_) | | | \__ \ |_| | | |_| | |_| | __/ | | | (_| | (_) | | |
pccomponentes.com: # \_____\___/|_| |_|___/\__|_| \__,_|\__, |\___|_| |_|\__,_|\___/ | |
pccomponentes.com: # | | __/ | | |
pccomponentes.com: # __ _| | __ _ ___ |___/ | |
pccomponentes.com: # / _` | |/ _` |/ _ \ | |
pccomponentes.com: # | (_| | | (_| | (_) | | |
pccomponentes.com: # \__,_|_|\__, |\___/ _ __ | |
pccomponentes.com: # __/ | | | \ \ | |
pccomponentes.com: # __ _ _ |___/ _ _ __ __| | ___ (_) | | |
pccomponentes.com: # / _` | '__/ _` | '_ \ / _` |/ _ \ | | | |
pccomponentes.com: # | (_| | | | (_| | | | | (_| | __/ _| | | |
pccomponentes.com: # \__, |_| \__,_|_| |_|\__,_|\___| (_) | | |
pccomponentes.com: # __/ | /_/ | |
pccomponentes.com: # |___/ | |
pccomponentes.com: # | |
pccomponentes.com: ################################################################################################################################### | |
pccomponentes.com: ## GENERAL SETTINGS | |
pccomponentes.com: ## SITEMAPS | |
pccomponentes.com: # SITEMAP INDEX | |
pccomponentes.com: # SITEMAP PRODUCTS INDEX | |
pccomponentes.com: # SITEMAP PRODUCTS BY PARENT | |
pccomponentes.com: # SITEMAP CATEGORIES | |
pccomponentes.com: # SITEMAP AMP PRODUCTS INDEX | |
pccomponentes.com: # SITEMAP AMP PRODUCTS BY PARENT | |
pccomponentes.com: # SITEMAP BLOG | |
pccomponentes.com: ## PRIVATE URLS | |
pccomponentes.com: # Disallow: /login | |
pccomponentes.com: ## CORPORATIVE PAGES | |
pccomponentes.com: # STATICS | |
pccomponentes.com: ## CRAWL BUDGET OPTIMIZATION | |
pccomponentes.com: # 3+ FILTERS | |
pccomponentes.com: # QUERIES | |
pccomponentes.com: # AVAILABILITY UX | |
pccomponentes.com: # PRICE HISTORY | |
pccomponentes.com: # COMPLEMENTS | |
pccomponentes.com: #Disallow: /a/complements/* | |
pccomponentes.com: #RECOMMENDER | |
pccomponentes.com: # OLD IMAGE LOCATIONS | |
pccomponentes.com: # TECHNICAL RESOURCES | |
pccomponentes.com: # TECHNICAL ISSUES | |
pccomponentes.com: # REFURBISHED PRODUCTS - Commented on 12-14-2018 | |
pccomponentes.com: #Disallow: /*reacondicionado$ | |
pccomponentes.com: #Disallow: /*/reacondicionado$ | |
pccomponentes.com: #Disallow: /*reacondicionado* | |
pccomponentes.com: #Allow: /portatiles/*reacondicionado | |
pccomponentes.com: #Disallow: /*refurbished* | |
pccomponentes.com: #Disallow: /*-cpo-libre$ | |
pccomponentes.com: #Disallow: /*-recertified$ | |
pccomponentes.com: #Disallow: /rastrillo/* | |
pccomponentes.com: # DOCS, PDF AND MEDIA | |
pccomponentes.com: # USELESS RESOURCES | |
pccomponentes.com: # GROUNDWORK | |
pccomponentes.com: #Disallow: /ofertas/* | |
pccomponentes.com: # TRACKING CODES | |
pccomponentes.com: # BRANDED PAGES AS DUPLICATES | |
pccomponentes.com: ## SPECIFIC BOTS SETTINGS | |
nation.africa: # robots.txt for https://nation.africa/ -- Nation | |
ask.fm: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
ask.fm: # | |
ask.fm: # To ban all spiders from the entire site uncomment the next two lines: | |
ask.fm: # User-agent: * | |
ask.fm: # Disallow: / | |
sears.com: # 2020_04_22-B | |
sears.com: # Sears SEO Team | |
sears.com: # www.sears.com | |
sears.com: #Disallow: *sid=IDx20141117x00001xlpla* | |
sears.com: #Lumen #17857110 | |
sears.com: #Legal #04012019 | |
sears.com: #Lumen #18359173 | |
sears.com: # Category | |
sears.com: # Product | |
sears.com: # Misc | |
sears.com: # Marketplace Sellers | |
sears.com: # Brands Extended | |
sears.com: # Images | |
sears.com: #Sitemap: http://www.sears.com/Sitemap_Index_Image_1.xml | |
sears.com: #Sitemap: http://www.sears.com/Sitemap_Index_Image_MP_1.xml | |
smugmug.com: # If you're reading this, you belong at a job you love: https://www.smugmug.com/jobs/ | |
smugmug.com: # See https://secure.smugmug.com/help/contact if you'd like to apply to be whitelisted for crawling SmugMug | |
seneweb.com: # | |
seneweb.com: # Begin Standard Rules | |
seneweb.com: # | |
seneweb.com: # Allow Adsense | |
seneweb.com: # | |
seneweb.com: # | |
seneweb.com: # Sitemap Files | |
google.com.do: # AdsBot | |
google.com.do: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
axios.com: # 8fkux4mqab196bbs | |
bancadigitalbod.com: #robots.txt for all our sites | |
eldorado.ru: #2020-12-29 | |
appcast.io: # https://protect-de.mimecast.com/s/2FukCpZ4L4TvloxYcPCF4b?domain=robotstxt.org | |
appcast.io: # Allow crawling of all content | |
sodapdf.com: # This file can be used to affect how search engines and other web site crawlers see your site. | |
sodapdf.com: # For more information, please see http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1 | |
sodapdf.com: # WebMatrix 2.0 | |
sodapdf.com: # | |
sodapdf.com: # | |
sodapdf.com: # production server: sodapdf.com | |
sodapdf.com: # Last modified on: 2014-01-27 | |
sodapdf.com: # | |
sodapdf.com: # | |
sodapdf.com: # | |
sodapdf.com: # homesite resources: | |
sodapdf.com: # | |
sodapdf.com: # Disallow: /*/join | |
sodapdf.com: # Disallow: /join | |
sodapdf.com: # | |
sodapdf.com: # | |
sodapdf.com: # external resources: | |
sodapdf.com: # | |
jimdo.com: # en | |
jimdo.com: # de | |
jimdo.com: # es | |
jimdo.com: # fr | |
jimdo.com: # it | |
jimdo.com: # jp | |
jimdo.com: # nl | |
win-rar.com: # sitemap | |
blog.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
blog.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
blog.wordpress.com: # This file was generated on Thu, 28 Jan 2021 13:30:28 +0000 | |
finn.no: # Notice: Crawling FINN.no is prohibited unless you have written permission. | |
finn.no: # See the terms and conditions in the footer: | |
finn.no: # Innholdet er beskyttet etter åndsverksloven. Bruk av automatiserte tjenester (roboter, spidere, indeksering m.m.) | |
finn.no: # samt andre fremgangsmåter for systematisk eller regelmessig bruk er ikke tillatt uten eksplisitt samtykke fra FINN.no. | |
finn.no: # | |
finn.no: # Outdated CMS articles. | |
finn.no: # Don't index searches for old gallery urls | |
finn.no: # FAS shortcut links: | |
finn.no: # If googlebot respects that these 301 to a disallowed page, we can remove these. | |
finn.no: # 50k weekly. Many google referrers | |
finn.no: # 20k weekly. Many google referrers | |
finn.no: # 4k weekly. Many google referrers | |
finn.no: # Deprecated, and should be removed | |
finn.no: # If googlebot respects that these 301 to a disallowed page, we can remove these. | |
finn.no: # 8k weekly | |
finn.no: # 363 weekly | |
finn.no: # 5k weekly | |
finn.no: # 324 weekly | |
finn.no: # motor rules | |
finn.no: # disallow all car, nyttekj√∏rety, mc and boat FAS and ad pages: | |
finn.no: # bil vertical | |
finn.no: # ...but allow the landing pages. Pages with parameters are blocked with the meta tag | |
finn.no: # nyttekj√∏ret√∏y | |
finn.no: # vanused and vanimport already handled in bil | |
finn.no: # mc | |
finn.no: # båt | |
finn.no: # Eiendom | |
finn.no: # Bolig til leie - result page. Filtered search is blocked by meta tag | |
finn.no: # disallow all realestate FAS and ad pages: | |
finn.no: # BAP | |
finn.no: # Indexable, with some exceptions, see end of file | |
finn.no: # JOB | |
finn.no: # Indexable, with some exceptions, see end of file | |
finn.no: # PAL (PArtnerL√∏sning) | |
finn.no: # disallow ad and search pages | |
finn.no: # REISE | |
finn.no: # disallow flight search results | |
finn.no: # Personal Finance | |
finn.no: # iAd - these are outdated | |
finn.no: # Shopping was discontinued 2019. | |
finn.no: # Misc | |
finn.no: # Exceptions: | |
finn.no: # Twitterbot is allowed for Twitter Cards to work | |
webteb.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
webteb.com: #content{margin:0 0 0 2%;position:relative;} | |
jarir.com: # # Robots.txt for Magento Community and Enterprise | |
jarir.com: # # GENERAL SETTINGS | |
jarir.com: # Enables robots.txt rules for all crawlers | |
jarir.com: # # Crawl-delay parameter: the number of seconds you want to wait between successful requests to the same server. | |
jarir.com: # # Magento sitemap: URL to your sitemap file in Magento | |
jarir.com: # # Settings that relate to the UNDER CONSTRUCTION | |
jarir.com: # # GENERAL SETTINGS For MAGENTO | |
jarir.com: # # General technical Magento directory | |
jarir.com: # # Do not index the shared files Magento | |
jarir.com: # # MAGENTO SEA IMPROVEMENT | |
jarir.com: # # Do not index the page subcategories that are sorted or filtered. | |
jarir.com: # # Do not index the second copy of the home page (example.com / index.php /). Un-comment only if you have activated Magento SEO URLs. | |
jarir.com: # # Disallow: /index.php/ | |
jarir.com: # # Do not index the link from the session ID | |
jarir.com: # # Do not index the page checkout and user account | |
jarir.com: # # Server Settings | |
jarir.com: # # Do not index the general technical directories and files on a server | |
jarir.com: # stop not needed bots from site crawl | |
jarir.com: # TurnitinBot/1.5 (http://www.turnitin.com/robot/crawlerinfo.html) | |
jarir.com: # NPBot-1/2.0 (http://www.nameprotect.com/botinfo.html) | |
jarir.com: # sitecheck.internetseer.com (For more info see: http://sitecheck.internetseer.com) | |
jarir.com: # Rumours-Agent | |
jarir.com: # larbin_2.6.2 (larbin2.6.2@unspecified.mail) | |
jarir.com: # http://www.almaden.ibm.com/cs/crawler [wf160] | |
jarir.com: # http://www.almaden.ibm.com/cs/crawler [c01] | |
jarir.com: # Teleport Pro (http://www.tenmax.com/teleport/pro/home.htm) | |
jarir.com: # WebCopier | |
jarir.com: # WebStripper (http://www.webstripper.net/) | |
jarir.com: # MSIECrawler - added 12/02/04 | |
jarir.com: # Openbot (http://www.openfind.com.tw/robot.html) - added 12/02/04 | |
jarir.com: # WebZIP (http://www.spidersoft.com/webzip/default.asp) - added 08/03/04 | |
jarir.com: # QuepasaCreep - added 28/06/04 | |
jarir.com: # WebReaper (http://www.webreaper.net/) - added 20/07/04 | |
jarir.com: # SuperBot (aka Website Downloader ?) - added 02/08/04 | |
jarir.com: # wget - added 16/08/04 | |
jarir.com: # Web Downloader - added 29/10/04 | |
jarir.com: # ShopWiki bot - added 18/08/06 | |
jarir.com: # MSRBOT - added 29/09/06 | |
jarir.com: # DugMirror - added 07/01/07 | |
jarir.com: # Twiceler-0.9 http://www.cuill.com/twiceler/robot.html - added 31/07/07 | |
jarir.com: # pagenest.com - added 01/06/08 | |
jarir.com: # dotbot - added 29/07/08 | |
jarir.com: # discobot - added 26/12/08 | |
jarir.com: # SimilarPages Nutch Crawler | |
jarir.com: # added 07/06/09 | |
jarir.com: # added 17/01/10 | |
jarir.com: # added 29/06/10 | |
jarir.com: # added 26/11/10 | |
jarir.com: # added 12/12/10 | |
jarir.com: # added 13/02/11 | |
jarir.com: # added 15/04/11 | |
jarir.com: # added 02/06/11 | |
jarir.com: # added 09/06/11 | |
jarir.com: # added 10/07/11 | |
jarir.com: # added 28/07/11 | |
jarir.com: # added 09/08/11 | |
jarir.com: # added 24/09/11 | |
jarir.com: # added 05/10/11 | |
jarir.com: # added 08/10/11 | |
jarir.com: # added 18/12/11 | |
jarir.com: # added 15/01/12 | |
jarir.com: # added 26/01/12 | |
jarir.com: # added 30/01/12 | |
jarir.com: # added 01/03/12 | |
jarir.com: # added 09/06/12 | |
jarir.com: # added 10/06/12 | |
jarir.com: # added 30/07/12 | |
jarir.com: # added 28/08/12 | |
jarir.com: # added 05/10/12 | |
jarir.com: # added 21/11/12 | |
jarir.com: # added 11/02/13 | |
jarir.com: # added 22/02/13 | |
jarir.com: # added 26/06/13 | |
jarir.com: # added 17/07/13 | |
jarir.com: # added 24/11/13 | |
jarir.com: # added 11/12/13 | |
jarir.com: # added 25/01/14 | |
jarir.com: # added 11/02/14 | |
jarir.com: # added 26/03/14 | |
jarir.com: # added 27/03/14 | |
jarir.com: # added 29/03/14 | |
jarir.com: # added 06/04/14 | |
jarir.com: # added 11/04/14 | |
jarir.com: # added 05/10/14 | |
jarir.com: # added 08/09/15 | |
jarir.com: # added 10/12/15 | |
jarir.com: # added 22/03/16 | |
jarir.com: # added 21/05/16 | |
jarir.com: #Block YandexBots 07/08/19 | |
jarir.com: # Block SemrushBot 07/08/19 | |
allstate.com: # Disallow: /blog/wp-admin/ | |
allstate.com: # Disallow: /blog/wp-includes/ | |
work.ua: # robots.txt | |
balenciaga.com: # Pages | |
balenciaga.com: # Product | |
balenciaga.com: # PLP | |
balenciaga.com: #sitemaps | |
bom.gov.au: ################################# | |
bom.gov.au: # robots.txt for www.bom.gov.au # | |
bom.gov.au: ################################# | |
bom.gov.au: ##################### | |
bom.gov.au: # Rules for Radian6 # | |
bom.gov.au: ##################### | |
bom.gov.au: ######################## | |
bom.gov.au: # Rules for all robots # | |
bom.gov.au: # except Googlebot # | |
bom.gov.au: ######################## | |
bom.gov.au: ###################### | |
bom.gov.au: # Removed 24/02/2016 # | |
bom.gov.au: ###################### | |
bom.gov.au: # Disallow: /clim_data/ | |
bom.gov.au: # Disallow: /climate/annual_sum/ | |
bom.gov.au: ###################### | |
bom.gov.au: # Added azv 2017/12/13 | |
bom.gov.au: # Disallow: /water/newEvents/document/ | |
bom.gov.au: # Disallow: /water/designRainfalls/document/ | |
bom.gov.au: # Disallow: /cyclone/history/pdf/ | |
bom.gov.au: ####################### | |
bom.gov.au: # Scripts and styling # | |
bom.gov.au: ####################### | |
bom.gov.au: ################################ | |
bom.gov.au: # Document directories # | |
bom.gov.au: # REMOVED 24/09/2020 ASK226937 # | |
bom.gov.au: ################################ | |
bom.gov.au: # Disallow: /docs/ | |
bom.gov.au: # Disallow: /Docs/ | |
bom.gov.au: # Disallow: /document/ | |
bom.gov.au: # Disallow: /DOCUMENT/ | |
bom.gov.au: # Disallow: /documents/ | |
bom.gov.au: # Disallow: /DOCUMENTS/ | |
bom.gov.au: # Disallow: /pdf/ | |
bom.gov.au: # Disallow: /PDF/ | |
bom.gov.au: # Disallow: /pdfs/ | |
bom.gov.au: ###################### | |
bom.gov.au: # Removed 24/02/2016 # | |
bom.gov.au: ###################### | |
bom.gov.au: # Disallow: /clim_data/ | |
bom.gov.au: # Disallow: /climate/averages/climatology/relhum/ | |
bom.gov.au: # Disallow: /climate/averages/climatology/sunshine_hours/ | |
bom.gov.au: # Disallow: /climate/averages/climatology/windroses/ | |
bom.gov.au: # Disallow: /climate/averages/wind/ | |
bom.gov.au: # Disallow: /climate/change/ | |
bom.gov.au: # Disallow: /climate/enso/archive/ | |
bom.gov.au: # Disallow: /climate/extremes/ # Removed 24/02/2016 stefanw | |
bom.gov.au: # Disallow: /climate/forms/map_forms/ | |
bom.gov.au: # Disallow: /climate/map/anual_rainfall/ | |
bom.gov.au: # Disallow: /climate/map/graphs/monthly_rain/idl_graphs/ | |
bom.gov.au: # Disallow: /climate/map/pics/ | |
bom.gov.au: # Disallow: /climate/pccsp/ | |
bom.gov.au: ###################### | |
bom.gov.au: ####################### | |
bom.gov.au: # Rules for Googlebot # | |
bom.gov.au: ####################### | |
bom.gov.au: ########################### | |
bom.gov.au: # Index State based pages # | |
bom.gov.au: ########################### | |
bom.gov.au: ############################### | |
bom.gov.au: # Don't index these filetypes # | |
bom.gov.au: ############################### | |
bom.gov.au: # Disallow: /*.pdf$ | |
bom.gov.au: #Disallow: /*.txt$ | |
bom.gov.au: # Added azv 2017/12/13 | |
bom.gov.au: ####################### | |
bom.gov.au: # Scripts and styling # | |
bom.gov.au: ####################### | |
bom.gov.au: ################################ | |
bom.gov.au: # Document directories # | |
bom.gov.au: # REMOVED 24/09/2020 ASK226937 # | |
bom.gov.au: ################################ | |
bom.gov.au: # Disallow: /docs/ | |
bom.gov.au: # Disallow: /Docs/ | |
bom.gov.au: # Disallow: /document/ | |
bom.gov.au: # Disallow: /DOCUMENT/ | |
bom.gov.au: # Disallow: /documents/ | |
bom.gov.au: # Disallow: /DOCUMENTS/ | |
bom.gov.au: # Disallow: /pdf/ | |
bom.gov.au: # Disallow: /PDF/ | |
bom.gov.au: # Disallow: /pdfs/ | |
bom.gov.au: # Disallow: /amm/ # Removed 24/02/2016 | |
bom.gov.au: # Disallow: /careers/docs/ | |
bom.gov.au: ###################### | |
bom.gov.au: # Removed 24/02/2016 # | |
bom.gov.au: ###################### | |
bom.gov.au: # Disallow: /climate/averages/climatology/relhum/ | |
bom.gov.au: # Disallow: /climate/averages/climatology/sunshine_hours/ | |
bom.gov.au: # Disallow: /climate/averages/climatology/windroses/ | |
bom.gov.au: # Disallow: /climate/averages/wind/ | |
bom.gov.au: # Disallow: /climate/change/ | |
bom.gov.au: # Disallow: /climate/enso/archive/ | |
bom.gov.au: # Disallow: /climate/extremes/ | |
bom.gov.au: # Disallow: /climate/forms/map_forms/ | |
bom.gov.au: # Disallow: /climate/map/anual_rainfall/ | |
bom.gov.au: # Disallow: /climate/map/graphs/monthly_rain/idl_graphs/ | |
bom.gov.au: # Disallow: /climate/map/pics/ | |
bom.gov.au: # Disallow: /climate/pccsp/ | |
bom.gov.au: ############################## | |
bom.gov.au: ############################### | |
bom.gov.au: # Allowed Wildcard Exceptions # | |
bom.gov.au: ############################### | |
google.co.nz: # AdsBot | |
google.co.nz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
jiji.ng: #─███─███─█───█ | |
jiji.ng: #─█───█───█───█ | |
jiji.ng: #─███─███─█───█ | |
jiji.ng: #───█─█───█───█ | |
jiji.ng: #─███─███─███─███ | |
jiji.ng: #─███─███ | |
jiji.ng: #──█───█ | |
jiji.ng: #──█───█ | |
jiji.ng: #──█───█ | |
jiji.ng: #─███──█ | |
jiji.ng: #───██─███───██─███ | |
jiji.ng: #────█──█─────█──█ | |
jiji.ng: #────█──█─────█──█ | |
jiji.ng: #─█──█──█──█──█──█ | |
jiji.ng: #─████─███─████─███ | |
jiji.ng: #─███─███ | |
jiji.ng: #──█───█ | |
jiji.ng: #──█───█ | |
jiji.ng: #──█───█ | |
jiji.ng: #─███──█ | |
acs.org: # robots.txt for http://www.acs.org/ | |
ekstrabladet.dk: # robots.txt, ekstrabladet.dk | |
ekstrabladet.dk: # adsense | |
isbank.com.tr: # sitemaps | |
ladilsa.com: # ladilsa.com (Fri Oct 27 14:25:58 2017) | |
brobible.com: # Sitemap archive | |
zoho.in: # ------------------------------------------ | |
zoho.in: # ZOHO Corp. -- http://www.zoho.com | |
zoho.in: # Robot Exclusion File -- robots.txt | |
zoho.in: # Author: Zoho Creative | |
zoho.in: # Last Updated: 24/12/2020 | |
zoho.in: # ------------------------------------------ | |
zoho.in: # unwanted list taken from zoho search list | |
zoho.in: # unwanted list taken from zoho search list | |
zoho.in: # unwanted list taken from zoho search for zoholics | |
zoho.in: # unwanted list taken from zoho search for zoho | |
newsmax.com: # Disallow: /archives/ | |
computerhope.com: # robots.txt file for https://www.computerhope.com | |
computerhope.com: # Send comments about this file to <URL:https://www.computerhope.com/contact/> | |
computerhope.com: # Disobeying the rules of the robots.txt will cause your IP to be banned. | |
computerhope.com: # Last updated: 5/23/2017 | |
computerhope.com: #Internet Archive doesn't need to archive cgi-bin | |
computerhope.com: #Other bots not allowed | |
rpp.pe: ######################### | |
rpp.pe: ## ## | |
rpp.pe: ## Grupo RPP ## | |
rpp.pe: ## Sitio: rpp.pe ## | |
rpp.pe: ## robots.txt ## | |
rpp.pe: ## ## | |
rpp.pe: ######################### | |
rpp.pe: # no entry | |
rpp.pe: # | |
rpp.pe: # Guia | |
rpp.pe: # | |
vip.wordpress.com: # Sitemap archive | |
google.com.ng: # AdsBot | |
google.com.ng: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
usda.gov: # | |
usda.gov: # robots.txt | |
usda.gov: # | |
usda.gov: # This file is to prevent the crawling and indexing of certain parts | |
usda.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
usda.gov: # and Google. By telling these "robots" where not to go on your site, | |
usda.gov: # you save bandwidth and server resources. | |
usda.gov: # | |
usda.gov: # This file will be ignored unless it is at the root of your host: | |
usda.gov: # Used: http://example.com/robots.txt | |
usda.gov: # Ignored: http://example.com/site/robots.txt | |
usda.gov: # | |
usda.gov: # For more information about the robots.txt standard, see: | |
usda.gov: # http://www.robotstxt.org/robotstxt.html | |
usda.gov: # CSS, JS, Images | |
usda.gov: # Directories | |
usda.gov: # Files | |
usda.gov: # Paths (clean URLs) | |
usda.gov: # Paths (no clean URLs) | |
dubizzle.com: # Block undesirable pages | |
dubizzle.com: # Rules for adsense bot | |
dubizzle.com: # Block crawling software | |
msnbc.com: # Directories | |
msnbc.com: # Files | |
msnbc.com: # Paths (clean URLs) | |
msnbc.com: # Paths (no clean URLs) | |
msnbc.com: # DFP Ad slot urls | |
burberry.com: # robots.txt for https://www.burberry.com | |
google.com.bd: # AdsBot | |
google.com.bd: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
the-sun.com: #Archive Sitemaps | |
the-sun.com: # Sitemap archive | |
banamex.com: # robots.txt file for https://www.banamex.com | |
banamex.com: #.htm /.html | |
banamex.com: #PDFs | |
banamex.com: #Old content in SWF | |
banamex.com: #Sitemap files | |
forums.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
forums.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
forums.wordpress.com: # This file was generated on Thu, 19 Mar 2020 19:23:28 +0000 | |
kathimerini.gr: # Admin Pages | |
kathimerini.gr: # Allow Those | |
kathimerini.gr: # Ads | |
nickis.com: # www.robotstxt.org/ | |
nickis.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
elcomercio.pe: # | |
elcomercio.pe: # la mayoria de veces causa problemas | |
elcomercio.pe: # | |
20min.ch: # | |
20min.ch: # robots.txt www.20min.ch | |
20min.ch: # | |
20min.ch: # V2.0.0, 04.08.2020 | |
20min.ch: # | |
pantheon.io: # | |
pantheon.io: # robots.txt | |
pantheon.io: # | |
pantheon.io: # This file is to prevent the crawling and indexing of certain parts | |
pantheon.io: # of your site by web crawlers and spiders run by sites like Yahoo! | |
pantheon.io: # and Google. By telling these "robots" where not to go on your site, | |
pantheon.io: # you save bandwidth and server resources. | |
pantheon.io: # | |
pantheon.io: # This file will be ignored unless it is at the root of your host: | |
pantheon.io: # Used: http://example.com/robots.txt | |
pantheon.io: # Ignored: http://example.com/site/robots.txt | |
pantheon.io: # | |
pantheon.io: # For more information about the robots.txt standard, see: | |
pantheon.io: # http://www.robotstxt.org/robotstxt.html | |
pantheon.io: # CSS, JS, Images | |
pantheon.io: # Directories | |
pantheon.io: # Files | |
pantheon.io: # Paths (clean URLs) | |
pantheon.io: # Paths (no clean URLs) | |
pantheon.io: # Resource Confirmation Paths | |
pantheon.io: # Miscelaneous Paths | |
yale.edu: # | |
yale.edu: # robots.txt | |
yale.edu: # | |
yale.edu: # This file is to prevent the crawling and indexing of certain parts | |
yale.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
yale.edu: # and Google. By telling these "robots" where not to go on your site, | |
yale.edu: # you save bandwidth and server resources. | |
yale.edu: # | |
yale.edu: # This file will be ignored unless it is at the root of your host: | |
yale.edu: # Used: http://example.com/robots.txt | |
yale.edu: # Ignored: http://example.com/site/robots.txt | |
yale.edu: # | |
yale.edu: # For more information about the robots.txt standard, see: | |
yale.edu: # http://www.robotstxt.org/robotstxt.html | |
yale.edu: # CSS, JS, Images | |
yale.edu: # Directories | |
yale.edu: # Files | |
yale.edu: # Paths (clean URLs) | |
yale.edu: # Paths (no clean URLs) | |
yale.edu: # Biblio module rules to prevent recursive searches by bots. | |
upmusics.com: # Google Image | |
google.com.ec: # AdsBot | |
google.com.ec: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
peardeck.com: # Squarespace Robots Txt | |
willhaben.at: # It is expressively forbidden to use spiders, search robots or other automatic methods | |
willhaben.at: # to access willhaben.at. Only if willhaben.at has given such access is allowed. | |
paytm.com: # robotstxt.org | |
statefarm.com: # Disallow | |
statefarm.com: # Sitemaps | |
capital.gr: # robots.txt for https://www.capital.gr | |
animoto.com: # /projects requires you to be logged in | |
manhuagui.com: # robots.txt generated at http://www.mcanerin.com | |
kemenag.go.id: #User-agent: * | |
kemenag.go.id: #Disallow: / | |
kemenag.go.id: # Group 1 | |
kemenag.go.id: # Group 2 | |
kemenag.go.id: # Group 3 | |
whimsical.com: # www.robotstxt.org/ | |
whimsical.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
mercadolibre.com.pe: #siteId: MPE | |
mercadolibre.com.pe: #country: peru | |
mercadolibre.com.pe: ##Block - Referidos | |
mercadolibre.com.pe: ##Block - siteinfo urls | |
mercadolibre.com.pe: ##Block - Cart | |
mercadolibre.com.pe: ##Block Checkout | |
mercadolibre.com.pe: ##Block - User Logged | |
mercadolibre.com.pe: #Shipping selector | |
mercadolibre.com.pe: ##Block - last search | |
mercadolibre.com.pe: ## Block - Profile - By Id | |
mercadolibre.com.pe: ## Block - Profile - By Id and role (old version) | |
mercadolibre.com.pe: ## Block - Profile - Leg. Req. | |
mercadolibre.com.pe: ##Block - noindex | |
mercadolibre.com.pe: # Mercado-Puntos | |
mercadolibre.com.pe: # Viejo mundo | |
mercadolibre.com.pe: ##Block recommendations listing | |
justice.gov: # | |
justice.gov: # robots.txt | |
justice.gov: # | |
justice.gov: # This file is to prevent the crawling and indexing of certain parts | |
justice.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
justice.gov: # and Google. By telling these "robots" where not to go on your site, | |
justice.gov: # you save bandwidth and server resources. | |
justice.gov: # | |
justice.gov: # This file will be ignored unless it is at the root of your host: | |
justice.gov: # Used: http://example.com/robots.txt | |
justice.gov: # Ignored: http://example.com/site/robots.txt | |
justice.gov: # | |
justice.gov: # For more information about the robots.txt standard, see: | |
justice.gov: # http://www.robotstxt.org/robotstxt.html | |
justice.gov: # | |
justice.gov: # For syntax checking, see: | |
justice.gov: # http://www.frobee.com/robots-txt-check | |
justice.gov: # Directories | |
justice.gov: # Files | |
justice.gov: # Paths (clean URLs) | |
justice.gov: # Paths (no clean URLs) | |
justice.gov: # Paths from legacy origin | |
car.gr: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
car.gr: # | |
car.gr: # To ban all spiders from the entire site uncomment the next two lines: | |
car.gr: # User-Agent: * | |
car.gr: # Disallow: / | |
greenhouse.io: # robots.txt for https://www.greenhouse.io/ | |
greenhouse.io: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
brainyquote.com: # -------------------------------------------------------------------------------------- | |
brainyquote.com: # Using bots or scrapers? | |
brainyquote.com: # Please read the 'BrainyQuote Terms Of Service' | |
brainyquote.com: # (specifically the 'License to Access and Use' section) at: | |
brainyquote.com: # https://www.brainyquote.com/about/terms | |
brainyquote.com: # https://www.brainyquote.com/es/sobre-nosotros/t%C3%A9rminos-de-servicio | |
brainyquote.com: # https://www.brainyquote.com/fr/%C3%A0-propos/conditions-d-utilisation | |
brainyquote.com: # -------------------------------------------------------------------------------------- | |
getrevue.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
getrevue.co: # | |
getrevue.co: # To ban all spiders from the entire site uncomment the next two lines: | |
getrevue.co: # User-agent: * | |
getrevue.co: # Disallow: / | |
alaraby.co.uk: # | |
alaraby.co.uk: # robots.txt | |
alaraby.co.uk: # | |
alaraby.co.uk: # This file is to prevent the crawling and indexing of certain parts | |
alaraby.co.uk: # of your site by web crawlers and spiders run by sites like Yahoo! | |
alaraby.co.uk: # and Google. By telling these "robots" where not to go on your site, | |
alaraby.co.uk: # you save bandwidth and server resources. | |
alaraby.co.uk: # | |
alaraby.co.uk: # This file will be ignored unless it is at the root of your host: | |
alaraby.co.uk: # Used: http://example.com/robots.txt | |
alaraby.co.uk: # Ignored: http://example.com/site/robots.txt | |
alaraby.co.uk: # | |
alaraby.co.uk: # For more information about the robots.txt standard, see: | |
alaraby.co.uk: # http://www.robotstxt.org/robotstxt.html | |
alaraby.co.uk: # CSS, JS, Images | |
alaraby.co.uk: # Directories | |
alaraby.co.uk: # Files | |
alaraby.co.uk: # Paths (clean URLs) | |
alaraby.co.uk: # Paths (no clean URLs) | |
alaraby.co.uk: # search | |
centurylink.com: #SITEMAP | |
unity3d.com: # | |
unity3d.com: # robots.txt | |
unity3d.com: # | |
unity3d.com: # This file is to prevent the crawling and indexing of certain parts | |
unity3d.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
unity3d.com: # and Google. By telling these "robots" where not to go on your site, | |
unity3d.com: # you save bandwidth and server resources. | |
unity3d.com: # | |
unity3d.com: # This file will be ignored unless it is at the root of your host: | |
unity3d.com: # Used: http://example.com/robots.txt | |
unity3d.com: # Ignored: http://example.com/site/robots.txt | |
unity3d.com: # | |
unity3d.com: # For more information about the robots.txt standard, see: | |
unity3d.com: # http://www.robotstxt.org/robotstxt.html | |
unity3d.com: # CSS, JS, Images | |
unity3d.com: # Directories | |
unity3d.com: # Files | |
unity3d.com: # Paths (clean URLs) | |
unity3d.com: # Paths (no clean URLs) | |
unity3d.com: # Chinese Search Engines | |
stepstone.de: # | |
stepstone.de: # Any other use of robots or failure to obey the robots exclusion standards | |
stepstone.de: # set forth at <http://www.robotstxt.org/wc/exclusion.html> is strictly | |
stepstone.de: # prohibited. | |
stepstone.de: # | |
stepstone.de: # StepStone | |
therecipecritic.com: # Sitemap | |
eyny.com: # Disallow: /search.php | |
eyny.com: # Disallow: /*mobile=yes* | |
zbj.com: #update:2020-02-10 | |
careers360.com: # | |
careers360.com: # robots.txt | |
careers360.com: # | |
careers360.com: # This file is to prevent the crawling and indexing of certain parts | |
careers360.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
careers360.com: # and Google. By telling these "robots" where not to go on your site, | |
careers360.com: # you save bandwidth and server resources. | |
careers360.com: # | |
careers360.com: # This file will be ignored unless it is at the root of your host: | |
careers360.com: # Used: http://example.com/robots.txt | |
careers360.com: # Ignored: http://example.com/site/robots.txt | |
careers360.com: # | |
careers360.com: # For more information about the robots.txt standard, see: | |
careers360.com: # http://www.robotstxt.org/robotstxt.html | |
careers360.com: # | |
careers360.com: # For syntax checking, see: | |
careers360.com: # http://www.frobee.com/robots-txt-check | |
careers360.com: # Directories | |
careers360.com: # Files | |
careers360.com: # Paths (clean URLs) | |
careers360.com: # Paths (no clean URLs) | |
indiatoday.in: # Disallow directive | |
indiatoday.in: # Directories | |
indiatoday.in: # Files | |
indiatoday.in: # Paths (clean URLs) | |
indiatoday.in: # Paths (no clean URLs) | |
commentcamarche.net: # https://commentcamarche.net | |
michigan.gov: # robots.txt for https://www.michigan.gov | |
michigan.gov: # SOM - WEX IBM Watson Explorer: VSE/1.0 (thompsonj@michigan.gov) | |
michigan.gov: # Yahoo! | |
michigan.gov: # MSN | |
michigan.gov: # BingBot | |
michigan.gov: # 80legs Crawler | |
michigan.gov: # Yandex | |
michigan.gov: # discoveryengine.com | |
michigan.gov: # Ahrefs.com | |
michigan.gov: # kalooga.com | |
michigan.gov: # blekko.com | |
michigan.gov: # changedetection.com | |
michigan.gov: # paper.li | |
michigan.gov: # Google | |
lankasri.com: # Disallow: /*? This is match ? anywhere in the URL | |
csc.gov.in: #container { | |
weightwatchers.com: # | |
weightwatchers.com: # robots.txt | |
weightwatchers.com: # | |
weightwatchers.com: # This file is to prevent the crawling and indexing of certain parts | |
weightwatchers.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
weightwatchers.com: # and Google. By telling these "robots" where not to go on your site, | |
weightwatchers.com: # you save bandwidth and server resources. | |
weightwatchers.com: # | |
weightwatchers.com: # This file will be ignored unless it is at the root of your host: | |
weightwatchers.com: # Used: http://example.com/robots.txt | |
weightwatchers.com: # Ignored: http://example.com/site/robots.txt | |
weightwatchers.com: # | |
weightwatchers.com: # For more information about the robots.txt standard, see: | |
weightwatchers.com: # http://www.robotstxt.org/robotstxt.html | |
weightwatchers.com: # Directories | |
weightwatchers.com: # Files | |
weightwatchers.com: # Paths (clean URLs) | |
weightwatchers.com: # Paths (no clean URLs) | |
weightwatchers.com: # Checkout | |
weightwatchers.com: #GPC | |
weightwatchers.com: # Sitemap | |
thrillist.com: # www.robotstxt.org/ | |
thrillist.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
thrillist.com: # Directories | |
cibercuba.com: # Directories | |
cibercuba.com: # Files | |
cibercuba.com: # Paths (clean URLs) | |
cibercuba.com: # Paths (no clean URLs) | |
unsw.edu.au: # | |
unsw.edu.au: # robots.txt | |
unsw.edu.au: # | |
unsw.edu.au: # This file is to prevent the crawling and indexing of certain parts | |
unsw.edu.au: # of your site by web crawlers and spiders run by sites like Yahoo! | |
unsw.edu.au: # and Google. By telling these "robots" where not to go on your site, | |
unsw.edu.au: # you save bandwidth and server resources. | |
unsw.edu.au: # | |
unsw.edu.au: # This file will be ignored unless it is at the root of your host: | |
unsw.edu.au: # Used: http://example.com/robots.txt | |
unsw.edu.au: # Ignored: http://example.com/site/robots.txt | |
unsw.edu.au: # | |
unsw.edu.au: # For more information about the robots.txt standard, see: | |
unsw.edu.au: # http://www.robotstxt.org/robotstxt.html | |
unsw.edu.au: # CSS, JS, Images | |
unsw.edu.au: # Directories | |
unsw.edu.au: # Files | |
unsw.edu.au: # Paths (clean URLs) | |
unsw.edu.au: # Paths (no clean URLs) | |
nrk.no: # Served by akamai | |
stuff.co.nz: # robots for http://www.stuff.co.nz | |
stuff.co.nz: #Disallowed paths | |
stuff.co.nz: # Site Scrapers and bots that are not desirable | |
matterport.com: # | |
matterport.com: # robots.txt | |
matterport.com: # | |
matterport.com: # This file is to prevent the crawling and indexing of certain parts | |
matterport.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
matterport.com: # and Google. By telling these "robots" where not to go on your site, | |
matterport.com: # you save bandwidth and server resources. | |
matterport.com: # | |
matterport.com: # This file will be ignored unless it is at the root of your host: | |
matterport.com: # Used: http://example.com/robots.txt | |
matterport.com: # Ignored: http://example.com/site/robots.txt | |
matterport.com: # | |
matterport.com: # For more information about the robots.txt standard, see: | |
matterport.com: # http://www.robotstxt.org/robotstxt.html | |
matterport.com: # CSS, JS, Images | |
matterport.com: # Directories | |
matterport.com: # Files | |
matterport.com: # Paths (clean URLs) | |
matterport.com: # Paths (no clean URLs) | |
domain.com.au: ### BEGIN FILE ### | |
domain.com.au: # | |
domain.com.au: # allow access to off-market landing page, and NOT individual off-market property pages | |
domain.com.au: # early-access was re-branded to off-market, but we currently support both | |
domain.com.au: # Block dugg mirror | |
domain.com.au: # Block trovit bot | |
netacad.com: # | |
netacad.com: # robots.txt | |
netacad.com: # | |
netacad.com: # This file is to prevent the crawling and indexing of certain parts | |
netacad.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
netacad.com: # and Google. By telling these "robots" where not to go on your site, | |
netacad.com: # you save bandwidth and server resources. | |
netacad.com: # | |
netacad.com: # This file will be ignored unless it is at the root of your host: | |
netacad.com: # Used: http://example.com/robots.txt | |
netacad.com: # Ignored: http://example.com/site/robots.txt | |
netacad.com: # | |
netacad.com: # For more information about the robots.txt standard, see: | |
netacad.com: # http://www.robotstxt.org/robotstxt.html | |
netacad.com: # CSS, JS, Images | |
netacad.com: # Directories | |
netacad.com: # Files | |
netacad.com: # Paths (clean URLs) | |
netacad.com: # Paths (no clean URLs) | |
imore.com: # | |
imore.com: # robots.txt | |
imore.com: # | |
imore.com: # This file is to prevent the crawling and indexing of certain parts | |
imore.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
imore.com: # and Google. By telling these "robots" where not to go on your site, | |
imore.com: # you save bandwidth and server resources. | |
imore.com: # | |
imore.com: # This file will be ignored unless it is at the root of your host: | |
imore.com: # Used: http://example.com/robots.txt | |
imore.com: # Ignored: http://example.com/site/robots.txt | |
imore.com: # | |
imore.com: # For more information about the robots.txt standard, see: | |
imore.com: # http://www.robotstxt.org/robotstxt.html | |
imore.com: # | |
imore.com: # For syntax checking, see: | |
imore.com: # http://www.frobee.com/robots-txt-check | |
imore.com: # Directories | |
imore.com: # Files | |
imore.com: # Paths (clean URLs) | |
imore.com: # Paths (no clean URLs) | |
google.cz: # AdsBot | |
google.cz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
gymshark.com: # we use Shopify as our ecommerce platform | |
gymshark.com: # Google adsbot ignores robots.txt unless specifically named! | |
joemonster.org: # www.robotstxt.org/ | |
joemonster.org: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
ufmg.br: # www.robotstxt.org/ | |
ufmg.br: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
onlinejobs.ph: # Then we start disallowing stuff | |
onlinejobs.ph: # Directories | |
onlinejobs.ph: # Disallow bots and crawlers | |
ox.ac.uk: # | |
ox.ac.uk: # robots.txt | |
ox.ac.uk: # | |
ox.ac.uk: # This file is to prevent the crawling and indexing of certain parts | |
ox.ac.uk: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ox.ac.uk: # and Google. By telling these "robots" where not to go on your site, | |
ox.ac.uk: # you save bandwidth and server resources. | |
ox.ac.uk: # | |
ox.ac.uk: # This file will be ignored unless it is at the root of your host: | |
ox.ac.uk: # Used: http://example.com/robots.txt | |
ox.ac.uk: # Ignored: http://example.com/site/robots.txt | |
ox.ac.uk: # | |
ox.ac.uk: # For more information about the robots.txt standard, see: | |
ox.ac.uk: # http://www.robotstxt.org/wc/robots.html | |
ox.ac.uk: # | |
ox.ac.uk: # For syntax checking, see: | |
ox.ac.uk: # http://www.sxw.org.uk/computing/robots/check.html | |
ox.ac.uk: # Directories | |
ox.ac.uk: # Files | |
ox.ac.uk: # Paths (clean URLs) | |
ox.ac.uk: # Paths (no clean URLs) | |
ox.ac.uk: # Stop access to videos used in overlays | |
mi.com: # 2015/12/11 | |
cnnbrasil.com.br: # | |
cnnbrasil.com.br: # robots.txt | |
cnnbrasil.com.br: # | |
gitlab.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
gitlab.com: # | |
gitlab.com: # To ban all spiders from the entire site uncomment the next two lines: | |
gitlab.com: # User-Agent: * | |
gitlab.com: # Disallow: / | |
gitlab.com: # Add a 1 second delay between successive requests to the same server, limits resources used by crawler | |
gitlab.com: # Only some crawlers respect this setting, e.g. Googlebot does not | |
gitlab.com: # Crawl-delay: 1 | |
gitlab.com: # Based on details in https://gitlab.com/gitlab-org/gitlab/blob/master/config/routes.rb, | |
gitlab.com: # https://gitlab.com/gitlab-org/gitlab/blob/master/spec/routing, and using application | |
gitlab.com: # Global routes | |
gitlab.com: # Restrict allowed routes to avoid very ugly search results | |
gitlab.com: # Generic resource routes like new, edit, raw | |
gitlab.com: # This will block routes like: | |
gitlab.com: # - /projects/new | |
gitlab.com: # - /gitlab-org/gitlab-foss/issues/123/-/edit | |
gitlab.com: # Group details | |
gitlab.com: # Project details | |
homeadvisor.com: # robots.txt for http://www.homeadvisor.com/ | |
abb.com: # start robots.txt | |
abb.com: # PIS Product detail pages | |
abb.com: # Robotics: | |
abb.com: # Image Bank: | |
abb.com: # CCD: | |
abb.com: # Other: | |
abb.com: # End manual section | |
abb.com: #Finished OK | |
moj.gov.sa: #ctl00_PlaceHolderMain_SiteMapPath1 span:nth-child(2) | |
teepublic.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
teepublic.com: # | |
teepublic.com: # To ban all spiders from the entire site uncomment the next two lines: | |
teepublic.com: # User-Agent: * | |
teepublic.com: # Disallow: / | |
saraba1st.com: # | |
saraba1st.com: # robots.txt for Discuz! X3 | |
saraba1st.com: # | |
ethnos.gr: # | |
ethnos.gr: # robots.txt | |
ethnos.gr: # | |
ethnos.gr: # This file is to prevent the crawling and indexing of certain parts | |
ethnos.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ethnos.gr: # and Google. By telling these "robots" where not to go on your site, | |
ethnos.gr: # you save bandwidth and server resources. | |
ethnos.gr: # | |
ethnos.gr: # This file will be ignored unless it is at the root of your host: | |
ethnos.gr: # Used: http://example.com/robots.txt | |
ethnos.gr: # Ignored: http://example.com/site/robots.txt | |
ethnos.gr: # | |
ethnos.gr: # For more information about the robots.txt standard, see: | |
ethnos.gr: # http://www.robotstxt.org/robotstxt.html | |
ethnos.gr: # CSS, JS, Images | |
ethnos.gr: # Directories | |
ethnos.gr: # Files | |
ethnos.gr: # Paths (clean URLs) | |
ethnos.gr: # Paths (no clean URLs) | |
nyu.edu: # Disallow: /registrar/ -- Commented out by Jim on 2020-01-16 | |
filehippo.com: # v.1.4 | |
filehippo.com: # /// .//. | |
filehippo.com: # ///////////// *//////////// | |
filehippo.com: # ///////////// /, ////////////, | |
filehippo.com: # ./////////// *// //////////// | |
filehippo.com: # ///////////. ///* ///////////. .... .. ,. ... * * , ... ... */* | |
filehippo.com: # /// //// .////// //// //// ./*,, // // //,,. /* // // //,,// //,*// /// ,// | |
filehippo.com: # .//////// /////////. ///////// .//// // // ///// /////// // // // // .// ,/* // | |
filehippo.com: # //////, /////////////. //////. ./, // // // /* // // //.. //.. // *// | |
filehippo.com: #/////. /////////////////// ///// /, // ///// ///// /, */ // // // /////. | |
filehippo.com: #//// *////////////////////// //// | |
filehippo.com: # // * /////////////// /* // | |
filehippo.com: # /// *///////////* /// | |
filehippo.com: # *///////////* | |
filehippo.com: # | |
filehippo.com: # Disallow: /search?q=* | |
filehippo.com: # Disallow: */search?q=* | |
michaels.com: ############################## | |
michaels.com: # Welcome to Michaels Robots.txt File # | |
michaels.com: # Last Updated 11/08/2018 # | |
michaels.com: ############################## | |
metacritic.com: # Google is crawling the Ad defineSlot() parameters. Exclude them so we don't get a bunch of 404s. | |
techbang.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
techbang.com: # | |
techbang.com: # To ban all spiders from the entire site uncomment the next two lines: | |
techbang.com: # User-Agent: * | |
techbang.com: # Disallow: / | |
prada.com: # Sitemaps | |
openclassrooms.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
france.tv: # www.robotstxt.org/ | |
france.tv: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
uhc.com: # Exclude DAM nodes serving pdfs for /shop. iex, landing and statedpls | |
uhc.com: # Exclude site paths | |
aawsat.com: #allow amp key | |
aawsat.com: # Paths (clean URLs) | |
aawsat.com: # Paths (no clean URLs) | |
aawsat.com: #HybridAuth paths | |
khaleejtimes.com: # Updated: 2009-10-19 | |
khaleejtimes.com: # Robots.txt | |
khaleejtimes.com: # Block Nocache=1 | |
khaleejtimes.com: # Block Static Update | |
khaleejtimes.com: # Block Folders | |
khaleejtimes.com: #Disallow: /assets/ | |
khaleejtimes.com: #Disallow: /images/ | |
khaleejtimes.com: #Block certain actions | |
khaleejtimes.com: #Disallow: /apps/pbcsi.dll | |
khaleejtimes.com: #Force certain actions | |
khaleejtimes.com: # Sitemap files | |
team-bhp.com: # | |
team-bhp.com: # robots.txt | |
team-bhp.com: # | |
team-bhp.com: # This file is to prevent the crawling and indexing of certain parts | |
team-bhp.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
team-bhp.com: # and Google. By telling these "robots" where not to go on your site, | |
team-bhp.com: # you save bandwidth and server resources. | |
team-bhp.com: # | |
team-bhp.com: # This file will be ignored unless it is at the root of your host: | |
team-bhp.com: # Used: http://example.com/robots.txt | |
team-bhp.com: # Ignored: http://example.com/site/robots.txt | |
team-bhp.com: # | |
team-bhp.com: # For more information about the robots.txt standard, see: | |
team-bhp.com: # http://www.robotstxt.org/wc/robots.html | |
team-bhp.com: # | |
team-bhp.com: # For syntax checking, see: | |
team-bhp.com: # http://www.sxw.org.uk/computing/robots/check.html | |
team-bhp.com: # Directories | |
team-bhp.com: # Disallow: /misc/ | |
team-bhp.com: # Files | |
team-bhp.com: # Paths (clean URLs) | |
team-bhp.com: # Paths (no clean URLs) | |
team-bhp.com: # Forum (was present before 14th Feb 2013 new portal, few more added later) | |
team-bhp.com: # Sitemaps have also been submitted directly in GSC, but left here for other search engines | |
rtings.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
rtings.com: # | |
rtings.com: # To ban all spiders from the entire site uncomment the next two lines: | |
easeus.com: # Robots.txt file from https://www.easeus.com | |
easeus.com: # | |
easeus.com: # All robots will spider the domain | |
jetbrains.com: # Global | |
jetbrains.com: # Allow crawling: these are blocked using the robots meta tag instead | |
jetbrains.com: # disallow: */promo/ | |
jetbrains.com: # disallow: */download-thanks | |
jetbrains.com: # disallow: */download/download_thanks.jsp | |
jetbrains.com: # Sitemaps | |
jetbrains.com: # AppCode | |
jetbrains.com: # CLion | |
jetbrains.com: # DataGrip | |
jetbrains.com: # dotCover | |
jetbrains.com: # dotMemory | |
jetbrains.com: # dotPeak | |
jetbrains.com: # dotTrace | |
jetbrains.com: # Hub | |
jetbrains.com: # Idea | |
jetbrains.com: # MPS | |
jetbrains.com: # PhpStorm | |
jetbrains.com: # PyCharm | |
jetbrains.com: # PyCharm Edu | |
jetbrains.com: # ReSharper | |
jetbrains.com: # ReSharper –°++ | |
jetbrains.com: # Research | |
jetbrains.com: # Rider | |
jetbrains.com: # RubyMine | |
jetbrains.com: # TeamCity | |
jetbrains.com: # Upsource | |
jetbrains.com: # WebStorm | |
jetbrains.com: # YouTrack | |
cjn.cn: # Robots.txt file from http://www.cjn.cn | |
cjn.cn: # All robots will spider the domain | |
videolan.org: # $Id$ | |
videolan.org: # Do not crawl CVS and .svn directories | |
videolan.org: # "This robot collects content from the Internet for the sole purpose of | |
videolan.org: # helping educational institutions prevent plagiarism. [...] we compare | |
videolan.org: # student papers against the content we find on the Internet to see if we | |
videolan.org: # can find similarities." (http://www.turnitin.com/robot/crawlerinfo.html) | |
videolan.org: # --> fuck off. | |
videolan.org: # "NameProtect engages in crawling activity in search of a wide range of | |
videolan.org: # brand and other intellectual property violations that may be of interest | |
videolan.org: # to our clients." (http://www.nameprotect.com/botinfo.html) | |
videolan.org: # --> fuck off. | |
videolan.org: # "iThenticate® is a new service we have developed to combat the piracy | |
videolan.org: # of intellectual property and ensure the originality of written work for# | |
videolan.org: # publishers, non-profit agencies, corporations, and newspapers." | |
videolan.org: # (http://www.slysearch.com/) | |
videolan.org: # --> fuck off. | |
mailshake.com: # robotstxt.org | |
straitstimes.com: # | |
straitstimes.com: # robots.txt | |
straitstimes.com: # | |
straitstimes.com: # This file is to prevent the crawling and indexing of certain parts | |
straitstimes.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
straitstimes.com: # and Google. By telling these "robots" where not to go on your site, | |
straitstimes.com: # you save bandwidth and server resources. | |
straitstimes.com: # | |
straitstimes.com: # This file will be ignored unless it is at the root of your host: | |
straitstimes.com: # Used: http://example.com/robots.txt | |
straitstimes.com: # Ignored: http://example.com/site/robots.txt | |
straitstimes.com: # | |
straitstimes.com: # For more information about the robots.txt standard, see: | |
straitstimes.com: # http://www.robotstxt.org/robotstxt.html | |
straitstimes.com: # Directories | |
straitstimes.com: # Files | |
straitstimes.com: # Paths (clean URLs) | |
straitstimes.com: # Paths (no clean URLs) | |
shopclues.com: #Baiduspider | |
shopclues.com: #Yandex | |
shopclues.com: #Sosospider | |
shopclues.com: #Ezooms | |
shopclues.com: #Sogou | |
shopclues.com: #80legs.com | |
kenyans.co.ke: # | |
kenyans.co.ke: # robots.txt | |
kenyans.co.ke: # | |
kenyans.co.ke: # This file is to prevent the crawling and indexing of certain parts | |
kenyans.co.ke: # of your site by web crawlers and spiders run by sites like Yahoo! | |
kenyans.co.ke: # and Google. By telling these "robots" where not to go on your site, | |
kenyans.co.ke: # you save bandwidth and server resources. | |
kenyans.co.ke: # | |
kenyans.co.ke: # This file will be ignored unless it is at the root of your host: | |
kenyans.co.ke: # Used: http://example.com/robots.txt | |
kenyans.co.ke: # Ignored: http://example.com/site/robots.txt | |
kenyans.co.ke: # | |
kenyans.co.ke: # For more information about the robots.txt standard, see: | |
kenyans.co.ke: # http://www.robotstxt.org/robotstxt.html | |
kenyans.co.ke: # CSS, JS, Images | |
kenyans.co.ke: # Directories | |
kenyans.co.ke: # Files | |
kenyans.co.ke: # Paths (clean URLs) | |
kenyans.co.ke: # Paths (no clean URLs) | |
thepointsguy.com: # Sitemap archive | |
uci.edu: # www.robotstxt.org/ | |
uci.edu: # http://code.google.com/web/controlcrawlindex/ | |
penademo.wordpress.com: # This file was generated on Mon, 15 Feb 2021 00:08:23 +0000 | |
utexas.edu: # | |
utexas.edu: # robots.txt | |
utexas.edu: # | |
utexas.edu: # This file is to prevent the crawling and indexing of certain parts | |
utexas.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
utexas.edu: # and Google. By telling these "robots" where not to go on your site, | |
utexas.edu: # you save bandwidth and server resources. | |
utexas.edu: # | |
utexas.edu: # This file will be ignored unless it is at the root of your host: | |
utexas.edu: # Used: http://example.com/robots.txt | |
utexas.edu: # Ignored: http://example.com/site/robots.txt | |
utexas.edu: # | |
utexas.edu: # For more information about the robots.txt standard, see: | |
utexas.edu: # http://www.robotstxt.org/robotstxt.html | |
utexas.edu: # CSS, JS, Images | |
utexas.edu: # Directories | |
utexas.edu: # Files | |
utexas.edu: # Paths (clean URLs) | |
utexas.edu: # Paths (no clean URLs) | |
departementfeminin.com: # Sitemap | |
departementfeminin.com: #URL Parameters | |
fashionnova.com: # we use Shopify as our ecommerce platform | |
fashionnova.com: # Google adsbot ignores robots.txt unless specifically named! | |
neimanmarcus.com: # Updated 03-19-2020 | |
diplomatie.gouv.fr: # robots.txt | |
diplomatie.gouv.fr: # @url: https://www.diplomatie.gouv.fr | |
diplomatie.gouv.fr: # @generator: SPIP | |
ionos.de: #print | |
ionos.de: #terms and conditions | |
ionos.de: #Popups etc. | |
ionos.de: #Results | |
ionos.de: #crawl delay | |
saglik.gov.tr: #header { | |
pinterest.de: # Pinterest is hiring! | |
pinterest.de: # | |
pinterest.de: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.de: # | |
pinterest.de: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
bt.dk: # www.robotstxt.org/ | |
bt.dk: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
newsbreak.com: # New crawlers to block 2016 | |
pointtown.com: # sitemap url | |
walla.co.il: # robots.txt - 2018-03-13 | |
nielsen.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
liberal.gr: #Alexa | |
liberal.gr: #All others | |
topuniversities.com: # | |
topuniversities.com: # robots.txt | |
topuniversities.com: # | |
topuniversities.com: # This file is to prevent the crawling and indexing of certain parts | |
topuniversities.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
topuniversities.com: # and Google. By telling these "robots" where not to go on your site, | |
topuniversities.com: # you save bandwidth and server resources. | |
topuniversities.com: # | |
topuniversities.com: # This file will be ignored unless it is at the root of your host: | |
topuniversities.com: # Used: http://example.com/robots.txt | |
topuniversities.com: # Ignored: http://example.com/site/robots.txt | |
topuniversities.com: # | |
topuniversities.com: # For more information about the robots.txt standard, see: | |
topuniversities.com: # http://www.robotstxt.org/robotstxt.html | |
topuniversities.com: # CSS, JS, Images | |
topuniversities.com: # Directories | |
topuniversities.com: # Files | |
topuniversities.com: # Paths (clean URLs) | |
topuniversities.com: # Paths (no clean URLs) | |
topuniversities.com: # Paths (others) | |
colourpop.com: # we use Shopify as our ecommerce platform | |
colourpop.com: # Google adsbot ignores robots.txt unless specifically named! | |
google.com.kw: # AdsBot | |
google.com.kw: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
vhlcentral.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
vhlcentral.com: # | |
vhlcentral.com: # To ban all spiders from the entire site uncomment the next two lines: | |
vhlcentral.com: # User-Agent: * | |
vhlcentral.com: # Disallow: / | |
polimi.it: # Da: http://www.typo3blog.nl/seo/what-robotstxt-to-use-with-typo3.html | |
polimi.it: #Disallow: /uploads/ | |
fsu.edu: # robots.txt for http://www.fsu.edu/ | |
fsu.edu: # see http://info.webcrawler.com/mak/projects/robots/norobots.html | |
fsu.edu: # see http://www.robotstxt.org/wc/exclusion.html | |
fsu.edu: # see http://www.robotstxt.org/wc/norobots.html | |
fsu.edu: # see http://www.robotstxt.org/wc/norobots-rfc.html | |
fsu.edu: #Disallow: /Books/ | |
fsu.edu: #Disallow: /Phones/ | |
fsu.edu: #Disallow: /directories/ | |
fsu.edu: #Disallow: /Jobs/ | |
fsu.edu: #Disallow: /Campus/ | |
fsu.edu: #Disallow: /Links/ | |
fsu.edu: #Disallow: /Employee/ | |
fsu.edu: #Disallow: /Students/ | |
covid19india.org: # https://www.robotstxt.org/robotstxt.html | |
seemorgh.com: # JSitemap entries | |
gov.bc.ca: # robots.txt | |
gov.bc.ca: #HRSS | |
healthgrades.com: # Robots.txt file HealthGrades.com | |
healthgrades.com: # July 27, 2020 | |
healthgrades.com: # XML Sitemap Root File | |
healthgrades.com: #trying noindex vs disallow to avoid SEO blocking of versioned node page resources | |
healthgrades.com: # Disallow certain directories for b2b site | |
healthgrades.com: # Disallow consumer javascript and css folders | |
healthgrades.com: # Commented out to test impact on ability of Google to crawl / rankings | |
healthgrades.com: # Disallow: /Consumer/styles/ | |
healthgrades.com: # Disallow: /consumer/styles/ | |
healthgrades.com: # Disallow: /Consumer/scripts/ | |
healthgrades.com: # Disallow: /consumer/scripts/ | |
healthgrades.com: # End robots.txt file | |
ricardo.ch: # robots.txt for https://www.ricardo.ch/ | |
ricardo.ch: # English Pages until release | |
ricardo.ch: # 20th anniversary dedicated pages | |
ricardo.ch: # Selling Form | |
ricardo.ch: # 28.04.2016 | |
ricardo.ch: # CMS Pages | |
ricardo.ch: # Archived Article | |
ricardo.ch: # Category XML Pages | |
ricardo.ch: # Legacy and new online shop | |
ricardo.ch: # New ratings pages | |
ricardo.ch: # Legacy | |
ricardo.ch: # Legacy French Pages | |
ricardo.ch: # Disallow commercial bots we don't like | |
ricardo.ch: # Disallow static resources and API endpoints crawling | |
ricardo.ch: # User agent names for Google AdsBot can be found here : https://support.google.com/webmasters/answer/1061943?hl=en | |
ricardo.ch: # Instruction for OnCrawl bot can be found here : http://help.oncrawl.com/en/articles/2767653-oncrawl-crawler-how-does-the-oncrawl-bot-find-and-crawl-pages#:~:text=OnCrawl%20follows%20all%20instructions%20to,will%20apply%20to%20your%20crawl. | |
serpro.gov.br: # Define access-restrictions for robots/spiders | |
serpro.gov.br: # http://www.robotstxt.org/wc/norobots.html | |
serpro.gov.br: # By default we allow robots to access all areas of our site | |
serpro.gov.br: # already accessible to anonymous users | |
serpro.gov.br: # Add Googlebot-specific syntax extension to exclude forms | |
serpro.gov.br: # that are repeated for each piece of content in the site | |
serpro.gov.br: # the wildcard is only supported by Googlebot | |
serpro.gov.br: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling | |
karenmillen.com: # Pages | |
karenmillen.com: # Product Filter # | |
karenmillen.com: # Ordering & Product per page # | |
karenmillen.com: # Number of product per page - Default 60 | |
karenmillen.com: # Order By | |
karenmillen.com: # Price | |
karenmillen.com: # Faceted Navigation # | |
karenmillen.com: # Search # | |
karenmillen.com: # Ensure no Static Ressources is blocked # | |
karenmillen.com: # Crawl Delay - 5 URL max per second | |
showroomprive.com: # Site Desktop FR | |
showroomprive.com: # BackgroundAcqui | |
showroomprive.com: # Accueil | |
showroomprive.com: # Erreur | |
showroomprive.com: # MonCompte | |
showroomprive.com: # Boutique | |
showroomprive.com: # NousContacter | |
showroomprive.com: # PagesP | |
showroomprive.com: # JeuOpe | |
showroomprive.com: # Voyages | |
showroomprive.com: # Livraison | |
showroomprive.com: # ErreursExploration | |
showroomprive.com: # Cms | |
thedoublef.com: # Sitemap files | |
glassdoor.co.in: # India | |
glassdoor.co.in: # Greetings, human beings!, | |
glassdoor.co.in: # | |
glassdoor.co.in: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
glassdoor.co.in: # | |
glassdoor.co.in: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs? | |
glassdoor.co.in: # | |
glassdoor.co.in: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com | |
glassdoor.co.in: # | |
glassdoor.co.in: # | |
glassdoor.co.in: #logging related | |
glassdoor.co.in: # Blocking track urls (ACQ-2468) | |
glassdoor.co.in: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831) | |
glassdoor.co.in: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's) | |
glassdoor.co.in: # TRFC-4037 Block page from being indexed | |
glassdoor.co.in: # TRFC-4037 Block page from being indexed | |
glassdoor.co.in: # | |
glassdoor.co.in: # Note that this file has the extension '.text' rather than the more-standard '.txt' | |
glassdoor.co.in: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and | |
glassdoor.co.in: # there doesn't seem to be a way to turn this off.) | |
glassdoor.co.in: # | |
dhl.de: # robots.txt for /content/de/de | |
ipindiaonline.gov.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
ipindiaonline.gov.in: #content{margin:0 0 0 2%;position:relative;} | |
guru99.com: # If the Joomla site is installed within a folder such as at | |
guru99.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be | |
guru99.com: # moved to the site root at e.g. www.example.com/robots.txt | |
guru99.com: # AND the joomla folder name MUST be prefixed to the disallowed | |
guru99.com: # path, e.g. the Disallow rule for the /administrator/ folder | |
guru99.com: # MUST be changed to read Disallow: /joomla/administrator/ | |
guru99.com: # | |
guru99.com: # For more information about the robots.txt standard, see: | |
guru99.com: # http://www.robotstxt.org/orig.html | |
guru99.com: # | |
guru99.com: # For syntax checking, see: | |
guru99.com: # http://tool.motoricerca.info/robots-checker.phtml | |
ga.gov: # | |
ga.gov: # robots.txt | |
ga.gov: # | |
ga.gov: # This file is to prevent the crawling and indexing of certain parts | |
ga.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ga.gov: # and Google. By telling these "robots" where not to go on your site, | |
ga.gov: # you save bandwidth and server resources. | |
ga.gov: # | |
ga.gov: # This file will be ignored unless it is at the root of your host: | |
ga.gov: # Used: http://example.com/robots.txt | |
ga.gov: # Ignored: http://example.com/site/robots.txt | |
ga.gov: # | |
ga.gov: # For more information about the robots.txt standard, see: | |
ga.gov: # http://www.robotstxt.org/robotstxt.html | |
ga.gov: # CSS, JS, Images | |
ga.gov: # Directories | |
ga.gov: # Files | |
ga.gov: # Paths (clean URLs) | |
ga.gov: # Paths (no clean URLs) | |
ga.gov: # Book printer-friendly pages | |
beinsports.com: #Disallow: /$ | |
beinsports.com: #Allow: /*/*/news/*/*$ | |
beinsports.com: #Disallow: /*/news/*$ | |
beinsports.com: #Disallow: /*/videos/*$ | |
beinsports.com: #Allow: /us/*/video/*/*$ | |
beinsports.com: #Disallow: /us/*/video/*$ | |
beinsports.com: #Disallow: /us/soccer/video$ | |
beinsports.com: #team&player | |
beinsports.com: #Disallow: /*/*/team/player/* | |
beinsports.com: #Disallow: /*/*/team/2018/* | |
beinsports.com: #Disallow: /*/*/team/2017/* | |
beinsports.com: #Disallow: /*/*/team/2016/* | |
beinsports.com: #Disallow: /*/*/team/2015/* | |
beinsports.com: #tags | |
beinsports.com: #Disallow: /*/tag/*/* | |
beinsports.com: #Disallow: /*/search*q* | |
beinsports.com: #Disallow: /*/search/ | |
beinsports.com: # data-pages - galleries | |
logi.com: # Logitech | |
logi.com: # Modified 9.2.2009 | |
voanews.com: # | |
voanews.com: # robots.txt | |
voanews.com: # | |
voanews.com: # This file is to prevent the crawling and indexing of certain parts | |
voanews.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
voanews.com: # and Google. By telling these "robots" where not to go on your site, | |
voanews.com: # you save bandwidth and server resources. | |
voanews.com: # | |
voanews.com: # This file will be ignored unless it is at the root of your host: | |
voanews.com: # Used: http://example.com/robots.txt | |
voanews.com: # Ignored: http://example.com/site/robots.txt | |
voanews.com: # | |
voanews.com: # For more information about the robots.txt standard, see: | |
voanews.com: # http://www.robotstxt.org/robotstxt.html | |
voanews.com: # CSS, JS, Images | |
voanews.com: # Directories | |
voanews.com: # Files | |
voanews.com: # Paths (clean URLs) | |
voanews.com: # Paths (no clean URLs) | |
luminpdf.com: # robots.txt generated by atozseotools.com | |
gumtree.pl: #Sitemaps | |
gumtree.pl: #Sorting parameters | |
gumtree.pl: #Other comments: | |
gumtree.pl: #Sorting parameters | |
gumtree.pl: #Other comments: | |
gumtree.pl: #Sorting parameters | |
gumtree.pl: #Other comments: | |
gumtree.pl: #Sorting parameters | |
gumtree.pl: #Other comments: | |
next.co.uk: ##### 500s ##### | |
coltortiboutique.com: ##Disallow: /*? | |
coltortiboutique.com: # Disable checkout & customer account | |
coltortiboutique.com: # Disable Search pages | |
coltortiboutique.com: # Disable common folders | |
coltortiboutique.com: # Disable Tag & Review (Avoid duplicate content) | |
coltortiboutique.com: # Common files | |
coltortiboutique.com: # Disable sorting (Avoid duplicate content) | |
coltortiboutique.com: # Disable version control folders and others | |
coltortiboutique.com: #Disable Bitcoin | |
taojindi.com: # file: home robots.txt, 2012/09/13 | |
taojindi.com: # | |
game.co.uk: # __PUBLIC_IP_ADDR__ - Internet facing IP Address or Domain name. | |
creately.com: # | |
creately.com: # robots.txt | |
creately.com: # | |
creately.com: # This file is to prevent the crawling and indexing of certain parts | |
creately.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
creately.com: # and Google. By telling these "robots" where not to go on your site, | |
creately.com: # you save bandwidth and server resources. | |
creately.com: # | |
creately.com: # This file will be ignored unless it is at the root of your host: | |
creately.com: # Used: http://example.com/robots.txt | |
creately.com: # Ignored: http://example.com/site/robots.txt | |
creately.com: # | |
creately.com: # For more information about the robots.txt standard, see: | |
creately.com: # http://www.robotstxt.org/wc/robots.html | |
creately.com: # | |
creately.com: # For syntax checking, see: | |
creately.com: # http://www.sxw.org.uk/computing/robots/check.html | |
creately.com: # Directories | |
creately.com: # Files | |
creately.com: # Paths (clean URLs) | |
creately.com: # Paths (no clean URLs) | |
creately.com: # Directories | |
creately.com: # Files | |
creately.com: # Paths (clean URLs) | |
creately.com: # Paths (no clean URLs) | |
creately.com: # ========================================= # | |
articulate.com: # *********************************************************************** | |
articulate.com: # *********************************************************************** | |
articulate.com: # *********************************************************************** | |
articulate.com: # *************** **************** | |
articulate.com: # *************** **************** | |
articulate.com: # *************** **************** | |
articulate.com: # *************** **************** | |
articulate.com: # *********************************************************************** | |
articulate.com: # *********************************************************************** | |
articulate.com: # *****************************. .***************************** | |
articulate.com: # ************************* ************************* | |
articulate.com: # ********************** ********************** | |
articulate.com: # ******************** ******************** | |
articulate.com: # ****************** ,***********. ****************** | |
articulate.com: # ***************** ***************** ***************** | |
articulate.com: # **************** ,*******************, **************** | |
articulate.com: # ***************. ,*********************, **************** | |
articulate.com: # *************** .*********************** *************** | |
articulate.com: # *************** ************************. *************** | |
articulate.com: # *************** ************************, *************** | |
articulate.com: # *************** ,***********************. *************** | |
articulate.com: # *************** *********************** *************** | |
articulate.com: # **************** ********************* *************** | |
articulate.com: # ***************** ******************* *************** | |
articulate.com: # *****************, *************** *************** | |
articulate.com: # ******************* ,*****, *************** | |
articulate.com: # ********************* *************** | |
articulate.com: # *********************** *, *************** | |
articulate.com: # **************************. ****, *************** | |
articulate.com: # *********************************************************************** | |
articulate.com: # *********************************************************************** | |
articulate.com: # *********************************************************************** | |
gittigidiyor.com: ################################################################### | |
gittigidiyor.com: # # | |
gittigidiyor.com: # //. # | |
gittigidiyor.com: # /***/ # | |
gittigidiyor.com: # /*****/ # | |
gittigidiyor.com: # /*****/ # | |
gittigidiyor.com: # //****/ # | |
gittigidiyor.com: # ////*// # | |
gittigidiyor.com: # /############/*/************ # | |
gittigidiyor.com: # #########%### *//*********** # | |
gittigidiyor.com: # #########%%####************. # | |
gittigidiyor.com: # ##########%%%##/************ # | |
gittigidiyor.com: # ####### # | |
gittigidiyor.com: # ####### # | |
gittigidiyor.com: # ####### # | |
gittigidiyor.com: # ####### # | |
gittigidiyor.com: # #### # | |
gittigidiyor.com: # # # | |
gittigidiyor.com: # # | |
gittigidiyor.com: ############ Türkiye'nin Lider Alışveriş Sitesi ############## | |
mnml.la: # we use Shopify as our ecommerce platform | |
mnml.la: # Google adsbot ignores robots.txt unless specifically named! | |
kith.com: # we use Shopify as our ecommerce platform | |
kith.com: # Google adsbot ignores robots.txt unless specifically named! | |
welt.de: # The Facebook Crawler | |
welt.de: # Audisto Scraping Tool | |
airbnb.ca: # /////// | |
airbnb.ca: # // // | |
airbnb.ca: # // // | |
airbnb.ca: # // // /// /// /// | |
airbnb.ca: # // // /// /// | |
airbnb.ca: # // /// // //// /// /// /// //// /// //// /// //// /// //// | |
airbnb.ca: # // /// /// // ////////// /// ////////// /////////// ////////// /////////// | |
airbnb.ca: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.ca: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.ca: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.ca: # // // // // ////////// /// /// ////////// /// /// ////////// | |
airbnb.ca: # // ///// // | |
airbnb.ca: # // ///// // | |
airbnb.ca: # // /// /// // | |
airbnb.ca: # ////// ////// | |
airbnb.ca: # | |
airbnb.ca: # | |
airbnb.ca: # We thought you'd never make it! | |
airbnb.ca: # We hope you feel right at home in this file...unless you're a disallowed subfolder. | |
airbnb.ca: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering | |
airbnb.ca: # There's even a bring your robot to work day. | |
paris.fr: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
scielo.br: # Allow only major search spiders | |
scielo.br: # Block all other spiders | |
authentisign.com: # go away | |
arabnews.com: # | |
arabnews.com: # robots.txt | |
arabnews.com: # | |
arabnews.com: # This file is to prevent the crawling and indexing of certain parts | |
arabnews.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
arabnews.com: # and Google. By telling these "robots" where not to go on your site, | |
arabnews.com: # you save bandwidth and server resources. | |
arabnews.com: # | |
arabnews.com: # This file will be ignored unless it is at the root of your host: | |
arabnews.com: # Used: http://example.com/robots.txt | |
arabnews.com: # Ignored: http://example.com/site/robots.txt | |
arabnews.com: # | |
arabnews.com: # For more information about the robots.txt standard, see: | |
arabnews.com: # http://www.robotstxt.org/robotstxt.html | |
arabnews.com: # CSS, JS, Images | |
arabnews.com: # Directories | |
arabnews.com: # Files | |
arabnews.com: # Paths (clean URLs) | |
arabnews.com: # Paths (no clean URLs) | |
tickertape.in: #sitemaps | |
ucdavis.edu: # | |
ucdavis.edu: # robots.txt | |
ucdavis.edu: # | |
ucdavis.edu: # This file is to prevent the crawling and indexing of certain parts | |
ucdavis.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ucdavis.edu: # and Google. By telling these "robots" where not to go on your site, | |
ucdavis.edu: # you save bandwidth and server resources. | |
ucdavis.edu: # | |
ucdavis.edu: # This file will be ignored unless it is at the root of your host: | |
ucdavis.edu: # Used: http://example.com/robots.txt | |
ucdavis.edu: # Ignored: http://example.com/site/robots.txt | |
ucdavis.edu: # | |
ucdavis.edu: # For more information about the robots.txt standard, see: | |
ucdavis.edu: # http://www.robotstxt.org/robotstxt.html | |
ucdavis.edu: # CSS, JS, Images | |
ucdavis.edu: # Directories | |
ucdavis.edu: # Files | |
ucdavis.edu: # Paths (clean URLs) | |
ucdavis.edu: # Paths (no clean URLs) | |
sir.kr: # 200305 생성 | |
sir.kr: # 200403 iptables 차단을 했으나 막히지 않아 nginx 에서 강제로 막음 | |
sir.kr: ### 5 | |
sir.kr: ### 10 | |
nzherald.co.nz: # robots.txt for https://www.nzherald.co.nz | |
nzherald.co.nz: # | |
nzherald.co.nz: # | |
nzherald.co.nz: # Good Bots allowed | |
nzherald.co.nz: # | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # | |
nzherald.co.nz: # Restrictions to all bots | |
nzherald.co.nz: # | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # | |
nzherald.co.nz: # Image Bots | |
nzherald.co.nz: # | |
nzherald.co.nz: # User Account Pages | |
nzherald.co.nz: # | |
nzherald.co.nz: # Site scrapers & other known bad bots that are completely disallowed | |
nzherald.co.nz: # | |
google.com.qa: # AdsBot | |
google.com.qa: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
ontario.ca: # | |
ontario.ca: # robots.txt | |
ontario.ca: # | |
ontario.ca: # This file is to prevent the crawling and indexing of certain parts | |
ontario.ca: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ontario.ca: # and Google. By telling these "robots" where not to go on your site, | |
ontario.ca: # you save bandwidth and server resources. | |
ontario.ca: # | |
ontario.ca: # This file will be ignored unless it is at the root of your host: | |
ontario.ca: # Used: http://example.com/robots.txt | |
ontario.ca: # Ignored: http://example.com/site/robots.txt | |
ontario.ca: # | |
ontario.ca: # For more information about the robots.txt standard, see: | |
ontario.ca: # http://www.robotstxt.org/wc/robots.html | |
ontario.ca: # | |
ontario.ca: # For syntax checking, see: | |
ontario.ca: # http://www.sxw.org.uk/computing/robots/check.html | |
ontario.ca: # Directories | |
ontario.ca: # Files | |
ontario.ca: # Paths (clean URLs) | |
ontario.ca: # Paths (no clean URLs) | |
ontario.ca: # Ontario.ca | |
skai.gr: # | |
skai.gr: # robots.txt | |
skai.gr: # | |
skai.gr: # This file is to prevent the crawling and indexing of certain parts | |
skai.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
skai.gr: # and Google. By telling these "robots" where not to go on your site, | |
skai.gr: # you save bandwidth and server resources. | |
skai.gr: # | |
skai.gr: # This file will be ignored unless it is at the root of your host: | |
skai.gr: # Used: http://example.com/robots.txt | |
skai.gr: # Ignored: http://example.com/site/robots.txt | |
skai.gr: # | |
skai.gr: # For more information about the robots.txt standard, see: | |
skai.gr: # http://www.robotstxt.org/robotstxt.html | |
skai.gr: # | |
skai.gr: # CSS, JS, Images | |
skai.gr: # Directories | |
skai.gr: # Files | |
skai.gr: # Paths (clean URLs) | |
skai.gr: # Paths (no clean URLs) | |
ceneo.pl: #Disallow: /*clr$ | |
ceneo.pl: #User-agent: ia_archiver | |
ceneo.pl: #Crawl-delay: 30 | |
ceneo.pl: #User-agent: Slurp | |
ceneo.pl: #Crawl-delay: 30 | |
ceneo.pl: #User-agent: Yandex | |
ceneo.pl: #Crawl-delay: 30 | |
ceneo.pl: #Disallow: /Click/ | |
ceneo.pl: #User-agent: NetSprint | |
ceneo.pl: #Crawl-delay: 120 | |
ceneo.pl: #User-agent: Speedy | |
ceneo.pl: #Crawl-Delay: 120 | |
ceneo.pl: #zablokowane | |
indiatyping.com: # If the Joomla site is installed within a folder such as at | |
indiatyping.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be | |
indiatyping.com: # moved to the site root at e.g. www.example.com/robots.txt | |
indiatyping.com: # AND the joomla folder name MUST be prefixed to the disallowed | |
indiatyping.com: # path, e.g. the Disallow rule for the /administrator/ folder | |
indiatyping.com: # MUST be changed to read Disallow: /joomla/administrator/ | |
indiatyping.com: # | |
indiatyping.com: # For more information about the robots.txt standard, see: | |
indiatyping.com: # http://www.robotstxt.org/orig.html | |
indiatyping.com: # | |
indiatyping.com: # For syntax checking, see: | |
indiatyping.com: # http://tool.motoricerca.info/robots-checker.phtml | |
getflywheel.com: # Default Flywheel robots file | |
orbitz.com: # | |
orbitz.com: # General bots | |
orbitz.com: # | |
orbitz.com: #hotel | |
orbitz.com: #flight | |
orbitz.com: #package | |
orbitz.com: #car | |
orbitz.com: #activities | |
orbitz.com: #cruise | |
orbitz.com: #other | |
orbitz.com: # | |
orbitz.com: # Google Ads | |
orbitz.com: # | |
orbitz.com: # | |
orbitz.com: # | |
orbitz.com: # Bing Ads | |
orbitz.com: # | |
orbitz.com: # | |
orbitz.com: # SemrushBot | |
orbitz.com: # | |
besoccer.com: # Google AdSense | |
besoccer.com: # Adsbot-Google | |
glassdoor.ca: # Canada | |
glassdoor.ca: # Greetings, human beings!, | |
glassdoor.ca: # | |
glassdoor.ca: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
glassdoor.ca: # | |
glassdoor.ca: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs? | |
glassdoor.ca: # | |
glassdoor.ca: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com | |
glassdoor.ca: # | |
glassdoor.ca: # | |
glassdoor.ca: #logging related | |
glassdoor.ca: # Blocking track urls (ACQ-2468) | |
glassdoor.ca: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831) | |
glassdoor.ca: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's) | |
glassdoor.ca: # TRFC-4037 Block page from being indexed | |
glassdoor.ca: # | |
glassdoor.ca: # Note that this file has the extension '.text' rather than the more-standard '.txt' | |
glassdoor.ca: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and | |
glassdoor.ca: # there doesn't seem to be a way to turn this off.) | |
glassdoor.ca: # | |
ekaie.com: # | |
ekaie.com: # robots.txt for Discuz! X3 | |
ekaie.com: # | |
jining.com: # | |
jining.com: # robots.txt for Discuz! X3 | |
jining.com: # | |
jisilu.cn: # | |
jisilu.cn: # robots.txt for WeCenter | |
jisilu.cn: # | |
ebay.es: ## BEGIN FILE ### | |
ebay.es: # | |
ebay.es: # allow-all | |
ebay.es: # DR | |
ebay.es: # | |
ebay.es: # The use of robots or other automated means to access the eBay site | |
ebay.es: # without the express permission of eBay is strictly prohibited. | |
ebay.es: # Notwithstanding the foregoing, eBay may permit automated access to | |
ebay.es: # access certain eBay pages but soley for the limited purpose of | |
ebay.es: # including content in publicly available search engines. Any other | |
ebay.es: # use of robots or failure to obey the robots exclusion standards set | |
ebay.es: # forth at <https://www.robotstxt.org/orig.html> is strictly | |
ebay.es: # prohibited. | |
ebay.es: # | |
ebay.es: # v10_ROW_Feb_2021 | |
ebay.es: ### DIRECTIVES ### | |
ebay.es: # VIS Sitemaps | |
ebay.es: # PRP Sitemaps | |
ebay.es: # CLP Sitemaps | |
ebay.es: # BROWSE Sitemaps | |
ebay.es: ### END FILE ### | |
jusbrasil.com.br: # Disable search pagination | |
jusbrasil.com.br: ## Specific Artifact SERP | |
jusbrasil.com.br: # Lawsuits | |
jusbrasil.com.br: # Should be allowed | |
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes | |
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes/ | |
jusbrasil.com.br: # Shouldn't be allowed: | |
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes/artigos | |
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes/artigos/ | |
lucidchart.com: # | |
lucidchart.com: # robots.txt | |
lucidchart.com: # | |
lucidchart.com: # This file is to prevent the crawling and indexing of certain parts | |
lucidchart.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
lucidchart.com: # and Google. By telling these "robots" where not to go on your site, | |
lucidchart.com: # you save bandwidth and server resources. | |
lucidchart.com: # | |
lucidchart.com: # This file will be ignored unless it is at the root of your host: | |
lucidchart.com: # Used: http://example.com/robots.txt | |
lucidchart.com: # Ignored: http://example.com/site/robots.txt | |
lucidchart.com: # | |
lucidchart.com: # For more information about the robots.txt standard, see: | |
lucidchart.com: # http://www.robotstxt.org/wc/robots.html | |
lucidchart.com: # | |
lucidchart.com: # For syntax checking, see: | |
lucidchart.com: # http://www.sxw.org.uk/computing/robots/check.html | |
lucidchart.com: # Directories | |
lucidchart.com: # Paths (no clean URLs) | |
lucidchart.com: ##### | |
lucidchart.com: # Drupal | |
lucidchart.com: ##### | |
lucidchart.com: # Directories | |
lucidchart.com: # Allow some content from /pages/misc | |
lucidchart.com: # Files | |
lucidchart.com: # Paths (clean URLs) | |
lucidchart.com: # Paths (no clean URLs) | |
lucidchart.com: # Noindex i18n Pages | |
lucidchart.com: ##### | |
lucidchart.com: # Code-Base | |
lucidchart.com: # | |
lucidchart.com: # The following URL's are defined in our routing files, | |
lucidchart.com: # but have no value for indexing. Several of them should | |
lucidchart.com: # definitely NOT be indexed. | |
lucidchart.com: ##### | |
yuque.com: # If you would like to crawl Yuque contact us at support@yuque.com. | |
yuque.com: # We also provide an extensive API: https://yuque.com/yuque/developer | |
ratopati.com: # vestacp autogenerated robots.txt | |
vc.ru: # .-----------------------------------. | |
vc.ru: # ( –ò—Ç—é, —á—Ç–æ –ø–æ–∫–∞–∑–∞—Ç—å —Ä–æ–±–æ—Ç–∞–º, –ø–æ–¥–æ–¥–∏—Ç–µ ) | |
vc.ru: # ,-----------------------------------' | |
vc.ru: # -' | |
vc.ru: # , | |
vc.ru: # ,-. _,---._ __ / \ | |
vc.ru: # / ) .-' `./ / \ | |
vc.ru: # ( ( ,' `/ /| | |
vc.ru: # \ `-" \'\ / | | |
vc.ru: # `. , \ \ / | | |
vc.ru: # /`. ,'-`----Y | | |
vc.ru: # ( ; | ' | |
vc.ru: # | ,-. ,-' | / | |
vc.ru: # | | ( | CMTT.RU | / | |
vc.ru: # ) | \ `.___________|/ | |
vc.ru: # `--' `--' | |
almalnews.com: # vestacp autogenerated robots.txt | |
bolavip.com: # Bloqueo de bots | |
ysl.com: # Pages | |
ysl.com: # Product | |
ysl.com: #sitemaps | |
ycharts.com: # Disallow data URLs, allow everything else | |
grailed.com: # _____ _____ _____ _ ______ _____ | |
grailed.com: # / ____|| __ \ /\ |_ _|| | | ____|| __ \ | |
grailed.com: # | | __ | |__) | / \ | | | | | |__ | | | | | |
grailed.com: # | | |_ || _ / / /\ \ | | | | | __| | | | | | |
grailed.com: # | |__| || | \ \ / ____ \ _| |_ | |____ | |____ | |__| | | |
grailed.com: # \_____||_| \_\/_/ \_\|_____||______||______||_____/ | |
grailed.com: # | |
grailed.com: # Hello Robot, | |
grailed.com: # | |
grailed.com: # Very nice to e-meet you. We've been waiting for you. There are some cookies | |
grailed.com: # next to the sitemap, if you're hungry of course. | |
grailed.com: # | |
grailed.com: # With love, | |
grailed.com: # Grailed | |
grailed.com: # | |
fubo.tv: # robotstxt.org | |
lanzous.com: #link{background: #0088ff;color: #fff;padding: 10px 30px;border-radius: 3px;text-decoration: none;display: block;width: 100px;margin: 30px auto;} | |
upenn.edu: # | |
upenn.edu: # robots.txt | |
upenn.edu: # | |
upenn.edu: # This file is to prevent the crawling and indexing of certain parts | |
upenn.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
upenn.edu: # and Google. By telling these "robots" where not to go on your site, | |
upenn.edu: # you save bandwidth and server resources. | |
upenn.edu: # | |
upenn.edu: # This file will be ignored unless it is at the root of your host: | |
upenn.edu: # Used: http://example.com/robots.txt | |
upenn.edu: # Ignored: http://example.com/site/robots.txt | |
upenn.edu: # | |
upenn.edu: # For more information about the robots.txt standard, see: | |
upenn.edu: # http://www.robotstxt.org/robotstxt.html | |
upenn.edu: # CSS, JS, Images | |
upenn.edu: # Directories | |
upenn.edu: # Files | |
upenn.edu: # Paths (clean URLs) | |
upenn.edu: # Paths (no clean URLs) | |
qhdsny.com: Binary file (standard input) matches | |
desjardins.com: # Allow all | |
datareportal.com: # Squarespace Robots Txt | |
viettishop.com: ## robots.txt for Magento Community and Enterprise | |
viettishop.com: ## GENERAL SETTINGS | |
viettishop.com: ## Enable robots.txt rules for all crawlers | |
viettishop.com: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server. | |
viettishop.com: ## Set a custom crawl rate if you're experiencing traffic problems with your server. | |
viettishop.com: # Crawl-delay: 30 | |
viettishop.com: ## Vietti sitemap: | |
viettishop.com: ## DEVELOPMENT RELATED SETTINGS | |
viettishop.com: ## Do not crawl development files and folders: CVS, svn directories and dump files | |
viettishop.com: ## GENERAL MAGENTO SETTINGS | |
viettishop.com: ## Do not crawl Magento admin page | |
viettishop.com: ## Do not crawl common Magento technical folders | |
viettishop.com: ## Do not crawl common Magento files | |
viettishop.com: ## MAGENTO SEO IMPROVEMENTS | |
viettishop.com: ## Do not crawl sub category pages that are sorted or filtered. | |
viettishop.com: ## Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs. | |
viettishop.com: ## Disallow: /index.php/ | |
viettishop.com: ## Do not crawl links with session IDs | |
viettishop.com: ## Do not crawl checkout and user account pages | |
viettishop.com: ## Do not crawl seach pages and not-SEO optimized catalog links | |
viettishop.com: ## SERVER SETTINGS | |
viettishop.com: ## Do not crawl common server technical folders and files | |
viettishop.com: ## IMAGE CRAWLERS SETTINGS | |
viettishop.com: ## Extra: Uncomment if you do not wish Google and Bing to index your images | |
viettishop.com: # User-agent: Googlebot-Image | |
viettishop.com: # Disallow: / | |
viettishop.com: # User-agent: msnbot-media | |
viettishop.com: # Disallow: / | |
mangools.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
gradeup.co: # Block all user pages, hindi pages | |
gradeup.co: # Block urls with query params | |
gradeup.co: #Disallow: /user/* | |
gradeup.co: #Disallow: /hindi/* | |
gradeup.co: #Disallow: /post-i-* | |
gradeup.co: #Disallow: /query-i-* | |
gradeup.co: #Disallow: /shared-info-i-* | |
gradeup.co: #Disallow: /mcq-i-* | |
gradeup.co: #Sitemap: https://s3.amazonaws.com/sitemaps-gradeup/sitemap_index.xml | |
gradeup.co: #sitemap | |
standardbank.co.za: # robots.txt for Sites | |
standardbank.co.za: # Do Not delete this file. | |
rutgers.edu: # | |
rutgers.edu: # robots.txt | |
rutgers.edu: # | |
rutgers.edu: # This file is to prevent the crawling and indexing of certain parts | |
rutgers.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
rutgers.edu: # and Google. By telling these "robots" where not to go on your site, | |
rutgers.edu: # you save bandwidth and server resources. | |
rutgers.edu: # | |
rutgers.edu: # This file will be ignored unless it is at the root of your host: | |
rutgers.edu: # Used: http://example.com/robots.txt | |
rutgers.edu: # Ignored: http://example.com/site/robots.txt | |
rutgers.edu: # | |
rutgers.edu: # For more information about the robots.txt standard, see: | |
rutgers.edu: # http://www.robotstxt.org/robotstxt.html | |
rutgers.edu: # CSS, JS, Images | |
rutgers.edu: # Directories | |
rutgers.edu: # Files | |
rutgers.edu: # Paths (clean URLs) | |
rutgers.edu: # Paths (no clean URLs) | |
smile.io: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
smile.io: # | |
smile.io: # To ban all spiders from the entire site uncomment the next two lines: | |
smile.io: # User-agent: * | |
smile.io: # Disallow: / | |
cebraspe.org.br: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
cebraspe.org.br: #content{margin:0 0 0 2%;position:relative;} | |
ilduomo.it: # robots.txt automatically generated by PrestaShop e-commerce open-source solution | |
ilduomo.it: # http://www.prestashop.com - http://www.prestashop.com/forums | |
ilduomo.it: # This file is to prevent the crawling and indexing of certain parts | |
ilduomo.it: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ilduomo.it: # and Google. By telling these "robots" where not to go on your site, | |
ilduomo.it: # you save bandwidth and server resources. | |
ilduomo.it: # For more information about the robots.txt standard, see: | |
ilduomo.it: # http://www.robotstxt.org/robotstxt.html | |
ilduomo.it: # Allow Directives | |
ilduomo.it: # Private pages | |
ilduomo.it: # Directories | |
ilduomo.it: # Files | |
browserstack.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
browserstack.com: # | |
browserstack.com: # To ban all spiders from the entire site uncomment the next two lines: | |
cheezburger.com: # Moist | |
colg.cn: # | |
colg.cn: # robots.txt for Discuz! X3 | |
colg.cn: # | |
colg.cn: #Disallow: /forum.php?mod=redirect* | |
colg.cn: #Disallow: /forum.php?mod=post* | |
semanticscholar.org: # We are a non-profit research institute. If you would like to collaborate with us, | |
semanticscholar.org: # please contact us at: ai2-info@allenai.org | |
semanticscholar.org: # Or check out our public API http://api.semanticscholar.org/ | |
sme.sk: # www.robotstxt.org/ | |
sme.sk: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
monash.edu: # www.monash.edu | |
monash.edu: # Added for Dey Alexander. Templates no be indexed. RK dec 2003 | |
monash.edu: #Added for migration access issue 10/9/03 sms# | |
monash.edu: #Disallow: /library/ # Removed 21/11/2012 by DMa - Google needs to index library's site | |
monash.edu: #Disallow: /arts/ # removed on 20/11/19 - after domain change for arts | |
monash.edu: ##INS555156 HJiang, will be on 1st of Oct | |
monash.edu: # INC000001675021 | |
monash.edu: # INC000001890005 | |
monash.edu: # Disallow: /library/search | |
monash.edu: # INC000001918907 | |
monash.edu: ##### Don't index web server statistics | |
monash.edu: ##### Don't index user disks - they should be accessed as ~username | |
monash.edu: #lls site moved to celts server | |
monash.edu: #Disallow: /lls/ | |
monash.edu: #Disallow: /orientation/ | |
monash.edu: #fixing issue with mrgs progress reports indexing | |
monash.edu: #HEAT 450520 - removing /alumni./assets/images/ from Google image search | |
monash.edu: #User-agent: Googlebot-Image | |
monash.edu: #for Gary Gopinathan REMEDY INC200916 by HM 21 Feb 2012 | |
monash.edu: #for Rachel Zelada REMEDY INC378928 by DMa 14 Dec 2012 | |
monash.edu: #added for Derek Brown REMEDY INC400190 - 30 Jan 2013 by HM | |
monash.edu: #REMEDY INC513114 - 1 Aug 2013 by DMa | |
monash.edu: #REMEDY INC542349 - 11 Nov by DMa | |
monash.edu: #REMEDY INC693434 - 15 May 2014 by DMa | |
monash.edu: #REMEDY INC842290 - 12/01/2015 by MathewR | |
monash.edu: #Squiz Zd 38741 | |
monash.edu: # INC000002051888 | |
monash.edu: # Don't index Internal news folder as requested by Internal communications team - 2 June 2017 - done by Shefali Joshi | |
monash.edu: # MS-81 - Move Study to monash.edu domain - 19 Aug 2017 by dcook | |
monash.edu: # SDVIC-607 - Prevent crawl of old majors | |
monash.edu: #added for Fiona McQueen by SMC digital team 09/18 | |
monash.edu: # INC000001972784 added for Greg McKeown by HDo 14 Sep 2018 | |
monash.edu: # INC000002074170 | |
monash.edu: # Disallow MUMA's Design Files | |
monash.edu: # SDVIC-4380 Prevent crawling of Funnelback search queries or facets. | |
monash.edu: # ### OTHER SETTINGS ### | |
monash.edu: # INC000002334371 - Disallow MADA's /artdes/ paths | |
monash.edu: #added by Simiao Luo from SMC, requested by Harriosn Gist, 11/12/2019# | |
monash.edu: #added by Wilson for Jenny Legg INC000002497957 | |
monash.edu: #added by Angelene Wong, 22/10/2020 | |
monash.edu: #added by Angelene Wong, 27/10/2020 | |
ylsw.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
ylsw.com: #content{margin:0 0 0 2%;position:relative;} | |
in.gov: # robots.txt for http://www.IN.gov/ | |
bol.com: # Sitemap | |
bol.com: # SEO-3529 | |
bol.com: # Shop | |
bol.com: # SEB-1632 | |
bol.com: # SEB-1013 | |
bol.com: # SEB-2874 | |
bol.com: # SEB-2131 | |
bol.com: # SEB-2022 | |
bol.com: # Excluding double list and category pages created through CMS | |
bol.com: # Excluding links to review tools | |
bol.com: # Excluding links to reponsive review, q&a forms and ajax calls supporting them | |
bol.com: # Excluding all /catalog/ urls (link to e-mail a product, recommendations, compare, tab-content, etc.) | |
bol.com: # Excluding non-relevant ATG links | |
bol.com: # Excluding non-relevant brand pages | |
bol.com: # Excluding non-relevant prijsoverzicht urls | |
bol.com: # Excluding error urls | |
bol.com: # Track and trace page | |
bol.com: # SEB-1294 | |
bol.com: # SEB-1822 | |
bol.com: # SEB-1574 | |
bol.com: # Partner Service | |
bol.com: # MAR-2754 | |
bol.com: # SEB-1916 | |
bol.com: # SEB-1574 | |
anthem.com: # Below two items are to exclude bad URLs from Google Bot as of 6/2014 | |
anthem.com: #microsites | |
anthem.com: # bcbsga.com | |
anthem.com: # unicare.com | |
anthem.com: # Internal Search Bots | |
officeworks.com.au: # Non-seo URL | |
officeworks.com.au: # Customer specific urls | |
officeworks.com.au: # Excluded seo URLs | |
officeworks.com.au: # Old mobile & desktop urls | |
officeworks.com.au: # Old campaigns | |
officeworks.com.au: # business terms temporary | |
officeworks.com.au: # Sitemap | |
cnn.gr: # If the Joomla site is installed within a folder such as at | |
cnn.gr: # e.g. www.example.com/joomla/ the robots.txt file MUST be | |
cnn.gr: # moved to the site root at e.g. www.example.com/robots.txt | |
cnn.gr: # AND the joomla folder name MUST be prefixed to the disallowed | |
cnn.gr: # path, e.g. the Disallow rule for the /administrator/ folder | |
cnn.gr: # MUST be changed to read Disallow: /joomla/administrator/ | |
cnn.gr: # | |
cnn.gr: # For more information about the robots.txt standard, see: | |
cnn.gr: # http://www.robotstxt.org/orig.html | |
cnn.gr: # | |
cnn.gr: # For syntax checking, see: | |
cnn.gr: # http://tool.motoricerca.info/robots-checker.phtml | |
ryerson.ca: # /robots.txt file for http://ryerson.ca/ | |
ryerson.ca: # mail webmaster@ryerson.ca | |
support.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
support.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
support.wordpress.com: # This file was generated on Thu, 28 Jan 2021 13:33:58 +0000 | |
google.dk: # AdsBot | |
google.dk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
belgium.be: # | |
belgium.be: # robots.txt | |
belgium.be: # | |
belgium.be: # This file is to prevent the crawling and indexing of certain parts | |
belgium.be: # of your site by web crawlers and spiders run by sites like Yahoo! | |
belgium.be: # and Google. By telling these "robots" where not to go on your site, | |
belgium.be: # you save bandwidth and server resources. | |
belgium.be: # | |
belgium.be: # This file will be ignored unless it is at the root of your host: | |
belgium.be: # Used: http://example.com/robots.txt | |
belgium.be: # Ignored: http://example.com/site/robots.txt | |
belgium.be: # | |
belgium.be: # For more information about the robots.txt standard, see: | |
belgium.be: # http://www.robotstxt.org/robotstxt.html | |
belgium.be: # CSS, JS, Images | |
belgium.be: # Directories | |
belgium.be: # Files | |
belgium.be: # Paths (clean URLs) | |
belgium.be: # Paths (no clean URLs) | |
brightedge.com: # CSS, JS, Images | |
brightedge.com: # Directories | |
brightedge.com: # Files | |
brightedge.com: # Paths (clean URLs) | |
brightedge.com: # Paths (no clean URLs) | |
brightedge.com: # Paths (thank you pages) | |
ixl.com: # ----------------------------------------------------------------------------- | |
ixl.com: # | |
ixl.com: # Areas that search robots should avoid | |
ixl.com: # (c) 2011 IXL Learning. All rights reserved. | |
ixl.com: # | |
ixl.com: # created by jkent on 8 Mar 2002 | |
ixl.com: # | |
ixl.com: # Site-friendly search robots use this file to determine where _not_ | |
ixl.com: # to go. Some URL spaces are simply counterproductive. | |
ixl.com: # ----------------------------------------------------------------------------- | |
musixmatch.com: # Allow only major search spiders | |
musixmatch.com: # Block all other spiders | |
musixmatch.com: # Block Directories for all spiders | |
facebookblueprint.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
facebookblueprint.com: # Required for activities shared to Twitter, see https://dev.twitter.com/cards/getting-started "URL Crawling & Caching" | |
jomashop.com: # Directories | |
jomashop.com: # Session ID | |
restream.io: # Those landings are used by the marketing to track email campaigns | |
tasteofhome.com: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/ | |
proprofs.com: # Sitemap files | |
centrum24.pl: #error_title > div { | |
express.co.uk: #170820-DXD-6728 | |
nolo.com: #Crawl-delay: 10 | |
nolo.com: # Directories | |
nolo.com: # Nolo urls: | |
nolo.com: # Ehub Paths | |
nolo.com: # Paths (clean URLs) | |
nolo.com: # Paths (no clean URLs) | |
nolo.com: # NCMS | |
denverpost.com: # Sitemap archive | |
36kr.com: # robots.txt | |
aetna.com: # robots.txt for http://www.aetna.com | |
aetna.com: # | |
aetna.com: # Owner - Aetna.com User Interface Design and Development Team / AIS ADS Web Services | |
aetna.com: # | |
aetna.com: # List of Orphan urls - not linked from site - not part of search: | |
aetna.com: # http://www.aetna.com/employer/AetnaLink/ | |
aetna.com: # http://www.aetna.com/producer/marsh_broker.html | |
aetna.com: # | |
aetna.com: # http://www.aetna.com/TIBRF.html | |
aetna.com: # http://www.aetna.com/about/AetnaHealthFund/ | |
aetna.com: # http://www.aetna.com/about/AetnaHealthFund/before_fund_deductible/ | |
aetna.com: # http://www.aetna.com/about/MemberRights/ | |
aetna.com: # http://www.aetna.com/about/pdf/draft_privacy_notice.pdf | |
aetna.com: # http://www.aetna.com/about/pdf/Aetna_MCP.pdf | |
aetna.com: # http://www.aetna.com/about/dolregs.html | |
aetna.com: # | |
aetna.com: # http://www.aetna.com/help/logo/index.html | |
aetna.com: # http://www.aetna.com/info/nextel.html | |
aetna.com: # http://www.aetna.com/info/citibusiness.html | |
aetna.com: # | |
aetna.com: # http://www.aetna.com/provider/eob/ | |
aetna.com: # | |
aetna.com: # | |
aetna.com: # keep these allows out of all main catalogs | |
aetna.com: # Allow: /inyourstate/employer.html | |
aetna.com: # Allow: /inyourstate/member.html | |
aetna.com: # Allow: /inyourstate/producer.html | |
ucsd.edu: # Block all google tag manager tracking links | |
khabarfarsi.com: # | |
khabarfarsi.com: # robots.txt | |
khabarfarsi.com: # | |
khabarfarsi.com: # This file is to prevent the crawling and indexing of certain parts | |
khabarfarsi.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
khabarfarsi.com: # and Google. By telling these "robots" where not to go on your site, | |
khabarfarsi.com: # you save bandwidth and server resources. | |
khabarfarsi.com: # | |
khabarfarsi.com: # This file will be ignored unless it is at the root of your host: | |
khabarfarsi.com: # Used: http://example.com/robots.txt | |
khabarfarsi.com: # Ignored: http://example.com/site/robots.txt | |
khabarfarsi.com: # | |
khabarfarsi.com: # For more information about the robots.txt standard, see: | |
khabarfarsi.com: # http://www.robotstxt.org/wc/robots.html | |
khabarfarsi.com: # | |
khabarfarsi.com: # For syntax checking, see: | |
khabarfarsi.com: # http://www.sxw.org.uk/computing/robots/check.html | |
khabarfarsi.com: # Files | |
khabarfarsi.com: # Paths (clean URLs) | |
khabarfarsi.com: # Paths (no clean URLs) | |
pluto.tv: # Hidden channels | |
ballotpedia.org: #Disallow: /wiki/skins/ | |
ballotpedia.org: #Disallow: /wiki/index.php/ | |
ballotpedia.org: #Crawl-delay: 5 | |
ballotpedia.org: #Request-rate: 1/5 # maximum rate is one page every 5 seconds | |
ballotpedia.org: #Visit-time: 0600-0845 # only visit between 06:00 and 08:45 UTC (GMT) | |
ballotpedia.org: #User-agent: Slurp | |
ballotpedia.org: #Disallow: / | |
ballotpedia.org: #-------------------------- | |
google.co.ma: # AdsBot | |
google.co.ma: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
diariopanorama.com: # robots.txt for http://www.elliberal.com.ar | |
diariopanorama.com: # Last modified: 2014-12-30 T15:00:00 -0300 | |
techacademy.jp: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
techacademy.jp: # | |
techacademy.jp: # To ban all spiders from the entire site uncomment the next two lines: | |
zoho.eu: # ------------------------------------------ | |
zoho.eu: # ZOHO Corp. -- http://www.zoho.com | |
zoho.eu: # Robot Exclusion File -- robots.txt | |
zoho.eu: # Author: Zoho Creative | |
zoho.eu: # Last Updated: 24/12/2020 | |
zoho.eu: # ------------------------------------------ | |
zoho.eu: # unwanted list taken from zoho search list | |
zoho.eu: # unwanted list taken from zoho search list | |
zoho.eu: # unwanted list taken from zoho search for zoholics | |
zoho.eu: # unwanted list taken from zoho search for zoho | |
bu.edu: # Directions for robots. See this URL: | |
bu.edu: # http://info.webcrawler.com/mak/projects/robots/norobots.html | |
bu.edu: # for a description of the file format. | |
bu.edu: # 2008-08-21 | |
bu.edu: ##### | |
bu.edu: # Here is where we override the default action | |
bu.edu: ## Due to a bug in linklint, must first specify a disallow in order for | |
bu.edu: ## for all other directories to be allowed. Feel free to add other | |
bu.edu: ## disallows below the first disallow line. | |
bu.edu: ##### | |
bu.edu: # Allow W3C link Validator for /dev/ and /nisdev/ | |
bu.edu: # skipping other dynamic content or private areas | |
bu.edu: # 2004-08-27 gaudette | |
bu.edu: # | |
bu.edu: ##### | |
bu.edu: # default action - currently it allows access to most of the site | |
bu.edu: # skipping dynamic content or private areas | |
bu.edu: # | |
bu.edu: # BUniverse exclusions added by kgrin on 2010-04-26 | |
bu.edu: ### | |
bu.edu: # Emergency change 2012-02-14 bfenster, in response to incident | |
bu.edu: ### | |
bu.edu: # Emergency change 2014-11-17 bfenster, in response to incident | |
bu.edu: ##### | |
bu.edu: # default action - currently it allows access to most of the site | |
bu.edu: # skipping dynamic content or private areas | |
bu.edu: # | |
bu.edu: # BUniverse exclusions added by kgrin on 2010-04-21 | |
bu.edu: # academics/summer archive exclusions added by kgrin on 2011-07-17 | |
local.com: #robots.txt for all our sites | |
kartable.fr: # www.robotstxt.org/ | |
kartable.fr: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
kartable.fr: # https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt | |
kartable.fr: # old site urls | |
bookdepository.com: # START: Temporarily SEO Experiment; Ticket: WEBOPS-1445 | |
bookdepository.com: # END: Temporarily SEO Experiment | |
bookdepository.com: # Semrushbot does not have disallow rule types implemented - WEBOPS-2925 | |
macrotrends.net: # | |
macrotrends.net: # robots.txt for https://www.macrotrends.net | |
macrotrends.net: # | |
macrotrends.net: # Allow MOZ to crawl the site | |
macrotrends.net: # advertising-related bots: | |
macrotrends.net: # Wikipedia work bots: | |
macrotrends.net: # Crawlers that are kind enough to obey, but which we'd rather not have | |
macrotrends.net: # unless they're feeding search engines. | |
macrotrends.net: # Some bots are known to be trouble, particularly those designed to copy | |
macrotrends.net: # entire sites. Please obey robots.txt. | |
macrotrends.net: # Misbehaving: requests much too fast: | |
macrotrends.net: # | |
macrotrends.net: # Sorry, wget in its recursive mode is a frequent problem. | |
macrotrends.net: # Please read the man page and use it properly; there is a | |
macrotrends.net: # --wait option you can use to set the delay between hits, | |
macrotrends.net: # for instance. | |
macrotrends.net: # | |
macrotrends.net: # | |
macrotrends.net: # The 'grub' distributed client has been *very* poorly behaved. | |
macrotrends.net: # | |
macrotrends.net: # | |
macrotrends.net: # Doesn't follow robots.txt anyway, but... | |
macrotrends.net: # | |
macrotrends.net: # | |
macrotrends.net: # Hits many times per second, not acceptable | |
macrotrends.net: # http://www.nameprotect.com/botinfo.html | |
macrotrends.net: # A capture bot, downloads gazillions of pages with no public benefit | |
macrotrends.net: # http://www.webreaper.net/ | |
bell.ca: #Disallow: /Business/Mobility | |
bell.ca: #Disallow: /Entreprise/Mobilite | |
bell.ca: # Sitemap files | |
boohooman.com: # Pages | |
boohooman.com: # Product Filter # | |
boohooman.com: # Ordering & Product per page # | |
boohooman.com: # Number of product per page | |
boohooman.com: # Order By | |
boohooman.com: # Price | |
boohooman.com: # Faceted Navigation # | |
boohooman.com: # UK & ALL Search # | |
boohooman.com: # EU Search # | |
boohooman.com: # Search # | |
boohooman.com: # Ensure no Static Ressources is blocked # | |
boohooman.com: # Crawl Delay - 5 URL max per second | |
baseball-reference.com: # Disallow all robots on the sandbox for now. | |
baseball-reference.com: # Allow only specific directories | |
baseball-reference.com: # talk to me if you want us to unblock this. $$$$$ | |
baseball-reference.com: # tris | |
canon.com: # robots.txt for http://www.canon.com/ | |
vodafone.de: # robots.txt for www.vodafone.de 05.11.2020 | |
vodafone.de: # Sitemap | |
siemens.com: # | |
gumtree.co.za: #Sitemaps | |
gumtree.co.za: #Sorting parameters | |
gumtree.co.za: #Other comments: | |
gumtree.co.za: #Sorting parameters | |
gumtree.co.za: #Other comments: | |
gumtree.co.za: #Sorting parameters | |
gumtree.co.za: #Other comments: | |
gumtree.co.za: #Sorting parameters | |
gumtree.co.za: #Other comments: | |
google.tn: # AdsBot | |
google.tn: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
iaai.com: #Sitemap: https://www.iaai.com/sitemap.xml | |
signifyd.com: # Permanent redirects - Added 10-17-18 - modified 08-18-19 | |
signifyd.com: # Added 11-21-18 | |
signifyd.com: # Added 10-18-18 | |
signifyd.com: # Added 08-18-19 | |
signifyd.com: # Added 09-06-19 | |
signifyd.com: # Added 11-16-19 | |
signifyd.com: # Added 12-5-19 | |
signifyd.com: # Sitemaps | |
shangxueba.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
shangxueba.com: #content{margin:0 0 0 2%;position:relative;} | |
skyeng.ru: #Sitemap: https://skyeng.ru/sitemap/sitemap-videos.xml | |
skyeng.ru: #Sitemap: https://skyeng.ru/sitemap/sitemap-videos.xml | |
skyeng.ru: #Sitemap: https://skyeng.ru/sitemap/sitemap-videos-for-yandex.xml | |
fontspace.com: # Sitemap | |
discover.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
discover.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
discover.wordpress.com: # This file was generated on Wed, 24 Feb 2021 20:13:01 +0000 | |
usaddress.com: # robots.txt - 15/12/2016 | |
eventbrite.co.uk: # http://www.google.co.uk/adsbot.html - AdsBot ignores * wildcard | |
dicionarioinformal.com.br: # fill the form contact on dicionarioinformal.com.br/contato.php for constructive criticism | |
surfline.com: # Robots.txt file for https://www.surfline.com | |
surfline.com: # Wikipedia work bots: | |
blick.ch: #Special Areas of the Page | |
blick.ch: #Special parameters | |
blick.ch: #Special File Endings: | |
blick.ch: #Bots which make unnecessary 10% of our (non search) bot traffic | |
blick.ch: #Sitemap URLs | |
codal.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
codal.ir: #content{margin:0 0 0 2%;position:relative;} | |
club-k.net: # If the Joomla site is installed within a folder | |
club-k.net: # eg www.example.com/joomla/ then the robots.txt file | |
club-k.net: # MUST be moved to the site root | |
club-k.net: # eg www.example.com/robots.txt | |
club-k.net: # AND the joomla folder name MUST be prefixed to all of the | |
club-k.net: # paths. | |
club-k.net: # eg the Disallow rule for the /administrator/ folder MUST | |
club-k.net: # be changed to read | |
club-k.net: # Disallow: /joomla/administrator/ | |
club-k.net: # | |
club-k.net: # For more information about the robots.txt standard, see: | |
club-k.net: # http://www.robotstxt.org/orig.html | |
club-k.net: # | |
club-k.net: # For syntax checking, see: | |
club-k.net: # http://tool.motoricerca.info/robots-checker.phtml | |
google.com.uy: # AdsBot | |
google.com.uy: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
sciencemag.org: # | |
sciencemag.org: # robots.txt | |
sciencemag.org: # | |
sciencemag.org: # This file is to prevent the crawling and indexing of certain parts | |
sciencemag.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sciencemag.org: # and Google. By telling these "robots" where not to go on your site, | |
sciencemag.org: # you save bandwidth and server resources. | |
sciencemag.org: # | |
sciencemag.org: # This file will be ignored unless it is at the root of your host: | |
sciencemag.org: # Used: http://example.com/robots.txt | |
sciencemag.org: # Ignored: http://example.com/site/robots.txt | |
sciencemag.org: # | |
sciencemag.org: # For more information about the robots.txt standard, see: | |
sciencemag.org: # http://www.robotstxt.org/robotstxt.html | |
sciencemag.org: # CSS, JS, Images | |
sciencemag.org: # Directories | |
sciencemag.org: # Files | |
sciencemag.org: # Paths (clean URLs) | |
sciencemag.org: # Paths (no clean URLs) | |
medicinenet.com: # | |
medicinenet.com: # robots.txt for MedicineNet, Inc. Properties | |
medicinenet.com: # | |
society6.com: # Robots.txt file for https://society6.com | |
society6.com: # November 14, 2017 | |
comcast.com: # Comcast | |
comcast.com: # robots.txt for www.comcast.com | |
comcast.com: # Modified on 1/25/17 | |
ipaddress.com: #Disallow: /jstream/ | |
ipaddress.com: #Disallow: /vote/ | |
cuelinks.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
cuelinks.com: # | |
cuelinks.com: # To ban all spiders from the entire site uncomment the next two lines: | |
airbnb.co.in: # /////// | |
airbnb.co.in: # // // | |
airbnb.co.in: # // // | |
airbnb.co.in: # // // /// /// /// | |
airbnb.co.in: # // // /// /// | |
airbnb.co.in: # // /// // //// /// /// /// //// /// //// /// //// /// //// | |
airbnb.co.in: # // /// /// // ////////// /// ////////// /////////// ////////// /////////// | |
airbnb.co.in: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.co.in: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.co.in: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.co.in: # // // // // ////////// /// /// ////////// /// /// ////////// | |
airbnb.co.in: # // ///// // | |
airbnb.co.in: # // ///// // | |
airbnb.co.in: # // /// /// // | |
airbnb.co.in: # ////// ////// | |
airbnb.co.in: # | |
airbnb.co.in: # | |
airbnb.co.in: # We thought you'd never make it! | |
airbnb.co.in: # We hope you feel right at home in this file...unless you're a disallowed subfolder. | |
airbnb.co.in: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering | |
airbnb.co.in: # There's even a bring your robot to work day. | |
starbucks.com: # Slow an overly aggressive MJ12bot from the UK | |
faz.net: # robots.txt updated 2018-12-13 | |
goethe.de: #robots.txt for http://www.goethe.de/ | |
virginmedia.com: #This message has been scanned for viruses | |
maisonmargiela.com: # Disallow tricombot. | |
cjol.com: # robots.txt for http://cjol.com/ | |
olx.co.id: #Base Filters | |
olx.co.id: #Cars Filters | |
olx.co.id: #RE Filters | |
olx.co.id: # Generated on 2019-12-12T23:22:18.976Z | |
tripadvisor.com.br: # Hi there, | |
tripadvisor.com.br: # | |
tripadvisor.com.br: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
tripadvisor.com.br: # | |
tripadvisor.com.br: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet? | |
tripadvisor.com.br: # | |
tripadvisor.com.br: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team | |
tripadvisor.com.br: # | |
tripadvisor.com.br: # Email seoRockstar@tripadvisor.com | |
tripadvisor.com.br: # | |
tripadvisor.com.br: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo | |
tripadvisor.com.br: # | |
tripadvisor.com.br: # | |
ucas.com: # | |
ucas.com: # robots.txt | |
ucas.com: # | |
ucas.com: # This file is to prevent the crawling and indexing of certain parts | |
ucas.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ucas.com: # and Google. By telling these "robots" where not to go on your site, | |
ucas.com: # you save bandwidth and server resources. | |
ucas.com: # | |
ucas.com: # This file will be ignored unless it is at the root of your host: | |
ucas.com: # Used: http://example.com/robots.txt | |
ucas.com: # Ignored: http://example.com/site/robots.txt | |
ucas.com: # | |
ucas.com: # For more information about the robots.txt standard, see: | |
ucas.com: # http://www.robotstxt.org/robotstxt.html | |
ucas.com: # CSS, JS, Images | |
ucas.com: # Directories | |
ucas.com: # Files | |
ucas.com: # Paths (clean URLs) | |
ucas.com: # Paths (no clean URLs) | |
nextdoor.com: # Twitter specifies this format here https://dev.twitter.com/cards/getting-started#crawling | |
lexpress.mu: # | |
lexpress.mu: # robots.txt | |
lexpress.mu: # | |
lexpress.mu: # This file is to prevent the crawling and indexing of certain parts | |
lexpress.mu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
lexpress.mu: # and Google. By telling these "robots" where not to go on your site, | |
lexpress.mu: # you save bandwidth and server resources. | |
lexpress.mu: # | |
lexpress.mu: # This file will be ignored unless it is at the root of your host: | |
lexpress.mu: # Used: http://example.com/robots.txt | |
lexpress.mu: # Ignored: http://example.com/site/robots.txt | |
lexpress.mu: # | |
lexpress.mu: # For more information about the robots.txt standard, see: | |
lexpress.mu: # http://www.robotstxt.org/robotstxt.html | |
lexpress.mu: # CSS, JS, Images | |
lexpress.mu: # Directories | |
lexpress.mu: # Files | |
lexpress.mu: # Paths (clean URLs) | |
lexpress.mu: # Paths (no clean URLs) | |
adobelogin.com: # The use of robots or other automated means to access the Adobe site | |
adobelogin.com: # without the express permission of Adobe is strictly prohibited. | |
adobelogin.com: # Notwithstanding the foregoing, Adobe may permit automated access to | |
adobelogin.com: # access certain Adobe pages but solely for the limited purpose of | |
adobelogin.com: # including content in publicly available search engines. Any other | |
adobelogin.com: # use of robots or failure to obey the robots exclusion standards set | |
adobelogin.com: # forth at http://www.robotstxt.org/ is strictly prohibited. | |
adobelogin.com: # Details about Googlebot available at: http://www.google.com/bot.html | |
adobelogin.com: # The Google search engine can see everything | |
adobelogin.com: # The Omniture search engine can see everything | |
adobelogin.com: # XML sitemaps updates per SH10272020 | |
adobelogin.com: # XML sitemaps updates per BW10202020 | |
adobelogin.com: # Hreflang sitemap | |
adobelogin.com: # Hreflang sitemap updates per SH10122020 | |
adobelogin.com: # PSFl individual sitemaps HS07082020 | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.flrigt.btn > a:hover { | |
agora.io: #mega-menu-item-23839.mobile-btn a.mega-menu-link { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-mobile-btn { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > .mega-menu-item.mega-mobile-btn { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link sup { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.mega-toggle-on > a.mega-menu-link, | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a.mega-menu-link:hover { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-megamenu > ul.mega-sub-menu > li.mega-menu-row > ul.mega-sub-menu > li.mega-menu-columns-5-of-12, | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-megamenu > ul.mega-sub-menu > li.mega-menu-row > ul.mega-sub-menu > li.mega-menu-columns-6-of-12, | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-megamenu > ul.mega-sub-menu > li.mega-menu-row > ul.mega-sub-menu > li.mega-menu-columns-7-of-12 { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a.mega-menu-link { | |
agora.io: #menu-1 { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-item.advantage-button { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-item.advantage-button > a.mega-menu-link { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-item.advantage-button > a.mega-menu-link:hover { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item.advantage-button > a.mega-menu-link { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item.advantage-button > a.mega-menu-link:hover { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.flrigt.btn > a:hover { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-mobile-btn { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a.mega-menu-link { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li a | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu .mobile-btn > a.mega-menu-link | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.flrigt { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.flrigt.btn > a { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li > a { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu:after { content: ""; | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item > a.mega-menu-link:hover { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link:hover, #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link:focus { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout.mega-toggle-on ul.mega-sub-menu { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu { | |
agora.io: #footer-widgets h3 { | |
agora.io: #footer-widgets a { | |
agora.io: #footer-widgets ul { | |
agora.io: #column-one a { | |
agora.io: #column-one { | |
agora.io: #footer-widgets .elementor-image { | |
agora.io: #footer-widgets .elementor-image img { | |
agora.io: #social-icon-container { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu a.mega-menu-link .mega-description-group .mega-menu-description { | |
agora.io: #mega-menu-item-22860.mobile-btn a.mega-menu-link { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-megamenu > ul.mega-sub-menu > li.mega-menu-item, #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-megamenu > ul.mega-sub-menu li.mega-menu-column > ul.mega-sub-menu > li.mega-menu-item { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary[data-effect="fade_up"] li.mega-menu-item.mega-menu-megamenu > ul.mega-sub-menu, #mega-menu-wrap-primary #mega-menu-primary[data-effect="fade_up"] li.mega-menu-item.mega-menu-flyout ul.mega-sub-menu { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li#mega-menu-item-5732 ul#menu-customer-stories-mega-menu-1 li { | |
agora.io: #mega-menu-primary > li.mega-menu-item > a.mega-menu-link, | |
agora.io: #mega-menu-primary > li.nav-btn-signup > a.mega-menu-link:hover { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.nav-btn-signup > a.mega-menu-link { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.nav-btn-signup > a.mega-menu-link:hover { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item > a.mega-menu-link { | |
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item > a.mega-menu-link:hover { | |
agora.io: #team-thumb img { | |
agora.io: #team-thumb img.active { | |
agora.io: #heading-bluebox:after { | |
agora.io: #heading-graybox:before { | |
agora.io: #heading-graybox:after { | |
agora.io: #game-box-shadow { | |
agora.io: #client-slider.tabs > div { | |
agora.io: #client-slider.tabs > div span { | |
agora.io: #client-slider.tabs ul.horizontal { | |
agora.io: #client-slider.tabs li { | |
agora.io: #client-slider.tabs li img { | |
agora.io: #client-slider.tabs a { | |
agora.io: #client-slider.tabs li:hover, | |
agora.io: #client-slider.tabs li.active { | |
agora.io: #client-slider.tabs .prev, | |
agora.io: #client-slider.tabs .next { | |
agora.io: #client-slider.tabs .next { | |
agora.io: #client-slider.tabs .prev:focus, | |
agora.io: #client-slider.tabs .next:focus { | |
agora.io: #benefit-center-image { | |
agora.io: #benefit-featured-one:after, | |
agora.io: #benefit-featured-two:after, | |
agora.io: #benefit-featured-three:after, | |
agora.io: #benefit-featured-four:after { | |
agora.io: #benefit-featured-one:after { | |
agora.io: #benefit-featured-two:after { | |
agora.io: #benefit-featured-three:after { | |
agora.io: #benefit-featured-four:after { | |
agora.io: #original-audio, | |
agora.io: #agora-audio { | |
agora.io: #swiper-slide01 .button-primary { | |
agora.io: #scrollTop.show { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.mega-toggle-on > a + ul.mega-sub-menu { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a:hover { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-megamenu ul.mega-sub-menu ul.mega-sub-menu { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.mega-toggle-on > a.mega-menu-link { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.mega-toggle-on > a.mega-menu-link span:after { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item-has-children > a.mega-menu-link > span.mega-indicator{ | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.nav-btn-sales a { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.nav-btn-signup a { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html i.menu-featured-icon { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html .menu-featured-content { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html h3 { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html p { | |
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a { | |
archives-ouvertes.fr: # HAL robots.txt | |
archives-ouvertes.fr: # If you want to download lots of metadata, please use our API at https://api.archives-ouvertes.fr/ | |
archives-ouvertes.fr: # The API is far more efficient for metadata harvesting | |
archives-ouvertes.fr: # To learn more, please contact hal-support@ccsd.cnrs.fr | |
archives-ouvertes.fr: # Sitemap | |
spectrum.com: # Allowed Paths | |
spectrum.com: # Excluded Pages | |
spectrum.com: # Excluded Tags | |
spectrum.com: # Excluded Paths | |
turbosquid.com: # | |
turbosquid.com: # robots.txt | |
turbosquid.com: # | |
turbosquid.com: # Excludes | |
turbosquid.com: # XML Sitemap | |
trafficfactory.biz: #raven-field-group-e22bac2, #raven-field-group-d8b42b7{ | |
trafficfactory.biz: #raven-field-group-e22bac2, #raven-field-group-d8b42b7{ | |
patagonia.com: # patagonia.com robots.txt | |
patagonia.com: # | |
patagonia.com: # | |
patagonia.com: # | |
patagonia.com: # | |
patagonia.com: # __ _______ .___________. ______ __ __ .______ .______ ______ .______ ______ .___________. _______. | |
patagonia.com: # | | | ____|| | / __ \ | | | | | _ \ | _ \ / __ \ | _ \ / __ \ | | / | | |
patagonia.com: # | | | |__ `---| |----` | | | | | | | | | |_) | | |_) | | | | | | |_) | | | | | `---| |----` | (----` | |
patagonia.com: # | | | __| | | | | | | | | | | | / | / | | | | | _ < | | | | | | \ \ | |
patagonia.com: # | `----.| |____ | | | `--' | | `--' | | |\ \----. | |\ \----.| `--' | | |_) | | `--' | | | .----) | | |
patagonia.com: # |_______||_______| |__| \______/ \______/ | _| `._____| | _| `._____| \______/ |______/ \______/ |__| |_______/ | |
patagonia.com: # _______ ______ _______. __ __ .______ _______ __ .__ __. _______ | |
patagonia.com: # / _____| / __ \ / || | | | | _ \ | ____|| | | \ | | / _____| | |
patagonia.com: # | | __ | | | | | (----`| | | | | |_) | | |__ | | | \| | | | __ | |
patagonia.com: # | | |_ | | | | | \ \ | | | | | / | __| | | | . ` | | | |_ | | |
patagonia.com: # | |__| | | `--' | .----) | | `--' | | |\ \----.| | | | | |\ | | |__| | | |
patagonia.com: # \______| \______/ |_______/ \______/ | _| `._____||__| |__| |__| \__| \______| | |
patagonia.com: # | |
patagonia.com: # | |
patagonia.com: # | |
patagonia.com: # | |
patagonia.com: # | |
slideteam.net: ## robots.txt for Magento Community and Enterprise | |
slideteam.net: ## GENERAL SETTINGS | |
slideteam.net: ## Enable robots.txt rules for selected crawlers | |
slideteam.net: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server. | |
slideteam.net: ## Set a custom crawl rate if you're experiencing traffic problems with your server. | |
slideteam.net: # Crawl-delay: 30 | |
slideteam.net: ## Magento sitemap: uncomment and replace the URL to your Magento sitemap file | |
slideteam.net: # Sitemap: http://www.example.com/sitemap/sitemap.xml | |
slideteam.net: ## DEVELOPMENT RELATED SETTINGS | |
slideteam.net: ## Do not crawl development files and folders: CVS, svn directories and dump files | |
slideteam.net: ## GENERAL MAGENTO SETTINGS | |
slideteam.net: ## Do not crawl Magento admin page | |
slideteam.net: ## Do not crawl common Magento technical folders | |
slideteam.net: ## Do not crawl common Magento files | |
slideteam.net: ## MAGENTO2 disallowed URLs Begins | |
slideteam.net: ## MAGENTO2 disallowed URLs Ends | |
slideteam.net: ##SLI Configuration Begins | |
slideteam.net: ##SLI Configuration Ends | |
slideteam.net: ## MAGENTO SEO IMPROVEMENTS | |
slideteam.net: ## Do not crawl sub category pages that are sorted or filtered. | |
slideteam.net: #Disallow: /*?dir* | |
slideteam.net: #Disallow: /*?dir=desc | |
slideteam.net: #Disallow: /*?dir=asc | |
slideteam.net: #Disallow: /*?limit=all | |
slideteam.net: #Disallow: /*?mode* | |
slideteam.net: ## Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs. | |
slideteam.net: ## Disallow: /index.php/ | |
slideteam.net: ## Do not crawl links with session IDs | |
slideteam.net: ## Do not crawl checkout and user account pages | |
slideteam.net: #Disallow: /onestepcheckout/ | |
slideteam.net: #Disallow: /customer/ | |
slideteam.net: #Disallow: /customer/account/ | |
slideteam.net: #Disallow: /customer/account/login/ | |
slideteam.net: ## Do not crawl seach pages and not-SEO optimized catalog links | |
slideteam.net: ## SERVER SETTINGS | |
slideteam.net: ## Do not crawl common server technical folders and files | |
slideteam.net: ## IMAGE CRAWLERS SETTINGS | |
slideteam.net: ## Extra: Uncomment if you do not wish Google and Bing to index your images | |
slideteam.net: # User-agent: Googlebot-Image | |
slideteam.net: # Disallow: / | |
slideteam.net: # User-agent: msnbot-media | |
slideteam.net: # Disallow: / | |
slideteam.net: ##FOR OTHER CRAWLERS DISALLOW ALL | |
digicenter.pt: # sitemap generated by the Jumpseller ecommerce platform | |
colorado.gov: # robots.txt for http://www.colorado.gov/ | |
nbc.com: # | |
nbc.com: # robots.txt | |
nbc.com: # | |
nbc.com: # This file is to prevent the crawling and indexing of certain parts | |
nbc.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nbc.com: # and Google. By telling these "robots" where not to go on your site, | |
nbc.com: # you save bandwidth and server resources. | |
nbc.com: # | |
nbc.com: # This file will be ignored unless it is at the root of your host: | |
nbc.com: # Used: http://example.com/robots.txt | |
nbc.com: # Ignored: http://example.com/site/robots.txt | |
nbc.com: # | |
nbc.com: # For more information about the robots.txt standard, see: | |
nbc.com: # http://www.robotstxt.org/robotstxt.html | |
nbc.com: # Directories | |
nbc.com: # Files | |
nbc.com: # Paths (clean URLs) | |
nbc.com: # Paths (no clean URLs) | |
nbc.com: # Disallow users paths | |
nbc.com: # USA - Shows | |
nbc.com: # USA - Movies | |
nbc.com: # Sitemap details. | |
nbc.com: # Sitemap for the Google PlayGuide. | |
anz.com.au: # /robots.txt for http://www.anz.com/ | |
anz.com.au: # comments to InternetAdministration@anz.com | |
anz.com.au: # | |
collegedekho.com: # robots.txt for https://www.collegedekho.com/ | |
collegedekho.com: #Study Abroad Url Start | |
collegedekho.com: #Study Abroad Url End | |
collegedekho.com: #User-agent: Screaming Frog SEO Spider | |
collegedekho.com: #Disallow: / | |
dealmoon.com: # disable 2019.02.27 | |
dealmoon.com: # Disallow: /gift/ | |
dealmoon.com: #sitemap | |
bitbucket.org: # Google Site Link Exclusions | |
37signals.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
hhs.gov: # | |
hhs.gov: # robots.txt | |
hhs.gov: # | |
hhs.gov: # This file is to prevent the crawling and indexing of certain parts | |
hhs.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
hhs.gov: # and Google. By telling these "robots" where not to go on your site, | |
hhs.gov: # you save bandwidth and server resources. | |
hhs.gov: # | |
hhs.gov: # This file will be ignored unless it is at the root of your host: | |
hhs.gov: # Used: http://example.com/robots.txt | |
hhs.gov: # Ignored: http://example.com/site/robots.txt | |
hhs.gov: # | |
hhs.gov: # For more information about the robots.txt standard, see: | |
hhs.gov: # http://www.robotstxt.org/robotstxt.html | |
hhs.gov: # CSS, JS, Images | |
hhs.gov: # Directories | |
hhs.gov: # Files | |
hhs.gov: # Paths (clean URLs) | |
hhs.gov: # Paths (no clean URLs) | |
helsinki.fi: # | |
helsinki.fi: # robots.txt | |
helsinki.fi: # | |
helsinki.fi: # This file is to prevent the crawling and indexing of certain parts | |
helsinki.fi: # of your site by web crawlers and spiders run by sites like Yahoo! | |
helsinki.fi: # and Google. By telling these "robots" where not to go on your site, | |
helsinki.fi: # you save bandwidth and server resources. | |
helsinki.fi: # | |
helsinki.fi: # This file will be ignored unless it is at the root of your host: | |
helsinki.fi: # Used: http://example.com/robots.txt | |
helsinki.fi: # Ignored: http://example.com/site/robots.txt | |
helsinki.fi: # | |
helsinki.fi: # For more information about the robots.txt standard, see: | |
helsinki.fi: # http://www.robotstxt.org/robotstxt.html | |
helsinki.fi: # | |
helsinki.fi: # For syntax checking, see: | |
helsinki.fi: # http://www.frobee.com/robots-txt-check | |
helsinki.fi: # Directories | |
helsinki.fi: # Files | |
helsinki.fi: # Paths (clean URLs) | |
helsinki.fi: # Paths (no clean URLs) | |
helsinki.fi: #vanha www.helsinki.fi | |
defimedia.info: # | |
defimedia.info: # robots.txt | |
defimedia.info: # | |
defimedia.info: # This file is to prevent the crawling and indexing of certain parts | |
defimedia.info: # of your site by web crawlers and spiders run by sites like Yahoo! | |
defimedia.info: # and Google. By telling these "robots" where not to go on your site, | |
defimedia.info: # you save bandwidth and server resources. | |
defimedia.info: # | |
defimedia.info: # This file will be ignored unless it is at the root of your host: | |
defimedia.info: # Used: http://example.com/robots.txt | |
defimedia.info: # Ignored: http://example.com/site/robots.txt | |
defimedia.info: # | |
defimedia.info: # For more information about the robots.txt standard, see: | |
defimedia.info: # http://www.robotstxt.org/robotstxt.html | |
defimedia.info: # CSS, JS, Images | |
defimedia.info: # Directories | |
defimedia.info: # Files | |
defimedia.info: # Paths (clean URLs) | |
defimedia.info: # Paths (no clean URLs) | |
marktplaats.nl: # Here is our sitemap (this line is independent of UA blocks, per the spec) | |
marktplaats.nl: #Please keep blocking of all URLs in place for at least 2 years after removing a specific module | |
marktplaats.nl: #internal URLs | |
marktplaats.nl: #SOI subpage | |
marktplaats.nl: # login, confirm and forgot password pages | |
marktplaats.nl: # mymp pages | |
marktplaats.nl: # ASQ pages | |
marktplaats.nl: # SYI Pages | |
marktplaats.nl: # Flagging/tipping ads | |
marktplaats.nl: # bidding on ads | |
marktplaats.nl: # external url redirects | |
marktplaats.nl: # google analytics | |
marktplaats.nl: #korean spam | |
marktplaats.nl: # widgets | |
marktplaats.nl: # prevent unnecessary crawling | |
marktplaats.nl: # New vip | |
marktplaats.nl: # Block VIPs with parameters | |
marktplaats.nl: #block homepage feeds | |
bit.ly: # Welcome to Bitly =) | |
bit.ly: # robots welcome; | |
bit.ly: # API documentation can be found at https://dev.bitly.com/ | |
timeout.com: # robots.txt file for Time Out .com | |
timeout.com: # updated 14th May 2020 | |
db.com: # Favicon /docroot/favicon.ico | |
vidnami.com: # Google Image | |
vidnami.com: # Google AdSense | |
vidnami.com: # digg mirror | |
vidnami.com: # global | |
asianetnews.com: # Robots Text for AN production portals | |
zaubacorp.com: # | |
zaubacorp.com: # robots.txt | |
zaubacorp.com: # | |
zaubacorp.com: # This file is to prevent the crawling and indexing of certain parts | |
zaubacorp.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
zaubacorp.com: # and Google. By telling these "robots" where not to go on your site, | |
zaubacorp.com: # you save bandwidth and server resources. | |
zaubacorp.com: # | |
zaubacorp.com: # This file will be ignored unless it is at the root of your host: | |
zaubacorp.com: # Used: http://example.com/robots.txt | |
zaubacorp.com: # Ignored: http://example.com/site/robots.txt | |
zaubacorp.com: # | |
zaubacorp.com: # For more information about the robots.txt standard, see: | |
zaubacorp.com: # http://www.robotstxt.org/robotstxt.html | |
zaubacorp.com: # CSS, JS, Images | |
zaubacorp.com: # Directories | |
zaubacorp.com: # Files | |
zaubacorp.com: # Paths (clean URLs) | |
zaubacorp.com: # Paths (no clean URLs) | |
abril.com.br: # Sitemap archive | |
ithome.com.tw: #div_cube3d_pic { width: 100% !important} | |
ithome.com.tw: #div_cube3d_exp {left:0 !important; position: relative !important; margin: 0 auto} | |
ithome.com.tw: #headerbar .container-fluid {max-width: 1350px !important} | |
ithome.com.tw: #block-block-41 {background-color:#f7f7f7} | |
ithome.com.tw: #block-block-41 .wrap {text-align: center; margin: 0 auto 14px;} | |
ithome.com.tw: #hpad930up {margin-bottom:15px !important} | |
rfi.fr: # France Medias Monde [2020-10-21] - francemediasmonde.com | |
rfi.fr: ## RFI - rfi.fr - HTTPS | |
rfi.fr: ### Sitemaps | |
rfi.fr: ### Sitemaps News | |
therealreal.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file. | |
therealreal.com: # | |
zameen.com: # Disallow crawling relative links for the most populous cities in Pakistan | |
zameen.com: # Disallow crawling relative links for other locations | |
zameen.com: # Spiders added on date 2014-06-17 | |
zameen.com: # Baidu | |
zameen.com: # http://www.baidu.com/search/spider.html | |
zameen.com: # EasouSpider | |
zameen.com: # http://www.easou.com/search/spider.html | |
zameen.com: # Exabot | |
zameen.com: # http://www.exabot.com/go/robot | |
banimode.com: # robots.txt automaticaly generated by PrestaShop e-commerce open-source solution | |
banimode.com: # http://www.prestashop.com - http://www.prestashop.com/forums | |
banimode.com: # This file is to prevent the crawling and indexing of certain parts | |
banimode.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
banimode.com: # and Google. By telling these "robots" where not to go on your site, | |
banimode.com: # you save bandwidth and server resources. | |
banimode.com: # For more information about the robots.txt standard, see: | |
banimode.com: # http://www.robotstxt.org/robotstxt.html | |
banimode.com: # Allow Directives | |
banimode.com: # Private pages | |
banimode.com: # Directories | |
banimode.com: # Files | |
banimode.com: # Persian | |
banimode.com: # Disallow: /*discount | |
banimode.com: # Disallow query string | |
banimode.com: #Disallow: /*?* | |
banimode.com: # Disallow query string | |
banimode.com: #Disallow: /*?sort* | |
banimode.com: #SiteMap | |
job5156.com: # robots.txt for https://www.job5156.com/ | |
motherless.com: # All other robots will spider the domain | |
motherless.com: # Don't let spiders report stuff | |
mediamarkt.es: # Disallow: /MultiChannelMA | |
mediamarkt.es: # Disallow: /MultiChannelLocal | |
mediamarkt.es: # Disallow: /MultiChannelSearch | |
mediamarkt.es: # Disallow: /MultiChannelRedirect | |
mediamarkt.es: # Disallow: /MultiChannelMARegister | |
mediamarkt.es: # Disallow: /MultiChannelMyStoreEvents | |
mediamarkt.es: # Disallow: /MultiChannelAutoCompletion | |
mediamarkt.es: # Disallow: /MultiChannelCatalogEntryPrint | |
mediamarkt.es: # Disallow: /MultiChannelPrintCompProducts | |
mediamarkt.es: # Disallow: /MultiChannelMyStoreAdvertising | |
mediamarkt.es: # Disallow: /MultiChannelMyStoreSpecialitems | |
mediamarkt.es: # Disallow: /MultiMultiChannelMAWishlistPrint | |
mediamarkt.es: # Disallow: /mediapedia | |
mediamarkt.es: # Disallow: /error404.html | |
mediamarkt.es: # Disallow: /error500.html | |
mediamarkt.es: # Disallow: /*storeId=* | |
mediamarkt.es: # Allow: /*storeId=19601* | |
mensjournal.com: # Sitemap archive | |
justanswer.com: # Directories | |
justanswer.com: # Files | |
justanswer.com: # Paths (clean URLs) | |
justanswer.com: # Paths (no clean URLs) | |
justanswer.com: # Secure and Error pages | |
justanswer.com: # 404s | |
justanswer.com: # Sitemaps | |
justanswer.com: # Exceptions | |
sendpulse.com: # If the Joomla site is installed within a folder such as at | |
sendpulse.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be | |
sendpulse.com: # moved to the site root at e.g. www.example.com/robots.txt | |
sendpulse.com: # AND the joomla folder name MUST be prefixed to the disallowed | |
sendpulse.com: # path, e.g. the Disallow rule for the /administrator/ folder | |
sendpulse.com: # MUST be changed to read Disallow: /joomla/administrator/ | |
sendpulse.com: # | |
sendpulse.com: # For more information about the robots.txt standard, see: | |
sendpulse.com: # http://www.robotstxt.org/orig.html | |
sendpulse.com: # | |
sendpulse.com: # For syntax checking, see: | |
sendpulse.com: # http://tool.motoricerca.info/robots-checker.phtml | |
scribbr.com: # Google Image | |
scribbr.com: # Google AdSense | |
scribbr.com: # Internet Archiver Wayback Machine | |
scribbr.com: # digg mirror | |
tesla.com: # | |
tesla.com: # robots.txt | |
tesla.com: # | |
tesla.com: # This file is to prevent the crawling and indexing of certain parts | |
tesla.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
tesla.com: # and Google. By telling these "robots" where not to go on your site, | |
tesla.com: # you save bandwidth and server resources. | |
tesla.com: # | |
tesla.com: # This file will be ignored unless it is at the root of your host: | |
tesla.com: # Used: http://example.com/robots.txt | |
tesla.com: # Ignored: http://example.com/site/robots.txt | |
tesla.com: # | |
tesla.com: # For more information about the robots.txt standard, see: | |
tesla.com: # http://www.robotstxt.org/robotstxt.html | |
tesla.com: # CSS, JS, Images | |
tesla.com: # Directories | |
tesla.com: # Files | |
tesla.com: # Paths (clean URLs) | |
tesla.com: # Paths (no clean URLs) | |
tesla.com: ############################## | |
tesla.com: # START TESLA CONTENT. | |
tesla.com: ############################## | |
tesla.com: # Tesla content landing pages. | |
tesla.com: ############################## | |
tesla.com: # STOP TESLA CONTENT. | |
tesla.com: ############################## | |
addic7ed.com: # Robots file for Addic7ed.com. | |
mercadolibre.com.uy: #siteId: MLU | |
mercadolibre.com.uy: #country: uruguay | |
mercadolibre.com.uy: ##Block - Referidos | |
mercadolibre.com.uy: ##Block - siteinfo urls | |
mercadolibre.com.uy: ##Block - Cart | |
mercadolibre.com.uy: ##Block Checkout | |
mercadolibre.com.uy: ##Block - User Logged | |
mercadolibre.com.uy: #Shipping selector | |
mercadolibre.com.uy: ##Block - last search | |
mercadolibre.com.uy: ## Block - Profile - By Id | |
mercadolibre.com.uy: ## Block - Profile - By Id and role (old version) | |
mercadolibre.com.uy: ## Block - Profile - Leg. Req. | |
mercadolibre.com.uy: ##Block - noindex | |
mercadolibre.com.uy: # Mercado-Puntos | |
mercadolibre.com.uy: # Viejo mundo | |
mercadolibre.com.uy: ##Block recommendations listing | |
delfi.ee: # robots.txt for http://www.delfi.ee/ | |
delfi.ee: # | |
delfi.ee: # http://www.robotstxt.org/wc/norobots-rfc.txt | |
delfi.ee: # $Revision: 1.19 $ $Date: 2015/05/20 14:31:07 $ | |
nest.com: # Realm nest.com | |
google.com.py: # AdsBot | |
google.com.py: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
wikibooks.org: # | |
wikibooks.org: # Please note: There are a lot of pages on this site, and there are | |
wikibooks.org: # some misbehaved spiders out there that go _way_ too fast. If you're | |
wikibooks.org: # irresponsible, your access to the site may be blocked. | |
wikibooks.org: # | |
wikibooks.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN | |
wikibooks.org: # and ignoring 429 ratelimit responses, claims to respect robots: | |
wikibooks.org: # http://mj12bot.com/ | |
wikibooks.org: # advertising-related bots: | |
wikibooks.org: # Wikipedia work bots: | |
wikibooks.org: # Crawlers that are kind enough to obey, but which we'd rather not have | |
wikibooks.org: # unless they're feeding search engines. | |
wikibooks.org: # Some bots are known to be trouble, particularly those designed to copy | |
wikibooks.org: # entire sites. Please obey robots.txt. | |
wikibooks.org: # Misbehaving: requests much too fast: | |
wikibooks.org: # | |
wikibooks.org: # Sorry, wget in its recursive mode is a frequent problem. | |
wikibooks.org: # Please read the man page and use it properly; there is a | |
wikibooks.org: # --wait option you can use to set the delay between hits, | |
wikibooks.org: # for instance. | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # The 'grub' distributed client has been *very* poorly behaved. | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # Doesn't follow robots.txt anyway, but... | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # Hits many times per second, not acceptable | |
wikibooks.org: # http://www.nameprotect.com/botinfo.html | |
wikibooks.org: # A capture bot, downloads gazillions of pages with no public benefit | |
wikibooks.org: # http://www.webreaper.net/ | |
wikibooks.org: # | |
wikibooks.org: # Friendly, low-speed bots are welcome viewing article pages, but not | |
wikibooks.org: # dynamically-generated pages please. | |
wikibooks.org: # | |
wikibooks.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your | |
wikibooks.org: # bot supports such a thing using the 'Crawl-delay' or another | |
wikibooks.org: # instruction, please let us know. | |
wikibooks.org: # | |
wikibooks.org: # There is a special exception for API mobileview to allow dynamic | |
wikibooks.org: # mobile web & app views to load section content. | |
wikibooks.org: # These views aren't HTTP-cached but use parser cache aggressively | |
wikibooks.org: # and don't expose special: pages etc. | |
wikibooks.org: # | |
wikibooks.org: # Another exception is for REST API documentation, located at | |
wikibooks.org: # /api/rest_v1/?doc. | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # ar: | |
wikibooks.org: # | |
wikibooks.org: # dewiki: | |
wikibooks.org: # T6937 | |
wikibooks.org: # sensible deletion and meta user discussion pages: | |
wikibooks.org: # 4937#5 | |
wikibooks.org: # T14111 | |
wikibooks.org: # T15961 | |
wikibooks.org: # | |
wikibooks.org: # enwiki: | |
wikibooks.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for | |
wikibooks.org: # their name. See T6776 | |
wikibooks.org: # T15398 | |
wikibooks.org: # T16075 | |
wikibooks.org: # T13261 | |
wikibooks.org: # T12288 | |
wikibooks.org: # T16793 | |
wikibooks.org: # | |
wikibooks.org: # eswiki: | |
wikibooks.org: # T8746 | |
wikibooks.org: # | |
wikibooks.org: # fiwiki: | |
wikibooks.org: # T10695 | |
wikibooks.org: # | |
wikibooks.org: # hewiki: | |
wikibooks.org: #T11517 | |
wikibooks.org: # | |
wikibooks.org: # huwiki: | |
wikibooks.org: # | |
wikibooks.org: # itwiki: | |
wikibooks.org: # T7545 | |
wikibooks.org: # | |
wikibooks.org: # jawiki | |
wikibooks.org: # T7239 | |
wikibooks.org: # nowiki | |
wikibooks.org: # T13432 | |
wikibooks.org: # | |
wikibooks.org: # plwiki | |
wikibooks.org: # T10067 | |
wikibooks.org: # | |
wikibooks.org: # ptwiki: | |
wikibooks.org: # T7394 | |
wikibooks.org: # | |
wikibooks.org: # rowiki: | |
wikibooks.org: # T14546 | |
wikibooks.org: # | |
wikibooks.org: # ruwiki: | |
wikibooks.org: # | |
wikibooks.org: # svwiki: | |
wikibooks.org: # T12229 | |
wikibooks.org: # T13291 | |
wikibooks.org: # | |
wikibooks.org: # zhwiki: | |
wikibooks.org: # T7104 | |
wikibooks.org: # | |
wikibooks.org: # sister projects | |
wikibooks.org: # | |
wikibooks.org: # enwikinews: | |
wikibooks.org: # T7340 | |
wikibooks.org: # | |
wikibooks.org: # itwikinews | |
wikibooks.org: # T11138 | |
wikibooks.org: # | |
wikibooks.org: # enwikiquote: | |
wikibooks.org: # T17095 | |
wikibooks.org: # | |
wikibooks.org: # enwikibooks | |
wikibooks.org: # | |
wikibooks.org: # working... | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: #----------------------------------------------------------# | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # robots.txt for http://en.wikibooks.org/ | |
wikibooks.org: # Edit at http://en.wikibooks.org/w/index.php?title=MediaWiki:Robots.txt&action=edit | |
wikibooks.org: # Don't add newlines here. All rules set here are active for every user-agent. | |
wikibooks.org: # | |
wikibooks.org: # Please check any changes using a syntax validator such as http://tool.motoricerca.info/robots-checker.phtml | |
wikibooks.org: # Enter http://en.wikibooks.org/robots.txt as the URL to check. | |
wikibooks.org: # | |
wikibooks.org: # Don't index anything in the MediaWiki namespace | |
wikibooks.org: # | |
wikibooks.org: # Don't index anything in the Transwiki namespace | |
wikibooks.org: # | |
wikibooks.org: # Don't index discussions | |
wikibooks.org: # | |
wikibooks.org: # | |
wikibooks.org: # </source><!--leave this line alone--> | |
sportsmansoutdoorsuperstore.com: # Begin robots.txt file | |
sportsmansoutdoorsuperstore.com: # End robots.txt file | |
fritz.box: ## Disallows all robots | |
coolchasgamer.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
coolchasgamer.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
coolchasgamer.wordpress.com: # This file was generated on Wed, 09 Dec 2020 18:39:59 +0000 | |
invertironline.com: # Alexa | |
invertironline.com: # Ask | |
invertironline.com: # Google | |
invertironline.com: # MSN | |
invertironline.com: # Yahoo! | |
invertironline.com: # Others | |
auto.ru: #part2 | |
auto.ru: #part3 | |
auto.ru: #after | |
auto.ru: #parts | |
auto.ru: #part2 | |
auto.ru: #part3 | |
auto.ru: #after | |
auto.ru: #parts | |
auto.ru: #part2 | |
auto.ru: #part3 | |
auto.ru: #after | |
auto.ru: #parts | |
customs.gov.az: # robots.txt | |
customs.gov.az: # DGK 1.0.0 | |
pingidentity.com: # Updated 12.11.20 (www.pingidentity.com) | |
pingidentity.com: # For all bots | |
whoishostingthis.com: # This rule means it applies to all user-agents | |
whoishostingthis.com: # wordpress blog | |
whoishostingthis.com: # The Googlebot is the main search bot for google | |
whoishostingthis.com: # feed urls | |
whoishostingthis.com: # Disallow all files ending with these extensions | |
whoishostingthis.com: # Disallow Google from parsing indididual post feeds and trackbacks.. | |
whoishostingthis.com: # Disallow all files with ? in url | |
whoishostingthis.com: #block access to internal search result pages | |
yandex.ua: # yandex.ua | |
bangbros.com: # www.robotstxt.org/ | |
bangbros.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
banorte.com: #pulseChatButton{ | |
banorte.com: #start-chat{ | |
banorte.com: #meta-title .homepage_circles_container div{ | |
yoomark.com: # | |
yoomark.com: # robots.txt | |
yoomark.com: # | |
yoomark.com: # This file is to prevent the crawling and indexing of certain parts | |
yoomark.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
yoomark.com: # and Google. By telling these "robots" where not to go on your site, | |
yoomark.com: # you save bandwidth and server resources. | |
yoomark.com: # | |
yoomark.com: # This file will be ignored unless it is at the root of your host: | |
yoomark.com: # Used: http://example.com/robots.txt | |
yoomark.com: # Ignored: http://example.com/site/robots.txt | |
yoomark.com: # | |
yoomark.com: # For more information about the robots.txt standard, see: | |
yoomark.com: # http://www.robotstxt.org/robotstxt.html | |
yoomark.com: # CSS, JS, Images | |
yoomark.com: # Directories | |
yoomark.com: # Files | |
yoomark.com: # Paths (clean URLs) | |
yoomark.com: # Paths (no clean URLs) | |
nbps.org: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
nbps.org: #content{margin:0 0 0 2%;position:relative;} | |
musescore.com: # | |
musescore.com: # robots.txt | |
musescore.com: # | |
musescore.com: # This file is to prevent the crawling and indexing of certain parts | |
musescore.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
musescore.com: # and Google. By telling these "robots" where not to go on your site, | |
musescore.com: # you save bandwidth and server resources. | |
musescore.com: # | |
musescore.com: # This file will be ignored unless it is at the root of your host: | |
musescore.com: # Used: http://example.com/robots.txt | |
musescore.com: # Ignored: http://example.com/site/robots.txt | |
musescore.com: # | |
musescore.com: # For more information about the robots.txt standard, see: | |
musescore.com: # http://www.robotstxt.org/robotstxt.html | |
musescore.com: # CSS, JS, Images | |
musescore.com: # Directories | |
musescore.com: # Files | |
musescore.com: # Paths (clean URLs) | |
musescore.com: # Paths (no clean URLs) | |
musescore.com: # Musescore.com | |
dailystar.co.uk: #Agent Specific Disallowed Sections | |
kensaq.com: ## Default robots.txt | |
mentimeter.com: # If you are not a robot, please stop reading | |
mentimeter.com: # If you are human, please go to //humans.txt | |
imprint5.com: #**************************************************************************** | |
imprint5.com: # robots.txt | |
imprint5.com: # : Robots, spiders, and search engines use this file to detmine which | |
imprint5.com: # content they should *not* crawl while indexing your website. | |
imprint5.com: # : This system is called "The Robots Exclusion Standard." | |
imprint5.com: # : It is strongly encouraged to use a robots.txt validator to check | |
imprint5.com: # for valid syntax before any robots read it! | |
imprint5.com: # | |
imprint5.com: # Examples: | |
imprint5.com: # | |
imprint5.com: # Instruct all robots to stay out of the admin area. | |
imprint5.com: # : User-agent: * | |
imprint5.com: # : Disallow: /admin/ | |
imprint5.com: # | |
imprint5.com: # Restrict Google and MSN from indexing your images. | |
imprint5.com: # : User-agent: Googlebot | |
imprint5.com: # : Disallow: /images/ | |
imprint5.com: # : User-agent: MSNBot | |
imprint5.com: # : Disallow: /images/ | |
imprint5.com: #**************************************************************************** | |
imprint5.com: # Website Sitemap | |
imprint5.com: # Crawlers Setup | |
imprint5.com: # Allowable Index | |
imprint5.com: # Mind that Allow is not an official standard | |
imprint5.com: # Directories | |
imprint5.com: #Disallow: /js/ | |
imprint5.com: #Disallow: /lib/ | |
imprint5.com: # Disallow: /media/ | |
imprint5.com: #Disallow: /media/catalog/ | |
imprint5.com: #Disallow: /media/css/ | |
imprint5.com: #Disallow: /media/css_secure/ | |
imprint5.com: #Disallow: /media/js/ | |
imprint5.com: #Disallow: /media/wysiwyg/ | |
imprint5.com: #Disallow: /media/po_compressor/ | |
imprint5.com: #Disallow: /skin/ | |
imprint5.com: # Paths (clean URLs) | |
imprint5.com: #Disallow: /checkout/ | |
imprint5.com: # Files | |
imprint5.com: # Paths (no clean URLs) | |
imprint5.com: #Disallow: /*.js$ | |
imprint5.com: #Disallow: /*.css$ | |
imprint5.com: #Disallow: *?price=* | |
imprint5.com: #Disallow: *capacity=* | |
imprint5.com: #Disallow: *?material=* | |
imprint5.com: #Disallow: *?decoration=* | |
imprint5.com: # Pre-existing robots rule | |
imprint5.com: # # SETTINGS Image indexing | |
imprint5.com: # # Optional: If you do not want to Google and Bing to index your images | |
imprint5.com: # User-agent: Googlebot-Image | |
imprint5.com: # Disallow: / | |
imprint5.com: # User-agent: msnbot-media | |
imprint5.com: # Disallow: / | |
alexa.com: # The crawlers listed below are allowed on the Alexa site. | |
alexa.com: # Alexa allows other crawlers on a case by case basis. | |
alexa.com: # | |
alexa.com: # Alexa provides access to traffic ranking data via Amazon Web Services. | |
alexa.com: # More information here: <URL: http://aws.amazon.com/awis> | |
alexa.com: # Disallow all other crawlers | |
ifixit.com: # iFixit robots.txt | |
underarmour.com: # Block Multiple Refinement Buckets | |
underarmour.com: # Block Sort Parameters | |
underarmour.com: # Block Price Parameters | |
underarmour.com: # Block Search Refinement Parameters | |
underarmour.com: # Block Site Search Parameters | |
underarmour.com: # Block URLS containing pipes | |
underarmour.com: # Block Pipelines | |
underarmour.com: # Block Misc Utility Pipelines | |
underarmour.com: # Sitemaps | |
underarmour.com: # International Sitemaps | |
prepscholar.com: # robotstxt.org/ | |
qatarliving.com: # | |
qatarliving.com: # robots.txt | |
qatarliving.com: # | |
qatarliving.com: # This file is to prevent the crawling and indexing of certain parts | |
qatarliving.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
qatarliving.com: # and Google. By telling these "robots" where not to go on your site, | |
qatarliving.com: # you save bandwidth and server resources. | |
qatarliving.com: # | |
qatarliving.com: # This file will be ignored unless it is at the root of your host: | |
qatarliving.com: # Used: http://example.com/robots.txt | |
qatarliving.com: # Ignored: http://example.com/site/robots.txt | |
qatarliving.com: # | |
qatarliving.com: # For more information about the robots.txt standard, see: | |
qatarliving.com: # http://www.robotstxt.org/robotstxt.html | |
qatarliving.com: # CSS, JS, Images | |
qatarliving.com: # Directories | |
qatarliving.com: # Files | |
qatarliving.com: # Paths (clean URLs) | |
qatarliving.com: #Disallow: /admin/ | |
qatarliving.com: #Disallow: /comment/reply/ | |
qatarliving.com: #Disallow: /filter/tips/ | |
qatarliving.com: #Disallow: /node/add/ | |
qatarliving.com: #Disallow: /search/ | |
qatarliving.com: #Disallow: /user/register/ | |
qatarliving.com: #Disallow: /user/password/ | |
qatarliving.com: #Disallow: /user/login/ | |
qatarliving.com: #Disallow: /user/logout/ | |
qatarliving.com: # Paths (no clean URLs) | |
qatarliving.com: #Disallow: /?q=admin/ | |
qatarliving.com: #Disallow: /?q=comment/reply/ | |
qatarliving.com: #Disallow: /?q=filter/tips/ | |
qatarliving.com: #Disallow: /?q=node/add/ | |
qatarliving.com: #Disallow: /?q=search/ | |
qatarliving.com: #Disallow: /messages/new/ | |
qatarliving.com: #Disallow: /?destination=user/ | |
qatarliving.com: #Disallow: /?q=user/password/ | |
qatarliving.com: #Disallow: /?q=user/register/ | |
qatarliving.com: #Disallow: /?q=user/login/ | |
qatarliving.com: #Disallow: /?q=user/logout/ | |
qatarliving.com: # Disallow URLs with destination parameter | |
qatarliving.com: #Disallow: /user/login?destination=* | |
qatarliving.com: #Disallow: /user/register?destination=* | |
qatarliving.com: #Disallow: /user?destination=* | |
qatarliving.com: # Disallow individual user content | |
qatarliving.com: #Disallow: /user/*/groups | |
qatarliving.com: #Disallow: /user/*/posts | |
qatarliving.com: #Disallow: /user/*/pages | |
qatarliving.com: #Disallow: /user/*/comments | |
qatarliving.com: #Disallow: /user/*/classifieds | |
qatarliving.com: #Disallow: /user/*/jobs | |
qatarliving.com: #Disallow: /user/*/jobs/* | |
qatarliving.com: #Disallow: /user/*/wishlist | |
qatarliving.com: #Disallow: /events?type=* | |
qatarliving.com: #Disallow: /community-group/* | |
qatarliving.com: #Disallow: /email/node/*/field_email | |
qatarliving.com: #Disallow: /email/node/* | |
qatarliving.com: #Disallow: /email/*/*/* | |
qatarliving.com: #Disallow: /forum/*?page=* | |
qatarliving.com: #Disallow: /api/* | |
fb.com: # Notice: Collection of data on Facebook through automated means is | |
fb.com: # prohibited unless you have express written permission from Facebook | |
fb.com: # and may only be conducted for the limited purpose contained in said | |
fb.com: # permission. | |
fb.com: # See: http://www.facebook.com/apps/site_scraping_tos_terms.php | |
mcgill.ca: # | |
mcgill.ca: # robots.txt | |
mcgill.ca: # | |
mcgill.ca: # This file is to prevent the crawling and indexing of certain parts | |
mcgill.ca: # of your site by web crawlers and spiders run by sites like Yahoo! | |
mcgill.ca: # and Google. By telling these "robots" where not to go on your site, | |
mcgill.ca: # you save bandwidth and server resources. | |
mcgill.ca: # | |
mcgill.ca: # This file will be ignored unless it is at the root of your host: | |
mcgill.ca: # Used: http://example.com/robots.txt | |
mcgill.ca: # Ignored: http://example.com/site/robots.txt | |
mcgill.ca: # | |
mcgill.ca: # For more information about the robots.txt standard, see: | |
mcgill.ca: # http://www.robotstxt.org/robotstxt.html | |
mcgill.ca: # CSS, JS, Images | |
mcgill.ca: # Directories | |
mcgill.ca: # Files | |
mcgill.ca: # Paths (clean URLs) | |
mcgill.ca: # Paths (no clean URLs) | |
getepic.com: #updated 12/15/2020 | |
zoosk.com: # robots.txt | |
torontomls.net: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
torontomls.net: #content{margin:0 0 0 2%;position:relative;} | |
cedcommerce.com: # robots.txt | |
cedcommerce.com: #path(Clean URLs) | |
cedcommerce.com: # Stop Inxeding from search engine | |
cedcommerce.com: # Directories | |
cedcommerce.com: # Stop Crawling user account and checkout pages | |
sdna.gr: # | |
sdna.gr: # robots.txt | |
sdna.gr: # | |
sdna.gr: # This file is to prevent the crawling and indexing of certain parts | |
sdna.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sdna.gr: # and Google. By telling these "robots" where not to go on your site, | |
sdna.gr: # you save bandwidth and server resources. | |
sdna.gr: # | |
sdna.gr: # This file will be ignored unless it is at the root of your host: | |
sdna.gr: # Used: http://example.com/robots.txt | |
sdna.gr: # Ignored: http://example.com/site/robots.txt | |
sdna.gr: # | |
sdna.gr: # For more information about the robots.txt standard, see: | |
sdna.gr: # http://www.robotstxt.org/robotstxt.html | |
sdna.gr: # CSS, JS, Images | |
sdna.gr: # Directories | |
sdna.gr: # Files | |
sdna.gr: # Paths (clean URLs) | |
sdna.gr: # Paths (no clean URLs) | |
uservoice.com: # Tell MSN to simmer down | |
uservoice.com: # Tell 80legs to get bent | |
uservoice.com: # Same for TurnitinBot | |
uservoice.com: # Fuck off WareBay | |
elboticarioencasa.com: # robots.txt automatically generated by PrestaShop e-commerce open-source solution | |
elboticarioencasa.com: # http://www.prestashop.com - http://www.prestashop.com/forums | |
elboticarioencasa.com: # This file is to prevent the crawling and indexing of certain parts | |
elboticarioencasa.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
elboticarioencasa.com: # and Google. By telling these "robots" where not to go on your site, | |
elboticarioencasa.com: # you save bandwidth and server resources. | |
elboticarioencasa.com: # For more information about the robots.txt standard, see: | |
elboticarioencasa.com: # http://www.robotstxt.org/robotstxt.html | |
elboticarioencasa.com: # Allow Directives | |
elboticarioencasa.com: # Private pages | |
elboticarioencasa.com: # Directories for elboticarioencasa.com | |
elboticarioencasa.com: # Files | |
wwe.com: # | |
wwe.com: # robots.txt | |
wwe.com: # | |
wwe.com: # This file is to prevent the crawling and indexing of certain parts | |
wwe.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
wwe.com: # and Google. By telling these "robots" where not to go on your site, | |
wwe.com: # you save bandwidth and server resources. | |
wwe.com: # | |
wwe.com: # This file will be ignored unless it is at the root of your host: | |
wwe.com: # Used: http://example.com/robots.txt | |
wwe.com: # Ignored: http://example.com/site/robots.txt | |
wwe.com: # | |
wwe.com: # For more information about the robots.txt standard, see: | |
wwe.com: # http://www.robotstxt.org/robotstxt.html | |
wwe.com: # Directories | |
wwe.com: # Paths (clean URLs) | |
wwe.com: # Paths (no clean URLs) | |
evaluate.market: # https://www.robotstxt.org/robotstxt.html | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@, ,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@* *@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@ @@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@* *@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@ (((((( @@@@@@@@@ | |
truecaller.com: # @@@@@@@ ((((((((((( *@@@@@@@ | |
truecaller.com: # @@@@@@ ((((((((((((( @@@@@@ | |
truecaller.com: # @@@@@ ((((((((((((((( @@@@@ | |
truecaller.com: # @@@@ (((((((((((((((( @@@@ | |
truecaller.com: # @@@, (((((((((((((((( ,@@@ | |
truecaller.com: # @@@ ((((((((((((((( @@@ | |
truecaller.com: # @@ (((((((((((((( @@ | |
truecaller.com: # @@ (((((((((((, @@ | |
truecaller.com: # @& ((((((((( &@ | |
truecaller.com: # @ ((((((((/ @ | |
truecaller.com: # @ (((((//// @ | |
truecaller.com: # @ ((/////// @ | |
truecaller.com: # @ /////////* @ | |
truecaller.com: # @& ///////// &@ | |
truecaller.com: # @@ ////////// @@ | |
truecaller.com: # @@ *////////// ****, @@ | |
truecaller.com: # @@@ //////////// ************ @@@ | |
truecaller.com: # @@@. ////////////// ****************** .@@@ | |
truecaller.com: # @@@@ .//////////********************** @@@@ | |
truecaller.com: # @@@@@ */////************************ @@@@@ | |
truecaller.com: # @@@@@@ ************************* @@@@@@ | |
truecaller.com: # @@@@@@@ ******************** ,@@@@@@@ | |
truecaller.com: # @@@@@@@@@ *************. @@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@. .@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@ @@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@. .@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@, ,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |
boss.az: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
boss.az: # | |
boss.az: # To ban all spiders from the entire site uncomment the next two lines: | |
boss.az: # User-Agent: * | |
boss.az: # Disallow: / | |
koreatimes.co.kr: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
koreatimes.co.kr: #content{margin:0 0 0 2%;position:relative;} | |
cutestat.com: # BEGIN XML Sitemap | |
cutestat.com: # END XML Sitemap | |
uba.ar: #container { | |
abebooks.com: # Sitemap files | |
pakwheels.com: # /robots.txt for https://www.pakwheels.com | |
pakwheels.com: # block nofollow | |
pakwheels.com: # block nofollow | |
thrillophilia.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
thrillophilia.com: # | |
thrillophilia.com: # To ban all spiders from the entire site uncomment the next two lines: | |
thrillophilia.com: # User-Agent: * | |
thrillophilia.com: # Disallow:/ | |
thrillophilia.com: # Blog wp-contents | |
thrillophilia.com: # Disallow /tags | |
google.iq: # AdsBot | |
google.iq: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
seo.com.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
seo.com.cn: #content{margin:0 0 0 2%;position:relative;} | |
elschool.ru: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
elschool.ru: #content{margin:0 0 0 2%;position:relative;} | |
tiki.vn: # Disallow all crawlers access to certain pages. | |
nta.nic.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
nta.nic.in: #content{margin:0 0 0 2%;position:relative;} | |
bunjang.co.kr: # www.robotstxt.org/ | |
bunjang.co.kr: # Allow crawling of all content | |
bunjang.co.kr: # Google Search Engine Sitemap | |
luluhypermarket.com: #For all robots | |
luluhypermarket.com: # Block access to specific groups of pages | |
udg.mx: # | |
udg.mx: # robots.txt | |
udg.mx: # | |
udg.mx: # This file is to prevent the crawling and indexing of certain parts | |
udg.mx: # of your site by web crawlers and spiders run by sites like Yahoo! | |
udg.mx: # and Google. By telling these "robots" where not to go on your site, | |
udg.mx: # you save bandwidth and server resources. | |
udg.mx: # | |
udg.mx: # This file will be ignored unless it is at the root of your host: | |
udg.mx: # Used: http://example.com/robots.txt | |
udg.mx: # Ignored: http://example.com/site/robots.txt | |
udg.mx: # | |
udg.mx: # For more information about the robots.txt standard, see: | |
udg.mx: # http://www.robotstxt.org/robotstxt.html | |
udg.mx: # CSS, JS, Images | |
udg.mx: # Directories | |
udg.mx: # Files | |
udg.mx: # Paths (clean URLs) | |
udg.mx: # Paths (no clean URLs) | |
queensu.ca: # | |
queensu.ca: # robots.txt | |
queensu.ca: # | |
queensu.ca: # This file is to prevent the crawling and indexing of certain parts | |
queensu.ca: # of your site by web crawlers and spiders run by sites like Yahoo! | |
queensu.ca: # and Google. By telling these "robots" where not to go on your site, | |
queensu.ca: # you save bandwidth and server resources. | |
queensu.ca: # | |
queensu.ca: # This file will be ignored unless it is at the root of your host: | |
queensu.ca: # Used: http://example.com/robots.txt | |
queensu.ca: # Ignored: http://example.com/site/robots.txt | |
queensu.ca: # | |
queensu.ca: # For more information about the robots.txt standard, see: | |
queensu.ca: # http://www.robotstxt.org/robotstxt.html | |
queensu.ca: # Directories | |
queensu.ca: # Files | |
queensu.ca: # Paths (clean URLs) | |
queensu.ca: # Paths (no clean URLs) | |
queensu.ca: # | |
queensu.ca: # Sites going away | |
queensu.ca: ##Disallow: /calendars/artsci/ | |
sxl.cn: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
sxl.cn: # | |
sxl.cn: # To ban all spiders from the entire site uncomment the next two lines: | |
sxl.cn: # User-Agent: * | |
sxl.cn: # Disallow: / | |
sxl.cn: # Google adsbot ignores robots.txt unless specifically named! | |
amna.gr: #User-agent: Googlebot | |
amna.gr: #Disallow: /feeds | |
fstoppers.com: # | |
fstoppers.com: # robots.txt | |
fstoppers.com: # | |
fstoppers.com: # CSS, JS, Images | |
fstoppers.com: # Directories | |
fstoppers.com: # Files | |
fstoppers.com: # Paths (clean URLs) | |
fstoppers.com: #Disallow: /node/ | |
fstoppers.com: # Paths (no clean URLs) | |
fstoppers.com: # No access for quicktabs in the URL | |
fstoppers.com: #Disallow: *?quicktabs_* | |
fstoppers.com: #Disallow: *&quicktabs_* | |
northwestern.edu: # robots.txt generated by cron job | |
northwestern.edu: # Produced 02/24/2021 at 03:48 | |
northwestern.edu: # Directives generated from .NOINDEX files | |
northwestern.edu: # | |
zaobao.com: # | |
zaobao.com: # robots.txt | |
zaobao.com: # | |
zaobao.com: # This file is to prevent the crawling and indexing of certain parts | |
zaobao.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
zaobao.com: # and Google. By telling these "robots" where not to go on your site, | |
zaobao.com: # you save bandwidth and server resources. | |
zaobao.com: # | |
zaobao.com: # This file will be ignored unless it is at the root of your host: | |
zaobao.com: # Used: http://example.com/robots.txt | |
zaobao.com: # Ignored: http://example.com/site/robots.txt | |
zaobao.com: # | |
zaobao.com: # For more information about the robots.txt standard, see: | |
zaobao.com: # http://www.robotstxt.org/wc/robots.html | |
zaobao.com: # | |
zaobao.com: # For syntax checking, see: | |
zaobao.com: # http://www.sxw.org.uk/computing/robots/check.html | |
zaobao.com: # CSS, JS, Images | |
zaobao.com: #Disallow Sogou spider crawling - 17-01-2017 by huy | |
zaobao.com: # Directories | |
zaobao.com: # Disallow: /sites/all | |
zaobao.com: # Files | |
zaobao.com: # Paths (clean URLs) | |
zaobao.com: # Paths (no clean URLs) | |
zaobao.com: # disallow all files ending in specific extension | |
zaobao.com: # block freemium paywall Images | |
reklama5.mk: # Group 1 | |
reklama5.mk: # Group 2 | |
reklama5.mk: # Group 3 | |
reklama5.mk: # Group 4 | |
tehran.ir: # Begin robots.txt file | |
tehran.ir: #/-----------------------------------------------\ | |
tehran.ir: #| In single portal/domain situations, uncomment the sitmap line and enter domain name | |
tehran.ir: #\-----------------------------------------------/ | |
tehran.ir: #Sitemap: http://www.DomainNamehere.com/sitemap.aspx | |
tehran.ir: # End of robots.txt file | |
mushroommarket.net: # Robots For MushroomMarket.net B2C | |
xht888.com: #myFocus img{ width:100%; height:338px;} | |
xht888.com: #myFocus{ width:100%; height:338px;} | |
xht888.com: #gwea td { padding-left:10px} | |
msdmanuals.com: #Baiduspider | |
business.com: # www.robotstxt.org/ | |
business.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
edpuzzle.com: # | |
edpuzzle.com: # _ _ | |
edpuzzle.com: # ___ __| | ___ _ _ ____ ____ | | ___ | |
edpuzzle.com: # / _ \/ _' | _ \| | | |__ ||__ || | / _ \ | |
edpuzzle.com: # | __/ (_| | (_) | |_| | / /_ / /_| || __/ | |
edpuzzle.com: # \___|\__,_| __/|_____|/____|/____|_| \___| | |
edpuzzle.com: # |_| | |
edpuzzle.com: # | |
edpuzzle.com: # | |
edpuzzle.com: # Allow all robots | |
novojornal.co.ao: # robots.txt | |
klook.com: # Hi, we're KLOOK tech team, Nice to meet you. | |
klook.com: # If you're an engineer, we'd be interested to have a chat with you. | |
klook.com: # Our tech team base on Shenzhen, China. | |
klook.com: # You can find our positions in the link below | |
klook.com: # | |
klook.com: # https://klook.com/careers?department=Engineering%20%26%20Technology | |
klook.com: ## ## ## ####### ####### ## ## | |
klook.com: ## ## ## ## ## ## ## ## ## | |
klook.com: ## ## ## ## ## ## ## ## ## | |
klook.com: ##### ## ## ## ## ## ##### | |
klook.com: ## ## ## ## ## ## ## ## ## | |
klook.com: ## ## ## ## ## ## ## ## ## | |
klook.com: ## ## ######## ####### ####### ## ## | |
klook.com: # klook.com | |
klook.com: #Naver Setting separate | |
klook.com: #block JP site & Wifi vertical | |
klook.com: # block test activities | |
klook.com: # block some activities : AU team has onboarded a merchant (Australia Zoo) | |
klook.com: # wait mobile | |
klook.com: # sitemap | |
popular.com.kh: # vestacp autogenerated robots.txt | |
google.jo: # AdsBot | |
google.jo: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
mercadolibre.com.co: #siteId: MCO | |
mercadolibre.com.co: #country: colombia | |
mercadolibre.com.co: ##Block - Referidos | |
mercadolibre.com.co: ##Block - siteinfo urls | |
mercadolibre.com.co: ##Block - Cart | |
mercadolibre.com.co: ##Block Checkout | |
mercadolibre.com.co: ##Block - User Logged | |
mercadolibre.com.co: #Shipping selector | |
mercadolibre.com.co: ##Block - last search | |
mercadolibre.com.co: ## Block - Profile - By Id | |
mercadolibre.com.co: ## Block - Profile - By Id and role (old version) | |
mercadolibre.com.co: ## Block - Profile - Leg. Req. | |
mercadolibre.com.co: ##Block - noindex | |
mercadolibre.com.co: # Mercado-Puntos | |
mercadolibre.com.co: # Viejo mundo | |
mercadolibre.com.co: ##Block recommendations listing | |
pinterest.ch: # Pinterest is hiring! | |
pinterest.ch: # | |
pinterest.ch: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.ch: # | |
pinterest.ch: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
joomla.org: ##Please don't remove folders from disallow. | |
joomla.org: ##The allows at the top allow any of the mimetypes listed to be crawled within any folder | |
joomla.org: ##using long-tail wildcards, these ignore the disallows for the folders below. | |
joomla.org: ##This gives full render for the search engines whilst preventing full crawls of system | |
joomla.org: ##folders | |
joomla.org: #THIS ALLOWS FULL RENDER AT ENGINES | |
joomla.org: #THESE FOLDERS SHOULD NEVER BE CRAWLED | |
larepublica.pe: # 19/02/2021 | |
amobbs.com: # | |
amobbs.com: # robots.txt for Discuz! X3 | |
amobbs.com: # | |
wipo.int: # robots.txt for https://www.wipo.int/ | |
telerik.com: # All robots will spider the domain | |
telerik.com: #Image Sitemap | |
telerik.com: #Video Sitemap | |
canarabank.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
canarabank.in: #content{margin:0 0 0 2%;position:relative;} | |
goodrx.com: # Hey there! You don't look like a robot... | |
goodrx.com: # | |
goodrx.com: # We'd love to hear from curious humans such as yourself. | |
goodrx.com: # | |
goodrx.com: # While you're poking around, why not check out our open positions at GoodRx? | |
goodrx.com: # | |
goodrx.com: # https://www.goodrx.com/jobs | |
goodrx.com: # | |
goodrx.com: # Or reach out directly via seo@goodrx.com | |
goodrx.com: # | |
goodrx.com: # If you are in fact a robot, pardon my presumptions. Here's what you came for: | |
scholastic.com: # Directories | |
scholastic.com: # Files | |
scholastic.com: # Paths (clean URLs) | |
scholastic.com: # Paths (no clean URLs) | |
scholastic.com: # AEM | |
leprogres.fr: # Boutique | |
leprogres.fr: # Elections | |
leprogres.fr: # Examen | |
jutarnji.hr: # If the Joomla site is installed within a folder | |
jutarnji.hr: # eg www.example.com/joomla/ then the robots.txt file | |
jutarnji.hr: # MUST be moved to the site root | |
jutarnji.hr: # eg www.example.com/robots.txt | |
jutarnji.hr: # AND the joomla folder name MUST be prefixed to all of the | |
jutarnji.hr: # paths. | |
jutarnji.hr: # eg the Disallow rule for the /administrator/ folder MUST | |
jutarnji.hr: # be changed to read | |
jutarnji.hr: # Disallow: /joomla/administrator/ | |
jutarnji.hr: # | |
jutarnji.hr: # For more information about the robots.txt standard, see: | |
jutarnji.hr: # http://www.robotstxt.org/orig.html | |
jutarnji.hr: # | |
jutarnji.hr: # For syntax checking, see: | |
jutarnji.hr: # http://tool.motoricerca.info/robots-checker.phtml | |
downxia.com: # | |
downxia.com: # | |
global-free-classified-ads.com: #User-agent: ia_archiver | |
global-free-classified-ads.com: #User-agent: ia_archiver/1.6 | |
global-free-classified-ads.com: #User-agent: sogou | |
global-free-classified-ads.com: ##User-agent: proximic | |
pitneybowes.com: # Robots.txt file for http://www.pitneybowes.com/ | |
pitneybowes.com: # | |
pitneybowes.com: # 2020.04.22 | |
pitneybowes.com: # YYYY.MM.DD | |
pitneybowes.com: # -------------------------------------------------------------------------- | |
pitneybowes.com: # Global Directives | |
pitneybowes.com: # -------------------------------------------------------------------------- | |
pitneybowes.com: # -------------------------------------------------------------------------- | |
pitneybowes.com: # SEO Disallows | |
pitneybowes.com: # -------------------------------------------------------------------------- | |
pitneybowes.com: # -------------------------------------------------------------------------- | |
pitneybowes.com: # XML Sitemaps | |
pitneybowes.com: # -------------------------------------------------------------------------- | |
tueren-fachhandel.de: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server. | |
tueren-fachhandel.de: ## Set a custom crawl rate if you're experiencing traffic problems with your server. | |
tueren-fachhandel.de: # Crawl-delay: 30 | |
tueren-fachhandel.de: ## Do not crawl development files and folders: CVS, svn directories and dump files | |
tueren-fachhandel.de: ## Allow: /*?p= | |
tueren-fachhandel.de: ## Do not crawl common Magento technical folders | |
tueren-fachhandel.de: ## Do not crawl common Magento technical folders | |
tueren-fachhandel.de: ## Do not crawl common Magento files | |
tueren-fachhandel.de: ## Do not crawl sub category pages that are sorted or filtered. | |
tueren-fachhandel.de: ## Do not crawl links with session IDs | |
tueren-fachhandel.de: ## Do not crawl links with filetypes | |
tueren-fachhandel.de: ## Do not crawl checkout and user account pages | |
tueren-fachhandel.de: # Secific pages (comment to allow indexing) | |
tueren-fachhandel.de: #Disallow: /*contacts/ | |
tueren-fachhandel.de: ## Do not crawl seach pages and not-SEO optimized catalog links | |
tueren-fachhandel.de: ## Do not crawl attributes | |
tueren-fachhandel.de: ## SERVER SETTINGS | |
tueren-fachhandel.de: ## Do not crawl common server technical folders and files | |
tueren-fachhandel.de: ## IMAGE CRAWLERS SETTINGS | |
tueren-fachhandel.de: ## Extra: Uncomment if you do not wish Google and Bing to index your images | |
tueren-fachhandel.de: # User-agent: Googlebot-Image | |
tueren-fachhandel.de: # Disallow: / | |
tueren-fachhandel.de: # User-agent: msnbot-media | |
tueren-fachhandel.de: # Disallow: / | |
tueren-fachhandel.de: # Romove product filter URL | |
tueren-fachhandel.de: # Remove specific pages from index | |
hemnet.se: # Kul att du hittat hit! | |
hemnet.se: # Vill du också jobba på en av Sveriges största sajter med miljontals unika besökare? | |
hemnet.se: # Gör en spontanansökan på https://jobba.hemnet.se/ | |
1001freefonts.com: #Sitemap: /sitemap.xml | |
lingojam.com: # robotstxt.org/ | |
tesla.cn: # | |
tesla.cn: # robots.txt | |
tesla.cn: # | |
tesla.cn: # This file is to prevent the crawling and indexing of certain parts | |
tesla.cn: # of your site by web crawlers and spiders run by sites like Yahoo! | |
tesla.cn: # and Google. By telling these "robots" where not to go on your site, | |
tesla.cn: # you save bandwidth and server resources. | |
tesla.cn: # | |
tesla.cn: # This file will be ignored unless it is at the root of your host: | |
tesla.cn: # Used: http://example.com/robots.txt | |
tesla.cn: # Ignored: http://example.com/site/robots.txt | |
tesla.cn: # | |
tesla.cn: # For more information about the robots.txt standard, see: | |
tesla.cn: # http://www.robotstxt.org/robotstxt.html | |
tesla.cn: # CSS, JS, Images | |
tesla.cn: # Directories | |
tesla.cn: # Files | |
tesla.cn: # Paths (clean URLs) | |
tesla.cn: # Paths (no clean URLs) | |
tesla.cn: ############################## | |
tesla.cn: # START TESLA CONTENT. | |
tesla.cn: ############################## | |
tesla.cn: # Tesla content landing pages. | |
tesla.cn: ############################## | |
tesla.cn: # STOP TESLA CONTENT. | |
tesla.cn: ############################## | |
pingboard.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
pingboard.com: # | |
pingboard.com: # To ban all spiders from the entire site uncomment the next two lines: | |
dhs.gov: # | |
dhs.gov: # robots.txt | |
dhs.gov: # | |
dhs.gov: # This file is to prevent the crawling and indexing of certain parts | |
dhs.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
dhs.gov: # and Google. By telling these "robots" where not to go on your site, | |
dhs.gov: # you save bandwidth and server resources. | |
dhs.gov: # | |
dhs.gov: # This file will be ignored unless it is at the root of your host: | |
dhs.gov: # Used: http://example.com/robots.txt | |
dhs.gov: # Ignored: http://example.com/site/robots.txt | |
dhs.gov: # | |
dhs.gov: # For more information about the robots.txt standard, see: | |
dhs.gov: # http://www.robotstxt.org/robotstxt.html | |
dhs.gov: # CSS, JS, Images | |
dhs.gov: # Directories | |
dhs.gov: # Files | |
dhs.gov: # Paths (clean URLs) | |
dhs.gov: # Paths (no clean URLs) | |
cosmote.gr: # Allow crawlers | |
cosmote.gr: # Disallow all fixed URLs | |
cosmote.gr: # Disallow all fixed URLs | |
cosmote.gr: # Disallow WCS business URLs (content migrated under /cs/business URL) | |
cosmote.gr: # Disallow specific mobile URLs | |
cosmote.gr: # Disallow all old OTEGroup URLs | |
cosmote.gr: # Disallow old Pricelist | |
cosmote.gr: # NoIndex | |
kmib.co.kr: # robots.txt generated at http://www.adop.cc | |
onlinecv.it: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/ | |
bible.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
bible.com: # | |
bible.com: # To ban all spiders from the entire site uncomment the next two lines: | |
bible.com: # User-Agent: * | |
bible.com: # Disallow: / | |
easybib.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
usgs.gov: #-----GLOBALS------------- | |
usgs.gov: #Global non-indexing of any item matching paths below. Do not add individual elsewhere. | |
usgs.gov: # Paths (clean URLs) | |
usgs.gov: # Paths (no clean URLs) | |
usgs.gov: #-----END GLOBALS------- | |
usgs.gov: #SCIENCE EXPLORER | |
usgs.gov: #PUBS | |
usgs.gov: #ECOSYSTEMS | |
usgs.gov: #ENVIRONMENTS PROGRAM | |
usgs.gov: #ENERGY AND WILDLIFE PROGRAM | |
usgs.gov: #FISHERIES | |
usgs.gov: #FISH AND WILDLIFE DISEASE | |
usgs.gov: #INVASIVE SPECIES PROGRAM | |
usgs.gov: #SSSP AKA SAGE GROUSE AND SAGEBRUSH ECOSYSTEM | |
usgs.gov: #STP STATUS AND TRENDS PROGRAM | |
usgs.gov: #WILDLIFE PROGRAM | |
usgs.gov: #------ | |
usgs.gov: #WRET site for classroom | |
usgs.gov: #------ | |
usgs.gov: #SPECIAL TOPICS | |
usgs.gov: #FLOODS | |
usgs.gov: #BIG SUR | |
usgs.gov: #MICROBIOME | |
usgs.gov: #MISSISSIPPI | |
usgs.gov: #------ | |
usgs.gov: #MERGED NOT DELETED YET | |
usgs.gov: #ILIA-WATER | |
usgs.gov: #INKY | |
usgs.gov: #MI-OH | |
usgs.gov: #--------------------- | |
usgs.gov: #Sites allow ie now live | |
usgs.gov: #--------------------- | |
usgs.gov: #ASC | |
usgs.gov: #ASTRO | |
usgs.gov: #AZ-WATER | |
usgs.gov: #CA-WATER | |
usgs.gov: #CA WATER LS | |
usgs.gov: #CAR-FL-WATER | |
usgs.gov: #CASC PROGRAM | |
usgs.gov: #CASC CENTER | |
usgs.gov: #CBA | |
usgs.gov: #CDI | |
usgs.gov: #CBP CONTAMINANT BIOLOGY PROGRAM | |
usgs.gov: #CERC | |
usgs.gov: #CERSC | |
usgs.gov: #CGGSC | |
usgs.gov: #CM-WATER | |
usgs.gov: #CMERSC | |
usgs.gov: #CMGP AKA COASTAL MARINE HAZARDS AND RESOURCES PROGRAM | |
usgs.gov: #CO-WATER | |
usgs.gov: #DAKOTA | |
usgs.gov: #DML | |
usgs.gov: #EERSC | |
usgs.gov: #EGGL | |
usgs.gov: #EARTHMRI | |
usgs.gov: #EHMA | |
usgs.gov: #EHP EARTHQUAKE HAZARDS PROGRAM | |
usgs.gov: #EMERSC | |
usgs.gov: #EMP ENERGY AND MINERALS PROGRAM | |
usgs.gov: #ENERGY & MINERALS MA | |
usgs.gov: #EROS | |
usgs.gov: #ERP ENERGY RESOURCES PROGRAM | |
usgs.gov: #FIRE | |
usgs.gov: #FBGC | |
usgs.gov: #FORT | |
usgs.gov: #FRESC | |
usgs.gov: #GAP GAP ANALYSIS PROJECT | |
usgs.gov: #GAPP | |
usgs.gov: #GECSC | |
usgs.gov: #GEOHAZARDS | |
usgs.gov: #GEOMAGNETISM | |
usgs.gov: #GLR | |
usgs.gov: #GLSC | |
usgs.gov: #GMEG | |
usgs.gov: #GOM | |
usgs.gov: #GWSIP GROUNDWATER AND STREAMFLOW INFORMATION PROGRAM | |
usgs.gov: #HURRICANE FLORENCE | |
usgs.gov: #HURRICANE MICHAEL | |
usgs.gov: #ID-WATER | |
usgs.gov: #INNOVATION CENTER | |
usgs.gov: #LEETOWN | |
usgs.gov: #LHP LANDSLIDE HAZARDS PROGRAM | |
usgs.gov: #LRMA LAND RESOURCES MA | |
usgs.gov: #LSP LAND CHANGE SCIENCE PROGRAM | |
usgs.gov: #MENDENHALL | |
usgs.gov: #MRL | |
usgs.gov: #MRP MINERAL RESOURCES PROGRAM | |
usgs.gov: #NCGMP NATIONAL COOPERATIVE GEOLOGIC MAPPING PROGRAM | |
usgs.gov: #NE-WATER | |
usgs.gov: #NEHP | |
usgs.gov: #NGGDPP NATIONAL GEOGRAPHIC AND GEOSPATIAL DATA PRESERVATION PROGRAM | |
usgs.gov: #NHMA | |
usgs.gov: #NJ-WATER | |
usgs.gov: #NPWRC | |
usgs.gov: #NWHC | |
usgs.gov: #NV-WATER | |
usgs.gov: #NWQL | |
usgs.gov: #NWQP NATIONAL WATER QUALITY PROGRAM | |
usgs.gov: #PA-WATER | |
usgs.gov: #PCMSC | |
usgs.gov: #PIWSC | |
usgs.gov: #POWELL CENTER | |
usgs.gov: #MD-DE-DC | |
usgs.gov: #OR-WATER | |
usgs.gov: #ORGANIC MATTER RESEARCH LAB OMRL | |
usgs.gov: #PWRC | |
usgs.gov: #RBPGL | |
usgs.gov: #RSP REMOTE SENSING PHENOLOGY | |
usgs.gov: #SAFRR | |
usgs.gov: #SDC | |
usgs.gov: #SPCMSC | |
usgs.gov: #SPECLAB | |
usgs.gov: #TX-WATER | |
usgs.gov: #TOXIC SUBSTANCES HYDROLOGY PROGRAM | |
usgs.gov: #UMID-WATER | |
usgs.gov: #UT-WATER | |
usgs.gov: #VA-WV | |
usgs.gov: #WA-WATER | |
usgs.gov: #WAUSP WATER AVAILABILITY AND USE SCIENCE PROGRAM | |
usgs.gov: #WATER SCIENCE SCHOOL | |
usgs.gov: #WERC | |
usgs.gov: #WFRC | |
usgs.gov: #WGSC | |
usgs.gov: #WHCMSC | |
usgs.gov: #WMA | |
usgs.gov: #WRRP AKA WRRI | |
usgs.gov: #WY-MT-WATER | |
usgs.gov: #FOIA-FAQ | |
metro.co.uk: # News Sitemap | |
metro.co.uk: # Sitemap archive | |
metro.co.uk: # Sitemap archive | |
exame.com: # Sitemap archive | |
toy-people.com: # sitemap | |
toy-people.com: # Disallow: /toy-people | |
hostloc.com: # | |
hostloc.com: # robots.txt for Discuz! X3 | |
hostloc.com: # | |
uisdc.com: # robots.txt for http://www.uisdc.com | |
vistaprint.in: # Crawling Rules - Last Update on 11/07/2019 | |
logicworld.co.za: # Sitemap is also available on /sitemap.xml | |
sebrae-sc.com.br: # remova os diretorios | |
sebrae-sc.com.br: #Sitemap | |
winit.com.cn: # If the Joomla site is installed within a folder such as at | |
winit.com.cn: # e.g. www.example.com/joomla/ the robots.txt file MUST be | |
winit.com.cn: # moved to the site root at e.g. www.example.com/robots.txt | |
winit.com.cn: # AND the joomla folder name MUST be prefixed to the disallowed | |
winit.com.cn: # path, e.g. the Disallow rule for the /administrator/ folder | |
winit.com.cn: # MUST be changed to read Disallow: /joomla/administrator/ | |
winit.com.cn: # | |
winit.com.cn: # For more information about the robots.txt standard, see: | |
winit.com.cn: # http://www.robotstxt.org/orig.html | |
winit.com.cn: # | |
winit.com.cn: # For syntax checking, see: | |
winit.com.cn: # http://tool.motoricerca.info/robots-checker.phtml | |
vivino.com: # Sitemap files | |
thedailystar.net: # | |
thedailystar.net: # robots.txt | |
thedailystar.net: # | |
thedailystar.net: # This file is to prevent the crawling and indexing of certain parts | |
thedailystar.net: # of your site by web crawlers and spiders run by sites like Yahoo! | |
thedailystar.net: # and Google. By telling these "robots" where not to go on your site, | |
thedailystar.net: # you save bandwidth and server resources. | |
thedailystar.net: # | |
thedailystar.net: # This file will be ignored unless it is at the root of your host: | |
thedailystar.net: # Used: http://example.com/robots.txt | |
thedailystar.net: # Ignored: http://example.com/site/robots.txt | |
thedailystar.net: # | |
thedailystar.net: # For more information about the robots.txt standard, see: | |
thedailystar.net: # http://www.robotstxt.org/robotstxt.html | |
thedailystar.net: # CSS, JS, Images | |
thedailystar.net: # Directories | |
thedailystar.net: # Files | |
thedailystar.net: # Paths (clean URLs) | |
thedailystar.net: # Paths (no clean URLs) | |
efsyn.gr: # | |
efsyn.gr: # robots.txt | |
efsyn.gr: # | |
efsyn.gr: # This file is to prevent the crawling and indexing of certain parts | |
efsyn.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
efsyn.gr: # and Google. By telling these "robots" where not to go on your site, | |
efsyn.gr: # you save bandwidth and server resources. | |
efsyn.gr: # | |
efsyn.gr: # This file will be ignored unless it is at the root of your host: | |
efsyn.gr: # Used: http://example.com/robots.txt | |
efsyn.gr: # Ignored: http://example.com/site/robots.txt | |
efsyn.gr: # | |
efsyn.gr: # For more information about the robots.txt standard, see: | |
efsyn.gr: # http://www.robotstxt.org/robotstxt.html | |
efsyn.gr: # CSS, JS, Images | |
efsyn.gr: # Directories | |
efsyn.gr: # Files | |
efsyn.gr: # Paths (clean URLs) | |
efsyn.gr: # Paths (no clean URLs) | |
elcorreo.com: ### Robots www.elcorreo.com ### | |
elcorreo.com: ## Sitemaps ## | |
elcorreo.com: ## User Agents ## | |
elcorreo.com: #redi2014 # | |
elcorreo.com: #temp # | |
elcorreo.com: #25/10/17 | |
jobsdb.com: # Robots.txt file for www.jobsdb.com | |
jobsdb.com: # URLs are case sensitive!! | |
jobsdb.com: # Bingbot | |
jobsdb.com: # LinkedIn Bot | |
gitee.com: ### BEGIN FILE ### | |
gitee.com: # | |
gitee.com: # allow-all | |
gitee.com: # | |
gitee.com: # | |
gitee.com: ### END FILE ### | |
eldiariomontanes.es: ## Sitemaps ## | |
eldiariomontanes.es: ## User Agents ## | |
eldiariomontanes.es: #redi14# | |
eldiariomontanes.es: #mobile# | |
eldiariomontanes.es: #temp# | |
angieslist.com: # | |
angieslist.com: # robots.txt | |
angieslist.com: # | |
angieslist.com: # This file is to prevent the crawling and indexing of certain parts | |
angieslist.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
angieslist.com: # and Google. By telling these "robots" where not to go on your site, | |
angieslist.com: # you save bandwidth and server resources. | |
angieslist.com: # | |
angieslist.com: # This file will be ignored unless it is at the root of your host: | |
angieslist.com: # Used: http://example.com/robots.txt | |
angieslist.com: # Ignored: http://example.com/site/robots.txt | |
angieslist.com: # | |
angieslist.com: # For more information about the robots.txt standard, see: | |
angieslist.com: # http://www.robotstxt.org/robotstxt.html | |
angieslist.com: # CSS, JS, Images | |
angieslist.com: # Directories | |
angieslist.com: # Files | |
angieslist.com: # Paths (clean URLs) | |
angieslist.com: # Paths (no clean URLs) | |
angieslist.com: # almodule endpoints | |
angieslist.com: # alapi endpoints | |
angieslist.com: # Near ME CMT content | |
angieslist.com: # nothing under sites | |
angieslist.com: # Favicon | |
yandex.com.tr: # yandex.com.tr | |
giuliofashion.com: # we use Shopify as our ecommerce platform | |
giuliofashion.com: # Google adsbot ignores robots.txt unless specifically named! | |
health.com: # Sitemaps | |
health.com: #legacy | |
health.com: #Onecms | |
health.com: #content | |
health.com: #legacy | |
health.com: #Onecms | |
health.com: #content | |
indiainfoline.com: #Disallow: /search/ | |
indiainfoline.com: #User-agent: AhrefsBot | |
indiainfoline.com: #Crawl-Delay: 86400 | |
univie.ac.at: #Baiduspider | |
univie.ac.at: # for: http://www.univie.ac.at/ | |
univie.ac.at: # contact: webmaster@univie.ac.at | |
univie.ac.at: # for: http://www.univie.ac.at/ZID/ | |
univie.ac.at: # contact: webmaster.zid@univie.ac.at | |
univie.ac.at: # | |
support.squarespace.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
fotojet.com: # If the Joomla site is installed within a folder | |
fotojet.com: # eg www.example.com/joomla/ then the robots.txt file | |
fotojet.com: # MUST be moved to the site root | |
fotojet.com: # eg www.example.com/robots.txt | |
fotojet.com: # AND the joomla folder name MUST be prefixed to all of the | |
fotojet.com: # paths. | |
fotojet.com: # eg the Disallow rule for the /administrator/ folder MUST | |
fotojet.com: # be changed to read | |
fotojet.com: # Disallow: /joomla/administrator/ | |
fotojet.com: # | |
fotojet.com: # For more information about the robots.txt standard, see: | |
fotojet.com: # http://www.robotstxt.org/orig.html | |
fotojet.com: # | |
fotojet.com: # For syntax checking, see: | |
fotojet.com: # http://tool.motoricerca.info/robots-checker.phtml | |
decider.com: # Sitemap archive | |
decider.com: # Additional sitemaps | |
asrv.com: # we use Shopify as our ecommerce platform | |
asrv.com: # Google adsbot ignores robots.txt unless specifically named! | |
zumiez.com: # Zumiez prod <www.zumiez.com> | |
league-funny.com: # sitemap | |
league-funny.com: #Disallow: /league-funny | |
worldmarket.com: # | |
worldmarket.com: # robots.txt - Cost Plus World Market https://www.worldmarket.com | |
worldmarket.com: # | |
worldmarket.com: #Sitemaps | |
worldpay.com: #termly-code-snippet-support label[for="performance"] { | |
worldpay.com: #termly-code-snippet-support label > input:checked + label { | |
worldpay.com: #termly-code-snippet-support label > input + label { | |
etam.com: # HTTP Exclusion Rules | |
etam.com: # Last Mod: 07/2015 | |
etam.com: # GG UA reminder: https://support.google.com/webmasters/answer/1061943?hl=en | |
etam.com: # Authorised UA | |
etam.com: # GWT robots tester : https://www.google.com/webmasters/tools/robots-testing-tool | |
etam.com: # Disallow: /accessoires/pret-a-porter* # dup sous-cat dans ACCESSOIRES | |
etam.com: # Disallow: /lingerie-de-nuit/accessoires* | |
etam.com: # Disallow: /pret-a-porter/les-collants* | |
etam.com: #Disallow: /nuit/nuisettes-et-chemises-de-nuit* | |
etam.com: #Disallow: /nuit/kimonos-et-deshabilles* | |
etam.com: # paramètres à bloquer | |
etam.com: # Allow: *?*prefn1=refinementColor | |
etam.com: # Disallow: *?*prefv1=* | |
etam.com: # Disallow: *?*prefn1=* | |
etam.com: # Disallow: *?*prefn2=* | |
etam.com: # répertoires à bloquer | |
etam.com: # Produit Bally | |
etam.com: #Disallow: /soldes/* | |
etam.com: #Disallow: /promos/* | |
etam.com: #Disallow: /bonnes-affaires/* | |
etam.com: # Marketing campains Authorised UA | |
etam.com: # END | |
flipboard.com: # robots.txt for http://flipboard.com | |
flipboard.com: # Some references on why 2 and the duplication | |
flipboard.com: # https://searchengineland.com/robots-txt-tip-from-bing-include-all-relevant-directives-if-you-have-a-bingbot-section-309970 | |
flipboard.com: # https://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/ | |
hkgolden.com: #It's for search engine indexes, aka Google | |
hyundai.com: #Disallow: /pa/ | |
hyundai.com: # Disallow: /*?*page= | |
hyundai.com: # Sitemap files | |
watchmovies5.com.pk: #Begin Attracta SEO Tools Sitemap. Do not remove | |
watchmovies5.com.pk: #End Attracta SEO Tools Sitemap. Do not remove | |
naukrigulf.com: # Created September, 01, 2006. | |
naukrigulf.com: # Author: Jai P Sharma | |
naukrigulf.com: # Email : jai.sharma[at]naukri.com | |
naukrigulf.com: # Edited : Mar 27, 2018 | |
redbull.com: #PCS | |
redbull.com: #Wingfinder | |
redbull.com: #update04–07-2018#https | |
postfinance.ch: #robots.txt for PostFinance | |
supercoloring.com: # | |
supercoloring.com: # robots.txt | |
supercoloring.com: # | |
supercoloring.com: # This file is to prevent the crawling and indexing of certain parts | |
supercoloring.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
supercoloring.com: # and Google. By telling these "robots" where not to go on your site, | |
supercoloring.com: # you save bandwidth and server resources. | |
supercoloring.com: # | |
supercoloring.com: # This file will be ignored unless it is at the root of your host: | |
supercoloring.com: # Used: http://example.com/robots.txt | |
supercoloring.com: # Ignored: http://example.com/site/robots.txt | |
supercoloring.com: # | |
supercoloring.com: # For more information about the robots.txt standard, see: | |
supercoloring.com: # http://www.robotstxt.org/robotstxt.html | |
supercoloring.com: # | |
supercoloring.com: # For syntax checking, see: | |
supercoloring.com: # http://www.frobee.com/robots-txt-check | |
supercoloring.com: # Directories | |
supercoloring.com: # Files | |
supercoloring.com: # Paths (clean URLs) | |
supercoloring.com: # Paths (no clean URLs) | |
globalclassified.net: # Blocks robots from specific folders / directories | |
getharvest.com: # http://www.robotstxt.org/ | |
zeczec.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
zeczec.com: # | |
zeczec.com: # To ban all spiders from the entire site uncomment the next two lines: | |
zeczec.com: # User-Agent: * | |
zeczec.com: # Disallow: / | |
gmx.at: #https://www.gmx.ch/robots.txt | |
ludwig.guru: # Disallow: /dictionary/* | |
ludwig.guru: # Disallow: /it/dictionary/* | |
ludwig.guru: # Disallow: /ru/dictionary/* | |
ludwig.guru: # Disallow: /en/dictionary/* | |
ludwig.guru: # Disallow: /pt/dictionary/* | |
ludwig.guru: # Disallow: /zh/dictionary/* | |
ludwig.guru: # Disallow: /tr/dictionary/* | |
ludwig.guru: # Disallow: /share | |
ludwig.guru: # Disallow: /share/* | |
ludwig.guru: # Disallow: /it/share | |
ludwig.guru: # Disallow: /ru/share | |
ludwig.guru: # Disallow: /en/share | |
ludwig.guru: # Disallow: /zh/share | |
ludwig.guru: # Disallow: /pt/share | |
ludwig.guru: # Disallow: /tr/share | |
ludwig.guru: # Disallow: /it/share/* | |
ludwig.guru: # Disallow: /ru/share/* | |
ludwig.guru: # Disallow: /en/share/* | |
ludwig.guru: # Disallow: /zh/share/* | |
ludwig.guru: # Disallow: /pt/share/* | |
ludwig.guru: # Disallow: /tr/share/* | |
ludwig.guru: # Disallow: /dictionary/* | |
ludwig.guru: # Disallow: /it/dictionary/* | |
ludwig.guru: # Disallow: /ru/dictionary/* | |
ludwig.guru: # Disallow: /en/dictionary/* | |
ludwig.guru: # Disallow: /pt/dictionary/* | |
ludwig.guru: # Disallow: /zh/dictionary/* | |
ludwig.guru: # Disallow: /tr/dictionary/* | |
ludwig.guru: # Disallow: /share | |
ludwig.guru: # Disallow: /share/* | |
ludwig.guru: # Disallow: /it/share | |
ludwig.guru: # Disallow: /ru/share | |
ludwig.guru: # Disallow: /en/share | |
ludwig.guru: # Disallow: /zh/share | |
ludwig.guru: # Disallow: /pt/share | |
ludwig.guru: # Disallow: /tr/share | |
ludwig.guru: # Disallow: /it/share/* | |
ludwig.guru: # Disallow: /ru/share/* | |
ludwig.guru: # Disallow: /en/share/* | |
ludwig.guru: # Disallow: /zh/share/* | |
ludwig.guru: # Disallow: /pt/share/* | |
ludwig.guru: # Disallow: /tr/share/* | |
ludwig.guru: # Disallow: /dictionary/* | |
ludwig.guru: # Disallow: /it/dictionary/* | |
ludwig.guru: # Disallow: /ru/dictionary/* | |
ludwig.guru: # Disallow: /en/dictionary/* | |
ludwig.guru: # Disallow: /pt/dictionary/* | |
ludwig.guru: # Disallow: /zh/dictionary/* | |
ludwig.guru: # Disallow: /tr/dictionary/* | |
ludwig.guru: # Disallow: /share | |
ludwig.guru: # Disallow: /share/* | |
ludwig.guru: # Disallow: /it/share | |
ludwig.guru: # Disallow: /ru/share | |
ludwig.guru: # Disallow: /en/share | |
ludwig.guru: # Disallow: /zh/share | |
ludwig.guru: # Disallow: /pt/share | |
ludwig.guru: # Disallow: /tr/share | |
ludwig.guru: # Disallow: /it/share/* | |
ludwig.guru: # Disallow: /ru/share/* | |
ludwig.guru: # Disallow: /en/share/* | |
ludwig.guru: # Disallow: /zh/share/* | |
ludwig.guru: # Disallow: /pt/share/* | |
ludwig.guru: # Disallow: /tr/share/* | |
delfi.lt: # $Revision: 1.25 $ $Date: 2020-01-31 08:44:56 $ | |
bmail.uol.com.br: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
sketch.com: # If you’re trying to hide a page from search results use the 'noindex' instead: | |
sketch.com: # https://developers.google.com/search/docs/advanced/crawling/block-indexing | |
sketch.com: # | |
sketch.com: # Please ensure you have gone through the documentation before editing: | |
sketch.com: # https://developers.google.com/search/reference/robots_txt | |
sketch.com: # | |
sketch.com: # In case of conflicts, the less restrictive rules will prevail. | |
sudouest.fr: # Allowed search engines directives | |
sudouest.fr: #Sitemaps | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
sudouest.fr: # | |
juooo.com: # robots.txt generated at http://tool.chinaz.com/robots/ | |
lexisnexis.com: # Ignore FrontPage files | |
lexisnexis.com: # Ignore Other Files | |
lexisnexis.com: # Ignore some forms | |
lexisnexis.com: # Ignore Law School test dir | |
lexisnexis.com: # Ignore Other folders | |
lexisnexis.com: # Ignore clients | |
lexisnexis.com: # Ignore au communities | |
lexisnexis.com: # Include sitemap | |
lexisnexis.com: # Ignore search.aspx | |
lexisnexis.com: # Ignore Martindale-Hubbell | |
lexisnexis.com: # Ignore flash | |
lexisnexis.com: # Ignore support | |
lexisnexis.com: #store pages | |
lexisnexis.com: # Ignore Webcasting | |
lexisnexis.com: # Ignore LSBO | |
lexisnexis.com: #Ignore Accurint | |
lexisnexis.com: #Ignore ppc pages | |
lexisnexis.com: #Ignore downloads | |
lexisnexis.com: #Ignore lexisONE | |
lexisnexis.com: #Ignore campaign Ravel View | |
lexisnexis.com: #Ignore old newsroom | |
lexisnexis.com: #Ignore old lawschool page | |
lexisnexis.com: #Ignore members pages | |
lexisnexis.com: #Ignore test pages | |
fema.gov: # | |
fema.gov: # robots.txt | |
fema.gov: # | |
fema.gov: # This file is to prevent the crawling and indexing of certain parts | |
fema.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
fema.gov: # and Google. By telling these "robots" where not to go on your site, | |
fema.gov: # you save bandwidth and server resources. | |
fema.gov: # | |
fema.gov: # This file will be ignored unless it is at the root of your host: | |
fema.gov: # Used: http://example.com/robots.txt | |
fema.gov: # Ignored: http://example.com/site/robots.txt | |
fema.gov: # | |
fema.gov: # For more information about the robots.txt standard, see: | |
fema.gov: # http://www.robotstxt.org/robotstxt.html | |
fema.gov: # CSS, JS, Images | |
fema.gov: # Directories | |
fema.gov: # Files | |
fema.gov: # Paths (clean URLs) | |
fema.gov: # Paths (no clean URLs) | |
desidime.com: # Hello | |
desidime.com: # | |
desidime.com: # If you are a Human and reading this, It means you eat, sleep, dream SEO. | |
desidime.com: # | |
desidime.com: # We are implementing white-hat SEO growth hacking techniques on our site. | |
desidime.com: # | |
desidime.com: # If you are a growth hacker and technical aspects of SEO makes you excited, you have found a right team. Apply to us at jobs@desidime.com and dont forget to mention that you found us via Robots.txt for bonus points. ;) | |
desidime.com: # | |
programme.tv: # robots.txt file for Télé 2 Semaines | |
programme.tv: # desktop & mobile | |
programme.tv: # https://www.robotstxt.org/ | |
yunexpress.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
yunexpress.com: #content{margin:0 0 0 2%;position:relative;} | |
rsc.org: ##ACAP version=1.0 | |
rsc.org: # allow contracted search | |
rsc.org: # block GuideBot | |
rsc.org: # block robots | |
rsc.org: # Editors symposium files | |
rsc.org: # allow contracted search | |
rsc.org: # User-agent: gsa-crawler | |
rsc.org: # block GuideBot | |
rsc.org: # User-agent: Guidebot | |
rsc.org: # Disallow: / | |
rsc.org: # block robots | |
rsc.org: # User-agent: * | |
rsc.org: # Disallow: /Membership/Memberzone/ | |
rsc.org: # Disallow: /is/ | |
rsc.org: # Disallow: /publishing/journals/rssfeed.asp | |
rsc.org: # Yahoo crawl | |
rsc.org: #Conference Pages | |
rsc.org: #Exam File for Robert Bowles | |
rsc.org: # e-Membership | |
rsc.org: #HD75942 11:11 08/02/2012 | |
rsc.org: # Denial URLs | |
cruisefashion.com: #divMobSearch { | |
cruisefashion.com: #mp-menu { | |
cruisefashion.com: #BodyWrap.headerFix header.HeaderWrap { | |
cruisefashion.com: #BodyWrap.headerFix #divMobSearch { | |
cruisefashion.com: #BodyWrap #divMobSearch { | |
cruisefashion.com: #BodyWrap.headerFix header.HeaderWrap { | |
cruisefashion.com: #BodyWrap.headerFix #divMobSearch { | |
cruisefashion.com: #BodyWrap.headerFix .HeaderTopCrus { | |
cruisefashion.com: #BodyWrap #divMobSearch { | |
cruisefashion.com: #BodyWrap.headerFix #divMobSearch { | |
cruisefashion.com: #mp-menu { | |
oregonstate.edu: # | |
oregonstate.edu: # robots.txt | |
oregonstate.edu: # | |
oregonstate.edu: # This file is to prevent the crawling and indexing of certain parts | |
oregonstate.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
oregonstate.edu: # and Google. By telling these "robots" where not to go on your site, | |
oregonstate.edu: # you save bandwidth and server resources. | |
oregonstate.edu: # | |
oregonstate.edu: # This file will be ignored unless it is at the root of your host: | |
oregonstate.edu: # Used: http://example.com/robots.txt | |
oregonstate.edu: # Ignored: http://example.com/site/robots.txt | |
oregonstate.edu: # | |
oregonstate.edu: # For more information about the robots.txt standard, see: | |
oregonstate.edu: # http://www.robotstxt.org/robotstxt.html | |
oregonstate.edu: # CSS, JS, Images | |
oregonstate.edu: # Directories | |
oregonstate.edu: # Files | |
oregonstate.edu: # Paths (clean URLs) | |
oregonstate.edu: # Paths (no clean URLs) | |
buenosaires.gob.ar: # | |
buenosaires.gob.ar: # robots.txt | |
buenosaires.gob.ar: # | |
buenosaires.gob.ar: # This file is to prevent the crawling and indexing of certain parts | |
buenosaires.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo! | |
buenosaires.gob.ar: # and Google. By telling these "robots" where not to go on your site, | |
buenosaires.gob.ar: # you save bandwidth and server resources. | |
buenosaires.gob.ar: # | |
buenosaires.gob.ar: # This file will be ignored unless it is at the root of your host: | |
buenosaires.gob.ar: # Used: http://example.com/robots.txt | |
buenosaires.gob.ar: # Ignored: http://example.com/site/robots.txt | |
buenosaires.gob.ar: # | |
buenosaires.gob.ar: # For more information about the robots.txt standard, see: | |
buenosaires.gob.ar: # http://www.robotstxt.org/robotstxt.html | |
buenosaires.gob.ar: # Directories | |
buenosaires.gob.ar: # Files | |
buenosaires.gob.ar: # Paths (clean URLs) | |
buenosaires.gob.ar: # Paths (no clean URLs) | |
buenosaires.gob.ar: #Mantis 82858 | |
form.run: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
form.run: # | |
form.run: # To ban all spiders from the entire site uncomment the next two lines: | |
form.run: # User-agent: * | |
form.run: # Disallow: / | |
osvita.ua: # robots.txt for http://osvita.ua | |
doxy.me: # www.robotstxt.org/ | |
doxy.me: # Allow crawling of all content | |
the-qrcode-generator.com: # robotstxt.org/ | |
varsitytutors.com: # production site robots.txt | |
varsitytutors.com: # for site: www.varsitytutors.com | |
puravidabracelets.com: # we use Shopify as our ecommerce platform | |
puravidabracelets.com: # Google adsbot ignores robots.txt unless specifically named! | |
alberta.ca: # robot exclusion file for www.gov.ab.ca/www2.gov.ab.ca | |
alberta.ca: # see http://www.robotstxt.org/wc/exclusion-admin.html for format | |
affiliates.one: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
affiliates.one: # | |
affiliates.one: # To ban all spiders from the entire site uncomment the next two lines: | |
affiliates.one: # User-Agent: * | |
affiliates.one: # Disallow: / | |
islamweb.net: # Rule 1 | |
islamweb.net: # Rule 2 (indexing on new) | |
islamweb.net: # Rule 3 (old pages - indexing on new) | |
planoly.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
gla.ac.uk: #added by dg to stop google indexing old sites [2011/08/15] | |
gla.ac.uk: #added 2011/08/18 SG | |
gla.ac.uk: #from access.conf | |
gla.ac.uk: #Disallow: /services/library/ | |
gla.ac.uk: #Disallow: /undergraduate/prospectus/ | |
gla.ac.uk: #aliases from httpd.conf | |
gla.ac.uk: #Disallow: /Media/ | |
google.lv: # AdsBot | |
google.lv: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
tecnoblog.net: # All robots | |
tecnoblog.net: # remova os diretorios | |
tecnoblog.net: # Comunidade | |
tecnoblog.net: #Produtos | |
tecnoblog.net: # Multilang | |
tecnoblog.net: # remover scrips css e afins | |
tecnoblog.net: # Bloqueando URLs dinĆ¢micas | |
tecnoblog.net: # Robôs Diversos | |
tecnoblog.net: # Yandex | |
tecnoblog.net: # Bingbot | |
tecnoblog.net: # Sogou | |
tecnoblog.net: #Adsense | |
linkin.bio: # http://www.robotstxt.org | |
fzg360.com: # | |
fzg360.com: # robots.txt for fzg360.com | |
fzg360.com: # Version v2018 | |
fzg360.com: # | |
fzg360.com: # Allow | |
fzg360.com: # Disallow | |
huamu.com: ## ----------------------------------------------------------------------------- | |
huamu.com: ## fileEncoding = UTF-8 | |
huamu.com: ## 禁止爬虫爬取无效URL,提升网站核心静态资源抓取及索引效率。 | |
huamu.com: ## 无效URL包含:已下线产品线的URL,全动态URL,需权限验证的URL,存在问题的旧静态URL | |
huamu.com: ## 等各种无需被搜索引擎收录的URL。 | |
huamu.com: ## ----------------------------------------------------------------------------- | |
huamu.com: # robots.txt for careless3 2016.01.13 | |
ironcladapp.com: # robotstxt.org/ | |
newsis.com: # robots.txt generated at http://www.adop.cc | |
jbhifi.com.au: # we use Shopify as our ecommerce platform | |
jbhifi.com.au: #modified using Cloudflare Workers | |
jbhifi.com.au: # Google adsbot ignores robots.txt unless specifically named! | |
alza.cz: # robots.txt for https://www.alza.cz/ | |
qvc.com: # Throttle bingbot | |
qvc.com: # HTML Pages | |
qvc.com: # Affiliates | |
qvc.com: # Internal Pages | |
qvc.com: # HTML and PDF Includes | |
qvc.com: #Legacy | |
qvc.com: #AEM CHECKOUT | |
build.com: #Sitemaps | |
princeton.edu: # | |
princeton.edu: # robots.txt | |
princeton.edu: # | |
princeton.edu: # This file is to prevent the crawling and indexing of certain parts | |
princeton.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
princeton.edu: # and Google. By telling these "robots" where not to go on your site, | |
princeton.edu: # you save bandwidth and server resources. | |
princeton.edu: # | |
princeton.edu: # This file will be ignored unless it is at the root of your host: | |
princeton.edu: # Used: http://example.com/robots.txt | |
princeton.edu: # Ignored: http://example.com/site/robots.txt | |
princeton.edu: # | |
princeton.edu: # For more information about the robots.txt standard, see: | |
princeton.edu: # http://www.robotstxt.org/robotstxt.html | |
princeton.edu: # CSS, JS, Images | |
princeton.edu: # Directories | |
princeton.edu: # Files | |
princeton.edu: # Paths (clean URLs) | |
princeton.edu: # Paths (no clean URLs) | |
pacsun.com: # Specifically allow search result pages to be crawled and indexed | |
pacsun.com: # Allow: *Search-Show*q=* | |
pacsun.com: # Prevent indexing of category-specific searches | |
pacsun.com: # Disallow crawling of specific pages and resources | |
pacsun.com: # Disallow: /*demandware.static* | |
pacsun.com: # Prevent indexing of specific pages and resources | |
pacsun.com: # Noindex: *country=* | |
toluna.com: #Disallow: /Content/ | |
toluna.com: #Disallow: / | |
justmote.me: # https://www.robotstxt.org/robotstxt.html | |
meteofrance.com: # | |
meteofrance.com: # robots.txt | |
meteofrance.com: # | |
meteofrance.com: # This file is to prevent the crawling and indexing of certain parts | |
meteofrance.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
meteofrance.com: # and Google. By telling these "robots" where not to go on your site, | |
meteofrance.com: # you save bandwidth and server resources. | |
meteofrance.com: # | |
meteofrance.com: # This file will be ignored unless it is at the root of your host: | |
meteofrance.com: # Used: http://example.com/robots.txt | |
meteofrance.com: # Ignored: http://example.com/site/robots.txt | |
meteofrance.com: # | |
meteofrance.com: # For more information about the robots.txt standard, see: | |
meteofrance.com: # http://www.robotstxt.org/robotstxt.html | |
meteofrance.com: # CSS, JS, Images | |
meteofrance.com: # Directories | |
meteofrance.com: # Files | |
meteofrance.com: # Paths (clean URLs) | |
meteofrance.com: # Paths (no clean URLs) | |
meteofrance.com: # URL | |
newspicks.com: # robotstxt.org | |
riverisland.com: #UK | |
riverisland.com: #additions {31/10/16} | |
riverisland.com: # price - disallow any URL with price | |
riverisland.com: # sizes - disallow URLs with any size | |
riverisland.com: # combination of four facets | |
piazza.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
piazza.com: # | |
piazza.com: # To ban all spiders from the entire site uncomment the next two lines: | |
umass.edu: # Directories | |
umass.edu: # Files | |
umass.edu: # Paths (clean URLs) | |
umass.edu: # Paths (no clean URLs) | |
pinterest.ca: # Pinterest is hiring! | |
pinterest.ca: # | |
pinterest.ca: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.ca: # | |
pinterest.ca: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
shrm.org: # System Disallow Urls: | |
shrm.org: # User defined configurations: | |
ikman.lk: # Sitemap | |
ikman.lk: # Excludes | |
ikman.lk: # Blog | |
ikman.lk: # Promotions | |
ikman.lk: # msn | |
transactiondesk.com: # go away | |
telerama.fr: # robots.txt | |
telerama.fr: #Directories | |
telerama.fr: #Disallow: /sites/tr_master/ | |
telerama.fr: #Files | |
telerama.fr: #Paths (clean URLs) | |
telerama.fr: #Paths (no clean URLs) | |
telerama.fr: #Ne pas indexer la recherche | |
telerama.fr: #CSS, JS, Images | |
telerama.fr: # Sitemaps | |
expensify.com: # For all crawlers | |
expensify.com: # Whitelist specific pages | |
expensify.com: # Allow: /$ is to prevent us from blocking our root domain as is since we are doing Disallow: / | |
expensify.com: # Disallow everything else | |
bbcgoodfood.com: #Member Sitemap | |
bbcgoodfood.com: # News Sitemap | |
bbcgoodfood.com: # Sitemap archive | |
bbcgoodfood.com: # Sitemap archive | |
gatech.edu: # | |
gatech.edu: # robots.txt | |
gatech.edu: # | |
gatech.edu: # This file is to prevent the crawling and indexing of certain parts | |
gatech.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
gatech.edu: # and Google. By telling these "robots" where not to go on your site, | |
gatech.edu: # you save bandwidth and server resources. | |
gatech.edu: # | |
gatech.edu: # This file will be ignored unless it is at the root of your host: | |
gatech.edu: # Used: http://example.com/robots.txt | |
gatech.edu: # Ignored: http://example.com/site/robots.txt | |
gatech.edu: # | |
gatech.edu: # For more information about the robots.txt standard, see: | |
gatech.edu: # http://www.robotstxt.org/wc/robots.html | |
gatech.edu: # | |
gatech.edu: # For syntax checking, see: | |
gatech.edu: # http://www.sxw.org.uk/computing/robots/check.html | |
gatech.edu: # Directories | |
gatech.edu: # Files | |
gatech.edu: # Paths (clean URLs) | |
gatech.edu: # Paths (no clean URLs) | |
selfgrowth.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $ | |
selfgrowth.com: # | |
selfgrowth.com: # robots.txt | |
selfgrowth.com: # | |
selfgrowth.com: # This file is to prevent the crawling and indexing of certain parts | |
selfgrowth.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
selfgrowth.com: # and Google. By telling these "robots" where not to go on your site, | |
selfgrowth.com: # you save bandwidth and server resources. | |
selfgrowth.com: # | |
selfgrowth.com: # This file will be ignored unless it is at the root of your host: | |
selfgrowth.com: # Used: http://example.com/robots.txt | |
selfgrowth.com: # Ignored: http://example.com/site/robots.txt | |
selfgrowth.com: # | |
selfgrowth.com: # For more information about the robots.txt standard, see: | |
selfgrowth.com: # http://www.robotstxt.org/wc/robots.html | |
selfgrowth.com: # | |
selfgrowth.com: # For syntax checking, see: | |
selfgrowth.com: # http://www.sxw.org.uk/computing/robots/check.html | |
selfgrowth.com: # Directories | |
selfgrowth.com: # Files | |
selfgrowth.com: # Paths (clean URLs) | |
selfgrowth.com: # Paths (no clean URLs) | |
cdslindia.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
cdslindia.com: #content{margin:0 0 0 2%;position:relative;} | |
keywordtool.io: # | |
keywordtool.io: # robots.txt | |
keywordtool.io: # | |
keywordtool.io: # This file is to prevent the crawling and indexing of certain parts | |
keywordtool.io: # of your site by web crawlers and spiders run by sites like Yahoo! | |
keywordtool.io: # and Google. By telling these "robots" where not to go on your site, | |
keywordtool.io: # you save bandwidth and server resources. | |
keywordtool.io: # | |
keywordtool.io: # This file will be ignored unless it is at the root of your host: | |
keywordtool.io: # Used: http://example.com/robots.txt | |
keywordtool.io: # Ignored: http://example.com/site/robots.txt | |
keywordtool.io: # | |
keywordtool.io: # For more information about the robots.txt standard, see: | |
keywordtool.io: # http://www.robotstxt.org/robotstxt.html | |
keywordtool.io: # Crawl-delay: 10 | |
keywordtool.io: # CSS, JS, Images | |
keywordtool.io: # Directories | |
keywordtool.io: # Files | |
keywordtool.io: # Paths (clean URLs) | |
keywordtool.io: # Paths (no clean URLs) | |
keywordtool.io: # Wordpress Blog | |
roberthalf.com: # | |
roberthalf.com: # robots.txt | |
roberthalf.com: # | |
roberthalf.com: # This file is to prevent the crawling and indexing of certain parts | |
roberthalf.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
roberthalf.com: # and Google. By telling these "robots" where not to go on your site, | |
roberthalf.com: # you save bandwidth and server resources. | |
roberthalf.com: # | |
roberthalf.com: # This file will be ignored unless it is at the root of your host: | |
roberthalf.com: # Used: http://example.com/robots.txt | |
roberthalf.com: # Ignored: http://example.com/site/robots.txt | |
roberthalf.com: # | |
roberthalf.com: # For more information about the robots.txt standard, see: | |
roberthalf.com: # http://www.robotstxt.org/robotstxt.html | |
roberthalf.com: # CSS, JS, Images | |
roberthalf.com: # Directories | |
roberthalf.com: # Files | |
roberthalf.com: ## Allow rh-job-search.xml to be crawled | |
roberthalf.com: # Paths (clean URLs) | |
roberthalf.com: # Paths (no clean URLs) | |
roberthalf.com: # XML sitemap | |
expansion.com: # Diciembre 2020 | |
pinterest.ru: # Pinterest is hiring! | |
pinterest.ru: # | |
pinterest.ru: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.ru: # | |
pinterest.ru: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
pinterest.cl: # Pinterest is hiring! | |
pinterest.cl: # | |
pinterest.cl: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.cl: # | |
pinterest.cl: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
epa.gov: # | |
epa.gov: # robots.txt | |
epa.gov: # | |
epa.gov: # This file is to prevent the crawling and indexing of certain parts | |
epa.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
epa.gov: # and Google. By telling these "robots" where not to go on your site, | |
epa.gov: # you save bandwidth and server resources. | |
epa.gov: # | |
epa.gov: # This file will be ignored unless it is at the root of your host: | |
epa.gov: # Used: http://example.com/robots.txt | |
epa.gov: # Ignored: http://example.com/site/robots.txt | |
epa.gov: # | |
epa.gov: # For more information about the robots.txt standard, see: | |
epa.gov: # http://www.robotstxt.org/robotstxt.html | |
epa.gov: # CSS, JS, Images | |
epa.gov: # Directories | |
epa.gov: # Files | |
epa.gov: # Paths (clean URLs) | |
epa.gov: # Paths (no clean URLs) | |
epa.gov: # 15Jul2020 pbuch replaced dynamically inserted sitemap links | |
epa.gov: # with single static sitemap index | |
syncfusion.com: # All robots will spider the domain | |
syncfusion.com: # To disallow kb tags listing pages | |
syncfusion.com: # To disallow the retired products | |
dietdoctor.com: ## DD + global rules | |
dietdoctor.com: # Avoid duplicate content based on comment query arguments | |
dietdoctor.com: # Avoid unnecessary crawling of news archives with from_post argument | |
dietdoctor.com: # Archives, internal search and similar | |
dietdoctor.com: # Members only content | |
dietdoctor.com: # News archive beyond first two pages | |
dietdoctor.com: # Old date archive | |
dietdoctor.com: # New date archive | |
dietdoctor.com: # Sitemap | |
dietdoctor.com: ## SE | |
dietdoctor.com: # Avoid duplicate content based on comment query arguments | |
dietdoctor.com: # Archives, and similar | |
dietdoctor.com: # Members only content | |
dietdoctor.com: # News archive beyond first two pages | |
dietdoctor.com: # Old date archive | |
dietdoctor.com: # New date archive | |
dietdoctor.com: # Sitemap | |
dietdoctor.com: ## ES | |
dietdoctor.com: # News archive beyond first two pages | |
dietdoctor.com: # New date archive | |
dietdoctor.com: # Members only content | |
dietdoctor.com: # Sitemap | |
cretalive.gr: # | |
cretalive.gr: # robots.txt | |
cretalive.gr: # | |
cretalive.gr: # This file is to prevent the crawling and indexing of certain parts | |
cretalive.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
cretalive.gr: # and Google. By telling these "robots" where not to go on your site, | |
cretalive.gr: # you save bandwidth and server resources. | |
cretalive.gr: # | |
cretalive.gr: # This file will be ignored unless it is at the root of your host: | |
cretalive.gr: # Used: http://example.com/robots.txt | |
cretalive.gr: # Ignored: http://example.com/site/robots.txt | |
cretalive.gr: # | |
cretalive.gr: # For more information about the robots.txt standard, see: | |
cretalive.gr: # http://www.robotstxt.org/robotstxt.html | |
cretalive.gr: # CSS, JS, Images | |
cretalive.gr: # Directories | |
cretalive.gr: # Files | |
cretalive.gr: # Paths (clean URLs) | |
cretalive.gr: # Paths (no clean URLs) | |
447651.com: #robots.txt for all our sites | |
wines-info.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
wines-info.com: #content{margin:0 0 0 2%;position:relative;} | |
minepi.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
bankofindia.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
bankofindia.com: #content{margin:0 0 0 2%;position:relative;} | |
nv.gov: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
nv.gov: #content{margin:0 0 0 2%;position:relative;} | |
thestar.com: # Allow Mediapartners-Google | |
thestar.com: # Disallow Specific Robots | |
thenorthface.com: # robots.txt for www.thenorthface.com | |
thenorthface.com: #added 1-19-18 | |
thenorthface.com: #added 7-31-19 | |
thenorthface.com: #sitemaps | |
travelandleisure.com: # Sitemaps | |
travelandleisure.com: #legacy | |
travelandleisure.com: #Onecms | |
travelandleisure.com: #content | |
travelandleisure.com: #legacy | |
travelandleisure.com: #Onecms | |
travelandleisure.com: #content | |
androidcentral.com: # $Id: robots.txt,v 1.4.4.3 2008/11/04 09:14:25 hass Exp $ | |
androidcentral.com: # | |
androidcentral.com: # robots.txt | |
androidcentral.com: # | |
androidcentral.com: # This file is to prevent the crawling and indexing of certain parts | |
androidcentral.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
androidcentral.com: # and Google. By telling these "robots" where not to go on your site, | |
androidcentral.com: # you save bandwidth and server resources. | |
androidcentral.com: # | |
androidcentral.com: # This file will be ignored unless it is at the root of your host: | |
androidcentral.com: # Used: http://example.com/robots.txt | |
androidcentral.com: # Ignored: http://example.com/site/robots.txt | |
androidcentral.com: # | |
androidcentral.com: # For more information about the robots.txt standard, see: | |
androidcentral.com: # http://www.robotstxt.org/robotstxt.html | |
androidcentral.com: # | |
androidcentral.com: # For syntax checking, see: | |
androidcentral.com: # http://www.frobee.com/robots-txt-check | |
androidcentral.com: # Directories | |
androidcentral.com: # Files | |
androidcentral.com: # Paths (clean URLs) | |
androidcentral.com: # Paths (no clean URLs) | |
fintel.io: # LinkFluence | |
ad.nl: # Tell robots that the webview pages are not very interesting | |
ad.nl: # Articles which should not be listed in google search index: | |
ad.nl: # tu-e-zet-directeur-op-non-actief~ab9e5892/ | |
bbva.es: # Directorios: | |
bbva.es: # URL: | |
bbva.es: # Buscador interno: | |
timesofisrael.com: # Google Image | |
timesofisrael.com: # Google AdSense | |
timesofisrael.com: # digg mirror | |
timesofisrael.com: # Twiiter | |
timesofisrael.com: # Google News | |
timesofisrael.com: # MSN | |
timesofisrael.com: # global | |
lloydsbank.co.uk: # v 1.1 | |
lloydsbank.co.uk: # www.lloydsbank.com | |
duden.de: # | |
duden.de: # robots.txt | |
duden.de: # | |
duden.de: # This file is to prevent the crawling and indexing of certain parts | |
duden.de: # of your site by web crawlers and spiders run by sites like Yahoo! | |
duden.de: # and Google. By telling these "robots" where not to go on your site, | |
duden.de: # you save bandwidth and server resources. | |
duden.de: # | |
duden.de: # This file will be ignored unless it is at the root of your host: | |
duden.de: # Used: http://example.com/robots.txt | |
duden.de: # Ignored: http://example.com/site/robots.txt | |
duden.de: # | |
duden.de: # For more information about the robots.txt standard, see: | |
duden.de: # http://www.robotstxt.org/robotstxt.html | |
duden.de: # CSS, JS, Images | |
duden.de: # Directories | |
duden.de: # Files | |
duden.de: # Paths (clean URLs) | |
duden.de: # Paths (no clean URLs) | |
starfall.com: # Rule 1 | |
starfall.com: # | |
starfall.com: # Exclusions | |
starfall.com: # | |
starfall.com: # | |
starfall.com: # Begin special list for /n/ | |
starfall.com: # | |
starfall.com: # Allow: /n/N-info | |
starfall.com: # | |
starfall.com: # Begin special list for /n/level-* | |
starfall.com: # | |
starfall.com: # | |
starfall.com: # Continue more general /n/ | |
starfall.com: # | |
starfall.com: # | |
starfall.com: # End special list for /n/ | |
starfall.com: # | |
starfall.com: # | |
starfall.com: # Special addition since these were previously disallowed and are now allowed. | |
starfall.com: # 20170421 - RBW | |
starfall.com: # | |
starfall.com: # | |
starfall.com: # Sitemap addition for HTML5 content | |
starfall.com: # | |
autotrader.co.uk: # This is the robots.txt for autotrader.co.uk | |
autotrader.co.uk: # _ ___________ _ | |
autotrader.co.uk: # ////////// / \ |____ ____| | | | |
autotrader.co.uk: # /////////// / _ \ _ _ _ ____ | | _ ____ ___| | ____ _ | |
autotrader.co.uk: # //////////// / /_\ \ | | | | / \__ / \ | | | \__ / \ / | / _ \ | \___ | |
autotrader.co.uk: # / _____ \ | | | | | _/ | /\ | | | | __/ | /\ | | /\ | | [_] \ | __/ | |
autotrader.co.uk: # //////////// / / \ \ | | | | | | | | | | | | | | | | | | | | | | | ____| | | | |
autotrader.co.uk: # /////////// / / \ \ | \/ | | |__/\ | \/ | | | | | | \/ _ \ | \/ | | \____ | | | |
autotrader.co.uk: # ////////// /_/ \_\ \____/ \_____/ \____/ |_| |_| \___/ \/ \___/\/ \_____/ |_| | |
autotrader.co.uk: # | |
autotrader.co.uk: # ======================================================================================================== | |
autotrader.co.uk: # | Auto Trader are hiring - Check out our jobs at https://careers.autotrader.co.uk/jobs | | |
autotrader.co.uk: # ======================================================================================================== | |
visa.com: #logo { position: absolute; top: 20px; left: 16px; } | |
visa.com: #content { position: absolute; top: 146px; left: 96px; color: #000000; width: 623px; } | |
visa.com: #footer { position: absolute; top: 384px; left: 2px; width: 500px; height: 76px; margin: 45px 0 0 13px; float: left; font-size: 0.85em; color: #003399; overflow: hidden; } | |
visa.com: #copyright { color: #999999; margin-top: 5px; } | |
visa.com: #footer a { text-decoration: none; } | |
visa.com: #footer a:hover { text-decoration: underline;} | |
proposify.com: # robots.txt for https://www.proposify.com/ | |
proposify.com: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
india.gov.in: #searchForm label { display: none;} | |
bdjobs.com: # robots.txt file for www.bdjobs.com | |
bdjobs.com: # All other agents will not spider | |
bdjobs.com: # Google will not spider | |
bdjobs.com: # Google Ad Sense | |
bdjobs.com: # Yahoo | |
bdjobs.com: # Bing | |
bdjobs.com: # GA Checker | |
bdjobs.com: # Screaming Frog SEO Spider | |
bdjobs.com: # Visual SEO Studio | |
bdjobs.com: # LinkedInBot | |
bdjobs.com: # All other agents will not spider | |
halifax.co.uk: # v 1.1 | |
halifax.co.uk: # www.halifax.co.uk | |
realsimple.com: # Sitemaps | |
realsimple.com: # legacy | |
realsimple.com: #Onecms | |
realsimple.com: #Content | |
realsimple.com: # legacy | |
realsimple.com: #Onecms | |
realsimple.com: #Content | |
brainly.pl: #Brainly Robots.txt 31.07.2017 | |
brainly.pl: # Disallow Marketing bots | |
brainly.pl: #Disallow exotic search engine crawlers | |
brainly.pl: #Disallow other crawlers | |
brainly.pl: # Good bots whitelisting: | |
brainly.pl: #Other bots | |
brainly.pl: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring | |
brainly.pl: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*) | |
brainly.pl: #Obot - IBM X-Force service | |
brainly.pl: #SafeDNSBot (https://www.safedns.com/searchbot) | |
getgo.com: # Sitemaps and Autodiscovers | |
exportersindia.com: #Robots.txt for ExportersIndia.com | |
surokkha.gov.bd: # https://www.robotstxt.org/robotstxt.html | |
zola.com: # Sitemaps | |
zola.com: # Disallows | |
zola.com: # Allow Overrides | |
zola.com: # Grant access for Find a Couple | |
abc.com: # Block trendkite-akashic-crawler | |
pinterest.it: # Pinterest is hiring! | |
pinterest.it: # | |
pinterest.it: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c | |
pinterest.it: # | |
pinterest.it: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering | |
truecar.com: # prod | |
chengdun.com: #1 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/Library/Think/App.class.php(38): Think\Dispatcher::dispatch()<br /> | |
chengdun.com: #2 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/Library/Think/App.class.php(195): Think\App::init()<br /> | |
chengdun.com: #3 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/Library/Think/Think.class.php(120): Think\App::run()<br /> | |
chengdun.com: #4 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/ThinkPHP.php(97): Think\Think::start()<br /> | |
chengdun.com: #5 /application/nginx-1.12.2/html/www/chengdunbaozhang/web/index.php(37): require('/application/ng...')<br /> | |
chengdun.com: #6 {main}</p> | |
expedia.com.hk: # | |
expedia.com.hk: # General bots | |
expedia.com.hk: # | |
expedia.com.hk: #hotel | |
expedia.com.hk: #flight | |
expedia.com.hk: #package | |
expedia.com.hk: #car | |
expedia.com.hk: #activities | |
expedia.com.hk: #cruise | |
expedia.com.hk: #other | |
expedia.com.hk: # | |
expedia.com.hk: # Google Ads | |
expedia.com.hk: # | |
expedia.com.hk: # | |
expedia.com.hk: # | |
expedia.com.hk: # Bing Ads | |
expedia.com.hk: # | |
expedia.com.hk: # | |
expedia.com.hk: # SemrushBot | |
expedia.com.hk: # | |
im286.net: # | |
im286.net: # robots.txt for Discuz! X3 | |
im286.net: # | |
okdiario.com: #Permitir rastreo en paginaciones de contenidos evergreen | |
okdiario.com: # Paginaciones limitadas | |
okdiario.com: #Paginaciones LOOK | |
okdiario.com: #Directorios bloqueados | |
okdiario.com: #Extensiones de contenidos no indexables | |
okdiario.com: #Sitemaps Okdiario | |
okdiario.com: #Sitemaps Look | |
okdiario.com: #Bloqueo de agentes | |
directhit.com: ## Default robots.txt | |
ontraport.com: # www.robotstxt.org/ | |
ontraport.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
fark.com: # $Id: robotsalt.txt 10113 2010-10-25 19:37:18Z mandrews $ | |
oregon.gov: # Last Updated: 11/13/19 by MCS, NIC | |
oregon.gov: # ---------------------- Statewide ---------------------- | |
oregon.gov: # Note: Site-stored robots.txt are not honored, except on host-header sites / subdomains | |
oregon.gov: # All directives for www.oregon.gov are to be stored here | |
oregon.gov: # Remove an indexed SERP and/or submit a 'Remove URL' request in Webmaster Tools | |
oregon.gov: # Note: The Allow directives are added for many Google-specific Mobile Tests to fully render the page. | |
oregon.gov: # Without these directives, the state sites could get poor grades for mobile-friendliness which | |
oregon.gov: # can result is a lower Page Rank and other SEO scores, as well as incorrect analytics in | |
oregon.gov: # the Google Analytics product. | |
oregon.gov: # ---------------------- DCBS ---------------------- | |
oregon.gov: # Enables DCBS's Google Search Applicance to index paths otherwise blocked (e.g. Orders) | |
georgia.gov: # | |
georgia.gov: # robots.txt | |
georgia.gov: # | |
georgia.gov: # This file is to prevent the crawling and indexing of certain parts | |
georgia.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
georgia.gov: # and Google. By telling these "robots" where not to go on your site, | |
georgia.gov: # you save bandwidth and server resources. | |
georgia.gov: # | |
georgia.gov: # This file will be ignored unless it is at the root of your host: | |
georgia.gov: # Used: http://example.com/robots.txt | |
georgia.gov: # Ignored: http://example.com/site/robots.txt | |
georgia.gov: # | |
georgia.gov: # For more information about the robots.txt standard, see: | |
georgia.gov: # http://www.robotstxt.org/robotstxt.html | |
georgia.gov: # CSS, JS, Images | |
georgia.gov: # Directories | |
georgia.gov: # Files | |
georgia.gov: # Paths (clean URLs) | |
georgia.gov: # Paths (no clean URLs) | |
georgia.gov: # Book printer-friendly pages | |
infojobs.com.br: # | |
infojobs.com.br: # robots.txt Infojobs | |
infojobs.com.br: # | |
infojobs.com.br: # $ID: robots.txt,v 1.0 2006/05/17 17:14:00 Exp $ | |
infojobs.com.br: # | |
infojobs.com.br: # Web site: infojobs.com.br | |
infojobs.com.br: # Descomentar esto cuando tengamos sitemaps y links internos en mobile. | |
infojobs.com.br: # User-agent: Googlebot-Mobile | |
infojobs.com.br: # User-Agent: YahooSeeker/M1A1-R2D2 | |
infojobs.com.br: # User-Agent: MSNBOT_Mobile | |
infojobs.com.br: # Disallow: / | |
k5learning.com: # | |
k5learning.com: # robots.txt | |
k5learning.com: # | |
k5learning.com: # This file is to prevent the crawling and indexing of certain parts | |
k5learning.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
k5learning.com: # and Google. By telling these "robots" where not to go on your site, | |
k5learning.com: # you save bandwidth and server resources. | |
k5learning.com: # | |
k5learning.com: # This file will be ignored unless it is at the root of your host: | |
k5learning.com: # Used: http://example.com/robots.txt | |
k5learning.com: # Ignored: http://example.com/site/robots.txt | |
k5learning.com: # | |
k5learning.com: # For more information about the robots.txt standard, see: | |
k5learning.com: # http://www.robotstxt.org/robotstxt.html | |
k5learning.com: # CSS, JS, Images | |
k5learning.com: # Directories | |
k5learning.com: # Files | |
k5learning.com: # Paths (clean URLs) | |
k5learning.com: # Paths (no clean URLs) | |
k5learning.com: # MJ12bot | |
easypost.com: # robotstxt.org | |
easypost.com: # | |
easypost.com: # Access to easypost.com is pursuant to our terms of service, located at | |
easypost.com: # https://www.easypost.com/privacy | |
easypost.com: # | |
easypost.com: # . | |
easypost.com: # c. | |
easypost.com: # ;, | |
easypost.com: # ,c | |
easypost.com: # .. ':;:,... . | |
easypost.com: # ,;;',,,;;; .,;,,,,,,,,;. '' :,;.'' | |
easypost.com: # ;,,;c;:l,; .;,,,,,,,,,,,:. .;,;:;c:,: | |
easypost.com: # :::;,,,:, :,,,,;;:::;;;;;. ,c:;;,;:: | |
easypost.com: # :,,,,,: .:,;;'.lcdddNOdkOO; ;,,,,,;. | |
easypost.com: # :cccc, c,:..,. ' ' Ox' ;ccccc | |
easypost.com: # ,cccc, :,:...'....;....d:. occcc | |
easypost.com: # 'llll: :,;;''''''''''''.. olloo | |
easypost.com: # dlcco ;,,,,,,,,,,,,,,c :ccco | |
easypost.com: # ccccl. ,,,,;:,.;..;.;c; collx | |
easypost.com: # ,lollc ',,:,;;'c'':':. :cccd | |
easypost.com: # dcccl. .;,:;.' '..,..' ollod | |
easypost.com: # ,cclol c,,,;;;;;,,,,c. .occcl | |
easypost.com: # lolcc; ;;;;,,,,''''';.. cllol; | |
easypost.com: # .lcccl; .'...............'.. .lccco. | |
easypost.com: # .lclooc. '.....'',,,,,,,,,,,,,;..ololc; | |
easypost.com: # locccl;. .';',,;;;,,,,,,,,,,,,,,,,lclccol | |
easypost.com: # 'lccooolc,,::,,,,,;::::::;;:::::,c,ocl: | |
easypost.com: # ,odcccco;,c,,:::;,,,,,,,,,,,,;cc;oc. | |
easypost.com: # .;lccl:,c,,c;,,,,,,,,,,,,,,;ccc. | |
easypost.com: # .lc:,;:,,;c,,,,,,,,,,,,,,;c: | |
easypost.com: # .'c;,,,,c,,,,,,,,,,,,,,;c; | |
easypost.com: # ;,,,,,c,,,,,,,,,,,,,,;c; | |
easypost.com: # ,,,,,,c,,,,,,,,,,,,;l:c; | |
easypost.com: #Baiduspider | |
pymnts.com: #bad bots# | |
trailblazer.me: # | |
trailblazer.me: # default robots.txt for sfdc communities sites | |
trailblazer.me: # | |
trailblazer.me: # For use by salesforce.com | |
trailblazer.me: # | |
nc.gov: # | |
nc.gov: # robots.txt | |
nc.gov: # | |
nc.gov: # This file is to prevent the crawling and indexing of certain parts | |
nc.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nc.gov: # and Google. By telling these "robots" where not to go on your site, | |
nc.gov: # you save bandwidth and server resources. | |
nc.gov: # | |
nc.gov: # This file will be ignored unless it is at the root of your host: | |
nc.gov: # Used: http://example.com/robots.txt | |
nc.gov: # Ignored: http://example.com/site/robots.txt | |
nc.gov: # | |
nc.gov: # For more information about the robots.txt standard, see: | |
nc.gov: # http://www.robotstxt.org/robotstxt.html | |
nc.gov: # CSS, JS, Images | |
nc.gov: # Directories | |
nc.gov: # Files | |
nc.gov: # Paths (clean URLs) | |
nc.gov: # Paths (no clean URLs) | |
nc.gov: # AWS WAF Honeypot Endpoint Trap | |
coles.com.au: # /robots.txt for coles.com.au | |
livecoinwatch.com: # no, thank you | |
livecoinwatch.com: # everyone else, welcome... for now | |
livecoinwatch.com: # always fresh | |
gigazine.net: # /robots.txt file for Disallow: / | |
gigazine.net: # 2008/04/07 11:51 | |
gigazine.net: # 2013/11/05 10:17 add ia_archiver by takaki | |
gigazine.net: # 2016/04/13 12:30 modify | |
gigazine.net: # 2018/12/26 09:42 refactored by log1d | |
gigazine.net: # 2020/04/16 10:26 modify by log1d | |
adthrive.com: # This space intentionally left blank | |
optum.com: #optum | |
getapp.com: # Blocks crawlers that are kind enough to obey robots | |
star-clicks.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
star-clicks.com: #content{margin:0 0 0 2%;position:relative;} | |
klix.ba: # robotstxt.org | |
upmedia.mg: #Googlebot | |
upmedia.mg: #Googlebot-Mobile | |
upmedia.mg: #Googlebot-News | |
upmedia.mg: #Googlebot-Image | |
upmedia.mg: #Facebot | |
upmedia.mg: #Twitterbot | |
upmedia.mg: #Bingbot | |
upmedia.mg: #Yahoo | |
upmedia.mg: #Alexa | |
upmedia.mg: #Baidu | |
poetryfoundation.org: # robots.txt for https://www.poetryfoundation.org/ | |
poetryfoundation.org: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
digit.in: # www.robotstxt.org/ | |
digit.in: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
brightmls.com: # https://www.robotstxt.org/robotstxt.html | |
digitec.ch: # @/ @/ | |
digitec.ch: # @/ @/ Hello, fellow humans! | |
digitec.ch: # @/ @/ | |
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@ | |
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@ | |
digitec.ch: # @@@@@@ @@@@@@@@@@@@ @@@@@@ @@@% @ | |
digitec.ch: # @@@@@ /@@@@@@@@@@ @@@@@@ @@@ @@ | |
digitec.ch: # @@@@@@ @@@@@@@@@@@, @@@@@@ @@@@ @@@@ | |
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@ | |
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@ | |
digitec.ch: # @@@@@@@@@@@@@ @@@@@@@@@@@@@@ @@@@@@@@ | |
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@ | |
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@ | |
sanfoundry.com: # Allow the following useragents for crawling the site | |
finra.org: # | |
finra.org: # robots.txt | |
finra.org: # | |
finra.org: # This file is to prevent the crawling and indexing of certain parts | |
finra.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
finra.org: # and Google. By telling these "robots" where not to go on your site, | |
finra.org: # you save bandwidth and server resources. | |
finra.org: # | |
finra.org: # This file will be ignored unless it is at the root of your host: | |
finra.org: # Used: http://example.com/robots.txt | |
finra.org: # Ignored: http://example.com/site/robots.txt | |
finra.org: # | |
finra.org: # For more information about the robots.txt standard, see: | |
finra.org: # http://www.robotstxt.org/robotstxt.html | |
finra.org: # CSS, JS, Images | |
finra.org: # Directories | |
finra.org: # Files | |
finra.org: # Paths (clean URLs) | |
finra.org: # Paths (no clean URLs) | |
milenio.com: # | |
milenio.com: # robots.txt | |
milenio.com: # | |
milenio.com: # This file is to prevent the crawling and indexing of certain parts | |
milenio.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
milenio.com: # and Google. By telling these "robots" where not to go on your site, | |
milenio.com: # you save bandwidth and server resources. | |
milenio.com: # | |
milenio.com: # This file will be ignored unless it is at the root of your host: | |
milenio.com: # Used: http://example.com/robots.txt | |
milenio.com: # Ignored: http://example.com/site/robots.txt | |
milenio.com: # | |
milenio.com: # For more information about the robots.txt standard, see: | |
milenio.com: # http://www.robotstxt.org/wc/robots.html | |
milenio.com: # | |
milenio.com: # For syntax checking, see: | |
milenio.com: # http://www.sxw.org.uk/computing/robots/check.html | |
ring.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
classkick.com: # Squarespace Robots Txt | |
20minutos.es: # Agentes permitidos explicitamente | |
20minutos.es: # Agentes bloqueados por idioma | |
20minutos.es: # Agentes nocivos | |
tkgm.gov.tr: # | |
tkgm.gov.tr: # robots.txt | |
tkgm.gov.tr: # | |
tkgm.gov.tr: # This file is to prevent the crawling and indexing of certain parts | |
tkgm.gov.tr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
tkgm.gov.tr: # and Google. By telling these "robots" where not to go on your site, | |
tkgm.gov.tr: # you save bandwidth and server resources. | |
tkgm.gov.tr: # | |
tkgm.gov.tr: # This file will be ignored unless it is at the root of your host: | |
tkgm.gov.tr: # Used: http://example.com/robots.txt | |
tkgm.gov.tr: # Ignored: http://example.com/site/robots.txt | |
tkgm.gov.tr: # | |
tkgm.gov.tr: # For more information about the robots.txt standard, see: | |
tkgm.gov.tr: # http://www.robotstxt.org/robotstxt.html | |
tkgm.gov.tr: # CSS, JS, Images | |
tkgm.gov.tr: # Directories | |
tkgm.gov.tr: # Files | |
tkgm.gov.tr: # Paths (clean URLs) | |
tkgm.gov.tr: # Paths (no clean URLs) | |
scopus.com: # /robots.txt file for http://www.scopus.com/ | |
ahoramismo.com: # Sitemap archive | |
jigsawplanet.com: # --- allow ad bots | |
jigsawplanet.com: # --- | |
diigo.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
diigo.com: #Disallow: /user | |
coindeskjapan.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
alltrails.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
alltrails.com: # | |
123moviesfree.net: #logo { | |
123moviesfree.net: #menu { | |
123moviesfree.net: #menu ul.top-menu { | |
123moviesfree.net: #menu ul.top-menu li { | |
123moviesfree.net: #menu ul.top-menu li a { | |
123moviesfree.net: #menu ul.top-menu li:hover a, | |
123moviesfree.net: #menu ul.top-menu li.active a { | |
123moviesfree.net: #menu ul.top-menu li.active a { | |
123moviesfree.net: #menu .sub-container { | |
123moviesfree.net: #menu .sub-container ul.sub-menu { | |
123moviesfree.net: #menu .sub-container ul.sub-menu li { | |
123moviesfree.net: #menu .sub-container ul.sub-menu li a { | |
123moviesfree.net: #menu .sub-container ul.sub-menu li:hover a { | |
123moviesfree.net: #menu ul.top-menu li:hover .sub-container ul.sub-menu li a { | |
123moviesfree.net: #search { | |
123moviesfree.net: #search input.search-input { | |
123moviesfree.net: #search .search-submit { | |
123moviesfree.net: #search .search-submit i { | |
123moviesfree.net: #top-user { | |
123moviesfree.net: #top-user .top-user-content.guest { | |
123moviesfree.net: #main { | |
123moviesfree.net: #slider { | |
123moviesfree.net: #slider .swiper-slide { | |
123moviesfree.net: #slider .swiper-slide .slide-link { | |
123moviesfree.net: #slider .slide-caption { | |
123moviesfree.net: #slider:hover .slide-caption { | |
123moviesfree.net: #slider .slide-caption h2 { | |
123moviesfree.net: #slider .slide-caption .slide-caption-info { | |
123moviesfree.net: #slider .slide-caption .slide-caption-info .block { | |
123moviesfree.net: #slider .slide-caption .slide-caption-info .block strong { | |
123moviesfree.net: #top-news { | |
123moviesfree.net: #top-news .nav { | |
123moviesfree.net: #top-news .nav li { | |
123moviesfree.net: #top-news .nav li a { | |
123moviesfree.net: #top-news .nav li.active a, | |
123moviesfree.net: #top-news .nav li:hover a { | |
123moviesfree.net: #top-news .top-news { | |
123moviesfree.net: #top-news .top-news-content { | |
123moviesfree.net: #top-news .top-news-content .tn-news { | |
123moviesfree.net: #top-news .top-news-content .tn-notice { | |
123moviesfree.net: #top-news .top-news-content ul { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li:hover { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li:hover .tnc-info h4 a { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li:hover .news-thumb { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li .news-thumb { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li .tnc-info { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li .tnc-info h4 { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li .tnc-info h4 a { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li.view-more { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li.view-more a { | |
123moviesfree.net: #top-news .top-news-content ul.tn-news li.view-more a i { | |
123moviesfree.net: #top-news .top-news-content ul.tn-notice li { | |
123moviesfree.net: #top-news .top-news-content ul.tn-notice li:hover { | |
123moviesfree.net: #top-news .top-news-content ul.tn-notice li a { | |
123moviesfree.net: #top-news .top-news-content ul.tn-notice li a span { | |
123moviesfree.net: #top-news .top-news-content ul.tn-notice li a span i { | |
123moviesfree.net: #top-news .top-news-content .tab-pane { | |
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps { | |
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block { | |
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block span { | |
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block i { | |
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block:hover { | |
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block.ios:hover i { | |
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block.android:hover i { | |
123moviesfree.net: #top-news .top-news-content ul.tn-premium { | |
123moviesfree.net: #top-news .top-news-content ul.tn-premium li { | |
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a { | |
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a:hover { | |
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a .price { | |
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a .btn { | |
123moviesfree.net: #bread .breadcrumb { | |
123moviesfree.net: #bread .breadcrumb a { | |
123moviesfree.net: #mv-info { | |
123moviesfree.net: #mv-info .mvi-cover { | |
123moviesfree.net: #mv-info .mvi-cover:after { | |
123moviesfree.net: #mv-info .mvi-cover:before { | |
123moviesfree.net: #mv-info .mvi-cover:hover:before { | |
123moviesfree.net: #mv-info .mvi-cover:hover:after { | |
123moviesfree.net: #mv-info .mvi-view { | |
123moviesfree.net: #mv-info .mvi-content { | |
123moviesfree.net: #mv-info .mvi-content h3 { | |
123moviesfree.net: #mv-info .mvi-content .block-trailer { | |
123moviesfree.net: #mv-info .mvi-content .block-trailer a { | |
123moviesfree.net: #mv-info .mvi-content .mvic-desc { | |
123moviesfree.net: #mv-info .mvi-content .mvic-desc .desc { | |
123moviesfree.net: #mv-info .mvi-content .mvic-info { | |
123moviesfree.net: #mv-info .mvi-content .mvic-info p { | |
123moviesfree.net: #mv-info .mvi-content .mvic-info .mvici-left { | |
123moviesfree.net: #mv-info .mvi-content .mvic-info .mvici-right { | |
123moviesfree.net: #mv-info .mvi-content .mvic-thumb { | |
123moviesfree.net: #mv-info .mvi-content .mvic-btn { | |
123moviesfree.net: #mv-info .mvi-content .mvic-btn .btn { | |
123moviesfree.net: #mv-info .mvi-content .quality { | |
123moviesfree.net: #mv-info .mvi-content .block-social { | |
123moviesfree.net: #mv-keywords { | |
123moviesfree.net: #mv-keywords a { | |
123moviesfree.net: #mv-keywords a:hover { | |
123moviesfree.net: #mv-keywords a h5 { | |
123moviesfree.net: #mv-keywords a h5:before { | |
123moviesfree.net: #media-player, | |
123moviesfree.net: #content-embed { | |
123moviesfree.net: #media-player.active, | |
123moviesfree.net: #content-embed.active { | |
123moviesfree.net: #bar-player { | |
123moviesfree.net: #bar-player .bp-view { | |
123moviesfree.net: #bar-player .bp-btn-light span:after { | |
123moviesfree.net: #bar-player .bp-btn-light.active span:after { | |
123moviesfree.net: #bar-player .bp-btn-auto span:after { | |
123moviesfree.net: #bar-player .bp-btn-auto.active span:after { | |
123moviesfree.net: #bar-player .btn { | |
123moviesfree.net: #bar-player .btn:hover { | |
123moviesfree.net: #bar-player .btn.active { | |
123moviesfree.net: #bar-player .bp-btn-light.active { | |
123moviesfree.net: #bar-player .btn i { | |
123moviesfree.net: #overlay { | |
123moviesfree.net: #overlay.active { | |
123moviesfree.net: #comment-area { | |
123moviesfree.net: #comment-area.active { | |
123moviesfree.net: #comment-area #toggle { | |
123moviesfree.net: #comment-area #comment { | |
123moviesfree.net: #comment-area #comment.active { | |
123moviesfree.net: #comment-area #comment .content { | |
123moviesfree.net: #comment-area #comment .cac-close { | |
123moviesfree.net: #comment-area #comment .cac-close i { | |
123moviesfree.net: #footer .footer-link { | |
123moviesfree.net: #footer .footer-link.end { | |
123moviesfree.net: #footer .footer-link-head { | |
123moviesfree.net: #footer .footer-link { | |
123moviesfree.net: #footer .footer-link.end { | |
123moviesfree.net: #footer .footer-link-head { | |
123moviesfree.net: #copyright { | |
123moviesfree.net: #footer .heading { | |
123moviesfree.net: #footer a { | |
123moviesfree.net: #footer a:hover { | |
123moviesfree.net: #footer b, | |
123moviesfree.net: #footer strong { | |
123moviesfree.net: #footer .links a { | |
123moviesfree.net: #footer .text-lighter { | |
123moviesfree.net: #footer .desc { | |
123moviesfree.net: #commentfb { | |
123moviesfree.net: #pop-login .modal-dialog, | |
123moviesfree.net: #pop-register .modal-dialog, | |
123moviesfree.net: #pop-forgot .modal-dialog { | |
123moviesfree.net: #pagination { | |
123moviesfree.net: #open-forgot { | |
123moviesfree.net: #menu.active { | |
123moviesfree.net: #search.active { | |
123moviesfree.net: #filter { | |
123moviesfree.net: #filter.active { | |
123moviesfree.net: #filter .fc-title { | |
123moviesfree.net: #filter ul { | |
123moviesfree.net: #filter ul li { | |
123moviesfree.net: #filter ul li.active { | |
123moviesfree.net: #filter ul li label { | |
123moviesfree.net: #filter ul li label input { | |
123moviesfree.net: #filter ul.fc-main-list { | |
123moviesfree.net: #filter ul.fc-main-list li { | |
123moviesfree.net: #filter ul.fc-main-list li a { | |
123moviesfree.net: #filter ul.fc-main-list li a.active { | |
123moviesfree.net: #filter ul.fc-main-list li a:hover { | |
123moviesfree.net: #filter .filter-btn { | |
123moviesfree.net: #filter .cs10-top .fc-filmtype { | |
123moviesfree.net: #filter .cs10-top .fc-quality { | |
123moviesfree.net: #list-eps { | |
123moviesfree.net: #list-eps .le-server { | |
123moviesfree.net: #list-eps .le-server:last-of-type { | |
123moviesfree.net: #list-eps .le-server .les-title { | |
123moviesfree.net: #list-eps .le-server .les-content { | |
123moviesfree.net: #list-eps .le-server .les-content .btn-eps { | |
123moviesfree.net: #list-eps .le-server .les-content .btn-eps.active { | |
123moviesfree.net: #list-eps .le-server .les-content .btn-eps.active:before { | |
123moviesfree.net: #list-eps .le-server .les-content .btn-eps:hover { | |
123moviesfree.net: #donate-paypal .modal-body form { | |
123moviesfree.net: #donate-paypal .modal-body form input[type=image] { | |
123moviesfree.net: #schedule-eps { | |
123moviesfree.net: #schedule-eps .se-next { | |
123moviesfree.net: #schedule-eps .se-next .fa-close { | |
123moviesfree.net: #schedule-eps .se-left { | |
123moviesfree.net: #schedule-eps .se-right { | |
123moviesfree.net: #schedule-eps .se-right a { | |
123moviesfree.net: #schedule-eps .se-list { | |
123moviesfree.net: #schedule-eps .se-list li { | |
123moviesfree.net: #schedule-eps .se-list li:hover { | |
123moviesfree.net: #schedule-eps .se-list li .se-left { | |
123moviesfree.net: #toggle-schedule { | |
123moviesfree.net: #toggle-schedule.active { | |
123moviesfree.net: #toggle-schedule.active .fa-close { | |
123moviesfree.net: #install-app { | |
123moviesfree.net: #install-app .container { | |
123moviesfree.net: #install-app .ia-icon { | |
123moviesfree.net: #install-app .ia-icon img { | |
123moviesfree.net: #install-app .ia-info { | |
123moviesfree.net: #install-app .ia-info .ia-title { | |
123moviesfree.net: #install-app .ia-info p { | |
123moviesfree.net: #install-app .ia-close { | |
123moviesfree.net: #watch-alert {} | |
123moviesfree.net: #watch-alert .alert { | |
123moviesfree.net: #switch-mode { | |
123moviesfree.net: #switch-mode .sm-icon { | |
123moviesfree.net: #switch-mode .sm-text { | |
123moviesfree.net: #switch-mode .sm-button { | |
123moviesfree.net: #switch-mode .sm-button span { | |
123moviesfree.net: #switch-mode.active .sm-button span { | |
123moviesfree.net: #switch-mode.active .sm-button { | |
123moviesfree.net: #homenews { | |
123moviesfree.net: #homenews h2 { | |
123moviesfree.net: #media-player{position:relative;} | |
elnabaa.net: # WebMatrix 1.0 | |
shopdisney.com: # WKS 20190802 12:34 | |
fontawesome.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
fontawesome.com: # | |
fontawesome.com: # To ban all spiders from the entire site uncomment the next two lines: | |
fontawesome.com: # User-agent: * | |
fontawesome.com: # Disallow: / | |
k73.com: # | |
k73.com: # robots.txt for PHPcom | |
k73.com: # | |
hdfc.com: # | |
hdfc.com: # robots.txt | |
hdfc.com: # | |
hdfc.com: # This file is to prevent the crawling and indexing of certain parts | |
hdfc.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
hdfc.com: # and Google. By telling these "robots" where not to go on your site, | |
hdfc.com: # you save bandwidth and server resources. | |
hdfc.com: # | |
hdfc.com: # This file will be ignored unless it is at the root of your host: | |
hdfc.com: # Used: http://example.com/robots.txt | |
hdfc.com: # Ignored: http://example.com/site/robots.txt | |
hdfc.com: # | |
hdfc.com: # For more information about the robots.txt standard, see: | |
hdfc.com: # http://www.robotstxt.org/robotstxt.html | |
hdfc.com: # CSS, JS, Images | |
hdfc.com: # Directories | |
hdfc.com: # Paths (clean URLs) | |
hdfc.com: # Sitemap | |
erickson.it: # Admin section | |
erickson.it: # CSS e JS | |
iminent.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
iminent.com: #content{margin:0 0 0 2%;position:relative;} | |
bportugal.pt: #sliding-popup.sliding-popup-bottom,#sliding-popup.sliding-popup-bottom .eu-cookie-withdraw-banner,.eu-cookie-withdraw-tab{background:#023F5A;}#sliding-popup.sliding-popup-bottom.eu-cookie-withdraw-wrapper{background:transparent}#sliding-popup .popup-content #popup-text h1,#sliding-popup .popup-content #popup-text h2,#sliding-popup .popup-content #popup-text h3,#sliding-popup .popup-content #popup-text p,.eu-cookie-compliance-secondary-button,.eu-cookie-withdraw-tab{color:#fff !important;}.eu-cookie-withdraw-tab{border-color:#fff;}.eu-cookie-compliance-more-button{color:#fff !important;} | |
si.edu: # | |
si.edu: # robots.txt | |
si.edu: # | |
si.edu: # This file is to prevent the crawling and indexing of certain parts | |
si.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
si.edu: # and Google. By telling these "robots" where not to go on your site, | |
si.edu: # you save bandwidth and server resources. | |
si.edu: # | |
si.edu: # This file will be ignored unless it is at the root of your host: | |
si.edu: # Used: http://example.com/robots.txt | |
si.edu: # Ignored: http://example.com/site/robots.txt | |
si.edu: # | |
si.edu: # For more information about the robots.txt standard, see: | |
si.edu: # http://www.robotstxt.org/robotstxt.html | |
si.edu: # CSS, JS, Images | |
si.edu: # Directories | |
si.edu: # Files | |
si.edu: # Paths (clean URLs) | |
si.edu: # Paths (no clean URLs) | |
pcone.com.tw: #Allow: / | |
pcone.com.tw: #User-agent: bingbot | |
pcone.com.tw: #Crawl-Delay: 1 | |
pcone.com.tw: #User-agent: Googlebot | |
pcone.com.tw: #Crawl-Delay: 1 | |
cdkeys.com: # Google Image Crawler Setup | |
cdkeys.com: #User-agent: Googlebot-Image | |
cdkeys.com: #Crawl-delay:10 | |
cdkeys.com: # Bing Image Crawler Setup | |
cdkeys.com: # Crawlers Setup | |
cdkeys.com: # Directories | |
cdkeys.com: # Paths (clean URLs) | |
cdkeys.com: # Paths (no clean URLs) | |
cdkeys.com: # ga | |
cdkeys.com: #Disallow: /*utm_* | |
cdkeys.com: # Extras | |
jobplanet.co.kr: # Yeti | |
jobplanet.co.kr: # NaverBot | |
jobplanet.co.kr: # https://mj12bot.com/ | |
jobplanet.co.kr: # huawei | |
jobplanet.co.kr: # https://aspiegel.com/petalbot | |
jobplanet.co.kr: # Pin code for daum 'web master tool' | |
shopshashi.com: # we use Shopify as our ecommerce platform | |
shopshashi.com: # Google adsbot ignores robots.txt unless specifically named! | |
ibaotu.com: #2019-08-09–fi∏ƒ∫Û# | |
lernsax.de: ######################################################## | |
lernsax.de: # # | |
lernsax.de: # ACHTUNG: Diese Datei wird automatisch generiert. # | |
lernsax.de: # Manuelle Aenderungen werden ueberschrieben! # | |
lernsax.de: # # | |
lernsax.de: ######################################################## | |
zynga.com: # Default Flywheel robots file | |
sophos.com: # robots.txt for www.sophos.com | |
sophos.com: # web server 121 | |
sophos.com: # Sitemaps Pre | |
sophos.com: # Requests for Previous Versions | |
sophos.com: # Requested ML | |
sophos.com: # Requests 20-21 | |
sophos.com: # IX Removals | |
sophos.com: # Company Removals | |
sophos.com: # Translated Removals | |
sophos.com: # Investors | |
sophos.com: # KB | |
sophos.com: # Request Regional Migrations | |
sophos.com: # Requests | |
sophos.com: # PDF Issues | |
sophos.com: # Special | |
sophos.com: # Search | |
sophos.com: # Sophos Home Microsites | |
sophos.com: # New Requests 20-21 | |
sophos.com: # New Requests 21 | |
sophos.com: # Bot Requests | |
cyberpuerta.mx: #Baiduspider | |
cyberpuerta.mx: #Yandex | |
cyberpuerta.mx: #20150623 | |
cyberpuerta.mx: #Crawl-delay: 5 | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # wildcards at the end, because of some crawlers see it as errors | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: # | |
cyberpuerta.mx: #CMS pages | |
cyberpuerta.mx: #Sitemap | |
cyberpuerta.mx: #No follow sitemap parts | |
nettruyen.com: # robots.txt | |
nationwide.co.uk: #pageBody{position:relative;z-index:1} | |
levi.com: #For all robots | |
levi.com: #Block access to specific groups of pages | |
levi.com: #EU markets - blocks over 2 facet combinations | |
levi.com: #Phase 1 SEEU - blocks all facets. Short URLs are the vanity facets, to be re-opened upon further research. | |
levi.com: #Phase 2 SEEU - blocks all facets. Short URLs are the vanity facets, to be re-opened upon further research. | |
levi.com: #Levi GB Robots.txt Test | |
levi.com: #US - Allow up to 3 facet combinations | |
levi.com: #CA - Allow up to 2 facet combinations | |
levi.com: #Block colorgroup facet for all EU products that aren't jeans, excludes US & CA | |
levi.com: #Phase 1 SEEU - block all colourgroup facets regardless of other rules | |
levi.com: #Phase 2 SEEU - block all colourgroup facets regardless of other rules | |
levi.com: #Allow colorgroup facet navigation with specific conditions for each market | |
levi.com: #ES - Allow for jeans only | |
levi.com: #PL - Allow for jeans only | |
levi.com: #RU - Allow for jeans only | |
levi.com: #Allow colorgroup facet for all markets - jean products only, except for US & CA | |
levi.com: #Block search facets for colourgroup | |
levi.com: #Block over two or more facet combinations for vaqueros | |
levi.com: #Block the stretch feature | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Block size facet for all markets | |
levi.com: #Block feature-size_group facet for all markets | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Blocks jeans product item type facet nav for EU markets | |
levi.com: #Phase 1 SEEU | |
levi.com: #Phase 2 SEEU | |
levi.com: #Block feature-fit facet for all markets where it includes a combination of two or more facets | |
levi.com: #Block feature-fit_name facet for all markets where it includes a combination of two or more facets | |
levi.com: #Block feature-rise facet for all markets where it includes a combination of two or more facets | |
levi.com: #Block plusbottoms facet for all markets | |
levi.com: #Block tops facet for all markets | |
levi.com: #Block plustops facet for all markets | |
levi.com: #Block bigandtalltops facet for all markets | |
levi.com: #Block custom facet for all markets | |
levi.com: #Block dressesandjumpsuits facet for all markets | |
levi.com: #Block feature-sustainability facet for all markets | |
levi.com: #Block shoes facet for all markets | |
levi.com: #Block underwear facet for all markets | |
levi.com: #Block void facet for all markets until generation can be stopped | |
levi.com: #Block privacy policy excess rules for all markets until generation can be stopped | |
levi.com: #Block averageoverallrating facet for all markets | |
levi.com: #Block waist length and price facets in US | |
levi.com: #Allow fit + color in the US | |
levi.com: #SEEU Phase 1 | |
levi.com: #SEEU Phase 2 | |
levi.com: #TBD for removal | |
levi.com: # Block CazoodleBot as it does not present correct accept content headers | |
levi.com: # Block MJ12bot as it is just noise | |
levi.com: # Block dotbot as it cannot parse base urls properly | |
levi.com: # Block Gigabot | |
levi.com: # Block Social Boost | |
www.gov.za: # | |
www.gov.za: # robots.txt | |
www.gov.za: # | |
www.gov.za: # This file is to prevent the crawling and indexing of certain parts | |
www.gov.za: # of your site by web crawlers and spiders run by sites like Yahoo! | |
www.gov.za: # and Google. By telling these "robots" where not to go on your site, | |
www.gov.za: # you save bandwidth and server resources. | |
www.gov.za: # | |
www.gov.za: # This file will be ignored unless it is at the root of your host: | |
www.gov.za: # Used: http://example.com/robots.txt | |
www.gov.za: # Ignored: http://example.com/site/robots.txt | |
www.gov.za: # | |
www.gov.za: # For more information about the robots.txt standard, see: | |
www.gov.za: # http://www.robotstxt.org/robotstxt.html | |
www.gov.za: # CSS, JS, Images | |
www.gov.za: # Directories | |
www.gov.za: # Files | |
www.gov.za: # Paths (clean URLs) | |
www.gov.za: # Paths (no clean URLs) | |
hasznaltauto.hu: # robots.txt, www.hasznaltauto.hu | |
deccanherald.com: # Directories | |
deccanherald.com: # Files | |
deccanherald.com: # Paths (clean URLs) | |
deccanherald.com: # Paths (no clean URLs) | |
deccanherald.com: # Custom paths | |
tripadvisor.ru: # Hi there, | |
tripadvisor.ru: # | |
tripadvisor.ru: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
tripadvisor.ru: # | |
tripadvisor.ru: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet? | |
tripadvisor.ru: # | |
tripadvisor.ru: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team | |
tripadvisor.ru: # | |
tripadvisor.ru: # Email seoRockstar@tripadvisor.com | |
tripadvisor.ru: # | |
tripadvisor.ru: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo | |
tripadvisor.ru: # | |
tripadvisor.ru: # | |
ecuavisa.com: # | |
ecuavisa.com: # robots.txt | |
ecuavisa.com: # | |
ecuavisa.com: # This file is to prevent the crawling and indexing of certain parts | |
ecuavisa.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ecuavisa.com: # and Google. By telling these "robots" where not to go on your site, | |
ecuavisa.com: # you save bandwidth and server resources. | |
ecuavisa.com: # | |
ecuavisa.com: # This file will be ignored unless it is at the root of your host: | |
ecuavisa.com: # Used: http://example.com/robots.txt | |
ecuavisa.com: # Ignored: http://example.com/site/robots.txt | |
ecuavisa.com: # | |
ecuavisa.com: # For more information about the robots.txt standard, see: | |
ecuavisa.com: # http://www.robotstxt.org/robotstxt.html | |
ecuavisa.com: # CSS, JS, Images | |
ecuavisa.com: # Directories | |
ecuavisa.com: # Files | |
ecuavisa.com: # Paths (clean URLs) | |
ecuavisa.com: # Paths (URLs Alexa) | |
ecuavisa.com: # Disallow: /busqueda | |
ecuavisa.com: # Disallow: /taxonomy/term/* | |
ecuavisa.com: # Paths (URLs Adsense) | |
ecuavisa.com: # Paths (no clean URLs) | |
ecuavisa.com: # Paths not bots | |
ecuavisa.com: # Disallow: /taxonomy/term/* | |
ecuavisa.com: # Disallow: /tags/* | |
ecuavisa.com: # Disallow: /lo-mas-visto-de-televistazo/* | |
ecuavisa.com: # Disallow: /fotogaleria/todos/* | |
ecuavisa.com: # Disallow: /categoria/noticias/* | |
ecuavisa.com: # Disallow: /categoria/internacionales/* | |
ecuavisa.com: # Disallow: /categoria/espectaculo/* | |
ecuavisa.com: # Disallow: /categoria/actualidad/* | |
ecuavisa.com: # Fix search console | |
ecuavisa.com: # Disallow: /taxonomy/term/75920/all/feed | |
ecuavisa.com: # Disallow: /busqueda?* | |
ecuavisa.com: # Disallow: /taxonomy/term/ | |
ecuavisa.com: # Disallow: /fotogaleria/todos* | |
ecuavisa.com: # Disallow: /lo-mas-leido/* | |
heureka.sk: # Webmasters contact: seo@heureka.cz | |
heureka.sk: #SearchRelated | |
heureka.sk: #Bugs | |
heureka.sk: # Filters | |
heureka.sk: # Rating | |
heureka.sk: # Ordering | |
heureka.sk: ### | |
heureka.sk: # Wait list | |
heureka.sk: ### | |
heureka.sk: # | |
heureka.sk: # User-agent: PetalBot (Huawei search engine) | |
heureka.sk: # Disallow: / | |
heureka.sk: # | |
heureka.sk: ### | |
igihe.com: # robots.txt | |
igihe.com: # @url: http://igihe.com | |
igihe.com: # @generator: SPIP 3.1.10 [24286] | |
igihe.com: # @template: squelettes-dist/robots.txt.html | |
cmcmarkets.com: # instrument changes start | |
cmcmarkets.com: # instruments changes end | |
kiwilimon.com: # Chefs autorizados | |
kiwilimon.com: # Fin Chefs autorizados | |
heureka.cz: # Webmasters contact: seo@heureka.cz | |
heureka.cz: #SearchRelated | |
heureka.cz: #Bugs | |
heureka.cz: # Filters | |
heureka.cz: # Rating | |
heureka.cz: # Ordering | |
heureka.cz: ### | |
heureka.cz: # Wait list | |
heureka.cz: ### | |
heureka.cz: # | |
heureka.cz: # User-agent: PetalBot (Huawei search engine) | |
heureka.cz: # Disallow: / | |
heureka.cz: # | |
heureka.cz: ### | |
cnrs.fr: # | |
cnrs.fr: # robots.txt | |
cnrs.fr: # | |
cnrs.fr: # This file is to prevent the crawling and indexing of certain parts | |
cnrs.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
cnrs.fr: # and Google. By telling these "robots" where not to go on your site, | |
cnrs.fr: # you save bandwidth and server resources. | |
cnrs.fr: # | |
cnrs.fr: # This file will be ignored unless it is at the root of your host: | |
cnrs.fr: # Used: http://example.com/robots.txt | |
cnrs.fr: # Ignored: http://example.com/site/robots.txt | |
cnrs.fr: # | |
cnrs.fr: # For more information about the robots.txt standard, see: | |
cnrs.fr: # http://www.robotstxt.org/robotstxt.html | |
cnrs.fr: # CSS, JS, Images | |
cnrs.fr: # Directories | |
cnrs.fr: # Files | |
cnrs.fr: # Paths (clean URLs) | |
cnrs.fr: # Paths (no clean URLs) | |
atterley.com: # Google Image Crawler Setup | |
goosedefi.com: # https://www.robotstxt.org/robotstxt.html | |
enedis.fr: # | |
enedis.fr: # robots.txt | |
enedis.fr: # | |
enedis.fr: # This file is to prevent the crawling and indexing of certain parts | |
enedis.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
enedis.fr: # and Google. By telling these "robots" where not to go on your site, | |
enedis.fr: # you save bandwidth and server resources. | |
enedis.fr: # | |
enedis.fr: # This file will be ignored unless it is at the root of your host: | |
enedis.fr: # Used: http://example.com/robots.txt | |
enedis.fr: # Ignored: http://example.com/site/robots.txt | |
enedis.fr: # | |
enedis.fr: # For more information about the robots.txt standard, see: | |
enedis.fr: # http://www.robotstxt.org/robotstxt.html | |
enedis.fr: # CSS, JS, Images | |
enedis.fr: # Directories | |
enedis.fr: # Files | |
enedis.fr: # Paths (clean URLs) | |
enedis.fr: # Paths (no clean URLs) | |
vertex42.com: #User-agent: * | |
vertex42.com: #Disallow: /blog/wp-admin | |
vertex42.com: #Disallow: /blog/trackback | |
vertex42.com: #Disallow: /blog/cgi-bin | |
vertex42.com: #Disallow: /blog/search | |
vertex42.com: #Disallow: /blog/rss | |
vertex42.com: #Disallow: /blog/tag/* | |
vertex42.com: #Disallow: /blog/tag | |
vertex42.com: #Disallow: /blog/comments/feed | |
vertex42.com: #Disallow: /blog/comments | |
vertex42.com: #Disallow: /blog/login/ | |
vertex42.com: #Disallow: /blog/feed | |
vertex42.com: #Disallow: /blog/feed/$ | |
vertex42.com: #Disallow: /blog/*/feed/$ | |
vertex42.com: #Disallow: /blog/*/feed/rss/$ | |
vertex42.com: #Disallow: /blog/*/trackback/$ | |
vertex42.com: #Disallow: /blog/wp-login.php | |
vertex42.com: #Disallow: /blog/*wp-login.php* | |
vertex42.com: # Disallow Collectors and Spam | |
vertex42.com: # Disallow Offline Browsers | |
acnestudios.com: #robots.txt for http://www.acnestudios.com | |
acnestudios.com: #My Account section | |
acnestudios.com: #Cart page | |
acnestudios.com: #Checkout pages | |
acnestudios.com: #Gift pages | |
acnestudios.com: #Old collections | |
acnestudios.com: #Old peterschlesinger | |
acnestudios.com: #Sale pages | |
acnestudios.com: #Old Personalisation page | |
acnestudios.com: #Old refinements | |
acnestudios.com: #Old homepage | |
acnestudios.com: #site switcher pages | |
acnestudios.com: #search pages | |
acnestudios.com: #locales | |
acnestudios.com: #Sitemap files | |
sensortower.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
umd.edu: # | |
umd.edu: # robots.txt | |
umd.edu: # | |
umd.edu: # This file is to prevent the crawling and indexing of certain parts | |
umd.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
umd.edu: # and Google. By telling these "robots" where not to go on your site, | |
umd.edu: # you save bandwidth and server resources. | |
umd.edu: # | |
umd.edu: # This file will be ignored unless it is at the root of your host: | |
umd.edu: # Used: http://example.com/robots.txt | |
umd.edu: # Ignored: http://example.com/site/robots.txt | |
umd.edu: # | |
umd.edu: # For more information about the robots.txt standard, see: | |
umd.edu: # http://www.robotstxt.org/robotstxt.html | |
umd.edu: # CSS, JS, Images | |
umd.edu: # Directories | |
umd.edu: # Files | |
umd.edu: # Paths (clean URLs) | |
umd.edu: # Paths (no clean URLs) | |
home.kpmg: # Version 2020.10.22 | |
home.kpmg: # home.kpmg | |
arbetsformedlingen.se: #Disallow: /*91.* | |
vivaaerobus.com: # Exclude Files From All Robots: | |
vivaaerobus.com: # SPANISH SITE | |
vivaaerobus.com: # ENGLISH SITE | |
vivaaerobus.com: # SITEMAPS | |
vivaaerobus.com: # End robots.txt file | |
jobhero.com: #COST-2205 | |
fuq.com: # www.robotstxt.org/ | |
fuq.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
time.is: # Frequent, automatic reloading of Time.is is not allowed! | |
time.is: # If you want to reload Time.is automatically, please use a refresh interval of 1 hour or more. | |
time.is: # Time.is is made for humans. Automatic refresh and any usage from within scripts and apps is forbidden. | |
time.is: # If you need time synchronization for your app, please contact us about our API. | |
time.is: #Disallow: /*_2010* | |
time.is: #Disallow: /*Jan_2011* | |
time.is: #Disallow: /0* | |
time.is: #Disallow: /1* | |
time.is: #Disallow: /200* | |
time.is: #Disallow: *.js | |
time.is: #Disallow: /*/facts/ | |
time.is: # maximum rate is one page every 5 seconds | |
time.is: # | |
time.is: # Yahoo Pipes is for feeds not web pages. | |
time.is: # | |
depop.com: #Prevent Bot Crawl of applied search filters | |
juzimi.com: # $Id: robots.txt,v 1.9.2.2 2010/09/06 10:37:16 goba Exp $ | |
juzimi.com: # | |
juzimi.com: # robots.txt | |
juzimi.com: # | |
juzimi.com: # This file is to prevent the crawling and indexing of certain parts | |
juzimi.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
juzimi.com: # and Google. By telling these "robots" where not to go on your site, | |
juzimi.com: # you save bandwidth and server resources. | |
juzimi.com: # | |
juzimi.com: # This file will be ignored unless it is at the root of your host: | |
juzimi.com: # Used: http://example.com/robots.txt | |
juzimi.com: # Ignored: http://example.com/site/robots.txt | |
juzimi.com: # | |
juzimi.com: # For more information about the robots.txt standard, see: | |
juzimi.com: # http://www.robotstxt.org/wc/robots.html | |
juzimi.com: # | |
juzimi.com: # For syntax checking, see: | |
juzimi.com: # http://www.sxw.org.uk/computing/robots/check.html | |
juzimi.com: # Directories | |
juzimi.com: # Files | |
juzimi.com: # Paths (clean URLs) | |
juzimi.com: # Paths (no clean URLs) | |
nesine.com: # robots.txt for https://www.nesine.com/ | |
galaxus.ch: # @/ @/ | |
galaxus.ch: # @/ @/ Hello, fellow humans! | |
galaxus.ch: # @/ @/ | |
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@ | |
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@ | |
galaxus.ch: # @@@@@@ @@@@@@@@@@@@ @@@@@@ @@@% @ | |
galaxus.ch: # @@@@@ /@@@@@@@@@@ @@@@@@ @@@ @@ | |
galaxus.ch: # @@@@@@ @@@@@@@@@@@, @@@@@@ @@@@ @@@@ | |
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@ | |
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@ | |
galaxus.ch: # @@@@@@@@@@@@@ @@@@@@@@@@@@@@ @@@@@@@@ | |
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@ | |
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@ | |
bestprice.gr: # Ref: pricegrabber.com/robots.txt | |
extra.com: # For all robots | |
extra.com: # Sitemap files | |
extra.com: # 530506 / 2019 | |
extra.com: ##743301 / 2019 | |
extra.com: #CS20200000366474 | |
extra.com: #Blocking Base URL | |
mudah.my: # It is expressly forbidden to use spiders or other | |
mudah.my: # automated methods to access mudah.my. Only if mudah.my | |
mudah.my: # has given special permit such access is allowed. | |
mudah.my: ## Google | |
mudah.my: # all | |
mudah.my: #Disallow: /aw | |
mudah.my: #Disallow: /st | |
mudah.my: #Google Doubleclick gpt network ID | |
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time) | |
mudah.my: # all | |
mudah.my: #Disallow: /aw | |
mudah.my: #Disallow: /st | |
mudah.my: # all | |
mudah.my: #Disallow: /aw | |
mudah.my: #Disallow: /st | |
mudah.my: # all | |
mudah.my: #Disallow: /aw | |
mudah.my: #Disallow: /st | |
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time) | |
mudah.my: # all | |
mudah.my: #Disallow: /aw | |
mudah.my: #Disallow: /st | |
mudah.my: ## Yahoo | |
mudah.my: # all | |
mudah.my: #Disallow: /aw | |
mudah.my: #Disallow: /st | |
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time) | |
mudah.my: # all | |
mudah.my: #Disallow: /aw | |
mudah.my: #Disallow: /st | |
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time) | |
demae-can.com: #testchains | |
demae-can.com: #testshops | |
demae-can.com: #testshop_detail | |
tctelevision.com: # robots.txt for https://www.tctelevision.com/ | |
tctelevision.com: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
consolegameswiki.com: #Bad Bots | |
consolegameswiki.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
consolegameswiki.com: # unless they're feeding search engines. | |
consolegameswiki.com: # Some bots are known to be trouble, particularly those designed to copy | |
consolegameswiki.com: # entire sites. Please obey robots.txt. | |
consolegameswiki.com: # | |
consolegameswiki.com: # Sorry, wget in its recursive mode is a frequent problem. | |
consolegameswiki.com: # Please read the man page and use it properly; there is a | |
consolegameswiki.com: # --wait option you can use to set the delay between hits, | |
consolegameswiki.com: # for instance. | |
consolegameswiki.com: # | |
consolegameswiki.com: # | |
consolegameswiki.com: # Doesn't follow robots.txt anyway, but... | |
consolegameswiki.com: # | |
consolegameswiki.com: # | |
consolegameswiki.com: # Hits many times per second, not acceptable | |
consolegameswiki.com: # http://www.nameprotect.com/botinfo.html | |
consolegameswiki.com: # A capture bot, downloads gazillions of pages with no public benefit | |
consolegameswiki.com: # http://www.webreaper.net/ | |
consolegameswiki.com: #Allow! | |
athenahealth.com: # | |
athenahealth.com: # robots.txt | |
athenahealth.com: # | |
athenahealth.com: # This file is to prevent the crawling and indexing of certain parts | |
athenahealth.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
athenahealth.com: # and Google. By telling these "robots" where not to go on your site, | |
athenahealth.com: # you save bandwidth and server resources. | |
athenahealth.com: # | |
athenahealth.com: # This file will be ignored unless it is at the root of your host: | |
athenahealth.com: # Used: http://example.com/robots.txt | |
athenahealth.com: # Ignored: http://example.com/site/robots.txt | |
athenahealth.com: # | |
athenahealth.com: # For more information about the robots.txt standard, see: | |
athenahealth.com: # http://www.robotstxt.org/robotstxt.html | |
athenahealth.com: # CSS, JS, Images | |
athenahealth.com: #Allow: /core/*.css? | |
athenahealth.com: #Allow: /core/*.css$ | |
athenahealth.com: #Allow: /core/*.js$ | |
athenahealth.com: #Allow: /core/*.js? | |
athenahealth.com: #Allow: /core/*.gif | |
athenahealth.com: #Allow: /core/*.jpg | |
athenahealth.com: #Allow: /core/*.jpeg | |
athenahealth.com: #Allow: /core/*.png | |
athenahealth.com: #Allow: /core/*.svg | |
athenahealth.com: #Allow: /profiles/*.css$ | |
athenahealth.com: #Allow: /profiles/*.css? | |
athenahealth.com: #Allow: /profiles/*.js$ | |
athenahealth.com: #Allow: /profiles/*.js? | |
athenahealth.com: #Allow: /profiles/*.gif | |
athenahealth.com: #Allow: /profiles/*.jpg | |
athenahealth.com: #Allow: /profiles/*.jpeg | |
athenahealth.com: #Allow: /profiles/*.png | |
athenahealth.com: #Allow: /profiles/*.svg | |
athenahealth.com: # Directories | |
athenahealth.com: # Files | |
athenahealth.com: # Paths (clean URLs) | |
athenahealth.com: # Paths (no clean URLs) | |
athenahealth.com: # D7 Paths | |
athenahealth.com: # /robots.txt file for http://www.athenahealth.com | |
athenahealth.com: # User Agent Exclusion (Legacy site) | |
dataquest.io: # Sitemap for pages (landing pages) | |
dataquest.io: # Sitemap for Blog posts | |
al-maktaba.org: #container { | |
mendeley.com: # Careers: CSS, JS, Images | |
mendeley.com: # Careers: Directories | |
mendeley.com: # Careers: Files | |
mendeley.com: # Careers: Paths (clean URLs) | |
mendeley.com: # Careers: Paths (no clean URLs) | |
rxlist.com: # | |
rxlist.com: # robots.txt for MedicineNet, Inc. Properties | |
rxlist.com: # | |
perfect-english-grammar.com: # perfect-english-grammar.com (Fri Oct 27 12:46:12 2017) | |
kth.se: # | |
kth.se: # robots.txt from 17.392 | |
kth.se: # | |
loc.gov: #Baiduspider | |
rentcafe.com: #robots.txt document for http://www.rentcafe.com/robots.txt | |
olx.kz: # sitecode:olxkz-desktop | |
film2movie.asia: # BEGIN XML-SITEMAP-PLUGIN | |
film2movie.asia: # END XML-SITEMAP-PLUGIN | |
shorouknews.com: #Baiduspider | |
shorouknews.com: #User-agent: Baiduspider | |
shorouknews.com: #Disallow: / | |
redalyc.org: # Google Image | |
redalyc.org: # Google AdSense | |
getvideo.org: # | |
getvideo.org: # robots.txt | |
getvideo.org: # | |
getvideo.org: # This file is to prevent the crawling and indexing of certain parts | |
getvideo.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
getvideo.org: # and Google. By telling these "robots" where not to go on your site, | |
getvideo.org: # you save bandwidth and server resources. | |
getvideo.org: # | |
getvideo.org: # This file will be ignored unless it is at the root of your host: | |
getvideo.org: # Used: http://example.com/robots.txt | |
getvideo.org: # Ignored: http://example.com/site/robots.txt | |
getvideo.org: # | |
getvideo.org: # For more information about the robots.txt standard, see: | |
getvideo.org: # http://www.robotstxt.org/wc/robots.html | |
getvideo.org: # | |
getvideo.org: # For syntax checking, see: | |
getvideo.org: # http://www.sxw.org.uk/computing/robots/check.html | |
getvideo.org: # Directories | |
getvideo.org: # Directories | |
iwara.tv: # | |
iwara.tv: # robots.txt | |
iwara.tv: # | |
iwara.tv: # This file is to prevent the crawling and indexing of certain parts | |
iwara.tv: # of your site by web crawlers and spiders run by sites like Yahoo! | |
iwara.tv: # and Google. By telling these "robots" where not to go on your site, | |
iwara.tv: # you save bandwidth and server resources. | |
iwara.tv: # | |
iwara.tv: # This file will be ignored unless it is at the root of your host: | |
iwara.tv: # Used: http://example.com/robots.txt | |
iwara.tv: # Ignored: http://example.com/site/robots.txt | |
iwara.tv: # | |
iwara.tv: # For more information about the robots.txt standard, see: | |
iwara.tv: # http://www.robotstxt.org/robotstxt.html | |
iwara.tv: # CSS, JS, Images | |
iwara.tv: # Directories | |
iwara.tv: # Files | |
iwara.tv: # Paths (clean URLs) | |
iwara.tv: # Paths (no clean URLs) | |
lucid.app: # | |
lucid.app: # robots.txt | |
lucid.app: # | |
lucid.app: # This file is to prevent the crawling and indexing of certain parts | |
lucid.app: # of your site by web crawlers and spiders run by sites like Yahoo! | |
lucid.app: # and Google. By telling these "robots" where not to go on your site, | |
lucid.app: # you save bandwidth and server resources. | |
lucid.app: # | |
lucid.app: # This file will be ignored unless it is at the root of your host: | |
lucid.app: # Used: http://example.com/robots.txt | |
lucid.app: # Ignored: http://example.com/site/robots.txt | |
lucid.app: # | |
lucid.app: # For more information about the robots.txt standard, see: | |
lucid.app: # http://www.robotstxt.org/wc/robots.html | |
lucid.app: # | |
lucid.app: # For syntax checking, see: | |
lucid.app: # http://www.sxw.org.uk/computing/robots/check.html | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/bulletins_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/coupon_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/domainhub_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/lists_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/neighbor_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/neighborhood_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/newsletter_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/pictures_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/product_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/promotion_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/shoutout_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/static_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/vanity_index.xml | |
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/autos_index.xml | |
141jav.com: #Disallow: /hot/ | |
yoast.com: # This space intentionally left blank | |
yoast.com: # If you want to learn about why our robots.txt looks like this, read this post: https://yoa.st/robots-txt | |
travian.com: # robots.txt für travian.com | |
csun.edu: # | |
csun.edu: # robots.txt | |
csun.edu: # | |
csun.edu: # This file is to prevent the crawling and indexing of certain parts | |
csun.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
csun.edu: # and Google. By telling these "robots" where not to go on your site, | |
csun.edu: # you save bandwidth and server resources. | |
csun.edu: # | |
csun.edu: # This file will be ignored unless it is at the root of your host: | |
csun.edu: # Used: http://example.com/robots.txt | |
csun.edu: # Ignored: http://example.com/site/robots.txt | |
csun.edu: # | |
csun.edu: # For more information about the robots.txt standard, see: | |
csun.edu: # http://www.robotstxt.org/robotstxt.html | |
csun.edu: # CSS, JS, Images | |
csun.edu: # Custom Disallows | |
csun.edu: # Directories | |
csun.edu: # Files | |
csun.edu: # Paths (clean URLs) | |
csun.edu: # Paths (no clean URLs) | |
builtin.com: # | |
builtin.com: # robots.txt | |
builtin.com: # | |
builtin.com: # This file is to prevent the crawling and indexing of certain parts | |
builtin.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
builtin.com: # and Google. By telling these "robots" where not to go on your site, | |
builtin.com: # you save bandwidth and server resources. | |
builtin.com: # | |
builtin.com: # This file will be ignored unless it is at the root of your host: | |
builtin.com: # Used: http://example.com/robots.txt | |
builtin.com: # Ignored: http://example.com/site/robots.txt | |
builtin.com: # | |
builtin.com: # For more information about the robots.txt standard, see: | |
builtin.com: # http://www.robotstxt.org/robotstxt.html | |
builtin.com: # CSS, JS, Images | |
builtin.com: # Directories | |
builtin.com: # Files | |
builtin.com: # Paths (clean URLs) | |
builtin.com: # Paths (no clean URLs) | |
builtin.com: # Company Directory Paths | |
builtin.com: # Mirrored Company Profiles on builtin.com | |
darademo.wordpress.com: # This file was generated on Sat, 10 Oct 2020 02:19:50 +0000 | |
home.blog: # This file was generated on Tue, 31 Mar 2020 18:03:55 +0000 | |
vivo.com.br: # | |
vivo.com.br: # | |
vivo.com.br: # robots.txt | |
vivo.com.br: # | |
vivo.com.br: # This file is to prevent the crawling and indexing of certain parts | |
vivo.com.br: # of your site by web crawlers and spiders run by sites like Yahoo! | |
vivo.com.br: # and Google. By telling these "robots" where not to go on your site, | |
vivo.com.br: # you save bandwidth and server resources. | |
vivo.com.br: # | |
vivo.com.br: # This file will be ignored unless it is at the root of your host: | |
vivo.com.br: # Used: http://example.com/robots.txt | |
vivo.com.br: # Ignored: http://example.com/site/robots.txt | |
vivo.com.br: # | |
vivo.com.br: # For more information about the robots.txt standard, see: | |
vivo.com.br: # http://www.robotstxt.org/robotstxt.html | |
vivo.com.br: # Directories | |
vivo.com.br: # Files | |
vivo.com.br: # Paths (clean URLs) | |
vivo.com.br: # Paths (no clean URLs) | |
jira.com: # JIRA: | |
jira.com: # Disallow all SearchRequestViews in the IssueNavigator (Word, XML, RSS, | |
jira.com: # etc), all IssueViews (XML, Printable and Word), all charts and reports. | |
jira.com: # Disallow admin. | |
jira.com: # | |
jira.com: # Confluence: | |
jira.com: # Confluence uses in-page robot exclusion tags for non-indexable pages. | |
jira.com: # Disallow admin explicitly. | |
jira.com: # | |
jira.com: # General: | |
jira.com: # Disallow login, logout | |
manualslib.com: #Baiduspider | |
manualslib.com: ## Added by PTN | |
eprice.com.tw: #allow: /ad/redir.html | |
travelocity.com: # | |
travelocity.com: # General bots | |
travelocity.com: # | |
travelocity.com: #hotel | |
travelocity.com: #flight | |
travelocity.com: #package | |
travelocity.com: #car | |
travelocity.com: #activities | |
travelocity.com: #cruise | |
travelocity.com: #other | |
travelocity.com: # | |
travelocity.com: # Google Ads | |
travelocity.com: # | |
travelocity.com: # | |
travelocity.com: # | |
travelocity.com: # Bing Ads | |
travelocity.com: # | |
travelocity.com: # | |
travelocity.com: # SemrushBot | |
travelocity.com: # | |
boulanger.com: # BOULANGER.COM | |
boulanger.com: # Robot Exclusion File -- robots.txt | |
boulanger.com: # Last Updated: 22/02/2021 | |
boulanger.com: # Disallow | |
boulanger.com: # Fichiers & Scripts | |
boulanger.com: #Mon compte | |
boulanger.com: # Sitemap files | |
upbit.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
president.az: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
president.az: # | |
president.az: # To ban all spiders from the entire site uncomment the next two lines: | |
president.az: # User-Agent: * | |
president.az: # Disallow: / | |
bejson.com: # robots.txt generated at http://www.bejson.com | |
poxiao.com: # | |
poxiao.com: # robots.txt for EmpireCMS | |
poxiao.com: # | |
avvo.com: # :@HA##@@@@@@s S@@@HHHAGGAhAMHA&&AB@@ | |
avvo.com: # i@AHHH#@@@@@@@i. ;@@@#BA&HH&X23B#B&&HAH@@ | |
avvo.com: # ##99hAHH#@@@@@@@2, , ,::. . ,::. ,, &@@@#BHAA&A&AGSX##HBBHB@# | |
avvo.com: # #3hh&hXXA@@@@@@@@@S :rSis;:;isrsssirrsriSs: :@@@@#MAGGAA&Ahh9sA@##BB#@s | |
avvo.com: # sG2Ssi2322A#@@@@@@@@ ,si;, ,: .,. ...:s5iSr 3@@@##M&322X339AGh5X#@MB#@@ | |
avvo.com: # sHhSissiSisS53M#@AXA25i,..... . .. . ..,:,s5rrr9@@@###AhhX5SSiS5GhGG@@MM#@@ | |
avvo.com: # .A3i2irrrrsiisis:;::;, ,..:,. .:, ,,:..:.;rrssAAX#H&AH#M##BA&hi59G##MMMM@r | |
avvo.com: # ;SsiS523XSsr;,;, ,,.. .,:;r: ..,., .;r:.,,,,;r;;rrr;2M##@@@###@BXM#MBHBM@# | |
avvo.com: # ii3hh9XSri;,. ., ....,,;rr, . ,. r,.:s;:,,,,:,,::,,sSsShAAHMMMMMBM######@. | |
avvo.com: # rAHAAGSr, . .....,.,::r;,,:,r;;;::;Ss;:,::,,,.,,;;,,;;iGAM####BHB###M@2 | |
avvo.com: # A#&ir,.. .,,:,:::;;si.,;;srr;:rsir;r;:;:::::::,,:,:ri5GHM@@##MA&A@@ | |
avvo.com: # @@#9s. ...,,,;r;:rr;srii,;:;2rrir22S;;rrrr;rr;;::::::;;;sis5M@@##MBB@S | |
avvo.com: # #@#Bi, .,,:;;;;srr22iiri3i;rr5iihii3X5isiisSisr;;:;::::::::,;G#@##@@#@s | |
avvo.com: # @@#s.,:;r;rrsSirXXh2irSX#X:r;r2&AXS2hGA2iSXSrrr;;;::::::::;;2#@@@@##@@, | |
avvo.com: # #i::;sisiS25ssXG#Bir53A#2,:sSGBAXXS&MB925iisisrr;;;;;:;;;;rS3#@###@@X | |
avvo.com: # ri;:s52Siii232hB##HS293MH;:s,2@HH93X3#MA322AGXirrrr;;;;;;rrrr9MB#@@@r | |
avvo.com: # ;X;r2XX2Siiii52M@@@B593hMi,Sh ;#A93hHAM@@BGABGX5iiisrrrr;rrrrs&@@@@@@: | |
avvo.com: # .3rri3G&X2XXXXS2&@@#AhAh#2.rGG; iB3G9M#@@@M3AG93933X3225ssrrr;;r3M@@@@s | |
avvo.com: # rrs2ABB23M2;X@@H9A#B&AAH3,,i2i, :@#HM#@@@##B&h&h2Xh9hGAGG2rrrr;::iGM@@@ | |
avvo.com: # ;;5B@#HrAB,,#@@@h :A#B99i ,;;. ;#@#@@@#@@@@@@MBAGhHAA#@#95srr;;::;iB@@ | |
avvo.com: # :rhMG93i##i9@@@@5 9H#MM2 . .,5@@#@@@#@BA@@@@@#3:9@MH@@@@H2ir;;:::r&@ | |
avvo.com: # :;i22X2S&@#AA@@2 3G9@@@X. .,;iH@@@@@##A2G@@@@@@&sX@hAHB@@@#hisr;;;;rX2 | |
avvo.com: # .,:r53hhXXA#M&9hXShBSMAHs ,:2B#@@@MAH2;i#@@@@@AG#HhHHG&ABBHXSirrrrrs3 | |
avvo.com: # ;;,rXABH&9X5X93XGH#hs#hr ..,,;2&@@#MhAAi;r9##B&H#BAAA&GGGG93X2irrrrrs5: | |
avvo.com: # .:,r5GHBAA9XSi29G&M##Br ...,;:, :s#@@#M###&2239GBBA&HA&GAGGG2iS55irrr;;S@. | |
avvo.com: # .,.:rS9AA&AAhh&A&AHM#S ,,,,::... ,2@@@@@@M#MHB#BBBMMM##MHAA92isiiiirr;:rH; | |
avvo.com: # .;,,:;rs239hh9&h9h99h; .::,::,,,.,:X@@@BMHBHBB#BB####MBHGX225irrrrrrr;:;X3 | |
avvo.com: # .;.,,:rssiiSSiissssr: .:;;:;:,,,. ,2AHHHHHBAHHHHAHHA&h9933X2irr;rr;;;::SH, | |
avvo.com: # ,:.,::;rrssrrr;r;, ;:rS;ir;2Xr;5S ;XAh3&AAG&A&GG&&&9X222Sisrr;rr;;:::r&. | |
avvo.com: # r:.,,,,::;;;;;:, .;AXsis@Hsrr: .rXGh92SXh32SSS2XX5Siisr;r;;;:::,:SB | |
avvo.com: # ;:. ...,:,:::, .iAh;, .. ,;XMHhXXX22555225Ssssrrr;;;:::,.:29; | |
avvo.com: # rr. ..,::;: .....,;rGh:..,.,:,:,,, .rS9A325SS5iisrsssssr;;;:::,. .rA5 | |
avvo.com: # :r,... ..,,,:, ..,,,;rrrriX25srr;s;;;::,.. ,;rssrrriisrrrsiiisr;::,,,....;Gh | |
avvo.com: # r,. .,:::,. .,,:r;,.,,,;rsXhh2ir;r;:,:,,:;rr;::;;rrr;rrrrr;::,,,.. ..:2A | |
avvo.com: # r, .::::,.,,;r,. .::;:;ri3AXssrr;:::;rrrr;;sssiis;;:,,,,,,,.. . .;As | |
avvo.com: # r: .,;:. . ,:::;;:;rXXir;;;;;rsr;rssrssrrr;::,,,.... .....,i#r | |
avvo.com: # ,i,. .:,,,.. .:;::;r;;;sis;:;:r;:;risrr;;:,,::,,,,. .......,rAA | |
avvo.com: # :i:. .,:;;;:::. .,,,,;::::;;::;:,;;:;r;;::::,,.,,,........,...,,:rA9 | |
avvo.com: # 2r,. .,,:;;;r;;, ..,,,,::;;::;;:::::::,,,,.......,.......,,,:rA; | |
avvo.com: # .AX;:.. ...,::;;;;::,. .,,,::::;,:::,,:,,:,,,................,,,,r2 | |
avvo.com: # ,A9r:,,.. ..,..,,,::::;;::,.... , .,,,,,.,,.,,.,.,,....,..... ......,,,,:i9. | |
avvo.com: # You're not a robot. Why are you snooping around here? | |
avvo.com: # This might be a better use of your time, human: avvo.com/about_avvo/jobs | |
avvo.com: # If you've ever built a great product, started a bustling community, or told an intriguing story, we want to hear from you. | |
avvo.com: # kbai, | |
avvo.com: # Team Avvo | |
hessen.de: # | |
hessen.de: # robots.txt | |
hessen.de: # | |
hessen.de: # This file is to prevent the crawling and indexing of certain parts | |
hessen.de: # of your site by web crawlers and spiders run by sites like Yahoo! | |
hessen.de: # and Google. By telling these "robots" where not to go on your site, | |
hessen.de: # you save bandwidth and server resources. | |
hessen.de: # | |
hessen.de: # This file will be ignored unless it is at the root of your host: | |
hessen.de: # Used: http://example.com/robots.txt | |
hessen.de: # Ignored: http://example.com/site/robots.txt | |
hessen.de: # | |
hessen.de: # For more information about the robots.txt standard, see: | |
hessen.de: # http://www.robotstxt.org/wc/robots.html | |
hessen.de: # | |
hessen.de: # For syntax checking, see: | |
hessen.de: # http://www.sxw.org.uk/computing/robots/check.html | |
hessen.de: # Directories | |
hessen.de: # Files | |
hessen.de: # Paths (clean URLs) | |
hessen.de: # Paths (no clean URLs) | |
hessen.de: #ChM-00000411067 | |
ftc.gov: # | |
ftc.gov: # robots.txt | |
ftc.gov: # | |
ftc.gov: # This file is to prevent the crawling and indexing of certain parts | |
ftc.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ftc.gov: # and Google. By telling these "robots" where not to go on your site, | |
ftc.gov: # you save bandwidth and server resources. | |
ftc.gov: # | |
ftc.gov: # This file will be ignored unless it is at the root of your host: | |
ftc.gov: # Used: http://example.com/robots.txt | |
ftc.gov: # Ignored: http://example.com/site/robots.txt | |
ftc.gov: # | |
ftc.gov: # For more information about the robots.txt standard, see: | |
ftc.gov: # http://www.robotstxt.org/robotstxt.html | |
ftc.gov: # CSS, JS, Images | |
ftc.gov: # Directories | |
ftc.gov: # Files | |
ftc.gov: # Paths (clean URLs) | |
ftc.gov: # Paths (no clean URLs) | |
ftc.gov: # For link-checking site crawlers. | |
comic-walker.com: # Google News Robot | |
comic-walker.com: # Google Search Engine Robot | |
comic-walker.com: # Yahoo! Search Engine Robot | |
comic-walker.com: # Microsoft Search Engine Robot | |
comic-walker.com: # Yandex Search Engine Robot | |
comic-walker.com: # Other crawller or bot that might possibly access or crawling respect below | |
comic-walker.com: # Sitemap | |
fbi.gov: # Define access-restrictions for robots/spiders | |
fbi.gov: # http://www.robotstxt.org/wc/norobots.html | |
fbi.gov: # By default we allow robots to access all areas of our site | |
fbi.gov: # already accessible to anonymous users | |
fbi.gov: # Add Googlebot-specific syntax extension to exclude forms | |
fbi.gov: # that are repeated for each piece of content in the site | |
fbi.gov: # the wildcard is only supported by Googlebot | |
fbi.gov: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling | |
concursolutions.com: # robots.txt for myouttask.com - there is nothing here for a search engine | |
img2go.com: # www.robotstxt.org/ | |
img2go.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
airarabia.com: # | |
airarabia.com: # robots.txt | |
airarabia.com: # | |
airarabia.com: # This file is to prevent the crawling and indexing of certain parts | |
airarabia.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
airarabia.com: # and Google. By telling these "robots" where not to go on your site, | |
airarabia.com: # you save bandwidth and server resources. | |
airarabia.com: # | |
airarabia.com: # This file will be ignored unless it is at the root of your host: | |
airarabia.com: # Used: http://example.com/robots.txt | |
airarabia.com: # Ignored: http://example.com/site/robots.txt | |
airarabia.com: # | |
airarabia.com: # For more information about the robots.txt standard, see: | |
airarabia.com: # http://www.robotstxt.org/robotstxt.html | |
airarabia.com: # CSS, JS, Images | |
airarabia.com: # Directories | |
airarabia.com: # Files | |
airarabia.com: # Paths (clean URLs) | |
airarabia.com: # Paths (no clean URLs) | |
biography.com: # Tempest - biography | |
trinasolar.com: # | |
trinasolar.com: # robots.txt | |
trinasolar.com: # | |
trinasolar.com: # This file is to prevent the crawling and indexing of certain parts | |
trinasolar.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
trinasolar.com: # and Google. By telling these "robots" where not to go on your site, | |
trinasolar.com: # you save bandwidth and server resources. | |
trinasolar.com: # | |
trinasolar.com: # This file will be ignored unless it is at the root of your host: | |
trinasolar.com: # Used: http://example.com/robots.txt | |
trinasolar.com: # Ignored: http://example.com/site/robots.txt | |
trinasolar.com: # | |
trinasolar.com: # For more information about the robots.txt standard, see: | |
trinasolar.com: # http://www.robotstxt.org/robotstxt.html | |
trinasolar.com: # Directories | |
trinasolar.com: # Files | |
trinasolar.com: # Paths (clean URLs) | |
trinasolar.com: # Paths (no clean URLs) | |
freebcc.org: #notfound { | |
freebcc.org: #notfound .notfound { | |
zanerobe.com: # we use Shopify as our ecommerce platform | |
zanerobe.com: # Google adsbot ignores robots.txt unless specifically named! | |
coolmathgames.com: # | |
coolmathgames.com: # robots.txt | |
coolmathgames.com: # | |
coolmathgames.com: # This file is to prevent the crawling and indexing of certain parts | |
coolmathgames.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
coolmathgames.com: # and Google. By telling these "robots" where not to go on your site, | |
coolmathgames.com: # you save bandwidth and server resources. | |
coolmathgames.com: # | |
coolmathgames.com: # This file will be ignored unless it is at the root of your host: | |
coolmathgames.com: # Used: http://example.com/robots.txt | |
coolmathgames.com: # Ignored: http://example.com/site/robots.txt | |
coolmathgames.com: # | |
coolmathgames.com: # For more information about the robots.txt standard, see: | |
coolmathgames.com: # http://www.robotstxt.org/robotstxt.html | |
coolmathgames.com: # CSS, JS, Images | |
coolmathgames.com: # Directories | |
coolmathgames.com: # Files | |
coolmathgames.com: # Paths (clean URLs) | |
coolmathgames.com: # Paths (no clean URLs) | |
digikey.com: # Google-Adsbot | |
digikey.com: # all crawlers | |
digikey.com: # Sitemaps | |
braze.com: # robots.txt for https://www.braze.com/ | |
braze.com: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
linguee.es: # In ANY CASE, you are NOT ALLOWED to train Machine Translation Systems | |
linguee.es: # on data crawled on Linguee. | |
linguee.es: # | |
linguee.es: # Linguee contains fake entries - changes in the wording of sentences, | |
linguee.es: # complete fake entries. | |
linguee.es: # These entries can be used to identify even small parts of our material | |
linguee.es: # if you try to copy it without our permission. | |
linguee.es: # Machine Translation systems trained on these data will learn these errors | |
linguee.es: # and can be identified easily. We will take all legal measures against anyone | |
linguee.es: # training Machine Translation systems on data crawled from this website. | |
lamiareport.gr: # If the Joomla site is installed within a folder such as at | |
lamiareport.gr: # e.g. www.example.com/joomla/ the robots.txt file MUST be | |
lamiareport.gr: # moved to the site root at e.g. www.example.com/robots.txt | |
lamiareport.gr: # AND the joomla folder name MUST be prefixed to the disallowed | |
lamiareport.gr: # path, e.g. the Disallow rule for the /administrator/ folder | |
lamiareport.gr: # MUST be changed to read Disallow: /joomla/administrator/ | |
lamiareport.gr: # | |
lamiareport.gr: # For more information about the robots.txt standard, see: | |
lamiareport.gr: # http://www.robotstxt.org/orig.html | |
lamiareport.gr: # | |
lamiareport.gr: # For syntax checking, see: | |
lamiareport.gr: # http://www.sxw.org.uk/computing/robots/check.html | |
lamiareport.gr: #Disallow: /images/ | |
lamiareport.gr: #Disallow: /media/ | |
lamiareport.gr: #Disallow: /templates/ | |
flightaware.com: # | |
flightaware.com: # robots.txt for flightaware.com hosted by ahock.hou.flightaware.com | |
flightaware.com: # | |
flightaware.com: # | |
flightaware.com: # Specific unwanted clients | |
flightaware.com: # | |
flightaware.com: # | |
flightaware.com: # Command line recursive requests as well as automated fetching from the non- | |
flightaware.com: # exportable data is not acceptable. | |
flightaware.com: # | |
flightaware.com: # See: | |
flightaware.com: # https://flightaware.com/about/termsofuse | |
flightaware.com: # https://flightaware.com/commercial/flightxml/ | |
flightaware.com: # | |
flightaware.com: # | |
flightaware.com: # General robot rules | |
flightaware.com: # | |
flightaware.com: # | |
flightaware.com: # Stop Applebot from beating the crap out of ajax endpoints (specifically the | |
flightaware.com: # static flight map one) | |
flightaware.com: # | |
flightaware.com: # Allow Twitter to grab article and careers blobs | |
bexio.com: # robots.txt for https://www.bexio.com/de-CH/ | |
bexio.com: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
bexio.com: #Baiduspider | |
bexio.com: #Sogou | |
xtube.com: # Twitterbot | |
blogmura.com: # 2020-10-19 追加 | |
carparts-cat.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
carparts-cat.com: #content{margin:0 0 0 2%;position:relative;} | |
open.ac.uk: # | |
open.ac.uk: # This file is to prevent the crawling and indexing of certain parts | |
open.ac.uk: # of our site by web crawlers and spiders run by sites like Google. | |
open.ac.uk: # By telling these "robots" where not to go on the site, | |
open.ac.uk: # we save bandwidth and server resources. | |
open.ac.uk: # | |
open.ac.uk: # For more information about the robots.txt standard, see: | |
open.ac.uk: # http://www.robotstxt.org/wc/robots.html | |
open.ac.uk: # feeds | |
open.ac.uk: # search results | |
open.ac.uk: # Paths | |
open.ac.uk: # parameters | |
open.ac.uk: # wikis | |
economipedia.com: # Bloqueo basico para todos los bots y crawlers | |
economipedia.com: # puede dar problemas por bloqueo de recursos en Google Search Console | |
economipedia.com: # Disallow: /author/ | |
economipedia.com: # Desindexar p√°ginas y etiquetas | |
economipedia.com: # Bloqueo de las URL dinamicas | |
economipedia.com: #Bloqueo de busquedas | |
economipedia.com: # Bloqueo de trackbacks | |
economipedia.com: # Bloqueo de feeds para crawlers | |
economipedia.com: # Ralentizamos algunos bots que se suelen volver locos | |
economipedia.com: # Previene problemas de recursos bloqueados en Google Webmaster Tools | |
economipedia.com: # Bloqueo de bots y crawlers poco utiles | |
bauhaus.info: ### | |
bauhaus.info: # For all robots | |
bauhaus.info: # Block access to specific groups of pages | |
bauhaus.info: # Allow search crawlers to discover the sitemap | |
bauhaus.info: # Block CazoodleBot as it does not present correct accept content headers | |
bauhaus.info: # Block MJ12bot as it is just noise | |
bauhaus.info: # Block dotbot as it cannot parse base urls properly | |
bauhaus.info: # Block Gigabot | |
bauhaus.info: # Block Internet Archives | |
bauhaus.info: # PPS-12633, block SEOkicks crawler | |
bauhaus.info: # PPS-69657: block Semrush-Bot | |
g2g.com: # Adsense | |
g2g.com: # Blekko | |
g2g.com: # CommonCrawl | |
beisen.com: # ----------------------------------------------------------------------------- | |
beisen.com: # robots.txt for beisen.com | |
beisen.com: # ----------------------------------------------------------------------------- | |
eoffcn.com: #robots.txt generated at http://www.eoffcn.com/ | |
gbgame.com.tw: # Robots.txt file from http://www.gbgame.com.tw | |
hdpfans.com: # | |
hdpfans.com: # robots.txt for Discuz! X3 | |
hdpfans.com: # | |
shangc.net: # “‘œ¬Œ™20180826Ã̺” | |
shangc.net: # 20190101Ã̺” | |
shangc.net: # 20190522Ã̺” | |
capgemini.com: # Media | |
capgemini.com: # Restricted media* | |
capgemini.com: # Robots.txt Manager | |
softwareadvice.com: # robots.txt for https://www.softwareadvice.com | |
softwareadvice.com: # GDM ajax data and template | |
softwareadvice.com: # Blocks crawlers that are kind enough to obey robots | |
ambitionbox.com: #Korean search engine | |
ambitionbox.com: #Czech Republic search engine | |
ambitionbox.com: #Yahoo | |
ambitionbox.com: #Ask Jeeves, a U.S.-based search engine | |
tigerdirect.com: #modalMapNoPrice .modal-body p{font-size:14px;color:#000;font-family:Arial,Helvetica,sans-serif;text-align:left;padding:5px 10px;margin:0} | |
tigerdirect.com: #modalMapNoPrice .modal-body h5{font-size:16px;text-align:left;font-weight:bold;font-family:Arial,Helvetica,sans-serif;padding:5px 10px;margin:0} | |
tigerdirect.com: #dc_container{margin:0 auto;width:960px;} | |
tigerdirect.com: #dc_container iframe{margin:2px 0;width:100%;} | |
tigerdirect.com: #mast_nav{clear: both;background-repeat: no-repeat;padding: 0;position: relative;text-align: left;margin: 0 auto!important;margin-left: -1px;z-index: 500;width:960px;} | |
tigerdirect.com: #mast_nav .navItem{float:left;zoom:1;} | |
tigerdirect.com: #mast_nav, #mast_nav .mastNav-link{background-image:url(https://cdn-eu-ec.yottaa.net/56abbca0312e5815f5000542/e42d88e0d50401335179123dfe2baf36.yottaa.net/v~4b.4c/td/masthead_v2/masthead-nav-vert-5.jpg?yocs=2m_2E_);height: 37px;} | |
tigerdirect.com: #mast_nav .mastNav-link{display: block;text-indent: -9999px;cursor:default;} | |
tigerdirect.com: #mast_nav .jHover .mastNav-pop{display:block;} | |
tigerdirect.com: #navInsiderMesg{font-size:17px;font-weight:bold;line-height:20px;} | |
tigerdirect.com: #smsMobileWrapper {margin-right:10px;width:220px;} | |
tigerdirect.com: #txtMobileNav_imgWrap{height:74px;} | |
tigerdirect.com: #txtMobileNav_imgWrap img{width:84px; float:left;} | |
tigerdirect.com: #smsMobileWrapper .navInsiderInput {border: 1px solid #ccc;line-height:24px;height:24px;font:normal 15px/1 arial;padding:3px;width:210px;margin-bottom: 5px; display:block;} | |
tigerdirect.com: #vip-login a{ | |
tigerdirect.com: #vip-login a:hover{ | |
uproxx.com: # Uproxx Start | |
uproxx.com: # Uproxx End | |
uproxx.com: # Sitemap archive | |
iranestekhdam.ir: # Sitemap | |
instantdomainsearch.com: # * | |
instantdomainsearch.com: # Host | |
instantdomainsearch.com: # Sitemaps | |
rio.rj.gov.br: #button::after { | |
rio.rj.gov.br: #button:hover { | |
rio.rj.gov.br: #button:active { | |
rio.rj.gov.br: #button.show { | |
ucl.ac.uk: # | |
ucl.ac.uk: # robots.txt | |
ucl.ac.uk: # | |
ucl.ac.uk: # This file is to prevent the crawling and indexing of certain parts | |
ucl.ac.uk: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ucl.ac.uk: # and Google. By telling these "robots" where not to go on your site, | |
ucl.ac.uk: # you save bandwidth and server resources. | |
ucl.ac.uk: # | |
ucl.ac.uk: # This file will be ignored unless it is at the root of your host: | |
ucl.ac.uk: # Used: http://example.com/robots.txt | |
ucl.ac.uk: # Ignored: http://example.com/site/robots.txt | |
ucl.ac.uk: # | |
ucl.ac.uk: # For more information about the robots.txt standard, see: | |
ucl.ac.uk: # http://www.robotstxt.org/robotstxt.html | |
ucl.ac.uk: #Drupal default | |
ucl.ac.uk: # CSS, JS, Images | |
ucl.ac.uk: # Directories | |
ucl.ac.uk: # Files | |
ucl.ac.uk: # Paths (clean URLs) | |
ucl.ac.uk: # Paths (no clean URLs) | |
ucl.ac.uk: # Paths (clean URLs) - fixed | |
ucl.ac.uk: # Paths (no clean URLs) - fixed | |
ucl.ac.uk: # Sites | |
ucl.ac.uk: # Sites - fixed | |
paycor.com: # production | |
blockchair.com: # Russian localization | |
blockchair.com: # Chineese localization | |
blockchair.com: # Spanish localization | |
blockchair.com: # Portuguese localization | |
daveramsey.com: # robots.txt for https://www.daveramsey.com/ | |
daveramsey.com: # Disallow all crawlers access to certain folders. | |
arizona.edu: # | |
arizona.edu: # robots.txt | |
arizona.edu: # | |
arizona.edu: # This file is to prevent the crawling and indexing of certain parts | |
arizona.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
arizona.edu: # and Google. By telling these "robots" where not to go on your site, | |
arizona.edu: # you save bandwidth and server resources. | |
arizona.edu: # | |
arizona.edu: # This file will be ignored unless it is at the root of your host: | |
arizona.edu: # Used: http://example.com/robots.txt | |
arizona.edu: # Ignored: http://example.com/site/robots.txt | |
arizona.edu: # | |
arizona.edu: # For more information about the robots.txt standard, see: | |
arizona.edu: # http://www.robotstxt.org/robotstxt.html | |
arizona.edu: # CSS, JS, Images | |
arizona.edu: # Directories | |
arizona.edu: # Files | |
arizona.edu: # Paths (clean URLs) | |
arizona.edu: # Paths (no clean URLs) | |
theaustralian.com.au: #Agent Specific Disallowed Sections | |
emagister.com: # Nuevos | |
emagister.com: # Blog | |
emagister.com: # Respuestas | |
emagister.com: # Express | |
clover.com: # If you are human and can read this, you should apply for a job at Clover. | |
clover.com: # https://www.clover.com/careers | |
viator.com: # Hi, we're Viator, Nice to meet you. | |
viator.com: # | |
viator.com: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
viator.com: # | |
viator.com: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet? | |
viator.com: # | |
viator.com: # Run - don't crawl - to apply to join Viator's elite SEO team | |
viator.com: # | |
viator.com: # Visit https://careers.tripadvisor.com/search-results?keywords=seo | |
viator.com: # | |
viator.com: # | |
viator.com: # viator.com | |
bonprix.ru: # Company: bonprix.ru | |
bonprix.ru: # Author: bonprix.ru | |
bonprix.ru: # URL: https://www.bonprix.ru | |
bonprix.ru: # Disallow all crawlers access to certain pages. | |
bonprix.ru: # block special parameters | |
bonprix.ru: # block NITRO Tracking (added 2021-01-05) | |
bonprix.ru: # block Internal Search Suggestions (added 2021-01-05) | |
bonprix.ru: # block viewed products (added 2021-01-05) | |
bonprix.ru: # block update product variations (added 2021-01-05) | |
bonprix.ru: # block glossary content in productdetails (added 2021-01-05) | |
bonprix.ru: # block Wishlist (added 2021-01-05) | |
bonprix.ru: # Disallow Yandex access to certain parameters | |
bonprix.ru: # block special parameters | |
bonprix.ru: # block NITRO Tracking (added 2021-01-11) | |
bonprix.ru: # block Internal Search Suggestions (added 2021-01-11) | |
bonprix.ru: # block viewed products (added 2021-01-11) | |
bonprix.ru: # block update product variations (added 2021-01-11) | |
bonprix.ru: # block glossary content in productdetails (added 2021-01-11) | |
bonprix.ru: # block Wishlist (added 2021-01-11) | |
bonprix.ru: # Sitemap files | |
2dehands.be: # Here is our sitemap (this line is independent of UA blocks, per the spec) | |
2dehands.be: #Please keep blocking of all URLs in place for at least 2 years after removing a specific module | |
2dehands.be: #SOI subpage | |
2dehands.be: # login, confirm and forgot password pages | |
2dehands.be: # mymp pages | |
2dehands.be: # ASQ pages | |
2dehands.be: # SYI Pages | |
2dehands.be: # Flagging/tipping ads | |
2dehands.be: # bidding on ads | |
2dehands.be: # external url redirects | |
2dehands.be: # google analytics | |
2dehands.be: #korean spam | |
2dehands.be: #legacy | |
2dehands.be: # prevent unnecessary crawling | |
2dehands.be: # New vip | |
2dehands.be: # Block VIPs with parameters | |
2dehands.be: #block homepage feeds | |
vidaextra.com: # | |
vidaextra.com: # robots.txt | |
vidaextra.com: # | |
vidaextra.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
vidaextra.com: # unless they're feeding search engines. | |
vidaextra.com: # Some bots are known to be trouble, particularly those designed to copy | |
vidaextra.com: # entire sites. Please obey robots.txt. | |
vidaextra.com: # Sorry, wget in its recursive mode is a frequent problem. | |
vidaextra.com: # Please read the man page and use it properly; there is a | |
vidaextra.com: # --wait option you can use to set the delay between hits, | |
vidaextra.com: # for instance. | |
vidaextra.com: # | |
vidaextra.com: # | |
vidaextra.com: # The 'grub' distributed client has been *very* poorly behaved. | |
vidaextra.com: # | |
vidaextra.com: # | |
vidaextra.com: # Doesn't follow robots.txt anyway, but... | |
vidaextra.com: # | |
vidaextra.com: # | |
vidaextra.com: # Hits many times per second, not acceptable | |
vidaextra.com: # http://www.nameprotect.com/botinfo.html | |
vidaextra.com: # A capture bot, downloads gazillions of pages with no public benefit | |
vidaextra.com: # http://www.webreaper.net/ | |
ibctamil.com: # Disallow: /*? This is match ? anywhere in the URL | |
optus.com.au: # Temporary campaigns that are excluded from organic results | |
bw-bank.de: # Refuse all robots from these directories: | |
bw-bank.de: # Blocked Bots | |
bw-bank.de: # Disallow content | |
bw-bank.de: # Sitemap URL | |
zibal.ir: # https://www.robotstxt.org/robotstxt.html | |
elespectador.com: # CSS, JS, Images | |
elespectador.com: # Paths (clean URLs) | |
elespectador.com: # Sitemap: | |
thenetnaija.com: # hestiacp autogenerated robots.txt | |
ftvnews.com.tw: #User-agent: SearchmetricsBot | |
ftvnews.com.tw: #Disallow: / | |
remotasks.com: # www.robotstxt.org/ | |
remotasks.com: # Allow crawling of all content | |
clearbit.com: # all | |
volvocars.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
volvocars.com: #content{margin:0 0 0 2%;position:relative;} | |
trckapp.com: #header{ | |
trckapp.com: #header>div{ | |
trckapp.com: #contentfull { | |
trckapp.com: #contentbox { | |
trckapp.com: #contentbox:before, #contentbox:after { | |
trckapp.com: #contentbox:after { | |
trckapp.com: #contentbox blockquote{ | |
aaa.com: # For domain: http://www.aaa.com | |
indiapostgdsonline.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
indiapostgdsonline.in: #content{margin:0 0 0 2%;position:relative;} | |
audiusa.com: # functional links | |
audiusa.com: # editorial links | |
google.com.ni: # AdsBot | |
google.com.ni: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
w3.org: # | |
w3.org: # robots.txt for http://www.w3.org/ | |
w3.org: # | |
w3.org: # $Id: robots.txt,v 1.85 2020/11/06 21:15:53 gerald Exp $ | |
w3.org: # | |
w3.org: # For use by search.w3.org | |
w3.org: # W3C Link checker | |
w3.org: # Applebot continues to make hundreds of thousands of reqs/day for this area | |
w3.org: # even though it has been returning permanent redirects for years | |
w3.org: # the following settings apply to all bots | |
w3.org: # Blogs - WordPress | |
w3.org: # https://codex.wordpress.org/Search_Engine_Optimization_for_WordPress#Robots.txt_Optimization | |
w3.org: # Wikis - Mediawiki | |
w3.org: # https://www.mediawiki.org/wiki/Manual:Robots.txt | |
w3.org: # various other access-controlled or expensive areas | |
w3.org: # WAI indexing | |
w3.org: # Disallow: /WAI/EO/Drafts/ | |
birdeye.com: # Allow specific URLs for all bots | |
bitpay.com: # www.robotstxt.org/ | |
bitpay.com: # Allow crawling of all content | |
education.gouv.fr: # | |
education.gouv.fr: # robots.txt | |
education.gouv.fr: # | |
education.gouv.fr: # This file is to prevent the crawling and indexing of certain parts | |
education.gouv.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
education.gouv.fr: # and Google. By telling these "robots" where not to go on your site, | |
education.gouv.fr: # you save bandwidth and server resources. | |
education.gouv.fr: # | |
education.gouv.fr: # This file will be ignored unless it is at the root of your host: | |
education.gouv.fr: # Used: http://example.com/robots.txt | |
education.gouv.fr: # Ignored: http://example.com/site/robots.txt | |
education.gouv.fr: # | |
education.gouv.fr: # For more information about the robots.txt standard, see: | |
education.gouv.fr: # http://www.robotstxt.org/robotstxt.html | |
education.gouv.fr: # CSS, JS, Images | |
education.gouv.fr: # Directories | |
education.gouv.fr: # Files | |
education.gouv.fr: # Paths (clean URLs) | |
education.gouv.fr: # Paths (no clean URLs) | |
education.gouv.fr: # XML sitemap | |
closermag.fr: # SPECIFIC | |
liveauctioneers.com: #Start of Parameters | |
liveauctioneers.com: #Bingbot | |
liveauctioneers.com: #Start of Parameters | |
hi5.com: ######################################################################### | |
hi5.com: # /robots.txt file for http://www.tagged.com/ | |
hi5.com: # mail webmaster@tagged.com for constructive criticism | |
hi5.com: ######################################################################### | |
hi5.com: # Any others | |
invisionapp.com: # www.robotstxt.org/ | |
linustechtips.com: # Sogou does not behave correctly. Let this be a warning to all the other bots out there. | |
sjsu.edu: ######################################### | |
sjsu.edu: # Welcome to San Jose State University | |
sjsu.edu: # | |
sjsu.edu: # Note: Please do not over load the servers | |
sjsu.edu: # http://its.sjsu.edu | |
sjsu.edu: # | |
sjsu.edu: # Disallow: /ecampus/ | |
sjsu.edu: # Site Map Listing | |
hktvmall.com: # For all robots | |
hktvmall.com: # Allow search crawlers to discover the sitemap | |
hktvmall.com: # Block CazoodleBot as it does not present correct accept content headers | |
hktvmall.com: # Block MJ12bot as it is just noise | |
hktvmall.com: # Exclude evil bots | |
unity.com: # | |
unity.com: # robots.txt | |
unity.com: # | |
unity.com: # This file is to prevent the crawling and indexing of certain parts | |
unity.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
unity.com: # and Google. By telling these "robots" where not to go on your site, | |
unity.com: # you save bandwidth and server resources. | |
unity.com: # | |
unity.com: # This file will be ignored unless it is at the root of your host: | |
unity.com: # Used: http://example.com/robots.txt | |
unity.com: # Ignored: http://example.com/site/robots.txt | |
unity.com: # | |
unity.com: # For more information about the robots.txt standard, see: | |
unity.com: # http://www.robotstxt.org/robotstxt.html | |
unity.com: # CSS, JS, Images | |
unity.com: # Directories | |
unity.com: # Files | |
unity.com: # Paths (clean URLs) | |
unity.com: # Paths (no clean URLs) | |
unity.com: # Chinese Search Engines | |
cnews.fr: # Directories | |
cnews.fr: # Paths (clean URLs) | |
cnews.fr: # Paths (no clean URLs) | |
virginia.edu: # | |
virginia.edu: # robots.txt | |
virginia.edu: # | |
virginia.edu: # This file is to prevent the crawling and indexing of certain parts | |
virginia.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
virginia.edu: # and Google. By telling these "robots" where not to go on your site, | |
virginia.edu: # you save bandwidth and server resources. | |
virginia.edu: # | |
virginia.edu: # This file will be ignored unless it is at the root of your host: | |
virginia.edu: # Used: http://example.com/robots.txt | |
virginia.edu: # Ignored: http://example.com/site/robots.txt | |
virginia.edu: # | |
virginia.edu: # For more information about the robots.txt standard, see: | |
virginia.edu: # http://www.robotstxt.org/robotstxt.html | |
virginia.edu: # CSS, JS, Images | |
virginia.edu: # Directories | |
virginia.edu: # Files | |
virginia.edu: # Paths (clean URLs) | |
virginia.edu: # Paths (no clean URLs) | |
euskadi.eus: # disallow partial files and contents of type serv_proc_* | |
oricon.co.jp: # Baidu chinese search engine | |
oricon.co.jp: # RUSSIA search engine | |
oricon.co.jp: # sogou.com chinese search engine | |
oricon.co.jp: # User-agent: Sogou web spider | |
oricon.co.jp: # Disallow: / | |
oricon.co.jp: # Grapeshot Allow | |
tuchong.com: # Robots.txt file from http://www.tuchong.com | |
tuchong.com: # All robots will spider the domain | |
sepe.es: # Regla 1 | |
lidiashopping.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
lidiashopping.com: # | |
lidiashopping.com: # To ban all spiders from the entire site uncomment the next two lines: | |
lidiashopping.com: # User-agent: * | |
lidiashopping.com: # Disallow: / | |
lidiashopping.com: # https://blogs.bing.com/webmaster/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot/ | |
pancakeswap.finance: # https://www.robotstxt.org/robotstxt.html | |
iitb.ac.in: # | |
iitb.ac.in: # robots.txt | |
iitb.ac.in: # | |
iitb.ac.in: # This file is to prevent the crawling and indexing of certain parts | |
iitb.ac.in: # of your site by web crawlers and spiders run by sites like Yahoo! | |
iitb.ac.in: # and Google. By telling these "robots" where not to go on your site, | |
iitb.ac.in: # you save bandwidth and server resources. | |
iitb.ac.in: # | |
iitb.ac.in: # This file will be ignored unless it is at the root of your host: | |
iitb.ac.in: # Used: http://example.com/robots.txt | |
iitb.ac.in: # Ignored: http://example.com/site/robots.txt | |
iitb.ac.in: # | |
iitb.ac.in: # For more information about the robots.txt standard, see: | |
iitb.ac.in: # http://www.robotstxt.org/robotstxt.html | |
iitb.ac.in: # CSS, JS, Images | |
iitb.ac.in: # Directories | |
iitb.ac.in: # Files | |
iitb.ac.in: # Paths (clean URLs) | |
iitb.ac.in: # Paths (no clean URLs) | |
syria.tv: # | |
syria.tv: # robots.txt | |
syria.tv: # | |
syria.tv: # This file is to prevent the crawling and indexing of certain parts | |
syria.tv: # of your site by web crawlers and spiders run by sites like Yahoo! | |
syria.tv: # and Google. By telling these "robots" where not to go on your site, | |
syria.tv: # you save bandwidth and server resources. | |
syria.tv: # | |
syria.tv: # This file will be ignored unless it is at the root of your host: | |
syria.tv: # Used: http://example.com/robots.txt | |
syria.tv: # Ignored: http://example.com/site/robots.txt | |
syria.tv: # | |
syria.tv: # For more information about the robots.txt standard, see: | |
syria.tv: # http://www.robotstxt.org/robotstxt.html | |
syria.tv: # CSS, JS, Images | |
syria.tv: # Directories | |
syria.tv: # Files | |
syria.tv: # Paths (clean URLs) | |
syria.tv: # Paths (no clean URLs) | |
locanto.com: ############################## | |
locanto.com: # robots.txt file | |
locanto.com: # based on webmasterworld.com | |
locanto.com: # and searchengineworld.com | |
locanto.com: # Please, we do NOT allow nonauthorized robots any longer. | |
locanto.com: # Yes, feel free to copy and use the following. | |
locanto.com: # desktop | |
locanto.com: # mobile | |
locanto.com: # desktop | |
locanto.com: # mobile | |
locanto.com: # desktop | |
locanto.com: # mobile | |
locanto.com: # desktop | |
locanto.com: # mobile | |
locanto.com: #################################### | |
ionos.mx: #print | |
ionos.mx: #terms and conditions | |
ionos.mx: #Popups etc. | |
ionos.mx: #Results | |
ionos.mx: #crawl delay | |
tripleclicks.com: #dropmenudiv{ | |
tripleclicks.com: #dropmenudiv a{ | |
tripleclicks.com: #dropmenudiv a:hover{ /*hover background color*/ | |
tripleclicks.com: #f1_upload_process{ | |
tripleclicks.com: #footer2012{width:930px; padding:15px; line-height:1.3em;background:#fff;border:1px solid #dedede;border-radius:5px 5px 0 0;color:#666;font-size:.8em;font-family:"Helvetica Neue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;margin:10px auto 0} | |
tripleclicks.com: #footer2012 .make_money { background:#eeeeee; padding:10px; margin-top:10px } | |
tripleclicks.com: #footer2012 .footerHeader{font-family:Lato, sans-serif;font-size:16px;color:#690;font-weight:300;margin:0;padding:0} | |
tripleclicks.com: #footer2012 #customerCare2{float:left;width:120px; margin-right:15px;} | |
tripleclicks.com: #footer2012 #giving2{float:left;width:150px; margin-right:15px;} | |
tripleclicks.com: #footer2012 #safeSecure2{float:left;width:310px; margin-right:15px;} | |
tripleclicks.com: #footer2012 #intl2{float:left;width:170px; margin-right:15px;} | |
tripleclicks.com: #footer2012 #sponsor2{float:left;width:90px;} | |
tripleclicks.com: #zackpot_bar { width:962px; padding:0 10px 0 75px; margin:0 auto; height:40px; line-height:40px; background:#666666; border-radius:0 0 5px 5px; color:#CCCCCC; box-sizing:border-box; -moz-box-sizing:border-box; position:relative; font-size:1.1em } | |
tripleclicks.com: #zackpot_bar a.play { color:#FFF; text-decoration:none; padding:0px 16px; background:#7826a1; border-radius:20px; text-align:center; float:right; line-height:26px; border:1px solid #b042e7; margin-top:6px; font-weight:700; font-size:1.15em; letter-spacing:1px } | |
tripleclicks.com: #zackpot_bar img { position:absolute; bottom:0px; left:6px } | |
observer.com: # Sitemap archive | |
observer.com: ## Disallow search strings. | |
edesk.com: # | |
edesk.com: # robots.txt | |
edesk.com: # | |
edesk.com: # This file is to prevent the crawling and indexing of certain parts | |
edesk.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
edesk.com: # and Google. By telling these "robots" where not to go on your site, | |
edesk.com: # you save bandwidth and server resources. | |
edesk.com: # | |
edesk.com: # This file will be ignored unless it is at the root of your host: | |
edesk.com: # Used: http://example.com/robots.txt | |
edesk.com: # Ignored: http://example.com/site/robots.txt | |
edesk.com: # | |
edesk.com: # For more information about the robots.txt standard, see: | |
edesk.com: # http://www.robotstxt.org/robotstxt.html | |
edesk.com: # CSS, JS, Images | |
edesk.com: # Directories | |
edesk.com: # Files | |
edesk.com: # Paths (clean URLs) | |
edesk.com: # Paths (no clean URLs) | |
ucs.br: #Referencias | |
ucs.br: #http://www.robotstxt.org/ | |
ucs.br: #http://www.google.com/support/webmasters/bin/answer.py?hl=br&answer=156449 | |
ucs.br: #http://g1.globo.com/robots.txt | |
ucs.br: #http://en.wikipedia.org/robots.txt | |
ucs.br: #http://www.terra.com.br/robots.txt | |
ucs.br: #http://www.google.com.br/robots.txt | |
ucs.br: #http://www.livejournal.com/robots.txt | |
ucs.br: #http://www.ubuntu.com/robots.txt | |
ucs.br: #Disallow: /ucs/* | |
ucs.br: # Portal antigo da especializacao | |
ucs.br: # foram mantidas apenas as páginas estáticas abaixo | |
ucs.br: # advertising-related bots: | |
ucs.br: # Wikipedia work bots: | |
ucs.br: # Crawlers that are kind enough to obey, but which we'd rather not have | |
ucs.br: # unless they're feeding search engines. | |
ucs.br: # Some bots are known to be trouble, particularly those designed to copy | |
ucs.br: # entire sites. Please obey robots.txt. | |
ucs.br: # Hits many times per second, not acceptable | |
ucs.br: # http://www.nameprotect.com/botinfo.html | |
ucs.br: # A capture bot, downloads gazillions of pages with no public benefit | |
ucs.br: # http://www.webreaper.net/ | |
tradervue.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
tradervue.com: # | |
tradervue.com: # To ban all spiders from the entire site uncomment the next two lines: | |
tradervue.com: # User-Agent: * | |
tradervue.com: # Disallow: / | |
hsbc.co.uk: #Introduce Sitemaps | |
sitejabber.com: #favorite routes | |
sitejabber.com: #forum routes | |
sitejabber.com: #page routes | |
sitejabber.com: #partner routes | |
sitejabber.com: #plugin routes | |
sitejabber.com: #review routes | |
sitejabber.com: #url routes (non-pages) | |
sitejabber.com: #user routes | |
sitejabber.com: # misc | |
sitejabber.com: #adult content | |
blocket.se: # Det är uttryckligen förbjudet att använda sökrobotar eller andra | |
blocket.se: # automatiska metoder för att tillgå blocket.se. Endast om blocket.se | |
blocket.se: # givit särskilt tillstånd får sådan access ske. | |
blocket.se: # TODO: fix so links in sitemap.xml points to cdn | |
blocket.se: # Sitemap: https://assets.blocketcdn.se/adout/public/static/sitemap.xml | |
microworkers.com: # www.robotstxt.org/ | |
microworkers.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
athoselectronics.com: # Spam Backlink Blocker | |
athoselectronics.com: # Allow/Disallow Ads.txt | |
athoselectronics.com: # Allow/Disallow App-ads.txt | |
athoselectronics.com: # This robots.txt file was created by Better Robots.txt (Index & Rank Booster by Pagup) Plugin. https://www.better-robots.com/ | |
cursou.com.br: # Google AdSense | |
forever.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
forever.com: # | |
forever.com: # To ban all spiders from the entire site uncomment the next two lines: | |
forever.com: # User-agent: * | |
forever.com: # Disallow: / | |
imocandidaturas.co.ao: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
imocandidaturas.co.ao: #content{margin:0 0 0 2%;position:relative;} | |
lucascassianouploader.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
lucascassianouploader.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
lucascassianouploader.wordpress.com: # This file was generated on Wed, 24 Feb 2021 00:48:35 +0000 | |
melhorcambio.com: # remove directories | |
mundodaeletrica.com.br: # robots.txt file for https://www.mundodaeletrica.com.br/ | |
mundodaeletrica.com.br: # Template version: 20171218 | |
mundodaeletrica.com.br: # Last update of this robots.txt file: 15/02/21 at 20:40 | |
mundodaeletrica.com.br: # Avoid indexing some directories | |
mundodaeletrica.com.br: # Allow others | |
mundodaeletrica.com.br: # Avoid indexing somes file extensions | |
mundodaeletrica.com.br: # Sitemap | |
mundodaeletrica.com.br: # e32af6360eaa0df255079000158e386710afbf08826790547681c4419158f955 | |
official.ao: # Optimization for Google Ads Bot | |
portaldeangola.com: # BEGIN WBCPBlocker | |
portaldeangola.com: # END WBCPBlocker | |
procenter.co.ao: #Begin Attracta SEO Tools Sitemap. Do not remove | |
procenter.co.ao: #End Attracta SEO Tools Sitemap. Do not remove | |
shopify.com.br: # ,: | |
shopify.com.br: # ,' | | |
shopify.com.br: # / : | |
shopify.com.br: # --' / | |
shopify.com.br: # \/ />/ | |
shopify.com.br: # / <//_\ | |
shopify.com.br: # __/ / | |
shopify.com.br: # )'-. / | |
shopify.com.br: # ./ :\ | |
shopify.com.br: # /.' ' | |
shopify.com.br: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers | |
shopify.com.br: # robots.txt file for www.shopify.com.br | |
tutorialmonsters.com: # Primero el contenido adjunto. | |
tutorialmonsters.com: # TambiÈn podemos desindexar todo lo que empiece | |
tutorialmonsters.com: # por wp-. Es lo mismo que los Disallow de arriba pero | |
tutorialmonsters.com: # incluye cosas como wp-rss.php | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # Sitemap permitido, b˙squedas no. | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # Sitemap: http://tutorialmonsters.com/sitemap.xml | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # Permitimos el feed general para Google Blogsearch. | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # Impedimos que permalink/feed/ sea indexado ya que el | |
tutorialmonsters.com: # feed con los comentarios suele posicionarse en lugar de | |
tutorialmonsters.com: # la entrada y desorienta a los usuarios. | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # Lo mismo con URLs terminadas en /trackback/ que sÛlo | |
tutorialmonsters.com: # sirven como Trackback URI (y son contenido duplicado). | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # A partir de aquÌ es opcional pero recomendado. | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # Lista de bots que suelen respetar el robots.txt pero rara | |
tutorialmonsters.com: # vez hacen un buen uso del sitio y abusan bastanteÖ | |
tutorialmonsters.com: # AÒadir al gusto del consumidorÖ | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # Slurp (Yahoo!), Noxtrum y el bot de MSN a veces tienen | |
tutorialmonsters.com: # idas de pinza, toca decirles que reduzcan la marcha. | |
tutorialmonsters.com: # El valor es en segundos y podÈis dejarlo bajo e ir | |
tutorialmonsters.com: # subiendo hasta el punto Ûptimo. | |
tutorialmonsters.com: # | |
tutorialmonsters.com: # robots.txt automaticaly generated by PrestaShop e-commerce open-source solution | |
tutorialmonsters.com: # http://www.prestashop.com - http://www.prestashop.com/forums | |
tutorialmonsters.com: # This file is to prevent the crawling and indexing of certain parts | |
tutorialmonsters.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
tutorialmonsters.com: # and Google. By telling these "robots" where not to go on your site, | |
tutorialmonsters.com: # you save bandwidth and server resources. | |
tutorialmonsters.com: # For more information about the robots.txt standard, see: | |
tutorialmonsters.com: # http://www.robotstxt.org/wc/robots.html | |
tutorialmonsters.com: # GoogleBot specific | |
tutorialmonsters.com: # All bots | |
tutorialmonsters.com: # Directories | |
tutorialmonsters.com: # Files | |
tutorialmonsters.com: # Sitemap | |
huckberry.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
encar.com: #"Encar" prohibits unauthorized data collection activities (crawling, scraping) using manual or automated tools. | |
gba.gob.ar: # | |
gba.gob.ar: # robots.txt | |
gba.gob.ar: # | |
gba.gob.ar: # This file is to prevent the crawling and indexing of certain parts | |
gba.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo! | |
gba.gob.ar: # and Google. By telling these "robots" where not to go on your site, | |
gba.gob.ar: # you save bandwidth and server resources. | |
gba.gob.ar: # | |
gba.gob.ar: # This file will be ignored unless it is at the root of your host: | |
gba.gob.ar: # Used: http://example.com/robots.txt | |
gba.gob.ar: # Ignored: http://example.com/site/robots.txt | |
gba.gob.ar: # | |
gba.gob.ar: # For more information about the robots.txt standard, see: | |
gba.gob.ar: # http://www.robotstxt.org/robotstxt.html | |
gba.gob.ar: # CSS, JS, Images | |
gba.gob.ar: # Directories | |
gba.gob.ar: # Files | |
gba.gob.ar: # Paths (clean URLs) | |
gba.gob.ar: # Paths (no clean URLs) | |
airbnb.com.au: # /////// | |
airbnb.com.au: # // // | |
airbnb.com.au: # // // | |
airbnb.com.au: # // // /// /// /// | |
airbnb.com.au: # // // /// /// | |
airbnb.com.au: # // /// // //// /// /// /// //// /// //// /// //// /// //// | |
airbnb.com.au: # // /// /// // ////////// /// ////////// /////////// ////////// /////////// | |
airbnb.com.au: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.com.au: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.com.au: # // // // // /// /// /// /// /// /// /// /// /// /// | |
airbnb.com.au: # // // // // ////////// /// /// ////////// /// /// ////////// | |
airbnb.com.au: # // ///// // | |
airbnb.com.au: # // ///// // | |
airbnb.com.au: # // /// /// // | |
airbnb.com.au: # ////// ////// | |
airbnb.com.au: # | |
airbnb.com.au: # | |
airbnb.com.au: # We thought you'd never make it! | |
airbnb.com.au: # We hope you feel right at home in this file...unless you're a disallowed subfolder. | |
airbnb.com.au: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering | |
airbnb.com.au: # There's even a bring your robot to work day. | |
forextime.com: # | |
forextime.com: # robots.txt | |
forextime.com: # | |
forextime.com: # This file is to prevent the crawling and indexing of certain parts | |
forextime.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
forextime.com: # and Google. By telling these "robots" where not to go on your site, | |
forextime.com: # you save bandwidth and server resources. | |
forextime.com: # | |
forextime.com: # This file will be ignored unless it is at the root of your host: | |
forextime.com: # Used: http://example.com/robots.txt | |
forextime.com: # Ignored: http://example.com/site/robots.txt | |
forextime.com: # | |
forextime.com: # For more information about the robots.txt standard, see: | |
forextime.com: # http://www.robotstxt.org/robotstxt.html | |
forextime.com: # CSS, JS, Images | |
forextime.com: # Directories | |
forextime.com: # Files | |
forextime.com: # Paths (clean URLs) | |
forextime.com: # Paths (no clean URLs) | |
whatmobile.com.pk: ### | |
whatmobile.com.pk: # robots.txt file created at http://www.whatmobile.com.pk | |
whatmobile.com.pk: # For domain: http://www.whatmobile.com.pk | |
whatmobile.com.pk: ### | |
whatmobile.com.pk: #Begin Attracta SEO Tools Sitemap. Do not remove | |
whatmobile.com.pk: #End Attracta SEO Tools Sitemap. Do not remove | |
rangeme.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
rangeme.com: # | |
rangeme.com: # To ban all spiders from the entire site uncomment the next two lines: | |
rangeme.com: # User-agent: * | |
rangeme.com: # Disallow: / | |
developpement-durable.gouv.fr: # | |
developpement-durable.gouv.fr: # robots.txt | |
developpement-durable.gouv.fr: # | |
developpement-durable.gouv.fr: # This file is to prevent the crawling and indexing of certain parts | |
developpement-durable.gouv.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
developpement-durable.gouv.fr: # and Google. By telling these "robots" where not to go on your site, | |
developpement-durable.gouv.fr: # you save bandwidth and server resources. | |
developpement-durable.gouv.fr: # | |
developpement-durable.gouv.fr: # This file will be ignored unless it is at the root of your host: | |
developpement-durable.gouv.fr: # Used: http://example.com/robots.txt | |
developpement-durable.gouv.fr: # Ignored: http://example.com/site/robots.txt | |
developpement-durable.gouv.fr: # | |
developpement-durable.gouv.fr: # For more information about the robots.txt standard, see: | |
developpement-durable.gouv.fr: # http://www.robotstxt.org/robotstxt.html | |
developpement-durable.gouv.fr: # CSS, JS, Images | |
developpement-durable.gouv.fr: # Directories | |
developpement-durable.gouv.fr: # Files | |
developpement-durable.gouv.fr: # Paths (clean URLs) | |
developpement-durable.gouv.fr: # Paths (no clean URLs) | |
unicef.org: # Drupal sites | |
unicef.org: # For Main site | |
unicef.org: # CSS, JS, Images | |
unicef.org: # Directories | |
unicef.org: # Files | |
unicef.org: # Paths (clean URLs) | |
unicef.org: # Paths (no clean URLs) | |
unicef.org: # For ROCO sites | |
unicef.org: # CSS, JS, Images | |
unicef.org: # Directories | |
unicef.org: # Files | |
unicef.org: # Paths (clean URLs) | |
unicef.org: # Paths (no clean URLs) | |
rusvesna.su: # | |
rusvesna.su: # robots.txt | |
rusvesna.su: # | |
rusvesna.su: # This file is to prevent the crawling and indexing of certain parts | |
rusvesna.su: # of your site by web crawlers and spiders run by sites like Yahoo! | |
rusvesna.su: # and Google. By telling these "robots" where not to go on your site, | |
rusvesna.su: # you save bandwidth and server resources. | |
rusvesna.su: # | |
rusvesna.su: # This file will be ignored unless it is at the root of your host: | |
rusvesna.su: # Used: http://example.com/robots.txt | |
rusvesna.su: # Ignored: http://example.com/site/robots.txt | |
rusvesna.su: # | |
rusvesna.su: # For more information about the robots.txt standard, see: | |
rusvesna.su: # http://www.robotstxt.org/robotstxt.html | |
rusvesna.su: # Directories | |
rusvesna.su: # Files | |
rusvesna.su: # Paths (clean URLs) | |
rusvesna.su: # Paths (no clean URLs) | |
huatu.com: # | |
huatu.com: # robots.txt for huatu.com | |
huatu.com: # | |
honeybook.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
nexcess.net: #**************************************************************************** | |
nexcess.net: # robots.txt | |
nexcess.net: # : Robots, spiders, and search engines use this file to detmine which | |
nexcess.net: # content they should *not* crawl while indexing your website. | |
nexcess.net: # : This system is called "The Robots Exclusion Standard." | |
nexcess.net: # : It is strongly encouraged to use a robots.txt validator to check | |
nexcess.net: # for valid syntax before any robots read it! | |
nexcess.net: # | |
nexcess.net: # Examples: | |
nexcess.net: # | |
nexcess.net: # Instruct all robots to stay out of the admin area. | |
nexcess.net: # : User-agent: * | |
nexcess.net: # : Disallow: /admin/ | |
nexcess.net: # | |
nexcess.net: # Restrict Google and MSN from indexing your images. | |
nexcess.net: # : User-agent: Googlebot | |
nexcess.net: # : Disallow: /images/ | |
nexcess.net: # : User-agent: MSNBot | |
nexcess.net: # : Disallow: /images/ | |
nexcess.net: #**************************************************************************** | |
softwaresuggest.com: # Block Uptime robot | |
uct.ac.za: # | |
uct.ac.za: # robots.txt | |
uct.ac.za: # | |
uct.ac.za: # This file is to prevent the crawling and indexing of certain parts | |
uct.ac.za: # of your site by web crawlers and spiders run by sites like Yahoo! | |
uct.ac.za: # and Google. By telling these "robots" where not to go on your site, | |
uct.ac.za: # you save bandwidth and server resources. | |
uct.ac.za: # | |
uct.ac.za: # This file will be ignored unless it is at the root of your host: | |
uct.ac.za: # Used: http://example.com/robots.txt | |
uct.ac.za: # Ignored: http://example.com/site/robots.txt | |
uct.ac.za: # | |
uct.ac.za: # For more information about the robots.txt standard, see: | |
uct.ac.za: # http://www.robotstxt.org/robotstxt.html | |
uct.ac.za: # CSS, JS, Images | |
uct.ac.za: # Directories | |
uct.ac.za: # Files | |
uct.ac.za: # Paths (clean URLs) | |
uct.ac.za: # Paths (no clean URLs) | |
lit.link: # https://www.robotstxt.org/robotstxt.html | |
picsart.com: # Disallow. | |
picsart.com: # For time being | |
picsart.com: # Sitemaps. | |
bravotv.com: # | |
bravotv.com: # robots.txt | |
bravotv.com: # | |
bravotv.com: # This file is to prevent the crawling and indexing of certain parts | |
bravotv.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
bravotv.com: # and Google. By telling these "robots" where not to go on your site, | |
bravotv.com: # you save bandwidth and server resources. | |
bravotv.com: # | |
bravotv.com: # This file will be ignored unless it is at the root of your host: | |
bravotv.com: # Used: http://example.com/robots.txt | |
bravotv.com: # Ignored: http://example.com/site/robots.txt | |
bravotv.com: # | |
bravotv.com: # For more information about the robots.txt standard, see: | |
bravotv.com: # http://www.robotstxt.org/robotstxt.html | |
bravotv.com: # CSS, JS, Images | |
bravotv.com: # Directories | |
bravotv.com: # Files | |
bravotv.com: # Paths (clean URLs) | |
bravotv.com: # Paths (no clean URLs) | |
bravotv.com: # Ads, see https://bravotv.atlassian.net/browse/BO-537 | |
theme.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
theme.co: # This robots.txt file is not used. Please append the content below in the robots.txt file located at the root | |
theme.co: # | |
resh.edu.ru: # www.robotstxt.org/ | |
resh.edu.ru: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
jcrew.com: # | |
jcrew.com: # Site: factory.jcrew.com WWW | |
jcrew.com: # | |
jcrew.com: # This file is retrieved automatically by crawlers conforming to | |
jcrew.com: # the Robots.txt standard. It defines what URLs should/shouldn't | |
jcrew.com: # be indexed. | |
jcrew.com: # See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt> | |
jcrew.com: # | |
jcrew.com: # Format: | |
jcrew.com: # User-agent: <agent-string> | |
jcrew.com: # Disallow: <nothing> | <path> | |
jcrew.com: # ----------------------------------------------------------------------------- | |
jcrew.com: # All User Agent Exclusions | |
influence.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
influence.co: # | |
influence.co: # To ban all spiders from the entire site uncomment the next two lines: | |
vajehyab.com: #container { | |
rfa.org: # Define access-restrictions for robots/spiders | |
rfa.org: # http://www.robotstxt.org/wc/norobots.html | |
rfa.org: # By default we allow robots to access all areas of our site | |
rfa.org: # already accessible to anonymous users | |
rfa.org: # Add Googlebot-specific syntax extension to exclude forms | |
rfa.org: # that are repeated for each piece of content in the site | |
rfa.org: # the wildcard is only supported by Googlebot | |
rfa.org: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling | |
modern.az: # Sitemap files | |
manageengine.com: # -------------------------------------------------- | |
manageengine.com: # Robots.txt file for https://www.manageengine.com | |
manageengine.com: # Author: Webmaster | |
manageengine.com: # Last Updated Date: 08/02/2021 | |
manageengine.com: # -------------------------------------------------- | |
globes.co.il: # Robots.txt file | |
globes.co.il: # | |
globes.co.il: # All robots will spider the domain | |
turkcell.com.tr: # only access | |
businessofapps.com: #Googlebot | |
businessofapps.com: # Global | |
searchandshopping.org: ## Default robots.txt | |
donedeal.ie: # added 22/09/2009 | |
donedeal.ie: # added 9/4/2011 by Fred (trying to block Donkiz, but not sure if this works) | |
donedeal.ie: # added 23/05/2011 by Declan (trying to block Sightup) | |
donedeal.ie: # added 16/10/2014 by Pete | |
donedeal.ie: # added 20/02/2015 by Pete | |
donedeal.ie: # added 16/09/2016 by Pete | |
uscourts.gov: # | |
uscourts.gov: # robots.txt | |
uscourts.gov: # | |
uscourts.gov: # This file is to prevent the crawling and indexing of certain parts | |
uscourts.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
uscourts.gov: # and Google. By telling these "robots" where not to go on your site, | |
uscourts.gov: # you save bandwidth and server resources. | |
uscourts.gov: # | |
uscourts.gov: # This file will be ignored unless it is at the root of your host: | |
uscourts.gov: # Used: http://example.com/robots.txt | |
uscourts.gov: # Ignored: http://example.com/site/robots.txt | |
uscourts.gov: # | |
uscourts.gov: # For more information about the robots.txt standard, see: | |
uscourts.gov: # http://www.robotstxt.org/robotstxt.html | |
uscourts.gov: # CSS, JS, Images | |
uscourts.gov: # Directories | |
uscourts.gov: # Files | |
uscourts.gov: # Paths (clean URLs) | |
uscourts.gov: # Paths (no clean URLs) | |
pangzitv.com: #notfound { | |
pangzitv.com: #notfound .notfound { | |
thetimes.co.uk: #Agent Specific Disallowed Sections | |
rabobank.nl: # Robots.txt voor www.rabobank.nl | |
rabobank.nl: # Directories die niet geindexeerd hoeven te worden door externe search | |
rabobank.nl: # engines # | |
sendle.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
tamilwin.com: # Disallow: /*? This is match ? anywhere in the URL | |
xactlycorp.com: # | |
xactlycorp.com: # robots.txt | |
xactlycorp.com: # | |
xactlycorp.com: # This file is to prevent the crawling and indexing of certain parts | |
xactlycorp.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
xactlycorp.com: # and Google. By telling these "robots" where not to go on your site, | |
xactlycorp.com: # you save bandwidth and server resources. | |
xactlycorp.com: # | |
xactlycorp.com: # This file will be ignored unless it is at the root of your host: | |
xactlycorp.com: # Used: http://example.com/robots.txt | |
xactlycorp.com: # Ignored: http://example.com/site/robots.txt | |
xactlycorp.com: # | |
xactlycorp.com: # For more information about the robots.txt standard, see: | |
xactlycorp.com: # http://www.robotstxt.org/robotstxt.html | |
xactlycorp.com: # CSS, JS, Images | |
xactlycorp.com: # Directories | |
xactlycorp.com: # Files | |
xactlycorp.com: # Paths (clean URLs) | |
xactlycorp.com: # Paths (no clean URLs) | |
xactlycorp.com: # Query URLs | |
here.com: # | |
here.com: # robots.txt | |
here.com: # | |
here.com: # This file is to prevent the crawling and indexing of certain parts | |
here.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
here.com: # and Google. By telling these "robots" where not to go on your site, | |
here.com: # you save bandwidth and server resources. | |
here.com: # | |
here.com: # This file will be ignored unless it is at the root of your host: | |
here.com: # Used: http://example.com/robots.txt | |
here.com: # Ignored: http://example.com/site/robots.txt | |
here.com: # | |
here.com: # For more information about the robots.txt standard, see: | |
here.com: # http://www.robotstxt.org/robotstxt.html | |
here.com: # CSS, JS, Images | |
here.com: # Directories | |
here.com: # Files | |
here.com: # Paths (clean URLs) | |
here.com: # Paths (no clean URLs) | |
xiaomi.net: # 2015/12/11 | |
alaskaair.com: #Update 4/16/2019 - 9:27AM SLH | |
alaskaair.com: #Sites | |
alaskaair.com: #REMARK: to allow SE to de-indices url, remove/uncomment after SE de-indexed all these urls using on Accuwork # | |
alaskaair.com: #Disallow: /contents.asp | |
alaskaair.com: #Disallow: /Home.asp | |
alaskaair.com: #Disallow: /mileageplan/AboutMP.asp | |
alaskaair.com: #Disallow: /mileageplan/awardsAustralia.asp | |
alaskaair.com: #Disallow: /mileageplan/awardsSAmerica.asp | |
alaskaair.com: #Disallow: /mileageplan/CustomerComments.asp | |
alaskaair.com: #Disallow: /mileageplan/definitions.asp | |
alaskaair.com: #Disallow: /mileageplan/MemberGuide.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Airline.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Car.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Dining.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Financial.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Hotel.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Specialty.asp | |
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Telecom.asp | |
alaskaair.com: #Disallow: /mileageplan/MPpremiums.asp | |
alaskaair.com: #Disallow: /mileageplan/MVPStatus.asp | |
alaskaair.com: #Disallow: /mileageplan/OnlineAwardChart.asp | |
alaskaair.com: #Disallow: /mileageplan/PartnerMilesOps.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/Awards.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/Credit.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/EStatements.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/mpfaq.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/MVP.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/OtherMP.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/PDRes.asp | |
alaskaair.com: #Disallow: /mileageplan/faqs/Upgrades.asp | |
alaskaair.com: #Disallow: /mileageplan/ssl/partner/PartnerForm.asp | |
alaskaair.com: #Disallow: /shared/tips/AboutCompanyFares.asp | |
alaskaair.com: #Disallow: /shared/tips/AboutECertTip.asp | |
alaskaair.com: #Disallow: /shared/tips/AboutFareOptions.asp | |
alaskaair.com: #Old Content | |
alaskaair.com: #site core or as.com partial Content | |
alaskaair.com: #PDFs | |
alaskaair.com: #Disallow: /mileageplan/ExpressionOfThanks.pdf | |
alaskaair.com: #images | |
alaskaair.com: #parameters | |
alaskaair.com: #support files | |
alaskaair.com: #web services | |
nos.nl: # www.robotstxt.org/ | |
nos.nl: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
mediapost.com: # robots.txt | |
mediapost.com: # Tell "bitlybot" not to come here at all | |
mediapost.com: # From NYT.com - nobody seems to like this bot | |
mediapost.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
mediapost.com: # unless they're feeding search engines. | |
mediapost.com: # Some bots are known to be trouble, particularly those designed to copy | |
mediapost.com: # entire sites. Please obey robots.txt. | |
mediapost.com: # | |
mediapost.com: # Sorry, wget in its recursive mode is a frequent problem. | |
mediapost.com: # Please read the man page and use it properly; there is a | |
mediapost.com: # --wait option you can use to set the delay between hits, | |
mediapost.com: # for instance. | |
mediapost.com: # | |
mediapost.com: # | |
mediapost.com: # The 'grub' distributed client has been *very* poorly behaved. | |
mediapost.com: # | |
mediapost.com: # | |
mediapost.com: # Doesn't follow robots.txt anyway, but... | |
mediapost.com: # | |
mediapost.com: # | |
mediapost.com: # Hits many times per second, not acceptable | |
mediapost.com: # http://www.nameprotect.com/botinfo.html | |
mediapost.com: # A capture bot, downloads gazillions of pages with no public benefit | |
mediapost.com: # http://www.webreaper.net/ | |
mediapost.com: # | |
mediapost.com: # Friendly, low-speed bots are welcome viewing pages. | |
mediapost.com: # | |
mediapost.com: # | |
mediapost.com: # GoogleBot | |
mediapost.com: # | |
mediapost.com: # | |
mediapost.com: # MSN Bot listens to Crawl-Delay | |
mediapost.com: # | |
mediapost.com: # | |
mediapost.com: # Yahoo/Inktomi listens to Crawl-Delay | |
mediapost.com: # | |
mediapost.com: #Baiduspider | |
mediapost.com: #Yandex | |
alldatasheet.com: # User-agent: * | |
alldatasheet.com: # Disallow: /img | |
alldatasheet.com: # Disallow: /view_distributor2.jsp | |
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=675 | |
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=836 | |
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=929 | |
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=247 | |
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=164 | |
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=411 | |
alldatasheet.com: #User-agent: Googlebot | |
alldatasheet.com: #Disallow: / | |
alldatasheet.com: #Crawl-delay: 1 | |
alldatasheet.com: #User-agent: MSNBot | |
alldatasheet.com: #Disallow: / | |
alldatasheet.com: #Crawl-delay: 20 | |
alldatasheet.com: #User-agent: BaiDuSpider | |
alldatasheet.com: #Disallow: / | |
alldatasheet.com: #Crawl-delay: 20 | |
alldatasheet.com: #User-agent: bingbot | |
alldatasheet.com: #Disallow: / | |
alldatasheet.com: #Crawl-delay: 20 | |
alldatasheet.com: #User-agent: YandexBot | |
alldatasheet.com: #Disallow: / | |
alldatasheet.com: #Crawl-delay: 20 | |
stackblitz.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
moneyhouse.ch: #Prevent bots from crawling IT-Person Profiles to boost the hitlist | |
imf.org: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
imf.org: #content{margin:0 0 0 2%;position:relative;} | |
weedmaps.com: # WP Defaults | |
weedmaps.com: # WP Learn | |
weedmaps.com: # WP Open CA | |
weedmaps.com: # WP Review guidelines | |
weedmaps.com: # WP Sports | |
weedmaps.com: # WP Verified | |
weedmaps.com: # WP Weed Facts | |
kotlinlang.org: # Sitemaps | |
gamepedia.jp: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
leafly.com: # Modified: 01/12/2021 | |
salary.com: # robots.txt for https://www.salary.com/ | |
wuzzuf.net: #Disallow access to iframes | |
epson.com: # For all robots | |
epson.com: # Block access to specific groups of pages | |
epson.com: # Allow search crawlers to discover the sitemap | |
epson.com: # Block CazoodleBot as it does not present correct accept content headers | |
epson.com: # Block MJ12bot as it is just noise | |
epson.com: # Block dotbot as it cannot parse base urls properly | |
epson.com: # Block Gigabot | |
xoom.com: # Per robots.txt directives, blank lines are not good in robots.txt | |
xoom.com: # Please see www.robotstxt.org and the wikipedia page for proper syntax | |
clinicaltrials.gov: # robots.txt - robot exclusion file - back-end server version - no robots! | |
clinicaltrials.gov: # ======================================================================== | |
vmware.com: # List folders crawlers are not allowed to Index. | |
vmware.com: # List PDFs crawlers are not allowed to Index. | |
foodandwine.com: # Sitemaps | |
foodandwine.com: #Onecms | |
foodandwine.com: # Content | |
foodandwine.com: #Onecms | |
foodandwine.com: # Content | |
estrategiaconcursos.com.br: # Robots.txt file from http://www.estrategiaconcursos.com.br | |
tnt.com: # If you are not a robot sniffing around in this file, | |
tnt.com: # We might be looking for you to join our SEO team | |
tnt.com: # Contact us here: search.advertising@tnt.com | |
tnt.com: # | |
tnt.com: # _____ _ _ _____ | |
tnt.com: # |_ _| \| |_ _| | |
tnt.com: # | | | .` | | | | |
tnt.com: # |_| |_|\_| |_| | |
tnt.com: # | |
nist.gov: # | |
nist.gov: # robots.txt | |
nist.gov: # | |
nist.gov: # This file is to prevent the crawling and indexing of certain parts | |
nist.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nist.gov: # and Google. By telling these "robots" where not to go on your site, | |
nist.gov: # you save bandwidth and server resources. | |
nist.gov: # | |
nist.gov: # This file will be ignored unless it is at the root of your host: | |
nist.gov: # Used: http://example.com/robots.txt | |
nist.gov: # Ignored: http://example.com/site/robots.txt | |
nist.gov: # | |
nist.gov: # For more information about the robots.txt standard, see: | |
nist.gov: # http://www.robotstxt.org/robotstxt.html | |
nist.gov: # CSS, JS, Images | |
nist.gov: # Directories | |
nist.gov: # Files | |
nist.gov: # Paths (clean URLs) | |
nist.gov: # Paths (no clean URLs) | |
nist.gov: # Noindex files | |
raiffeisen.ch: # Exclude Special Areas | |
raiffeisen.ch: # Exclude Special Microsites | |
raiffeisen.ch: # Exclude Special Filetypes | |
raiffeisen.ch: # Stop Wasting Crawlbudget | |
raiffeisen.ch: # Exclude old casa urls | |
raiffeisen.ch: # Exclude livestream | |
sankei.com: # sitemap | |
sankei.com: # not contents | |
sankei.com: # not crawl target | |
sankei.com: # old contents | |
shatel.ir: # www.robotstxt.org/ | |
shatel.ir: # Allow crawling of all content | |
sociopost.com: # | |
sociopost.com: # robots.txt | |
sociopost.com: # | |
sociopost.com: # This file is to prevent the crawling and indexing of certain parts | |
sociopost.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sociopost.com: # and Google. By telling these "robots" where not to go on your site, | |
sociopost.com: # you save bandwidth and server resources. | |
sociopost.com: # | |
sociopost.com: # This file will be ignored unless it is at the root of your host: | |
sociopost.com: # Used: http://example.com/robots.txt | |
sociopost.com: # Ignored: http://example.com/site/robots.txt | |
sociopost.com: # | |
sociopost.com: # For more information about the robots.txt standard, see: | |
sociopost.com: # http://www.robotstxt.org/wc/robots.html | |
sociopost.com: # | |
sociopost.com: # For syntax checking, see: | |
sociopost.com: # http://www.sxw.org.uk/computing/robots/check.html | |
sociopost.com: # Directories | |
sociopost.com: # Files | |
sociopost.com: # Paths (clean URLs) | |
sociopost.com: # Paths (no clean URLs) | |
polkastarter.com: # See https://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
electrek.co: # Sitemap archive | |
newchic.com: #ÊêúÁ¥¢ | |
newchic.com: #搜索分类 | |
state.ma.us: # | |
state.ma.us: # robots.txt | |
state.ma.us: # | |
state.ma.us: # This file is to prevent the crawling and indexing of certain parts | |
state.ma.us: # of your site by web crawlers and spiders run by sites like Yahoo! | |
state.ma.us: # and Google. By telling these "robots" where not to go on your site, | |
state.ma.us: # you save bandwidth and server resources. | |
state.ma.us: # | |
state.ma.us: # This file will be ignored unless it is at the root of your host: | |
state.ma.us: # Used: http://example.com/robots.txt | |
state.ma.us: # Ignored: http://example.com/site/robots.txt | |
state.ma.us: # | |
state.ma.us: # For more information about the robots.txt standard, see: | |
state.ma.us: # http://www.robotstxt.org/robotstxt.html | |
state.ma.us: # CSS, JS, Images | |
state.ma.us: # Directories | |
state.ma.us: # Files | |
state.ma.us: # Paths (clean URLs) | |
state.ma.us: # Paths (no clean URLs) | |
lucidpress.com: # | |
lucidpress.com: # robots.txt | |
lucidpress.com: # | |
lucidpress.com: # This file is to prevent the crawling and indexing of certain parts | |
lucidpress.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
lucidpress.com: # and Google. By telling these "robots" where not to go on your site, | |
lucidpress.com: # you save bandwidth and server resources. | |
lucidpress.com: # | |
lucidpress.com: # This file will be ignored unless it is at the root of your host: | |
lucidpress.com: # Used: http://example.com/robots.txt | |
lucidpress.com: # Ignored: http://example.com/site/robots.txt | |
lucidpress.com: # | |
lucidpress.com: # For more information about the robots.txt standard, see: | |
lucidpress.com: # http://www.robotstxt.org/wc/robots.html | |
lucidpress.com: # | |
lucidpress.com: # For syntax checking, see: | |
lucidpress.com: # http://www.sxw.org.uk/computing/robots/check.html | |
lucidpress.com: # Directories | |
lucidpress.com: # Paths (no clean URLs) | |
lucidpress.com: ##### | |
lucidpress.com: # Drupal | |
lucidpress.com: ##### | |
lucidpress.com: # Directories | |
lucidpress.com: # Allow some content from /pages/misc | |
lucidpress.com: # Files | |
lucidpress.com: # Paths (clean URLs) | |
lucidpress.com: # Paths (no clean URLs) | |
lucidpress.com: # Rewrites | |
lucidpress.com: ##### | |
lucidpress.com: # Code-Base | |
lucidpress.com: # | |
lucidpress.com: # The following URL's are defined in our routing files, | |
lucidpress.com: # but have no value for indexing. Several of them should | |
lucidpress.com: # definitely NOT be indexed. | |
lucidpress.com: ##### | |
tiava.com: # www.robotstxt.org/ | |
tiava.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
upstream.to: #User-agent: * | |
upstream.to: #Disallow: / | |
thestudentroom.co.uk: #Disallow: /w/ | |
thestudentroom.co.uk: #Disallow: /ads.php | |
thestudentroom.co.uk: #Disallow: /*showthread.php?p= | |
thestudentroom.co.uk: #Disallow: /m/ | |
opensource.com: # | |
opensource.com: # robots.txt | |
opensource.com: # | |
opensource.com: # This file is to prevent the crawling and indexing of certain parts | |
opensource.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
opensource.com: # and Google. By telling these "robots" where not to go on your site, | |
opensource.com: # you save bandwidth and server resources. | |
opensource.com: # | |
opensource.com: # This file will be ignored unless it is at the root of your host: | |
opensource.com: # Used: http://example.com/robots.txt | |
opensource.com: # Ignored: http://example.com/site/robots.txt | |
opensource.com: # | |
opensource.com: # For more information about the robots.txt standard, see: | |
opensource.com: # http://www.robotstxt.org/robotstxt.html | |
opensource.com: # CSS, JS, Images | |
opensource.com: # Directories | |
opensource.com: # Files | |
opensource.com: # Paths (clean URLs) | |
opensource.com: # Paths (no clean URLs) | |
yellowimages.com: # https://megaindex.com/crawler | |
yellowimages.com: # Screaming Frog SEO Spider | |
yellowimages.com: # https://www.screamingfrog.co.uk/seo-spider/ | |
yellowimages.com: # http://webmeup-crawler.com | |
tableau.com: # Directories | |
tableau.com: # Paths (clean URLs) | |
tableau.com: # Paths (no clean URLs) | |
tableau.com: # Tableau | |
tableau.com: # Email only downloads | |
tableau.com: # BingBot can be overzealous, calm down. | |
tableau.com: # Bots without value | |
tableau.com: # re #7789 - temporarily unblock search vendor | |
tableau.com: # User-agent: SemrushBot | |
tableau.com: # Disallow: / | |
gingersoftware.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $ | |
gingersoftware.com: # | |
gingersoftware.com: # robots.txt | |
gingersoftware.com: # | |
gingersoftware.com: # This file is to prevent the crawling and indexing of certain parts | |
gingersoftware.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
gingersoftware.com: # and Google. By telling these "robots" where not to go on your site, | |
gingersoftware.com: # you save bandwidth and server resources. | |
gingersoftware.com: # | |
gingersoftware.com: # This file will be ignored unless it is at the root of your host: | |
gingersoftware.com: # Used: http://example.com/robots.txt | |
gingersoftware.com: # Ignored: http://example.com/site/robots.txt | |
gingersoftware.com: # | |
gingersoftware.com: # For more information about the robots.txt standard, see: | |
gingersoftware.com: # http://www.robotstxt.org/wc/robots.html | |
gingersoftware.com: # | |
gingersoftware.com: # For syntax checking, see: | |
gingersoftware.com: # http://www.sxw.org.uk/computing/robots/check.html | |
gingersoftware.com: # Directories | |
gingersoftware.com: # Files | |
gingersoftware.com: # Paths (clean URLs) | |
gingersoftware.com: # Paths (no clean URLs) | |
rocketmortgage.com: # password reset | |
assam.gov.in: # | |
assam.gov.in: # robots.txt | |
assam.gov.in: # | |
assam.gov.in: # This file is to prevent the crawling and indexing of certain parts | |
assam.gov.in: # of your site by web crawlers and spiders run by sites like Yahoo! | |
assam.gov.in: # and Google. By telling these "robots" where not to go on your site, | |
assam.gov.in: # you save bandwidth and server resources. | |
assam.gov.in: # | |
assam.gov.in: # This file will be ignored unless it is at the root of your host: | |
assam.gov.in: # Used: http://example.com/robots.txt | |
assam.gov.in: # Ignored: http://example.com/site/robots.txt | |
assam.gov.in: # | |
assam.gov.in: # For more information about the robots.txt standard, see: | |
assam.gov.in: # http://www.robotstxt.org/robotstxt.html | |
assam.gov.in: # CSS, JS, Images | |
assam.gov.in: # Directories | |
assam.gov.in: # Files | |
assam.gov.in: # Paths (clean URLs) | |
assam.gov.in: # Paths (no clean URLs) | |
better.com: # http://www.robotstxt.org | |
coach.com: #4/1/2019 | |
coach.com: #2358 | |
sport.pl: #cookieInfoMsgWrapper {margin-bottom: -2px;} | |
balr.com: # we use Shopify as our ecommerce platform | |
balr.com: # Google adsbot ignores robots.txt unless specifically named! | |
zebra.com: #NS | |
eluniversal.com.mx: # | |
eluniversal.com.mx: # robots.txt | |
eluniversal.com.mx: # | |
eluniversal.com.mx: # This file is to prevent the crawling and indexing of certain parts | |
eluniversal.com.mx: # of your site by web crawlers and spiders run by sites like Yahoo! | |
eluniversal.com.mx: # and Google. By telling these "robots" where not to go on your site, | |
eluniversal.com.mx: # you save bandwidth and server resources. | |
eluniversal.com.mx: # | |
eluniversal.com.mx: # This file will be ignored unless it is at the root of your host: | |
eluniversal.com.mx: # Used: http://example.com/robots.txt | |
eluniversal.com.mx: # Ignored: http://example.com/site/robots.txt | |
eluniversal.com.mx: # | |
eluniversal.com.mx: # For more information about the robots.txt standard, see: | |
eluniversal.com.mx: # http://www.robotstxt.org/robotstxt.html | |
eluniversal.com.mx: # CSS, JS, Images | |
eluniversal.com.mx: # Directories | |
eluniversal.com.mx: # Files | |
eluniversal.com.mx: # Paths (clean URLs) | |
eluniversal.com.mx: # Paths (no clean URLs) | |
ucr.edu: # | |
ucr.edu: # robots.txt | |
ucr.edu: # | |
ucr.edu: # This file is to prevent the crawling and indexing of certain parts | |
ucr.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ucr.edu: # and Google. By telling these "robots" where not to go on your site, | |
ucr.edu: # you save bandwidth and server resources. | |
ucr.edu: # | |
ucr.edu: # This file will be ignored unless it is at the root of your host: | |
ucr.edu: # Used: http://example.com/robots.txt | |
ucr.edu: # Ignored: http://example.com/site/robots.txt | |
ucr.edu: # | |
ucr.edu: # For more information about the robots.txt standard, see: | |
ucr.edu: # http://www.robotstxt.org/robotstxt.html | |
ucr.edu: # CSS, JS, Images | |
ucr.edu: # Directories | |
ucr.edu: # Files | |
ucr.edu: # Paths (clean URLs) | |
ucr.edu: # Paths (no clean URLs) | |
ddengle.com: # robots.txt | |
theadventurechallenge.com: # we use Shopify as our ecommerce platform | |
theadventurechallenge.com: # Google adsbot ignores robots.txt unless specifically named! | |
hkej.com: #User-agent: * | |
hkej.com: #Disallow: /rss/onlinenews.xml | |
hkej.com: #Disallow: /rss/shopping.xml | |
hkej.com: #Disallow: /rss/wine.xml | |
hkej.com: #Disallow: /template/forum/ | |
hkej.com: #Disallow: /template/blog/ | |
hkej.com: #Disallow: /template/xml/ | |
hkej.com: #Disallow: /rss/onlinenews.xml | |
hkej.com: #Disallow: /rss/shopping.xml | |
hkej.com: #Disallow: /rss/wine.xml | |
hkej.com: #Disallow: /template/forum/ | |
hkej.com: #Disallow: /template/blog/ | |
hkej.com: #Disallow: /template/xml/ | |
hkej.com: #Sitemap: http://www.hkej.com/rss/sitemap.xml | |
bloter.net: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/ | |
worldofwarships.ru: # General | |
worldofwarships.ru: # News | |
worldofwarships.ru: # Media | |
asianwiki.com: # | |
asianwiki.com: # Sorry, wget in its recursive mode is a frequent problem. | |
asianwiki.com: # Please read the man page and use it properly; there is a | |
asianwiki.com: # --wait option you can use to set the delay between hits, | |
asianwiki.com: # for instance. | |
asianwiki.com: # | |
asianwiki.com: # | |
asianwiki.com: # Hits many times per second, not acceptable | |
asianwiki.com: # http://www.nameprotect.com/botinfo.html | |
asianwiki.com: # A capture bot, downloads gazillions of pages with no public benefit | |
asianwiki.com: # http://www.webreaper.net/ | |
asianwiki.com: # Don't allow the wayback-maschine to index user-pages | |
decolar.com: #Robots default PT | |
decolar.com: #Los siguientes 2 son de las nuevas landings de hoteles en destinos | |
decolar.com: #Bloquea las landings de hoteles en pais | |
decolar.com: #Los siguientes 3 son por la estructura de urls de mobile | |
decolar.com: #Allow para hoteles search | |
decolar.com: #Bloquea paginas de hoteles version para imprimir | |
decolar.com: #Paquetes | |
decolar.com: #Actividades | |
decolar.com: #Buses | |
decolar.com: #Transfer | |
decolar.com: #Special clients | |
decolar.com: #Bloqueos especificos para el bot de QualityScore de SEM | |
decolar.com: #Bloqueo Baidu: | |
decolar.com: #multidestinos | |
decolar.com: #hermes | |
rokomari.com: # If you operate a search engine and would like to crawl Rokomari.Com, please | |
rokomari.com: # email us admin@rokomari.com . Thanks. | |
rokomari.com: #Disallow: /static/* # Allowed at 2019-11-07_11-17 by the request of Shougat Hossain - Fahad Ahammed | |
rokomari.com: # Jhokomari Blocked by Shougat Vai | |
bhg.com: #Sitemaps | |
bhg.com: # ONECMS | |
bhg.com: # Content | |
bhg.com: # ONECMS | |
bhg.com: # Content - allows syndication | |
518.com.tw: # Robots.txt file from https://www.518.com.tw | |
usa.gov: # | |
usa.gov: # robots.txt | |
usa.gov: # | |
usa.gov: # This file is to prevent the crawling and indexing of certain parts | |
usa.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
usa.gov: # and Google. By telling these "robots" where not to go on your site, | |
usa.gov: # you save bandwidth and server resources. | |
usa.gov: # | |
usa.gov: # This file will be ignored unless it is at the root of your host: | |
usa.gov: # Used: http://example.com/robots.txt | |
usa.gov: # Ignored: http://example.com/site/robots.txt | |
usa.gov: # | |
usa.gov: # For more information about the robots.txt standard, see: | |
usa.gov: # http://www.robotstxt.org/robotstxt.html | |
usa.gov: # Sitemaps | |
usa.gov: # CSS, JS, Images | |
usa.gov: # Directories | |
usa.gov: # Files | |
usa.gov: # Paths (clean URLs) | |
usa.gov: # Paths (no clean URLs) | |
usa.gov: # Specific pathing blocks for USA.gov and GobiernoUSA.gov | |
apple.com.cn: # robots.txt for https://www.apple.com.cn/ | |
specialized.com: # For all robots | |
specialized.com: # Block access to specific groups of pages | |
specialized.com: # Block access to URL filters | |
specialized.com: # Allow search crawlers to discover the sitemap | |
specialized.com: # Block CazoodleBot as it does not present correct accept content headers | |
specialized.com: # Block MJ12bot as it is just noise | |
specialized.com: # Block dotbot as it cannot parse base urls properly | |
specialized.com: # Block Gigabot | |
avvocatoandreani.it: # | |
avvocatoandreani.it: # robots.txt v. 1.02 | |
avvocatoandreani.it: # | |
umontreal.ca: # Accepter l'indexation | |
thread.com: # The following are temporary disallows to exclude noindex'ed pages from scraping | |
thread.com: # before they are hit. In the long term we should add nofollow to links to any of | |
thread.com: # pages, which requires efficent computation of whether the linked page is | |
thread.com: # indexable. | |
thread.com: # Updating these? You should also update filterset_indexable(). | |
thread.com: # Filters: | |
thread.com: # Other: | |
thread.com: # Item sources which are being indexed for an unknown reason (they redirect). | |
thread.com: # Temporary until we switch to query parameters. | |
radiojavan.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
somuch.com: # Robots need homes too | |
cfi.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
cfi.cn: #content{margin:0 0 0 2%;position:relative;} | |
btctrademart.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
btctrademart.com: #content{margin:0 0 0 2%;position:relative;} | |
javmost.com: #container { | |
reforma.com: # Primero deben especificarse los Allow y despues los Disallow | |
reforma.com: # Para el caso de que twitter pueda llegar al ImageTransformer | |
reforma.com: # robots.txt es case sensitive. Un url en bajas es difernte a un URL en altas | |
mubasher.info: # Integration | |
mubasher.info: # Static resources | |
mubasher.info: # Chart Files | |
mubasher.info: # API | |
mubasher.info: # # # # | |
mubasher.info: # MIX ORM URLs FI | |
mubasher.info: # MIX ORM URLs IPOS | |
olx.uz: # sitecode:olxuz-desktop | |
wallstreetoasis.com: # | |
wallstreetoasis.com: # robots.txt | |
wallstreetoasis.com: # | |
wallstreetoasis.com: # This file is to prevent the crawling and indexing of certain parts | |
wallstreetoasis.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
wallstreetoasis.com: # and Google. By telling these "robots" where not to go on your site, | |
wallstreetoasis.com: # you save bandwidth and server resources. | |
wallstreetoasis.com: # | |
wallstreetoasis.com: # This file will be ignored unless it is at the root of your host: | |
wallstreetoasis.com: # Used: http://example.com/robots.txt | |
wallstreetoasis.com: # Ignored: http://example.com/site/robots.txt | |
wallstreetoasis.com: # | |
wallstreetoasis.com: # For more information about the robots.txt standard, see: | |
wallstreetoasis.com: # http://www.robotstxt.org/robotstxt.html | |
wallstreetoasis.com: # | |
wallstreetoasis.com: # For syntax checking, see: | |
wallstreetoasis.com: # http://www.frobee.com/robots-txt-check | |
wallstreetoasis.com: # CSS, JS, Images | |
wallstreetoasis.com: # Directories | |
wallstreetoasis.com: # Files | |
wallstreetoasis.com: # Paths (clean URLs) | |
wallstreetoasis.com: # Paths (no clean URLs) | |
wallstreetoasis.com: # No access for table sorting paths or any paths that have parameters. | |
wallstreetoasis.com: # No access for quicktabs in the URL | |
wallstreetoasis.com: # Disallow URLs with destination parameter | |
wallstreetoasis.com: # Added by Khalid, 2012-02-27 per Patrick's request | |
wallstreetoasis.com: # Added by Khalid, 2012-04-02 per Joao's request | |
wallstreetoasis.com: # Added by Vitor, 2012-05-02 per Joao's request | |
wallstreetoasis.com: # Added by Vitor, 2012-05-02 per Khalid's request | |
wallstreetoasis.com: # Added by Vitor, 2012-05-16 | |
wallstreetoasis.com: # Added by Vitor, 2012-05-30 | |
wallstreetoasis.com: # Added by Vitor, 2012-09-04 per Joao's request | |
wallstreetoasis.com: # Added by Joao, 2013-02-21 | |
wallstreetoasis.com: # Added by Joao, 2013-03-12 | |
wallstreetoasis.com: #Disallow: /company/*/review | |
wallstreetoasis.com: #Disallow: /?q=company/*/review | |
wallstreetoasis.com: # comments by jgsantos on 2016-02-11 | |
wallstreetoasis.com: #Disallow: /company/*/interview | |
wallstreetoasis.com: #Disallow: /?q=company/*/interview | |
wallstreetoasis.com: #Disallow: /company/*/compensation | |
wallstreetoasis.com: #Disallow: /?q=company/*/compensation | |
wallstreetoasis.com: #Allows added by jgsantos 2017-10-19 | |
wallstreetoasis.com: #Allow: /company/*/interview/compensation | |
wallstreetoasis.com: #Allow: /?q=company/*/interview/compensation | |
wallstreetoasis.com: #Allow: /company/*/interview/review | |
wallstreetoasis.com: #Allow: /?q=company/*/interview/review | |
wallstreetoasis.com: # Added by Joao, 2016-09-02 | |
wallstreetoasis.com: #Added by jgsantos, 2017-10-23 | |
wallstreetoasis.com: # Crawl Delay To Slow Down Rogerbot | |
wallstreetoasis.com: # @see: https://moz.com/help/moz-procedures/crawlers/rogerbot#crawl-delay-to-slow-down-rogerbot | |
yamaha.com: # robots.txt | |
thebump.com: # Updated on 11/16/17 | |
ilfattoquotidiano.it: # BEGIN XML-SITEMAP-PLUGIN | |
ilfattoquotidiano.it: # END XML-SITEMAP-PLUGIN | |
dekiru.net: # robots.txt for https://dekiru.net/ | |
freepeople.com: # Sitemap indexes | |
cimanow.tv: # Google Image | |
cimanow.tv: # Google AdSense | |
cimanow.tv: # digg mirror | |
cimanow.tv: # global | |
bigstockphoto.com: #Domain: www.bigstockphoto.com | |
aileensoul.com: #Disallow: /business-profile | |
aileensoul.com: #Disallow: */profile | |
stgeorge.com.au: # /robots.txt file for http://www.stgeorge.com.au/ | |
getadsonline.com: # Blocks robots from specific folders / directories | |
ntvbd.com: # | |
ntvbd.com: # robots.txt | |
ntvbd.com: # | |
ntvbd.com: # This file is to prevent the crawling and indexing of certain parts | |
ntvbd.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ntvbd.com: # and Google. By telling these "robots" where not to go on your site, | |
ntvbd.com: # you save bandwidth and server resources. | |
ntvbd.com: # | |
ntvbd.com: # This file will be ignored unless it is at the root of your host: | |
ntvbd.com: # Used: http://example.com/robots.txt | |
ntvbd.com: # Ignored: http://example.com/site/robots.txt | |
ntvbd.com: # | |
ntvbd.com: # For more information about the robots.txt standard, see: | |
ntvbd.com: # http://www.robotstxt.org/robotstxt.html | |
ntvbd.com: # CSS, JS, Images | |
ntvbd.com: # Directories | |
ntvbd.com: # Files | |
ntvbd.com: # Paths (clean URLs) | |
ntvbd.com: # Paths (no clean URLs) | |
findlaw.com: # Findlaw robots.txt file | |
uoregon.edu: # | |
uoregon.edu: # robots.txt | |
uoregon.edu: # | |
uoregon.edu: # This file is to prevent the crawling and indexing of certain parts | |
uoregon.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
uoregon.edu: # and Google. By telling these "robots" where not to go on your site, | |
uoregon.edu: # you save bandwidth and server resources. | |
uoregon.edu: # | |
uoregon.edu: # This file will be ignored unless it is at the root of your host: | |
uoregon.edu: # Used: http://example.com/robots.txt | |
uoregon.edu: # Ignored: http://example.com/site/robots.txt | |
uoregon.edu: # | |
uoregon.edu: # For more information about the robots.txt standard, see: | |
uoregon.edu: # http://www.robotstxt.org/robotstxt.html | |
uoregon.edu: # CSS, JS, Images | |
uoregon.edu: # Directories | |
uoregon.edu: # Files | |
uoregon.edu: # Paths (clean URLs) | |
uoregon.edu: # Paths (no clean URLs) | |
gutenberg.org: # User-agent: Baiduspider | |
gutenberg.org: # Disallow: / | |
gutenberg.org: # User-agent: Yandex | |
gutenberg.org: # Disallow: / | |
google.com.cu: # AdsBot | |
google.com.cu: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
miansai.com: # we use Shopify as our ecommerce platform | |
miansai.com: # Google adsbot ignores robots.txt unless specifically named! | |
zoopla.co.uk: # ___ ___ ___ ___ ___ | |
zoopla.co.uk: # / /\ / /\ / /\ / /\ / /\ | |
zoopla.co.uk: # / /::| / /::\ / /::\ / /::\ / /::\ | |
zoopla.co.uk: # / /:/:| / /:/\:\ / /:/\:\ / /:/\:\ ___ ___ / /:/\:\ | |
zoopla.co.uk: # / /:/|:|__ / /:/ \:\ / /:/ \:\ / /:/~/:/ /__/\ / /\ / /:/~/::\ | |
zoopla.co.uk: #/__/:/ |:| /\ /__/:/ \__\:\ /__/:/ \__\:\ /__/:/ /:/ \ \:\ / /:/ /__/:/ /:/\:\ | |
zoopla.co.uk: #\__\/ |:|/:/ \ \:\ / /:/ \ \:\ / /:/ \ \:\/:/ \ \:\ /:/ \ \:\/:/__\/ | |
zoopla.co.uk: # | |:/:/ \ \:\ /:/ \ \:\ /:/ \ \::/ \ \:\/:/ \ \::/ | |
zoopla.co.uk: # | |::/ \ \:\/:/ \ \:\/:/ \ \:\ \ \::/ \ \:\ | |
zoopla.co.uk: # | |:/ \ \::/ \ \::/ \ \:\ \__\/ \ \:\ | |
zoopla.co.uk: # |__|/ \__\/ \__\/ \__\/ \__\/ | |
zoopla.co.uk: # Disallow: /property/location/edit/* | |
zoopla.co.uk: # Disallow: /property/edit/ | |
zoopla.co.uk: # Baidu restricted to for sale and new homes | |
zoopla.co.uk: # Slurp (Still slurping why?) | |
zoopla.co.uk: # Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php) | |
zoopla.co.uk: # blocked as they are making incorrect requests | |
zoopla.co.uk: # Let Google Ads crawl everything | |
fastspring.com: # robotstxt.org/ | |
ebc.com.br: # | |
ebc.com.br: # robots.txt | |
ebc.com.br: # | |
ebc.com.br: # This file is to prevent the crawling and indexing of certain parts | |
ebc.com.br: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ebc.com.br: # and Google. By telling these "robots" where not to go on your site, | |
ebc.com.br: # you save bandwidth and server resources. | |
ebc.com.br: # | |
ebc.com.br: # This file will be ignored unless it is at the root of your host: | |
ebc.com.br: # Used: http://example.com/robots.txt | |
ebc.com.br: # Ignored: http://example.com/site/robots.txt | |
ebc.com.br: # | |
ebc.com.br: # For more information about the robots.txt standard, see: | |
ebc.com.br: # http://www.robotstxt.org/robotstxt.html | |
ebc.com.br: # CSS, JS, Images | |
ebc.com.br: # Directories | |
ebc.com.br: # Files | |
ebc.com.br: # Paths (clean URLs) | |
ebc.com.br: # Paths (no clean URLs) | |
trysnow.com: # we use Shopify as our ecommerce platform | |
trysnow.com: # Google adsbot ignores robots.txt unless specifically named! | |
mavenlink.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
mavenlink.com: # | |
mavenlink.com: # To ban all spiders from the entire site uncomment the next two lines: | |
mavenlink.com: # User-Agent: * | |
mavenlink.com: # Disallow: / | |
elte.hu: # www.robotstxt.org/ | |
elte.hu: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
wordtracker.com: # Robots file | |
verizonmedia.com: # http://www.robotstxt.org/robotstxt.html | |
verizonmedia.com: # TODO: disallow till we go to production | |
aphrodite1994.com: # Blocking CMS and other Directories | |
aphrodite1994.com: # Paths (clean URLs) | |
aphrodite1994.com: #Stop crawling user account and checkout pages by search engine robot | |
aphrodite1994.com: #Blocking native catalog and search pages: | |
aphrodite1994.com: # Files | |
aphrodite1994.com: # Do not index pages that are sorted or filtered. | |
aphrodite1994.com: # Do not index session ID | |
aphrodite1994.com: #Disallow: /*? | |
aphrodite1994.com: # CVS, SVN directory and dump files | |
aphrodite1994.com: #Webmasters block pages with filters.. | |
aphrodite1994.com: #Host: aphrodite1994.com | |
aphrodite1994.com: #Sitemap: https://www.aphrodite1994.com/sitemaps/sitemap.xml | |
mercadolibre.com.ec: #siteId: MEC | |
mercadolibre.com.ec: #country: ecuador | |
mercadolibre.com.ec: ##Block - Referidos | |
mercadolibre.com.ec: ##Block - siteinfo urls | |
mercadolibre.com.ec: ##Block - Cart | |
mercadolibre.com.ec: ##Block Checkout | |
mercadolibre.com.ec: ##Block - User Logged | |
mercadolibre.com.ec: #Shipping selector | |
mercadolibre.com.ec: ##Block - last search | |
mercadolibre.com.ec: ## Block - Profile - By Id | |
mercadolibre.com.ec: ## Block - Profile - By Id and role (old version) | |
mercadolibre.com.ec: ## Block - Profile - Leg. Req. | |
mercadolibre.com.ec: ##Block - noindex | |
mercadolibre.com.ec: # Mercado-Puntos | |
mercadolibre.com.ec: # Viejo mundo | |
mercadolibre.com.ec: ##Block recommendations listing | |
91wii.com: # | |
91wii.com: # robots.txt for Discuz! X3 | |
91wii.com: # | |
cmd5.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
cmd5.com: #content{margin:0 0 0 2%;position:relative;} | |
pptok.com: # | |
pptok.com: # robots.txt for EmpireCMS | |
pptok.com: # | |
splice.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
splice.com: # | |
splice.com: # To ban all spiders from the entire site uncomment the next two lines: | |
splice.com: # User-agent: * | |
splice.com: # Disallow: / | |
yummly.com: # | |
yummly.com: # Misbehaving bot | |
yummly.com: # | |
longdo.com: # | |
longdo.com: # robots.txt | |
longdo.com: # | |
longdo.com: # This file is to prevent the crawling and indexing of certain parts | |
longdo.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
longdo.com: # and Google. By telling these "robots" where not to go on your site, | |
longdo.com: # you save bandwidth and server resources. | |
longdo.com: # | |
longdo.com: # This file will be ignored unless it is at the root of your host: | |
longdo.com: # Used: http://example.com/robots.txt | |
longdo.com: # Ignored: http://example.com/site/robots.txt | |
longdo.com: # | |
longdo.com: # For more information about the robots.txt standard, see: | |
longdo.com: # http://www.robotstxt.org/robotstxt.html | |
longdo.com: # | |
longdo.com: # For syntax checking, see: | |
longdo.com: # http://www.frobee.com/robots-txt-check | |
longdo.com: # Directories | |
longdo.com: # Files | |
longdo.com: # Paths (clean URLs) | |
longdo.com: # Paths (no clean URLs) | |
neighborwebsj.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
bark.com: # | |
bark.com: # Bark is built by a small team based in London. | |
bark.com: # We're always looking for clever people. | |
bark.com: # If you'd like to help us out email team@bark.com | |
bark.com: # WOOF! | |
bark.com: # ."";._ _.---._ _.-"". | |
bark.com: # /_.'_ '-' /`-` \_ \ | |
bark.com: # .' / `\ \ /` \ '. | |
bark.com: # .' / ; _ _ '-; \ ;'. | |
bark.com: # _.' ; /\ / \ \ \ ; '._;._ | |
bark.com: # .-'.--. | / | \0|0/ \ | '-. | |
bark.com: # / /` \ | / .' \ | .---. \ | |
bark.com: # | | | / /--' .-"""-. \ \/ \ | | |
bark.com: # \ \ / / / ( , , ) /\ \ | / | |
bark.com: # \ '----' .' | '-(_)-' | | '. / / | |
bark.com: # `'----'` | '. | `'----'` | |
bark.com: # \ `/ | |
bark.com: # '. , .' | |
bark.com: # `-.____.' '.____.-' | |
bark.com: # \ / | |
bark.com: # '-' | |
bark.com: # | |
espn.com.mx: # robots.txt for deportes | |
bradsdeals.com: # SEO/SEM Competitor Tool Bot Block | |
bradsdeals.com: # Bots that obey Robots.txt block | |
bradsdeals.com: # Original robots disallows | |
bradsdeals.com: # Special Merchant Requests | |
bradsdeals.com: #Sitemap | |
technipages.com: # Google Image | |
technipages.com: # Google AdSense | |
technipages.com: # global | |
ideal.es: ## Sitemaps ## | |
ideal.es: ## User Agents ## | |
ideal.es: #redi14 # | |
ideal.es: #mob # | |
ideal.es: #temp # | |
zbporn.com: # Block AhrefsBot | |
reference.com: ## Reference robots.txt | |
madarsho.com: # | |
madarsho.com: # robots.txt | |
madarsho.com: # | |
madarsho.com: # This file is to prevent the crawling and indexing of certain parts | |
madarsho.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
madarsho.com: # and Google. By telling these "robots" where not to go on your site, | |
madarsho.com: # you save bandwidth and server resources. | |
madarsho.com: # | |
madarsho.com: # This file will be ignored unless it is at the root of your host: | |
madarsho.com: # Used: http://example.com/robots.txt | |
madarsho.com: # Ignored: http://example.com/site/robots.txt | |
madarsho.com: # | |
madarsho.com: # For more information about the robots.txt standard, see: | |
madarsho.com: # http://www.robotstxt.org/robotstxt.html | |
madarsho.com: # CSS, JS, Images | |
madarsho.com: # Directories | |
madarsho.com: # Files | |
madarsho.com: # Paths (clean URLs) | |
madarsho.com: # Paths (no clean URLs) | |
thetoc.gr: #Disallow: /Api/* | |
thetoc.gr: #Disallow: /api/* | |
thetoc.gr: #Disallow: /Search* | |
thetoc.gr: #Disallow: /search* | |
flvs.net: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
flvs.net: #content{margin:0 0 0 2%;position:relative;} | |
caracoltv.com: # Bloqueo de URL | |
caracoltv.com: # Agentes nocivos conocidos | |
hover.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
hover.com: # | |
hover.com: # To ban all spiders from the entire site uncomment the next two lines: | |
hover.com: # User-Agent: * | |
hover.com: # Disallow: / | |
hover.com: # Block search engines from the order thankyou pages | |
hover.com: # Block search engines from the welcome landers | |
ciudad.com.ar: # | |
ciudad.com.ar: # robots.txt | |
ciudad.com.ar: # | |
ciudad.com.ar: # This file is to prevent the crawling and indexing of certain parts | |
ciudad.com.ar: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ciudad.com.ar: # and Google. By telling these "robots" where not to go on your site, | |
ciudad.com.ar: # you save bandwidth and server resources. | |
ciudad.com.ar: # | |
ciudad.com.ar: # This file will be ignored unless it is at the root of your host: | |
ciudad.com.ar: # Used: http://example.com/robots.txt | |
ciudad.com.ar: # Ignored: http://example.com/site/robots.txt | |
ciudad.com.ar: # | |
ciudad.com.ar: # For more information about the robots.txt standard, see: | |
ciudad.com.ar: # http://www.robotstxt.org/robotstxt.html | |
ciudad.com.ar: # CSS, JS, Images | |
ciudad.com.ar: # Directories | |
ciudad.com.ar: # Files | |
ciudad.com.ar: # Paths (clean URLs) | |
ciudad.com.ar: # Paths (no clean URLs) | |
prlog.org: # Please keep 10 seconds between requests | |
mercurynews.com: # Sitemap archive | |
0818tuan.com: # | |
0818tuan.com: # robots.txt for EmpireCMS | |
0818tuan.com: # | |
mclabels.com: # we use Shopify as our ecommerce platform | |
mclabels.com: # Google adsbot ignores robots.txt unless specifically named! | |
coinpot.co: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
coinpot.co: #content{margin:0 0 0 2%;position:relative;} | |
xatakandroid.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
xatakandroid.com: # unless they're feeding search engines. | |
xatakandroid.com: # Some bots are known to be trouble, particularly those designed to copy | |
xatakandroid.com: # entire sites. Please obey robots.txt. | |
xatakandroid.com: # Sorry, wget in its recursive mode is a frequent problem. | |
xatakandroid.com: # Please read the man page and use it properly; there is a | |
xatakandroid.com: # --wait option you can use to set the delay between hits, | |
xatakandroid.com: # for instance. | |
xatakandroid.com: # | |
xatakandroid.com: # | |
xatakandroid.com: # The 'grub' distributed client has been *very* poorly behaved. | |
xatakandroid.com: # | |
xatakandroid.com: # | |
xatakandroid.com: # Doesn't follow robots.txt anyway, but... | |
xatakandroid.com: # | |
xatakandroid.com: # | |
xatakandroid.com: # Hits many times per second, not acceptable | |
xatakandroid.com: # http://www.nameprotect.com/botinfo.html | |
xatakandroid.com: # A capture bot, downloads gazillions of pages with no public benefit | |
xatakandroid.com: # http://www.webreaper.net/ | |
itsmyurls.com: # Rule 1 | |
itsmyurls.com: # Rule 2 | |
mercatoday.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
ktmdainik.com: #Google Search Engine Robot | |
bullmarketbrokers.com: # Sitemap | |
public.gr: # allow crawlers plus delaying each successive spider request | |
public.gr: #Crawl-delay: 10 | |
public.gr: #Disallow: /assets/ | |
public.gr: # Disallow all ../?parentCategoryID=cat... | |
public.gr: #Disallow: *parentCategoryId* | |
public.gr: # Sitemap files | |
public.gr: #Specific URLs | |
public.gr: # Blocking bad link checker robots | |
gifi.fr: # https://www.robotstxt.org/robotstxt.html | |
instyle.com: # Sitemaps | |
instyle.com: # CMS FE | |
instyle.com: #OCMS | |
instyle.com: #content | |
instyle.com: # CMS FE | |
instyle.com: #OCMS | |
instyle.com: #content | |
hardees.qa: # Search Pages | |
hardees.qa: # Cart Pages | |
hardees.qa: # User Pages | |
hardees.qa: # Other Pages | |
hardees.qa: # Misc Pages | |
jisho.org: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
jisho.org: # | |
jisho.org: # To ban all spiders from the entire site uncomment the next two lines: | |
jisho.org: #User-Agent: * | |
jisho.org: #Disallow: / | |
synonymo.fr: #content { | |
phonearena.com: # www.robotstxt.org/ | |
phonearena.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
phonearena.com: # | |
buzzorange.com: # www.robotstxt.org/ | |
buzzorange.com: # Allow crawling of all content | |
buzzorange.com: # Directories | |
buzzorange.com: # Files | |
moovitapp.com: # Block crawling of the web trip planner | |
youtubekids.com: # robots.txt file for YouTube Kids | |
cineca.it: # | |
cineca.it: # robots.txt | |
cineca.it: # | |
cineca.it: # This file is to prevent the crawling and indexing of certain parts | |
cineca.it: # of your site by web crawlers and spiders run by sites like Yahoo! | |
cineca.it: # and Google. By telling these "robots" where not to go on your site, | |
cineca.it: # you save bandwidth and server resources. | |
cineca.it: # | |
cineca.it: # This file will be ignored unless it is at the root of your host: | |
cineca.it: # Used: http://example.com/robots.txt | |
cineca.it: # Ignored: http://example.com/site/robots.txt | |
cineca.it: # | |
cineca.it: # For more information about the robots.txt standard, see: | |
cineca.it: # http://www.robotstxt.org/robotstxt.html | |
cineca.it: # CSS, JS, Images | |
cineca.it: # Directories | |
cineca.it: # Files | |
cineca.it: # Paths (clean URLs) | |
cineca.it: # Paths (no clean URLs) | |
qiita.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
ubagroup.com: # robots.txt for https://www.ubagroup.com/ | |
webcasts.com: #content:before{content:"768";position:absolute;overflow:hidden;opacity:0;visibility:hidden;}@media (max-width:768px){.single.ast-separate-container .ast-author-meta{padding:1.5em 2.14em;}.single .ast-author-meta .post-author-avatar{margin-bottom:1em;}.ast-separate-container .ast-grid-2 .ast-article-post,.ast-separate-container .ast-grid-3 .ast-article-post,.ast-separate-container .ast-grid-4 .ast-article-post{width:100%;}.blog-layout-1 .post-content,.blog-layout-1 .ast-blog-featured-section{float:none;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on{margin-top:0;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:1em;}.ast-separate-container .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content{margin-top:-1.5em;}.ast-separate-container .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content{margin-left:-2.14em;margin-right:-2.14em;}.ast-separate-container .ast-article-single.remove-featured-img-padding .single-layout-1 .entry-header .post-thumb-img-content:first-child{margin-top:-1.5em;}.ast-separate-container .ast-article-single.remove-featured-img-padding .single-layout-1 .post-thumb-img-content{margin-left:-2.14em;margin-right:-2.14em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-left:-1.5em;margin-right:-1.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-left:-0.5em;margin-right:-0.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on{margin-top:0;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:1em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content{margin-top:-1.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content{margin-left:-1.5em;margin-right:-1.5em;}.blog-layout-2{display:flex;flex-direction:column-reverse;}.ast-separate-container .blog-layout-3,.ast-separate-container .blog-layout-1{display:block;}.ast-plain-container .ast-grid-2 .ast-article-post,.ast-plain-container .ast-grid-3 .ast-article-post,.ast-plain-container .ast-grid-4 .ast-article-post,.ast-page-builder-template .ast-grid-2 .ast-article-post,.ast-page-builder-template .ast-grid-3 .ast-article-post,.ast-page-builder-template .ast-grid-4 .ast-article-post{width:100%;}}@media (max-width:768px){.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-top:0;margin-left:-2.14em;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-top:0;margin-left:-1.14em;}}@media (min-width:769px){.single .ast-author-meta .ast-author-details{display:flex;}.ast-separate-container.ast-blog-grid-2 .ast-archive-description,.ast-separate-container.ast-blog-grid-3 .ast-archive-description,.ast-separate-container.ast-blog-grid-4 .ast-archive-description{margin-bottom:1.33333em;}.blog-layout-2.ast-no-thumb .post-content,.blog-layout-3.ast-no-thumb .post-content{width:calc(100% - 5.714285714em);}.blog-layout-2.ast-no-thumb.ast-no-date-box .post-content,.blog-layout-3.ast-no-thumb.ast-no-date-box .post-content{width:100%;}.ast-separate-container .ast-grid-2 .ast-article-post.ast-separate-posts,.ast-separate-container .ast-grid-3 .ast-article-post.ast-separate-posts,.ast-separate-container .ast-grid-4 .ast-article-post.ast-separate-posts{border-bottom:0;}.ast-separate-container .ast-grid-2 > .site-main > .ast-row,.ast-separate-container .ast-grid-3 > .site-main > .ast-row,.ast-separate-container .ast-grid-4 > .site-main > .ast-row{margin-left:-1em;margin-right:-1em;display:flex;flex-flow:row wrap;align-items:stretch;}.ast-separate-container .ast-grid-2 > .site-main > .ast-row:before,.ast-separate-container .ast-grid-2 > .site-main > .ast-row:after,.ast-separate-container .ast-grid-3 > .site-main > .ast-row:before,.ast-separate-container .ast-grid-3 > .site-main > .ast-row:after,.ast-separate-container .ast-grid-4 > .site-main > .ast-row:before,.ast-separate-container .ast-grid-4 > .site-main > .ast-row:after{flex-basis:0;width:0;}.ast-separate-container .ast-grid-2 .ast-article-post,.ast-separate-container .ast-grid-3 .ast-article-post,.ast-separate-container .ast-grid-4 .ast-article-post{display:flex;padding:0;}.ast-plain-container .ast-grid-2 > .site-main > .ast-row,.ast-plain-container .ast-grid-3 > .site-main > .ast-row,.ast-plain-container .ast-grid-4 > .site-main > .ast-row,.ast-page-builder-template .ast-grid-2 > .site-main > .ast-row,.ast-page-builder-template .ast-grid-3 > .site-main > .ast-row,.ast-page-builder-template .ast-grid-4 > .site-main > .ast-row{margin-left:-1em;margin-right:-1em;display:flex;flex-flow:row wrap;align-items:stretch;}.ast-plain-container .ast-grid-2 > .site-main > .ast-row:before,.ast-plain-container .ast-grid-2 > .site-main > .ast-row:after,.ast-plain-container .ast-grid-3 > .site-main > .ast-row:before,.ast-plain-container .ast-grid-3 > .site-main > .ast-row:after,.ast-plain-container .ast-grid-4 > .site-main > .ast-row:before,.ast-plain-container .ast-grid-4 > .site-main > .ast-row:after,.ast-page-builder-template .ast-grid-2 > .site-main > .ast-row:before,.ast-page-builder-template .ast-grid-2 > .site-main > .ast-row:after,.ast-page-builder-template .ast-grid-3 > .site-main > .ast-row:before,.ast-page-builder-template .ast-grid-3 > .site-main > .ast-row:after,.ast-page-builder-template .ast-grid-4 > .site-main > .ast-row:before,.ast-page-builder-template .ast-grid-4 > .site-main > .ast-row:after{flex-basis:0;width:0;}.ast-plain-container .ast-grid-2 .ast-article-post,.ast-plain-container .ast-grid-3 .ast-article-post,.ast-plain-container .ast-grid-4 .ast-article-post,.ast-page-builder-template .ast-grid-2 .ast-article-post,.ast-page-builder-template .ast-grid-3 .ast-article-post,.ast-page-builder-template .ast-grid-4 .ast-article-post{display:flex;}.ast-plain-container .ast-grid-2 .ast-article-post:last-child,.ast-plain-container .ast-grid-3 .ast-article-post:last-child,.ast-plain-container .ast-grid-4 .ast-article-post:last-child,.ast-page-builder-template .ast-grid-2 .ast-article-post:last-child,.ast-page-builder-template .ast-grid-3 .ast-article-post:last-child,.ast-page-builder-template .ast-grid-4 .ast-article-post:last-child{margin-bottom:2.5em;}}@media (min-width:769px){.single .post-author-avatar,.single .post-author-bio{float:left;clear:right;}.single .ast-author-meta .post-author-avatar{margin-right:1.33333em;}.single .ast-author-meta .about-author-title-wrapper,.single .ast-author-meta .post-author-bio{text-align:left;}.blog-layout-2 .post-content{padding-right:2em;}.blog-layout-2.ast-no-date-box.ast-no-thumb .post-content{padding-right:0;}.blog-layout-3 .post-content{padding-left:2em;}.blog-layout-3.ast-no-date-box.ast-no-thumb .post-content{padding-left:0;}.ast-separate-container .ast-grid-2 .ast-article-post.ast-separate-posts:nth-child(2n+0),.ast-separate-container .ast-grid-2 .ast-article-post.ast-separate-posts:nth-child(2n+1),.ast-separate-container .ast-grid-3 .ast-article-post.ast-separate-posts:nth-child(2n+0),.ast-separate-container .ast-grid-3 .ast-article-post.ast-separate-posts:nth-child(2n+1),.ast-separate-container .ast-grid-4 .ast-article-post.ast-separate-posts:nth-child(2n+0),.ast-separate-container .ast-grid-4 .ast-article-post.ast-separate-posts:nth-child(2n+1){padding:0 1em 0;}}@media (max-width:544px){.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:0.5em;}.ast-separate-container .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container .ast-article-single.remove-featured-img-padding .single-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-left:-1em;margin-right:-1em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-left:-0.5em;margin-right:-0.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:0.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content{margin-top:-1.33333em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content{margin-left:-1em;margin-right:-1em;}.ast-separate-container .ast-grid-2 .ast-article-post .blog-layout-1,.ast-separate-container .ast-grid-2 .ast-article-post .blog-layout-2,.ast-separate-container .ast-grid-2 .ast-article-post .blog-layout-3{padding:1.33333em 1em;}.ast-separate-container .ast-grid-3 .ast-article-post .blog-layout-1,.ast-separate-container .ast-grid-4 .ast-article-post .blog-layout-1{padding:1.33333em 1em;}.single.ast-separate-container .ast-author-meta{padding:1.5em 1em;}}@media (max-width:544px){.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-left:-1em;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-left:-0.5em;}}.ast-article-post .ast-date-meta .posted-on,.ast-article-post .ast-date-meta .posted-on *{background:#00609c;color:#ffffff;}.ast-article-post .ast-date-meta .posted-on .date-month,.ast-article-post .ast-date-meta .posted-on .date-year{color:#ffffff;}.ast-load-more:hover{color:#ffffff;border-color:#00609c;background-color:#00609c;}.ast-loader > div{background-color:#00609c;}.ast-small-footer{color:#bbbbbd;}.ast-small-footer a{color:#77777b;}.ast-small-footer a:hover{color:#00609c;}.ast-separate-container .blog-layout-1,.ast-separate-container .blog-layout-2,.ast-separate-container .blog-layout-3{background-color:transparent;background-image:none;}.ast-separate-container .ast-article-post{background-color:#ffffff;;}.ast-separate-container .ast-article-single,.ast-separate-container .comment-respond,.ast-separate-container .ast-comment-list li,.ast-separate-container .ast-woocommerce-container,.ast-separate-container .error-404,.ast-separate-container .no-results,.single.ast-separate-container .ast-author-meta,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container.ast-two-container #secondary .widget,.ast-separate-container .comments-count-wrapper,.ast-box-layout.ast-plain-container .site-content,.ast-padded-layout.ast-plain-container .site-content{background-color:#ffffff;;}.footer-adv .widget-title,.footer-adv .widget-title a{color:#434758;}.footer-adv{color:#434758;}.footer-adv a{color:#434758;}.footer-adv .tagcloud a:hover,.footer-adv .tagcloud a.current-item{border-color:#434758;background-color:#434758;}.footer-adv a:hover,.footer-adv .no-widget-text a:hover,.footer-adv a:focus,.footer-adv .no-widget-text a:focus{color:#00609c;}.footer-adv .calendar_wrap #today,.footer-adv a:hover + .post-count{background-color:#434758;}.footer-adv .widget-title,.footer-adv .widget-title a.rsswidget,.ast-no-widget-row .widget-title{font-family:'Open Sans',sans-serif;text-transform:inherit;}.footer-adv .widget > *:not(.widget-title){font-family:'Open Sans',sans-serif;}.footer-adv .tagcloud a:hover,.footer-adv .tagcloud a.current-item{color:#ffffff;}.footer-adv .calendar_wrap #today{color:#ffffff;}@media (min-width:769px){.ast-container{max-width:1240px;}}@media (min-width:993px){.ast-container{max-width:1240px;}}@media (min-width:1201px){.ast-container{max-width:1240px;}}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.single.ast-separate-container .ast-author-details,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-top:0px;padding-bottom:0px;}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .comments-count-wrapper,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container .related-posts-title-wrapper,.single.ast-separate-container .ast-author-details,.single.ast-separate-container .about-author-title-wrapper,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-right:0px;padding-left:0px;}.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary,.ast-plain-container #primary{margin-top:0px;margin-bottom:0px;}.ast-left-sidebar #primary,.ast-right-sidebar #primary,.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary{padding-left:0px;padding-right:0px;}.ast-no-sidebar.ast-separate-container .entry-content .alignfull{margin-right:-0px;margin-left:-0px;}@media (max-width:768px){.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.single.ast-separate-container .ast-author-details,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-top:1.5em;padding-bottom:1.5em;}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .comments-count-wrapper,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container .related-posts-title-wrapper,.single.ast-separate-container .ast-author-details,.single.ast-separate-container .about-author-title-wrapper,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-right:2.14em;padding-left:2.14em;}.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary,.ast-plain-container #primary{margin-top:1.5em;margin-bottom:1.5em;}.ast-left-sidebar #primary,.ast-right-sidebar #primary,.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary{padding-left:0em;padding-right:0em;}.ast-no-sidebar.ast-separate-container .entry-content .alignfull{margin-right:-2.14em;margin-left:-2.14em;}}@media (max-width:544px){.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.single.ast-separate-container .ast-author-details,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-top:1.5em;padding-bottom:1.5em;}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .comments-count-wrapper,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container .related-posts-title-wrapper,.single.ast-separate-container .ast-author-details,.single.ast-separate-container .about-author-title-wrapper,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-right:1em;padding-left:1em;}.ast-no-sidebar.ast-separate-container .entry-content .alignfull{margin-right:-1em;margin-left:-1em;}}@media (max-width:768px){.ast-header-break-point .main-header-bar .main-header-bar-navigation .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:calc( 20px - 0.907em );}}@media (max-width:544px){.ast-header-break-point .header-main-layout-2 .site-branding,.ast-header-break-point .ast-mobile-header-stack .ast-mobile-menu-buttons{padding-bottom:0;}}@media (max-width:768px){.ast-separate-container.ast-two-container #secondary .widget,.ast-separate-container #secondary .widget{margin-bottom:1.5em;}}.ast-separate-container #primary{padding-top:0;}@media (max-width:768px){.ast-separate-container #primary{padding-top:0;}}.ast-separate-container #primary{padding-bottom:0;}@media (max-width:768px){.ast-separate-container #primary{padding-bottom:0;}}.ast-default-menu-enable.ast-main-header-nav-open.ast-header-break-point .main-header-bar,.ast-main-header-nav-open .main-header-bar{padding-bottom:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-below-menu-enable.ast-header-break-point .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-below-menu-enable.ast-header-break-point .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}@media (max-width:768px){.main-header-bar,.ast-header-break-point .main-header-bar,.ast-header-break-point .header-main-layout-2 .main-header-bar{padding-top:1.5em;padding-bottom:1.5em;}.ast-default-menu-enable.ast-main-header-nav-open.ast-header-break-point .main-header-bar,.ast-main-header-nav-open .main-header-bar{padding-bottom:0;}.main-navigation ul .menu-item .menu-link,.ast-header-break-point .main-navigation ul .menu-item .menu-link,.ast-header-break-point li.ast-masthead-custom-menu-items,li.ast-masthead-custom-menu-items{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .menu-item-has-children > .ast-menu-toggle{top:0px;}.ast-desktop .main-navigation .ast-mm-template-content,.ast-desktop .main-navigation .ast-mm-custom-content,.ast-desktop .main-navigation .ast-mm-custom-text-content,.main-navigation .sub-menu .menu-item .menu-link,.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-link{padding-top:0px;padding-right:0;padding-bottom:0px;padding-left:30px;}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-link{padding-left:calc( 30px + 10px );}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 30px + 20px );}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 30px + 30px );}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 30px + 40px );}.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{margin-right:20px;right:0;}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{right:calc( 20px - 0.907em );}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .menu-item-has-children .sub-menu .ast-menu-toggle{top:0px;}.ast-fullscreen-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-navigation .sub-menu .menu-item .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-link{padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-below-header,.ast-header-break-point .ast-below-header{padding-top:1em;padding-bottom:1em;}.ast-below-header-menu .menu-link,.below-header-nav-padding-support .below-header-section-1 .below-header-menu > .menu-item > .menu-link,.below-header-nav-padding-support .below-header-section-2 .below-header-menu > .menu-item > .menu-link,.ast-header-break-point .ast-below-header-actual-nav > .ast-below-header-menu > .menu-item > .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-desktop .ast-below-header-menu .ast-mm-template-content,.ast-desktop .ast-below-header-menu .ast-mm-custom-text-content,.ast-below-header-menu .sub-menu .menu-link,.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-link{padding-left:calc( 20px + 10px );}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 20px );}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 30px );}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 40px );}.ast-default-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-default-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle,.ast-flyout-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-flyout-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-default-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle,.ast-flyout-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}.ast-above-header{padding-top:0px;padding-bottom:0px;}.ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu > .menu-item > .menu-link,.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu > .menu-item:first-child > .menu-link,.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu > .menu-item:last-child > .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-header-break-point .ast-above-header-navigation > ul > .menu-item-has-children > .ast-menu-toggle{top:0px;}.ast-desktop .ast-above-header-navigation .ast-mm-custom-text-content,.ast-desktop .ast-above-header-navigation .ast-mm-template-content,.ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item .sub-menu .menu-link,.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-link,.ast-above-header-enabled .ast-above-header-menu > .menu-item:first-child .sub-menu .menu-item .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-link{padding-left:calc( 20px + 10px );}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-item .menu-link{padding-left:calc( 20px + 20px );}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 30px );}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 40px );}.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-default-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle,.ast-flyout-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{margin-right:20px;right:0;}.ast-footer-overlay{padding-top:2em;padding-bottom:2em;}.ast-small-footer .nav-menu a,.footer-sml-layout-2 .ast-small-footer-section-1 .menu-item a,.footer-sml-layout-2 .ast-small-footer-section-2 .menu-item a{padding-top:0em;padding-right:.5em;padding-bottom:0em;padding-left:.5em;}}@media (max-width:544px){.main-header-bar,.ast-header-break-point .main-header-bar,.ast-header-break-point .header-main-layout-2 .main-header-bar,.ast-header-break-point .ast-mobile-header-stack .main-header-bar{padding-top:1em;padding-bottom:1em;}.ast-default-menu-enable.ast-main-header-nav-open.ast-header-break-point .main-header-bar,.ast-main-header-nav-open .main-header-bar{padding-bottom:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-desktop .main-navigation .ast-mm-template-content,.ast-desktop .main-navigation .ast-mm-custom-content,.ast-desktop .main-navigation .ast-mm-custom-text-content,.main-navigation .sub-menu .menu-item .menu-link,.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-link{padding-right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}.ast-above-header{padding-top:0.5em;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}}@media (max-width:544px){.ast-header-break-point .header-main-layout-2 .site-branding,.ast-header-break-point .ast-mobile-header-stack .ast-mobile-menu-buttons{padding-bottom:0;}}.site-title,.site-title a{font-family:'Roboto Slab',serif;text-transform:inherit;}.site-header .site-description{text-transform:inherit;}.secondary .widget-title{font-family:'Roboto Slab',serif;text-transform:inherit;}.secondary .widget > *:not(.widget-title){font-family:'Open Sans',sans-serif;}.ast-single-post .entry-title,.page-title{font-family:'Roboto Slab',serif;text-transform:inherit;}.ast-archive-description .ast-archive-title{font-family:'Roboto Slab',serif;text-transform:inherit;}.blog .entry-title,.blog .entry-title a,.archive .entry-title,.archive .entry-title a,.search .entry-title,.search .entry-title a {font-family:'Roboto Slab',serif;text-transform:inherit;}h1,.entry-content h1{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h2,.entry-content h2{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h3,.entry-content h3{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h4,.entry-content h4{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h5,.entry-content h5{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h6,.entry-content h6{font-family:'Roboto Slab',serif;text-transform:inherit;}.ast-desktop .ast-mega-menu-enabled.ast-below-header-menu .menu-item .menu-link:hover,.ast-desktop .ast-mega-menu-enabled.ast-below-header-menu .menu-item .menu-link:focus{background-color:#575757;}.ast-desktop .ast-below-header-navigation .astra-megamenu-li .menu-item .menu-link:hover,.ast-desktop .ast-below-header-navigation .astra-megamenu-li .menu-item .menu-link:focus{color:#ffffff;}.ast-above-header-menu .astra-full-megamenu-wrapper{box-shadow:0 5px 20px rgba(0,0,0,0.06);}.ast-above-header-menu .astra-full-megamenu-wrapper .sub-menu,.ast-above-header-menu .astra-megamenu .sub-menu{box-shadow:none;}.ast-below-header-menu.ast-mega-menu-enabled.submenu-with-border .astra-full-megamenu-wrapper{border-color:#ffffff;}.ast-below-header-menu .astra-full-megamenu-wrapper{box-shadow:0 5px 20px rgba(0,0,0,0.06);}.ast-below-header-menu .astra-full-megamenu-wrapper .sub-menu,.ast-below-header-menu .astra-megamenu .sub-menu{box-shadow:none;}.ast-desktop .main-header-menu.submenu-with-border .astra-megamenu,.ast-desktop .main-header-menu.ast-mega-menu-enabled.submenu-with-border .astra-full-megamenu-wrapper{border-top-width:2px;border-left-width:0px;border-right-width:0px;border-bottom-width:0px;border-style:solid;}.ast-desktop .ast-mega-menu-enabled.main-header-menu .menu-item-heading > .menu-link{font-weight:700;font-size:1.1em;}.ast-desktop .ast-above-header .submenu-with-border .astra-full-megamenu-wrapper{border-top-width:2px;border-left-width:0px;border-right-width:0px;border-bottom-width:0px;border-style:solid;}.ast-desktop .ast-below-header .submenu-with-border .astra-full-megamenu-wrapper{border-top-width:2px;border-left-width:0px;border-right-width:0px;border-bottom-width:0px;border-style:solid;}.ast-advanced-headers-different-logo .advanced-header-logo,.ast-header-break-point .ast-has-mobile-header-logo .advanced-header-logo{display:inline-block;}.ast-header-break-point.ast-advanced-headers-different-logo .ast-has-mobile-header-logo .ast-mobile-header-logo{display:none;}.ast-advanced-headers-layout{width:100%;}.ast-header-break-point .ast-advanced-headers-parallax{background-attachment:fixed;} | |
elearning.edu.sa: # | |
elearning.edu.sa: # robots.txt | |
elearning.edu.sa: # | |
elearning.edu.sa: # This file is to prevent the crawling and indexing of certain parts | |
elearning.edu.sa: # of your site by web crawlers and spiders run by sites like Yahoo! | |
elearning.edu.sa: # and Google. By telling these "robots" where not to go on your site, | |
elearning.edu.sa: # you save bandwidth and server resources. | |
elearning.edu.sa: # | |
elearning.edu.sa: # This file will be ignored unless it is at the root of your host: | |
elearning.edu.sa: # Used: http://example.com/robots.txt | |
elearning.edu.sa: # Ignored: http://example.com/site/robots.txt | |
elearning.edu.sa: # | |
elearning.edu.sa: # For more information about the robots.txt standard, see: | |
elearning.edu.sa: # http://www.robotstxt.org/robotstxt.html | |
elearning.edu.sa: # CSS, JS, Images | |
elearning.edu.sa: # Directories | |
elearning.edu.sa: # Files | |
elearning.edu.sa: # Paths (clean URLs) | |
elearning.edu.sa: # Paths (no clean URLs) | |
sensibull.com: # www.robotstxt.org/ | |
sensibull.com: # Allow crawling of all content | |
finder.com.au: # Prevent crawling searches. | |
finder.com.au: # https://finder.atlassian.net/browse/CWS-452 | |
finder.com.au: # Prevent crawling additional searches. | |
finder.com.au: # https://finder.atlassian.net/browse/CWS-497 | |
finder.com.au: # Allow Google AdsBot to crawl anything | |
finder.com.au: # Allow Twitterbot to crawl anything | |
finder.com.au: # https://finder.atlassian.net/browse/OPS-915 | |
finder.com.au: # Don't crawl twitter links with encoded URLs | |
finder.com.au: # https://finder.atlassian.net/browse/FD-5667 | |
finder.com.au: # NBN Tracker | |
finder.com.au: # https://finder.atlassian.net/browse/FD-6467 | |
finder.com.au: # https://finder.atlassian.net/browse/GXUSR-37 | |
finder.com.au: # Crawl image versions | |
finder.com.au: # https://finder.atlassian.net/browse/FD-7310 | |
finder.com.au: # https://finder.atlassian.net/browse/FD-9630 | |
finder.com.au: # https://finder.atlassian.net/browse/GXFO-34 | |
finder.com.au: # https://finder.atlassian.net/browse/OPS-498 | |
finder.com.au: # https://finder.atlassian.net/browse/PROJ-174 | |
finder.com.au: # https://www.deepcrawl.com/bot/ | |
finder.com.au: # Block pages from appearing in Google News | |
finder.com.au: # Main sitemap | |
javfree.me: # XML Sitemap & Google News version 5.0.6 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
iubenda.com: # Url space used for testing | |
iubenda.com: # Disallow PP generator | |
iubenda.com: # Disallow misc folders | |
iubenda.com: # Following urls cause exception if crawled by bots | |
iubenda.com: # CS configurator | |
iubenda.com: # Google Image | |
iubenda.com: # Google AdSense | |
iubenda.com: # Sitemap: https://www.iubenda.com/sitemap_index.xml.gz | |
iubenda.com: ## Various exclusions | |
iubenda.com: # Trovacigusto | |
iubenda.com: # BravoReisen | |
iubenda.com: # U4PET | |
iubenda.com: # IDSCAN | |
iubenda.com: # www.omc2diesel.it | |
crnobelo.com: # If the Joomla site is installed within a folder such as at | |
crnobelo.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be | |
crnobelo.com: # moved to the site root at e.g. www.example.com/robots.txt | |
crnobelo.com: # AND the joomla folder name MUST be prefixed to the disallowed | |
crnobelo.com: # path, e.g. the Disallow rule for the /administrator/ folder | |
crnobelo.com: # MUST be changed to read Disallow: /joomla/administrator/ | |
crnobelo.com: # | |
crnobelo.com: # For more information about the robots.txt standard, see: | |
crnobelo.com: # http://www.robotstxt.org/orig.html | |
crnobelo.com: # | |
crnobelo.com: # For syntax checking, see: | |
crnobelo.com: # http://tool.motoricerca.info/robots-checker.phtml | |
estadao.com.br: # Support directories | |
estadao.com.br: # Sitemaps | |
estadao.com.br: #User-agent: Googlebot-News | |
estadao.com.br: #Disallow: / | |
delhigovt.nic.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
delhigovt.nic.in: #content{margin:0 0 0 2%;position:relative;} | |
calpoly.edu: # | |
calpoly.edu: # robots.txt | |
calpoly.edu: # | |
calpoly.edu: # This file is to prevent the crawling and indexing of certain parts | |
calpoly.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
calpoly.edu: # and Google. By telling these "robots" where not to go on your site, | |
calpoly.edu: # you save bandwidth and server resources. | |
calpoly.edu: # | |
calpoly.edu: # This file will be ignored unless it is at the root of your host: | |
calpoly.edu: # Used: http://example.com/robots.txt | |
calpoly.edu: # Ignored: http://example.com/site/robots.txt | |
calpoly.edu: # | |
calpoly.edu: # For more information about the robots.txt standard, see: | |
calpoly.edu: # http://www.robotstxt.org/robotstxt.html | |
calpoly.edu: # CSS, JS, Images | |
calpoly.edu: # Directories | |
calpoly.edu: # Files | |
calpoly.edu: # Paths (clean URLs) | |
calpoly.edu: # Paths (no clean URLs) | |
adam4adam.com: # www.robotstxt.org/ | |
adam4adam.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
porn300.com: # www.robotstxt.org/ | |
porn300.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
etnews.com: # Grapeshot º≥¡§ # | |
efka.gov.gr: # | |
efka.gov.gr: # robots.txt | |
efka.gov.gr: # | |
efka.gov.gr: # This file is to prevent the crawling and indexing of certain parts | |
efka.gov.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
efka.gov.gr: # and Google. By telling these "robots" where not to go on your site, | |
efka.gov.gr: # you save bandwidth and server resources. | |
efka.gov.gr: # | |
efka.gov.gr: # This file will be ignored unless it is at the root of your host: | |
efka.gov.gr: # Used: http://example.com/robots.txt | |
efka.gov.gr: # Ignored: http://example.com/site/robots.txt | |
efka.gov.gr: # | |
efka.gov.gr: # For more information about the robots.txt standard, see: | |
efka.gov.gr: # http://www.robotstxt.org/robotstxt.html | |
efka.gov.gr: # CSS, JS, Images | |
efka.gov.gr: # Directories | |
efka.gov.gr: # Files | |
efka.gov.gr: # Paths (clean URLs) | |
efka.gov.gr: # Paths (no clean URLs) | |
suratmunicipal.gov.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
suratmunicipal.gov.in: #content{margin:0 0 0 2%;position:relative;} | |
managebuilding.com: #mktoForm_2388 .mktoButton:hover { | |
managebuilding.com: #mktoForm_2388 .mktoRadioList { | |
managebuilding.com: #mktoForm_2388 label#LblmarketingField3 { | |
managebuilding.com: #mktoForm_2388 label#LblmktoRadio_29854_0, #LblmktoRadio_29854_1 { | |
managebuilding.com: #mkto-form-wrapper #mktoForm_1295 .mktoRequiredField .mktoLabel:after { | |
managebuilding.com: #mktoForm_2031 #LblzoomEventDay { | |
managebuilding.com: #industry-report span.mktoButtonWrap.mktoInset { | |
managebuilding.com: #industry-report .mktoButton { | |
managebuilding.com: #mktoForm_2198 .mktoFormRow .mktoFormCol:nth-child(2) { | |
managebuilding.com: #mktoForm_2198 :-ms-input-placeholder { | |
managebuilding.com: #mc-embedded-subscribe-form .mktoLabel { | |
managebuilding.com: #ot-ccpa-banner { | |
managebuilding.com: #ot-ccpa-banner .ot-ccpa-icon { | |
managebuilding.com: #ot-ccpa-banner .ot-ccpa-icon img{ | |
bni.co.id: # Begin robots.txt file | |
bni.co.id: #/-----------------------------------------------\ | |
bni.co.id: #| In single portal/domain situations, uncomment the sitmap line and enter domain name | |
bni.co.id: #\-----------------------------------------------/ | |
bni.co.id: #Sitemap: http://www.DomainNamehere.com/sitemap.aspx | |
bni.co.id: # End of robots.txt file | |
ldlc.com: # Blocage section | |
ldlc.com: # Blocage agent | |
daniweb.com: # DaniWeb | |
daniweb.com: # DaniWeb Connect | |
daniweb.com: # Legacy | |
genbeta.com: # | |
genbeta.com: # robots.txt | |
genbeta.com: # | |
genbeta.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
genbeta.com: # unless they're feeding search engines. | |
genbeta.com: # Some bots are known to be trouble, particularly those designed to copy | |
genbeta.com: # entire sites. Please obey robots.txt. | |
genbeta.com: # Sorry, wget in its recursive mode is a frequent problem. | |
genbeta.com: # Please read the man page and use it properly; there is a | |
genbeta.com: # --wait option you can use to set the delay between hits, | |
genbeta.com: # for instance. | |
genbeta.com: # | |
genbeta.com: # | |
genbeta.com: # The 'grub' distributed client has been *very* poorly behaved. | |
genbeta.com: # | |
genbeta.com: # | |
genbeta.com: # Doesn't follow robots.txt anyway, but... | |
genbeta.com: # | |
genbeta.com: # | |
genbeta.com: # Hits many times per second, not acceptable | |
genbeta.com: # http://www.nameprotect.com/botinfo.html | |
genbeta.com: # A capture bot, downloads gazillions of pages with no public benefit | |
genbeta.com: # http://www.webreaper.net/ | |
rd.com: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/ | |
tutpub.com: # 1) this filename (robots.txt) must stay lowercase | |
tutpub.com: # 2) this file must be in the servers root directory | |
tutpub.com: # ex: http://www.mydomain.com/pliklisubfolder/ -- you must move the robots.txt from | |
tutpub.com: # /pliklisubfolder/ to the root folder for http://www.mydomain.com/ | |
tutpub.com: # you must then add your subfolder to each 'Disallow' below | |
tutpub.com: # ex: Disallow: /cache/ becomes Disallow: /pliklisubfolder/cache/ | |
u-bordeaux.fr: # Handled by rewrite rule, based on the domain. | |
mygov.in: # | |
mygov.in: # robots.txt | |
mygov.in: # | |
mygov.in: # This file is to prevent the crawling and indexing of certain parts | |
mygov.in: # of your site by web crawlers and spiders run by sites like Yahoo! | |
mygov.in: # and Google. By telling these "robots" where not to go on your site, | |
mygov.in: # you save bandwidth and server resources. | |
mygov.in: # | |
mygov.in: # Images | |
mygov.in: # Directories | |
mygov.in: # Directories without slash | |
mygov.in: # Files | |
mygov.in: # Files without slash | |
mygov.in: # Files | |
mygov.in: # Paths (no clean URLs) | |
mygov.in: # Paths (no clean URLs without slash) | |
cratejoy.com: # | |
cratejoy.com: # robots.txt | |
cratejoy.com: # | |
uwo.ca: # robots.txt for http://www.uwo.ca/ | |
uwo.ca: # | |
uwo.ca: # Inktomi's web robot will obey the first record in the robots.txt file with a User-Agent containing "UWO-InktomiSearch". | |
uwo.ca: # If there is no such record, It will obey the first entry with a User-Agent of "*". | |
uwo.ca: # Because nothing is disallowed, everything is allowed | |
uwo.ca: # specifies that no robots should visit | |
uwo.ca: # any URL starting with "/ccs/export/" | |
dailykos.com: #Disallow: / | |
dailykos.com: #Disallow: / | |
dailykos.com: # Alexa Archver, allow them | |
dailykos.com: # Internet Archives open source crawler | |
dailykos.com: # Has gone nuts on us before. | |
dailykos.com: # topsy.com's bot | |
thermofisher.com: # Added 8/20/2014 \/ | |
thermofisher.com: # compensate for subdirectories that do need to be blocked: discussions from 6/3/2014 | |
thermofisher.com: # all of this content get's 301 redirected to regional URL and search bots can't update if they are not followed | |
thermofisher.com: # Updated 10/5/2014/\ | |
thermofisher.com: # Added 8/20/2014 /\ | |
thermofisher.com: # Added 3/28/2015 \/ | |
thermofisher.com: # Added 3/28/2015 /\ | |
thermofisher.com: # requested by 7/28/2014 \/ | |
thermofisher.com: # requested by 7/28/2014 | |
thermofisher.com: #requested by 3/30/2015 \/ | |
thermofisher.com: #requested by 3/30/2015 /\ | |
outofthesandbox.com: # we use Shopify as our ecommerce platform | |
outofthesandbox.com: # Google adsbot ignores robots.txt unless specifically named! | |
stylecaster.com: # Sitemap archive | |
elcinema.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
elcinema.com: # | |
elcinema.com: # To ban all spiders from the entire site uncomment the next two lines: | |
commonsensemedia.org: # | |
commonsensemedia.org: # robots.txt | |
commonsensemedia.org: # | |
commonsensemedia.org: # This file is to prevent the crawling and indexing of certain parts | |
commonsensemedia.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
commonsensemedia.org: # and Google. By telling these "robots" where not to go on your site, | |
commonsensemedia.org: # you save bandwidth and server resources. | |
commonsensemedia.org: # | |
commonsensemedia.org: # This file will be ignored unless it is at the root of your host: | |
commonsensemedia.org: # Used: http://example.com/robots.txt | |
commonsensemedia.org: # Ignored: http://example.com/site/robots.txt | |
commonsensemedia.org: # | |
commonsensemedia.org: # For more information about the robots.txt standard, see: | |
commonsensemedia.org: # http://www.robotstxt.org/robotstxt.html | |
commonsensemedia.org: # | |
commonsensemedia.org: # For syntax checking, see: | |
commonsensemedia.org: # http://www.frobee.com/robots-txt-check | |
commonsensemedia.org: # CSS, JS, Images | |
commonsensemedia.org: # Directories | |
commonsensemedia.org: # Files | |
commonsensemedia.org: # Paths (clean URLs) | |
commonsensemedia.org: # Paths (no clean URLs) | |
commonsensemedia.org: # Help with dupe content? | |
commonsensemedia.org: # Allow images to be indexed? (google) | |
commonsensemedia.org: # Disallow CP related traffic links | |
commonsensemedia.org: #HybridAuth paths | |
baby-kingdom.com: # | |
baby-kingdom.com: # robots.txt for Discuz! X1.5 | |
baby-kingdom.com: # | |
newspim.com: # robots.txt generated at http://www.adop.cc | |
kmcert.com: # robots.txt | |
shopify.com.au: # ,: | |
shopify.com.au: # ,' | | |
shopify.com.au: # / : | |
shopify.com.au: # --' / | |
shopify.com.au: # \/ />/ | |
shopify.com.au: # / <//_\ | |
shopify.com.au: # __/ / | |
shopify.com.au: # )'-. / | |
shopify.com.au: # ./ :\ | |
shopify.com.au: # /.' ' | |
shopify.com.au: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers | |
shopify.com.au: # robots.txt file for www.shopify.com.au | |
kanazawa-u.ac.jp: #fb-root{ | |
98zudisw.xyz: # | |
98zudisw.xyz: # robots.txt for Discuz! X3 | |
98zudisw.xyz: # | |
riselinkedu.com: # www.robotstxt.org/ | |
riselinkedu.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
downcc.com: # | |
downcc.com: # robots.txt for www.downcc.com | |
downcc.com: # | |
empower-retirement.com: # | |
empower-retirement.com: # robots.txt | |
empower-retirement.com: # | |
empower-retirement.com: # This file is to prevent the crawling and indexing of certain parts | |
empower-retirement.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
empower-retirement.com: # and Google. By telling these "robots" where not to go on your site, | |
empower-retirement.com: # you save bandwidth and server resources. | |
empower-retirement.com: # | |
empower-retirement.com: # This file will be ignored unless it is at the root of your host: | |
empower-retirement.com: # Used: http://example.com/robots.txt | |
empower-retirement.com: # Ignored: http://example.com/site/robots.txt | |
empower-retirement.com: # | |
empower-retirement.com: # For more information about the robots.txt standard, see: | |
empower-retirement.com: # http://www.robotstxt.org/robotstxt.html | |
empower-retirement.com: # CSS, JS, Images | |
empower-retirement.com: # Directories | |
empower-retirement.com: # Files | |
empower-retirement.com: # Paths (clean URLs) | |
empower-retirement.com: # Paths (no clean URLs) | |
parents.com: # Sitemaps | |
parents.com: # current CMS | |
parents.com: # ONECMS | |
parents.com: # Content | |
parents.com: # current CMS | |
parents.com: # ONECMS | |
parents.com: # Content | |
plainenglish.io: # https://www.robotstxt.org/robotstxt.html | |
campo-golf.de: # ############################## | |
campo-golf.de: # ################################## | |
campo-golf.de: # #################################### | |
campo-golf.de: # ##################################### | |
campo-golf.de: # ############# ############# ##########@ ########### ###### @######## ######## ###### ######## ########### | |
campo-golf.de: # ########### ########### ##############& ############### ############################# ################# ############### | |
campo-golf.de: # ########## #### ########## ####### (###### ####### ####### #######( ######### ####### ######## ####### ####### #######& | |
campo-golf.de: # ######### ###### ######### ###### ###### ###### ###### ####### ###### ###### ######% ####### ###### | |
campo-golf.de: # ######### ###################### ###### ############ ###### %###### ###### ###### ###### ###### ###### | |
campo-golf.de: # ######### ###################### ###### ################ ###### %###### ###### ###### ###### ###### ###### | |
campo-golf.de: # ######### ###################### ###### #### ###### ###### ###### %###### ###### ###### ####### ###### ###### | |
campo-golf.de: # ########## ##### ######### ####### ######& ###### ###### ###### %###### ###### ####### ####### ####### ####### | |
campo-golf.de: # ########## ########## ################ ################# ###### %###### ###### #################( (################ | |
campo-golf.de: # ############ ############ ############ ########## ###### ###### %###### ###### ################ (############ | |
campo-golf.de: # ################ ############### ###### | |
campo-golf.de: # ##################################### ###### | |
campo-golf.de: # ################ ############## ###### | |
campo-golf.de: # ################ ############## ###### | |
campo-golf.de: # Directories | |
digiskills.pk: # Group 1 | |
gaia.com: #Search | |
gaia.com: #Random Paths | |
gaia.com: #Cart | |
gaia.com: #Disallow Affiliates, Ambassadors & Hosts | |
gaia.com: #Disallow Go Handler | |
gaia.com: # Language Queries | |
gaia.com: #Migrated Aomm.TV URLs | |
gaia.com: #Tercer Milenio URLs | |
gaia.com: #German Language URLs | |
gaia.com: #Twitter sharing exemptions | |
onesignal.com: # robots.txt for https://onesignal.com/ | |
onesignal.com: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
onesignal.com: # Copied from old website | |
okta-emea.com: # | |
okta-emea.com: # robots.txt | |
okta-emea.com: # | |
okta-emea.com: # This file is to prevent the crawling and indexing of certain parts | |
okta-emea.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
okta-emea.com: # and Google. By telling these "robots" where not to go on your site, | |
okta-emea.com: # you save bandwidth and server resources. | |
okta-emea.com: # | |
okta-emea.com: # This file will be ignored unless it is at the root of your host: | |
okta-emea.com: # Used: http://example.com/robots.txt | |
okta-emea.com: # Ignored: http://example.com/site/robots.txt | |
okta-emea.com: # | |
okta-emea.com: # For more information about the robots.txt standard, see: | |
okta-emea.com: # http://www.robotstxt.org/robotstxt.html | |
okta-emea.com: # CSS, JS, Images | |
okta-emea.com: # Directories | |
okta-emea.com: # Files | |
okta-emea.com: # Paths (clean URLs) | |
okta-emea.com: # Paths (no clean URLs) | |
enphaseenergy.com: # | |
enphaseenergy.com: # robots.txt | |
enphaseenergy.com: # | |
enphaseenergy.com: # This file is to prevent the crawling and indexing of certain parts | |
enphaseenergy.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
enphaseenergy.com: # and Google. By telling these "robots" where not to go on your site, | |
enphaseenergy.com: # you save bandwidth and server resources. | |
enphaseenergy.com: # | |
enphaseenergy.com: # This file will be ignored unless it is at the root of your host: | |
enphaseenergy.com: # Used: http://example.com/robots.txt | |
enphaseenergy.com: # Ignored: http://example.com/site/robots.txt | |
enphaseenergy.com: # | |
enphaseenergy.com: # For more information about the robots.txt standard, see: | |
enphaseenergy.com: # http://www.robotstxt.org/robotstxt.html | |
enphaseenergy.com: # CSS, JS, Images | |
enphaseenergy.com: # Directories | |
enphaseenergy.com: # Files | |
enphaseenergy.com: # Paths (clean URLs) | |
enphaseenergy.com: # Paths (no clean URLs) | |
enphaseenergy.com: # Vanity Paths | |
enphaseenergy.com: # Taxonomy Term listing Page | |
enphaseenergy.com: # NL SItemap | |
enphaseenergy.com: #Disallow Files | |
enphaseenergy.com: #Disallow search Page | |
enphaseenergy.com: # BCMT-547 EN-US | |
enphaseenergy.com: # BCMT-547 NL-NL | |
enphaseenergy.com: # BCMT-547 EN-AU | |
enphaseenergy.com: # External URLs | |
vapejuicedepot.com: # we use Shopify as our ecommerce platform | |
vapejuicedepot.com: # Google adsbot ignores robots.txt unless specifically named! | |
google.com.bo: # AdsBot | |
google.com.bo: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
google.ba: # AdsBot | |
google.ba: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
honda.com: #Blank robots.txt | |
coachoutlet.com: #2020.10.19 | |
puzzle-english.com: # XML Sitemap & Google News Feeds version 4.3.2 - http://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
principal.com: # | |
principal.com: # robots.txt | |
principal.com: # | |
principal.com: # This file is to prevent the crawling and indexing of certain parts | |
principal.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
principal.com: # and Google. By telling these "robots" where not to go on your site, | |
principal.com: # you save bandwidth and server resources. | |
principal.com: # | |
principal.com: # This file will be ignored unless it is at the root of your host: | |
principal.com: # Used: http://example.com/robots.txt | |
principal.com: # Ignored: http://example.com/site/robots.txt | |
principal.com: # | |
principal.com: # For more information about the robots.txt standard, see: | |
principal.com: # http://www.robotstxt.org/robotstxt.html | |
principal.com: # Directories | |
principal.com: # Files | |
principal.com: # Paths (clean URLs) | |
principal.com: # Paths (no clean URLs) | |
playblackdesert.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
playblackdesert.com: #content{margin:0 0 0 2%;position:relative;} | |
busuu.com: #Grav | |
busuu.com: # Backend Symfony | |
busuu.com: # Frontend | |
busuu.com: # Specific paths | |
busuu.com: #Sitemap | |
reseau-canope.fr: #On empeche l'indexation de la page de resultats de recherche | |
reseau-canope.fr: #Disallow: /resultats-de-recherche/ ces pages contiennent déjà un balise meta robots =>noindex, nofollow. | |
reseau-canope.fr: #supprimer le nofollow | |
reseau-canope.fr: #On empeche l'indexation des fichers typoscript | |
reseau-canope.fr: #Disallow: /*.ts$ => désactivé avec /fileadmin/template/ts/ | |
seconnecter.org: # | |
seconnecter.org: # robots.txt | |
seconnecter.org: # | |
seconnecter.org: # This file is to prevent the crawling and indexing of certain parts | |
seconnecter.org: # of your site by web crawlers and spiders run by sites like Yahoo! | |
seconnecter.org: # and Google. By telling these "robots" where not to go on your site, | |
seconnecter.org: # you save bandwidth and server resources. | |
seconnecter.org: # | |
seconnecter.org: # This file will be ignored unless it is at the root of your host: | |
seconnecter.org: # Used: http://example.com/robots.txt | |
seconnecter.org: # Ignored: http://example.com/site/robots.txt | |
seconnecter.org: # | |
seconnecter.org: # For more information about the robots.txt standard, see: | |
seconnecter.org: # http://www.robotstxt.org/wc/robots.html | |
seconnecter.org: # | |
seconnecter.org: # For syntax checking, see: | |
seconnecter.org: # http://www.sxw.org.uk/computing/robots/check.html | |
seconnecter.org: # Directories | |
seconnecter.org: # Files | |
seconnecter.org: # Paths (clean URLs) | |
seconnecter.org: # Paths (no clean URLs) | |
devpost.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
devpost.com: # | |
devpost.com: # To ban all spiders from the entire site uncomment the next two lines: | |
devpost.com: # User-agent: * | |
devpost.com: # Disallow: / | |
streeteasy.com: # robots.txt | |
eventbrite.fr: # http://www.google.fr/adsbot.html - AdsBot ignores * wildcard | |
azerforum.com: # Sitemap files | |
tvm.com.mt: # slow down dot | |
98asedwwq.xyz: # | |
98asedwwq.xyz: # robots.txt for Discuz! X3 | |
98asedwwq.xyz: # | |
snyk.io: #Sitemap: https://snyk.io/search-sitemaps/sitemap_index.xml | |
snyk.io: #sitemap: http://a213584.sitemaphosting4.com/4168338/sitemap.xml | |
snyk.io: #Sitemap: https://snyk.io/search-sitemaps/test-sitemap-1-03122020.xml | |
rubtc.top: #container { | |
actu.fr: # Lana Sitemap version 1.0.0 - http://wp.lanaprojekt.hu/blog/wordpress-plugins/lana-sitemap/ | |
centrify.com: # | |
centrify.com: # robots.txt | |
centrify.com: # | |
centrify.com: # This file is to prevent the crawling and indexing of certain parts | |
centrify.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
centrify.com: # and Google. By telling these "robots" where not to go on your site, | |
centrify.com: # you save bandwidth and server resources. | |
centrify.com: # | |
centrify.com: # This file will be ignored unless it is at the root of your host: | |
centrify.com: # Used: http://example.com/robots.txt | |
centrify.com: # Ignored: http://example.com/site/robots.txt | |
centrify.com: # | |
centrify.com: # For more information about the robots.txt standard, see: | |
centrify.com: # http://www.robotstxt.org/robotstxt.html | |
centrify.com: # CSS, JS, Images | |
centrify.com: # Directories | |
centrify.com: # Files | |
centrify.com: # Paths (clean URLs) | |
centrify.com: # Paths (no clean URLs) | |
centrify.com: # Other Paths | |
foodpanda.com.tw: # www.robotstxt.org/ | |
foodpanda.com.tw: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
comparably.com: # Disallow: Yandex | |
comparably.com: # Disallow: Sistrix | |
comparably.com: # Disallow: Sistrix | |
comparably.com: # Disallow: Sistrix | |
comparably.com: # Disallow: SEOkicks-Robot | |
comparably.com: # Disallow: jobs.de-Robot | |
comparably.com: # Backlink Analysis | |
comparably.com: # Bot der Leipziger Unister Holding GmbH | |
comparably.com: # http://moz.com/products | |
comparably.com: # http://www.searchmetrics.com | |
comparably.com: # http://www.majestic12.co.uk/projects/dsearch/mj12bot.php | |
comparably.com: # http://www.domaintools.com/webmasters/surveybot.php | |
comparably.com: # http://www.seodiver.com/bot | |
comparably.com: # http://openlinkprofiler.org/bot | |
comparably.com: # http://www.wotbox.com/bot/ | |
comparably.com: # http://www.opensiteexplorer.org/dotbot | |
comparably.com: # http://moz.com/researchtools/ose/dotbot | |
comparably.com: # http://www.meanpath.com/meanpathbot.html | |
comparably.com: # http://www.backlinktest.com/crawler.html | |
comparably.com: # http://www.brandwatch.com/magpie-crawler/ | |
comparably.com: # http://filterdb.iss.net/crawler/ | |
comparably.com: # http://webmeup-crawler.com | |
comparably.com: # https://megaindex.com/crawler | |
comparably.com: # http://www.cloudservermarket.com | |
comparably.com: # http://www.trendiction.de/de/publisher/bot | |
comparably.com: # http://www.exalead.com | |
comparably.com: # http://www.career-x.de/bot.html | |
comparably.com: # https://www.lipperhey.com/en/about/ | |
comparably.com: # https://www.lipperhey.com/en/about/ | |
comparably.com: # https://turnitin.com/robot/crawlerinfo.html | |
comparably.com: # http://help.coccoc.com/ | |
comparably.com: # ubermetrics-technologies.com | |
comparably.com: # datenbutler.de | |
comparably.com: # http://searchgears.de/uber-uns/crawling-faq.html | |
comparably.com: # http://commoncrawl.org/faq/ | |
comparably.com: # https://www.qwant.com/ | |
comparably.com: # http://linkfluence.net/ | |
comparably.com: # http://www.botje.com/plukkie.htm | |
comparably.com: # https://www.safedns.com/searchbot | |
comparably.com: # http://www.haosou.com/help/help_3_2.html | |
comparably.com: # http://www.haosou.com/help/help_3_2.html | |
comparably.com: # http://www.moz.com/dp/rogerbot | |
comparably.com: # http://www.openhose.org/bot.html | |
comparably.com: # http://www.screamingfrog.co.uk/seo-spider/ | |
comparably.com: # http://thumbsniper.com | |
comparably.com: # http://www.radian6.com/crawler | |
comparably.com: # http://cliqz.com/company/cliqzbot | |
comparably.com: # https://www.aihitdata.com/about | |
comparably.com: # http://www.trendiction.com/en/publisher/bot | |
comparably.com: # http://warebay.com/bot.html | |
civilica.com: # https://www.robotstxt.org/robotstxt.html | |
thehut.com: # Sitemap files | |
teratail.com: # robotstxt.org/ | |
screamingfrog.co.uk: # Protection of frog team | |
screamingfrog.co.uk: # Protection of frog teams sanity | |
screamingfrog.co.uk: # Screaming Frog - Search Engine Marketing | |
screamingfrog.co.uk: # If you're looking at our robots.txt then you might well be interested in our current SEO vacancies :-) | |
screamingfrog.co.uk: # https://www.screamingfrog.co.uk/careers/ | |
onvista.de: #robots.txt for www.onvista.de | |
onvista.de: #Robots.txt File | |
onvista.de: #Version: 0.3 | |
onvista.de: #Last updated: 20/06/2018 | |
onvista.de: #Please note our terms and conditions "http://www.onvista.de/agb.html" | |
onvista.de: #Spidering is not allowed by our terms and conditions | |
onvista.de: #Authorised spidering is subject to permission | |
onvista.de: #For authorisation please contact us - see "http://www.onvista.de/impressum.html" | |
nazk.gov.ua: #ajax_ac_widget th{background:none repeat scroll 0 0 #457cbf;color:#FFF;font-weight:400;padding:5px 1px;text-align:center;font-size:16px} | |
nazk.gov.ua: #ajax_ac_widget td{text-align:center} | |
nazk.gov.ua: #ajax_ac_widget table tbody tr{ | |
nazk.gov.ua: #my-calendar a{background:none repeat scroll 0 0 #BEE6FD;color:#2B4261;display:block;padding:7px 0;width:100%!important} | |
nazk.gov.ua: #my-calendar a:hover{background: none repeat scroll 0 0 #ffd232;} | |
nazk.gov.ua: #my-calendar{width:100%} | |
nazk.gov.ua: #my_calender span{display:block;padding:7px 0;width:100%!important} | |
nazk.gov.ua: #today a,#today span{background:none repeat scroll 0 0 #ffd232!important;color:#1A1A22} | |
nazk.gov.ua: #ajax_ac_widget #my_year{float:right} | |
nazk.gov.ua: #my_accessibility{ | |
nazk.gov.ua: #stat .swiper-slide{cursor:pointer;display:-ms-flexbox;display:flex;-ms-flex-pack:justify;justify-content:space-evenly;-ms-flex-align:baseline;align-items:center;padding:6px 20px 3px} | |
nazk.gov.ua: #stat .swiper-wrapper{justify-content:center} | |
nazk.gov.ua: #stat .swiper-wrapper{justify-content:initial} | |
nazk.gov.ua: #stat .swiper-slide .section1__block{width:auto;padding:6px 20px 3px;} | |
nazk.gov.ua: #stat .swiper-slide.active{display:flex} | |
nazk.gov.ua: #stat .section1__top{flex-wrap:wrap} | |
nazk.gov.ua: #stat .swiper-slide.active{display:flex;align-items:flex-start;justify-content:flex-start} | |
nazk.gov.ua: #ui-datepicker-div{max-width:235px} | |
greatist.com: # | |
greatist.com: # robots.txt | |
greatist.com: # | |
greatist.com: # This file is to prevent the crawling and indexing of certain parts | |
greatist.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
greatist.com: # and Google. By telling these "robots" where not to go on your site, | |
greatist.com: # you save bandwidth and server resources. | |
greatist.com: # | |
greatist.com: # This file will be ignored unless it is at the root of your host: | |
greatist.com: # Used: http://example.com/robots.txt | |
greatist.com: # Ignored: http://example.com/site/robots.txt | |
greatist.com: # | |
greatist.com: # For more information about the robots.txt standard, see: | |
greatist.com: # http://www.robotstxt.org/robotstxt.html | |
greatist.com: # CSS, JS, Images | |
greatist.com: # Directories | |
greatist.com: # Files | |
greatist.com: # Paths (clean URLs) | |
greatist.com: # Paths (no clean URLs) | |
greatist.com: # Sitemaps | |
greatist.com: #SEO recommendation | |
google.com.mm: # AdsBot | |
google.com.mm: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
metropolia.fi: # | |
metropolia.fi: # robots.txt | |
metropolia.fi: # | |
metropolia.fi: # This file is to prevent the crawling and indexing of certain parts | |
metropolia.fi: # of your site by web crawlers and spiders run by sites like Yahoo! | |
metropolia.fi: # and Google. By telling these "robots" where not to go on your site, | |
metropolia.fi: # you save bandwidth and server resources. | |
metropolia.fi: # | |
metropolia.fi: # This file will be ignored unless it is at the root of your host: | |
metropolia.fi: # Used: http://example.com/robots.txt | |
metropolia.fi: # Ignored: http://example.com/site/robots.txt | |
metropolia.fi: # | |
metropolia.fi: # For more information about the robots.txt standard, see: | |
metropolia.fi: # http://www.robotstxt.org/robotstxt.html | |
metropolia.fi: # CSS, JS, Images | |
metropolia.fi: # Directories | |
metropolia.fi: # Files | |
metropolia.fi: # Paths (clean URLs) | |
metropolia.fi: # Paths (no clean URLs) | |
eatingwell.com: # Sitemaps | |
eatingwell.com: # current CMS | |
eatingwell.com: # ONECMS | |
eatingwell.com: # Content | |
eatingwell.com: # current CMS | |
eatingwell.com: # ONECMS | |
eatingwell.com: # Content | |
moe.gov.om: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
moe.gov.om: #content{margin:0 0 0 2%;position:relative;} | |
renweb.com: # Default Flywheel robots file | |
bibliocommons.com: # Squarespace Robots Txt | |
labcorp.com: # | |
labcorp.com: # robots.txt | |
labcorp.com: # | |
labcorp.com: # This file is to prevent the crawling and indexing of certain parts | |
labcorp.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
labcorp.com: # and Google. By telling these "robots" where not to go on your site, | |
labcorp.com: # you save bandwidth and server resources. | |
labcorp.com: # | |
labcorp.com: # This file will be ignored unless it is at the root of your host: | |
labcorp.com: # Used: http://example.com/robots.txt | |
labcorp.com: # Ignored: http://example.com/site/robots.txt | |
labcorp.com: # | |
labcorp.com: # For more information about the robots.txt standard, see: | |
labcorp.com: # http://www.robotstxt.org/robotstxt.html | |
labcorp.com: # CSS, JS, Images | |
labcorp.com: # Directories | |
labcorp.com: # Files | |
labcorp.com: # Paths (clean URLs) | |
labcorp.com: # Paths (no clean URLs) | |
labcorp.com: # Custom Entries | |
labcorp.com: # LC22517-267 | |
labcorp.com: # Disallow: /account-setup-international-providers | |
labcorp.com: # Disallow: /account-setup-japan | |
labcorp.com: # Disallow: /account-setup-providers | |
getsmarter.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
getsmarter.com: # | |
getsmarter.com: # To ban all spiders from the entire site uncomment the next two lines: | |
getsmarter.com: # User-agent: * | |
getsmarter.com: # Disallow: / | |
getsmarter.com: # User-agent: * | |
getsmarter.com: # Disallow: /checkout | |
getsmarter.com: # Disallow: /cart | |
getsmarter.com: # Disallow: /orders | |
getsmarter.com: # Disallow: /user | |
getsmarter.com: # Disallow: /account | |
getsmarter.com: # Disallow: /api | |
getsmarter.com: # Disallow: /password | |
mailtester.com: #User-agent: Mediapartners-Google | |
cirqlive.com: # Optimization for Google Ads Bot | |
chapman.edu: # robots.txt for Chapman University http://www.chapman.edu (maintained in Cascade) | |
simplypsychology.org: #################################################### | |
simplypsychology.org: # ALLOW MEDIA BOT TO CRAWL ANYWHERE | |
simplypsychology.org: ##### | |
simplypsychology.org: #################################################### | |
simplypsychology.org: # ALLOW IMAGE BOT TO CRAWL ANYWHERE | |
simplypsychology.org: ##### | |
simplypsychology.org: #################################################### | |
simplypsychology.org: # ALLOW GOOGLE BOT TO CRAWL ANYWHERE | |
simplypsychology.org: ##### | |
simplypsychology.org: #################################################### | |
simplypsychology.org: # ALLOW GOOGLE IPHONE AD BOT TO CRAWL ANYWHERE | |
simplypsychology.org: ##### | |
simplypsychology.org: # Some bots are known to be trouble, particularly those designed to copy | |
simplypsychology.org: # entire sites. Please obey robots.txt. | |
simplypsychology.org: # Misbehaving: requests much too fast: | |
simplypsychology.org: # | |
simplypsychology.org: # Sorry, wget in its recursive mode is a frequent problem. | |
simplypsychology.org: # Please read the man page and use it properly; there is a | |
simplypsychology.org: # --wait option you can use to set the delay between hits, | |
simplypsychology.org: # for instance. | |
simplypsychology.org: # | |
simplypsychology.org: # | |
simplypsychology.org: # The 'grub' distributed client has been *very* poorly behaved. | |
simplypsychology.org: # | |
simplypsychology.org: # | |
simplypsychology.org: # Doesn't follow robots.txt anyway, but... | |
simplypsychology.org: # | |
simplypsychology.org: # | |
simplypsychology.org: # Hits many times per second, not acceptable | |
simplypsychology.org: # http://www.nameprotect.com/botinfo.html | |
simplypsychology.org: # A capture bot, downloads gazillions of pages with no public benefit | |
simplypsychology.org: # http://www.webreaper.net/ | |
simplypsychology.org: # Wayback Machine: defaults and whether to index user-pages | |
simplypsychology.org: # FIXME: Complete the removal of this block, per T7582. | |
simplypsychology.org: # User-agent: archive.org_bot | |
simplypsychology.org: # Allow: / | |
hrdc-drhc.gc.ca: #esdc_gc_ca | |
archives.gov: # | |
archives.gov: # robots.txt | |
archives.gov: # | |
archives.gov: # This file is to prevent the crawling and indexing of certain parts | |
archives.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
archives.gov: # and Google. By telling these "robots" where not to go on your site, | |
archives.gov: # you save bandwidth and server resources. | |
archives.gov: # | |
archives.gov: # This file will be ignored unless it is at the root of your host: | |
archives.gov: # Used: http://example.com/robots.txt | |
archives.gov: # Ignored: http://example.com/site/robots.txt | |
archives.gov: # | |
archives.gov: # For more information about the robots.txt standard, see: | |
archives.gov: # http://www.robotstxt.org/robotstxt.html | |
trendlyne.com: # Block trendkite-akashic-crawler | |
xtb.com: # www.robotstxt.org/ | |
xtb.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
jobomas.com: #Baiduspider | |
mailtrap.io: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
mailtrap.io: # | |
mailtrap.io: # To ban all spiders from the entire site uncomment the next two lines: | |
mailtrap.io: # User-agent: * | |
mailtrap.io: # Disallow: / | |
trekbikes.com: # For all robots Block access to specific groups of pages | |
akhbarak.net: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
akhbarak.net: # | |
akhbarak.net: # To ban all spiders from the entire site uncomment the next two lines: | |
akhbarak.net: #Disallow: /*.js$ | |
akhbarak.net: #User-Agent: * | |
akhbarak.net: #Disallow: / | |
abokifx.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
abokifx.com: # | |
abokifx.com: # To ban all spiders from the entire site uncomment the next two lines: | |
abokifx.com: # User-agent: * | |
abokifx.com: # Disallow: / | |
saatvesaat.com.tr: # Google Image Crawler Setup | |
google.com.sv: # AdsBot | |
google.com.sv: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
zeropark.com: # Fix for Redundant Hostnames in Universal Analytics | |
okmall.com: # Disallow: Sistrix | |
okmall.com: # Disallow: Sistrix | |
okmall.com: # Disallow: Sistrix | |
okmall.com: # Disallow: SEOkicks-Robot | |
okmall.com: # Disallow: jobs.de-Robot | |
okmall.com: # Bot der Leipziger Unister Holding GmbH | |
okmall.com: # http://www.searchmetrics.com | |
okmall.com: # http://www.domaintools.com/webmasters/surveybot.php | |
okmall.com: # http://www.seodiver.com/bot | |
okmall.com: # http://openlinkprofiler.org/bot | |
okmall.com: # http://www.wotbox.com/bot/ | |
okmall.com: # http://www.opensiteexplorer.org/dotbot | |
okmall.com: # http://moz.com/researchtools/ose/dotbot | |
okmall.com: # http://www.meanpath.com/meanpathbot.html | |
okmall.com: # http://www.backlinktest.com/crawler.html | |
okmall.com: # http://www.brandwatch.com/magpie-crawler/ | |
okmall.com: # http://filterdb.iss.net/crawler/ | |
okmall.com: # http://webmeup-crawler.com | |
okmall.com: # https://megaindex.com/crawler | |
okmall.com: # http://www.cloudservermarket.com | |
okmall.com: # http://www.trendiction.de/de/publisher/bot | |
okmall.com: # http://www.exalead.com | |
okmall.com: # http://www.career-x.de/bot.html | |
okmall.com: # https://www.lipperhey.com/en/about/ | |
okmall.com: # https://www.lipperhey.com/en/about/ | |
okmall.com: # https://turnitin.com/robot/crawlerinfo.html | |
okmall.com: # http://help.coccoc.com/ | |
okmall.com: # ubermetrics-technologies.com | |
okmall.com: # datenbutler.de | |
okmall.com: # http://searchgears.de/uber-uns/crawling-faq.html | |
okmall.com: # http://commoncrawl.org/faq/ | |
okmall.com: # https://www.qwant.com/ | |
okmall.com: # http://linkfluence.net/ | |
okmall.com: # http://www.botje.com/plukkie.htm | |
okmall.com: # https://www.safedns.com/searchbot | |
okmall.com: # http://www.haosou.com/help/help_3_2.html | |
okmall.com: # http://www.haosou.com/help/help_3_2.html | |
okmall.com: # http://www.moz.com/dp/rogerbot | |
okmall.com: # http://www.openhose.org/bot.html | |
okmall.com: # http://www.screamingfrog.co.uk/seo-spider/ | |
okmall.com: # http://thumbsniper.com | |
okmall.com: # http://www.radian6.com/crawler | |
okmall.com: # http://cliqz.com/company/cliqzbot | |
okmall.com: # https://www.aihitdata.com/about | |
okmall.com: # http://www.trendiction.com/en/publisher/bot | |
okmall.com: # http://warebay.com/bot.html | |
okmall.com: # http://www.website-datenbank.de/ | |
okmall.com: # http://law.di.unimi.it/BUbiNG.html | |
okmall.com: # http://www.linguee.com/bot; bot@linguee.com | |
okmall.com: # www.sentibot.eu | |
okmall.com: # http://velen.io | |
okmall.com: # https://moz.com/help/guides/moz-procedures/what-is-rogerbot | |
okmall.com: # http://www.garlik.com | |
okmall.com: # https://www.gosign.de/typo3-extension/typo3-sicherheitsmonitor/ | |
okmall.com: # http://www.siteliner.com/bot | |
okmall.com: # https://sabsim.com | |
okmall.com: # http://ltx71.com/ | |
designcrowd.com: # robots.txt | |
hoy.com.do: #robots para Hoy | |
hoy.com.do: #Por Kenneth Burgos | |
hoy.com.do: # Bloqueo basico para todos los bots y crawlers | |
hoy.com.do: # puede dar problemas por bloqueo de recursos en GWT | |
hoy.com.do: # Bloqueo de las URL dinamicas | |
hoy.com.do: #Bloqueo de busquedas | |
hoy.com.do: # Bloqueo de trackbacks | |
hoy.com.do: # Bloqueo de feeds para crawlers | |
hoy.com.do: # Ralentizamos algunos bots que se suelen volver locos | |
hoy.com.do: # Haran peticiones cada 20 segundos para bajar la carga de request al hosting | |
hoy.com.do: # Bloqueando algunos bots adicionales de Google | |
hoy.com.do: # Activar despues que baje la carga del hosting | |
hoy.com.do: # Bloqueo de bots y crawlers poco utiles | |
hoy.com.do: # Previene problemas de recursos bloqueados en Google Webmaster Tools | |
hoy.com.do: # Crawl-delay: 20 | |
hoy.com.do: # En condiciones normales este es el sitemap pero El Nacional no tiene | |
hoy.com.do: # Sitemap: https://eldia.com.do/sitemap.xml | |
hoy.com.do: # Si utiliza Yoast SEO estos son los sitemaps principales | |
hoy.com.do: # Sitemap: https://eldia.com.do/sitemap_index.xml | |
hoy.com.do: # Sitemap: https://eldia.com.do/category-sitemap.xml | |
hoy.com.do: # Sitemap: https://eldia.com.do/page-sitemap.xml | |
hoy.com.do: # Sitemap: http://eldia.com.do/sitemap_index.xml | |
tripadvisor.es: # Hi there, | |
tripadvisor.es: # | |
tripadvisor.es: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
tripadvisor.es: # | |
tripadvisor.es: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet? | |
tripadvisor.es: # | |
tripadvisor.es: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team | |
tripadvisor.es: # | |
tripadvisor.es: # Email seoRockstar@tripadvisor.com | |
tripadvisor.es: # | |
tripadvisor.es: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo | |
tripadvisor.es: # | |
tripadvisor.es: # | |
lu.se: # | |
lu.se: # robots.txt | |
lu.se: # | |
lu.se: # This file is to prevent the crawling and indexing of certain parts | |
lu.se: # of your site by web crawlers and spiders run by sites like Yahoo! | |
lu.se: # and Google. By telling these "robots" where not to go on your site, | |
lu.se: # you save bandwidth and server resources. | |
lu.se: # | |
lu.se: # This file will be ignored unless it is at the root of your host: | |
lu.se: # Used: http://example.com/robots.txt | |
lu.se: # Ignored: http://example.com/site/robots.txt | |
lu.se: # | |
lu.se: # For more information about the robots.txt standard, see: | |
lu.se: # http://www.robotstxt.org/robotstxt.html | |
lu.se: # CSS, JS, Images | |
lu.se: # Directories | |
lu.se: # Files | |
lu.se: # Paths (clean URLs) | |
lu.se: # Paths (no clean URLs) | |
lu.se: # Search | |
lu.se: # Disallow all on index.php | |
nokia.com: # | |
nokia.com: # robots.txt | |
nokia.com: # | |
nokia.com: # This file is to prevent the crawling and indexing of certain parts | |
nokia.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nokia.com: # and Google. By telling these "robots" where not to go on your site, | |
nokia.com: # you save bandwidth and server resources. | |
nokia.com: # | |
nokia.com: # This file will be ignored unless it is at the root of your host: | |
nokia.com: # Used: http://example.com/robots.txt | |
nokia.com: # Ignored: http://example.com/site/robots.txt | |
nokia.com: # | |
nokia.com: # For more information about the robots.txt standard, see: | |
nokia.com: # http://www.robotstxt.org/robotstxt.html | |
nokia.com: # CSS, JS, Images | |
nokia.com: # Directories | |
nokia.com: # Files | |
nokia.com: # Paths (clean URLs) | |
nokia.com: # Paths (no clean URLs) | |
nokia.com: # Disallow tax terms (and language based also) | |
merrilledge.com: #dvFooter .disclaimer { | |
merrilledge.com: #dvDisclaimer .disclaimer > .disclaimer { padding:0 0 15px 0; } | |
merrilledge.com: #dvFooter #jdBold { | |
gossiplankanews.com: # Blogger Sitemap generated on 2013.05.28 | |
slashdot.org: # robots.txt for Slashdot.org | |
slashdot.org: # $Id$ | |
slashdot.org: # "Any empty [Disallow] value, indicates that all URLs can be retrieved. | |
slashdot.org: # At least one Disallow field needs to be present in a record." | |
saashub.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
unibo.it: # User-agent: MegaIndex.ru/2.0 | |
unibo.it: # Disallow: / | |
unibo.it: # User-agent: MegaIndex.ru/ | |
unibo.it: # Disallow: / | |
bluejeans.com: # | |
bluejeans.com: # robots.txt | |
bluejeans.com: # | |
bluejeans.com: # This file is to prevent the crawling and indexing of certain parts | |
bluejeans.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
bluejeans.com: # and Google. By telling these "robots" where not to go on your site, | |
bluejeans.com: # you save bandwidth and server resources. | |
bluejeans.com: # | |
bluejeans.com: # This file will be ignored unless it is at the root of your host: | |
bluejeans.com: # Used: http://example.com/robots.txt | |
bluejeans.com: # Ignored: http://example.com/site/robots.txt | |
bluejeans.com: # | |
bluejeans.com: # For more information about the robots.txt standard, see: | |
bluejeans.com: # http://www.robotstxt.org/robotstxt.html | |
bluejeans.com: # CSS, JS, Images | |
bluejeans.com: # Directories | |
bluejeans.com: # Files | |
bluejeans.com: # Paths (clean URLs) | |
bluejeans.com: # Paths (no clean URLs) | |
freetaxusa.com: # /robots.txt file for https://www.freetaxusa.com/ | |
freetaxusa.com: # mail webmaster@freetaxusa.com with any comments | |
etour.com: ## Default robots.txt | |
thenationalnews.com: # Updated: 2021-02-21 | |
thenationalnews.com: # Robots.txt | |
simonparkes.org: # Optimization for Google Ads Bot | |
golf.com: #WP Import Export Rule | |
quia.com: # ----------------------------------------------------------------------------- | |
quia.com: # | |
quia.com: # Areas that search robots should avoid | |
quia.com: # (c) 2011 IXL Learning. All rights reserved. | |
quia.com: # | |
quia.com: # created by jkent on 8 Mar 2002 | |
quia.com: # | |
quia.com: # Site-friendly search robots use this file to determine where _not_ | |
quia.com: # to go. Some URL spaces are simply counterproductive. | |
quia.com: # | |
quia.com: # ----------------------------------------------------------------------------- | |
bancofalabella.cl: # Incluye todos los bots | |
sammobile.com: # This is a tag that is defined for Analytics Tag manager, and not a path | |
reamaze.com: # Do not allow bot access to private conversation pages | |
zapimoveis.com.br: # Amenities shall not pass! | |
zapimoveis.com.br: #Crawl Budget test https://github.com/grupozap/squad-growth/issues/1153 | |
booth.pm: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
google.com.et: # AdsBot | |
google.com.et: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
bazaarvoice.com: # www.robotstxt.org/ | |
bazaarvoice.com: # Stop heritrix from crawling SEI | |
bazaarvoice.com: # Default WP | |
bazaarvoice.com: # Disallow Search Queries | |
eadaily.com: # robots.txt for https://eadaily.com/ | |
solopos.com: #User-agent: ia_archiver-web.archive.org | |
solopos.com: #Disallow: / | |
solopos.com: #Sitemap: http://www.askapache.com/sitemap.xml | |
solopos.com: #Sitemap: https://www.solopos.com/sitemap_index.xml | |
solopos.com: #Sitemap: http://www.askapache.com/sitemap.xml | |
solopos.com: #Sitemap: https://m.solopos.com/sitemap_index.xml | |
solopos.com: # __ __ | |
solopos.com: # ____ ______/ /______ _____ ____ ______/ /_ ___ | |
solopos.com: # / __ `/ ___/ //_/ __ `/ __ \/ __ `/ ___/ __ \/ _ \ | |
solopos.com: # / /_/ (__ ) ,< / /_/ / /_/ / /_/ / /__/ / / / __/ | |
solopos.com: # \__,_/____/_/|_|\__,_/ .___/\__,_/\___/_/ /_/\___/ | |
solopos.com: # /_/ | |
solopos.com: # | |
dy2018.com: #1 | |
dy2018.com: # robots.txt for EmpireCMS | |
dy2018.com: # | |
imomoe.ai: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
imomoe.ai: #content{margin:0 0 0 2%;position:relative;} | |
puercn.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
puercn.com: # | |
puercn.com: # To ban all spiders from the entire site uncomment the next two lines: | |
puercn.com: #User-agent: * | |
puercn.com: #Disallow: /jiu/ | |
yn.gov.cn: #btn_tz5 li{ | |
bewakoof.com: # robotstxt.org | |
watanserb.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
paypalobjects.com: ### BEGIN FILE ### | |
paypalobjects.com: # PayPal robots.txt file | |
google.by: # AdsBot | |
google.by: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
handbrake.fr: # Required to let Google show relevant ads | |
bdsimg.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
bdsimg.com: #content{margin:0 0 0 2%;position:relative;} | |
javdb5.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
nofraud.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
nofraud.com: # | |
nofraud.com: # To ban all spiders from the entire site uncomment the next two lines: | |
nofraud.com: # User-agent: * | |
nofraud.com: # Disallow: / | |
tracker.gg: # Robots! | |
youcanbook.me: # Hello Spiders | |
youcanbook.me: # Sorry Baidu - you just don't play nicely | |
claimfreecoins.io: #adcopy-outer table{ background: #fff;color:#999;} | |
swaggerhub.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
swaggerhub.com: #content{margin:0 0 0 2%;position:relative;} | |
vqfit.com: # we use Shopify as our ecommerce platform | |
vqfit.com: # Google adsbot ignores robots.txt unless specifically named! | |
findagrave.com: # robots.txt file for Find A Grave | |
findagrave.com: ## Below disallows are to accomodate user requests to remove a name from search results. ## | |
findagrave.com: #Updated 2/25/2019 | |
ecoledirecte.com: # robotstxt.org | |
themanifest.com: # | |
themanifest.com: # robots.txt | |
themanifest.com: # | |
themanifest.com: # This file is to prevent the crawling and indexing of certain parts | |
themanifest.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
themanifest.com: # and Google. By telling these "robots" where not to go on your site, | |
themanifest.com: # you save bandwidth and server resources. | |
themanifest.com: # | |
themanifest.com: # This file will be ignored unless it is at the root of your host: | |
themanifest.com: # Used: http://example.com/robots.txt | |
themanifest.com: # Ignored: http://example.com/site/robots.txt | |
themanifest.com: # | |
themanifest.com: # For more information about the robots.txt standard, see: | |
themanifest.com: # http://www.robotstxt.org/robotstxt.html | |
themanifest.com: # CSS, JS, Images | |
themanifest.com: # Directories | |
themanifest.com: # Files | |
themanifest.com: # Paths (clean URLs) | |
themanifest.com: # Paths (no clean URLs) | |
brainscape.com: # blocking bad bots | |
brandbucket.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $ | |
brandbucket.com: # | |
brandbucket.com: # robots.txt | |
brandbucket.com: # | |
brandbucket.com: # This file is to prevent the crawling and indexing of certain parts | |
brandbucket.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
brandbucket.com: # and Google. By telling these "robots" where not to go on your site, | |
brandbucket.com: # you save bandwidth and server resources. | |
brandbucket.com: # | |
brandbucket.com: # This file will be ignored unless it is at the root of your host: | |
brandbucket.com: # Used: http://example.com/robots.txt | |
brandbucket.com: # Ignored: http://example.com/site/robots.txt | |
brandbucket.com: # | |
brandbucket.com: # For more information about the robots.txt standard, see: | |
brandbucket.com: # http://www.robotstxt.org/wc/robots.html | |
brandbucket.com: # | |
brandbucket.com: # For syntax checking, see: | |
brandbucket.com: # http://www.sxw.org.uk/computing/robots/check.html | |
brandbucket.com: # Crawl-delay: 10 | |
brandbucket.com: # Directories | |
brandbucket.com: # Files | |
brandbucket.com: # Paths (clean URLs) | |
brandbucket.com: # Paths (no clean URLs) | |
radioformula.com.mx: # WP Refugees | |
fh-aachen.de: #------------------------------------------- | |
fh-aachen.de: # robots.txt zu http://www.fh-aachen.de | |
fh-aachen.de: # 28.2.2021 ML | |
fh-aachen.de: #------------------------------------------- | |
freshersworld.com: # Filename:robots.txt file for https://www.freshersworld.com | |
freshersworld.com: # Created Dec, 09, 2015. | |
freshersworld.com: # Author: Bijeesh | |
freshersworld.com: # Email: info@freshersworld.com | |
freshersworld.com: # Edited : Jan, 29, 2019 | |
freshersworld.com: # GoogleMedia Partners | |
freshersworld.com: # Google Adsbot | |
dermstore.com: # | |
dermstore.com: # DermStore.com: robots.txt | |
dermstore.com: # Please, we do NOT allow nonauthorized robots any longer. | |
film.ru: # Directories | |
film.ru: # Paths (clean URLs) | |
film.ru: # Paths (no clean URLs) | |
film.ru: # Directories | |
film.ru: # Paths (clean URLs) | |
film.ru: # Paths (no clean URLs) | |
film.ru: # Directories | |
film.ru: # Paths (clean URLs) | |
film.ru: # Paths (no clean URLs) | |
chime.com: # This robots.txt file was created by Robots.txt Rewrite plugin: https://wordpress.org/plugins/robotstxt-rewrite/ | |
wuerth.de: # robots.txt for www.wuerth.de | |
wuerth.de: # Disallow: /web/media/system/ | |
wuerth.de: # Disallow: /web/media/system/ | |
sportybet.com: # wap | |
sportybet.com: # pc | |
vendasta.com: # We're hiring! https://www.vendasta.com/devjobs | |
npmjs.com: # | |
npmjs.com: # | |
npmjs.com: # _____ | |
npmjs.com: # | | | |
npmjs.com: # | | | | | |
npmjs.com: # |_____| | |
npmjs.com: # ____ ___|_|___ ____ | |
npmjs.com: # ()___) ()___) | |
npmjs.com: # // /| |\ \\ | |
npmjs.com: # // / | | \ \\ | |
npmjs.com: # (___) |___________| (___) | |
npmjs.com: # (___) (_______) (___) | |
npmjs.com: # (___) (___) (___) | |
npmjs.com: # (___) |_| (___) | |
npmjs.com: # (___) ___/___\___ | | | |
npmjs.com: # | | | | | | | |
npmjs.com: # | | |___________| /___\ | |
npmjs.com: # /___\ ||| ||| // \\ | |
npmjs.com: # // \\ ||| ||| \\ // | |
npmjs.com: # \\ // ||| ||| \\ // | |
npmjs.com: # \\ // ()__) (__() | |
npmjs.com: # /// \\\ | |
npmjs.com: # /// \\\ | |
npmjs.com: # _///___ ___\\\_ | |
npmjs.com: # |_______| |_______| | |
npmjs.com: # | |
npmjs.com: # | |
npmjs.com: # | |
lloydsbank.com: # v 1.1 | |
lloydsbank.com: # www.lloydsbank.com | |
state.mn.us: # Disallow everything until we want to expose the site to external search | |
state.mn.us: # engines. | |
state.mn.us: # 0000-1200 GMT is 6PM to 6AM here | |
state.mn.us: # 4/14/14 Updated for DataExplorer to crawl all state sites. | |
film2serial.ir: # Sitemap | |
food52.com: # Production | |
food52.com: # Search | |
food52.com: #404 | |
emory.edu: # robots.txt for http://www.emory.edu/ | |
emory.edu: #removed /CARTER_CENTER and replaced with a redirect to cartercenter.org. 8/28/2104 --jm | |
emory.edu: #Disallow: /CARTER_CENTER/ | |
emory.edu: #Disallow: /CARTER--CENTER/ | |
emory.edu: #Disallow: /CARTER-CENTER/ | |
hdfcergo.com: #LokPalPopup .btn-red{ background: #E21F26; | |
wongnai.com: # If you are interested in our data, please visit https://business.wongnai.com/restaurants-data-service/ for more detail. | |
express.com.pk: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
express.com.pk: #content{margin:0 0 0 2%;position:relative;} | |
socalgas.com: # | |
socalgas.com: # robots.txt | |
socalgas.com: # | |
socalgas.com: # This file is to prevent the crawling and indexing of certain parts | |
socalgas.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
socalgas.com: # and Google. By telling these "robots" where not to go on your site, | |
socalgas.com: # you save bandwidth and server resources. | |
socalgas.com: # | |
socalgas.com: # This file will be ignored unless it is at the root of your host: | |
socalgas.com: # Used: http://example.com/robots.txt | |
socalgas.com: # Ignored: http://example.com/site/robots.txt | |
socalgas.com: # | |
socalgas.com: # For more information about the robots.txt standard, see: | |
socalgas.com: # http://www.robotstxt.org/robotstxt.html | |
socalgas.com: # CSS, JS, Images | |
socalgas.com: # Directories | |
socalgas.com: # Files | |
socalgas.com: # Paths (clean URLs) | |
socalgas.com: # Paths (no clean URLs) | |
socalgas.com: # Paths (clean URLs) - fixed! | |
socalgas.com: # Paths (no clean URLs) - fixed! | |
socalgas.com: # Sitemap | |
retaildive.com: # | |
retaildive.com: # ..;coxkOOOOOOkxoc;'. | |
retaildive.com: # .:d0NWMMMMMMMMMMMMMMWN0xc' | |
retaildive.com: # .:kXMMMMMMMMMMMMMMMMMMMMMMMXl. | |
retaildive.com: # .c0WMMMMMMMMMMMMMMMMMMMMMMMXd' | |
retaildive.com: # ,OWMMMMMMMMMMMMMMMMMMMMMMMXo' .. | |
retaildive.com: # cXMMMMMMXo::::::::::::::col. .lKXl. | |
retaildive.com: # lNMMMMMMM0' .lKWMMNo | |
retaildive.com: # :XMMMMMMMM0' .l0WMMMMMNc | |
retaildive.com: # .OMMMMMMMMM0' .ccccccc;. ,KMMMMMMMMO. | |
retaildive.com: # :NMMMMMMMMM0' oWMMMMMMWKc. oWMMMMMMMN: | |
retaildive.com: # lWMMMMMMMMM0' oWMMMMMMMMX: ,KMMMMMMMMo | |
retaildive.com: # oMMMMMMMMMM0' oWMMMMMMMMNc ,KMMMMMMMMd | |
retaildive.com: # cNMMMMMMMMM0' oWMMMMMMMNd. lWMMMMMMMWl | |
retaildive.com: # '0MMMMMMMMWk. ,oooooooc' ,0MMMMMMMMK, | |
retaildive.com: # oWMMMMMMXo. ,0MMMMMMMMWo | |
retaildive.com: # .xWMMMXd' ,dXMMMMMMMMWk. | |
retaildive.com: # .xWNx' .',''''''',,;coONMMMMMMMMMWk. | |
retaildive.com: # .:, .l0WWWWWWWWWWWMMMMMMMMMMMMMNd. | |
retaildive.com: # .lKWMMMMMMMMMMMMMMMMMMMMMMMWk; | |
retaildive.com: # .lKWMMMMMMMMMMMMMMMMMMMMMMMNk;. | |
retaildive.com: # .ckXWMMMMMMMMMMMMMMMMMMWXkl' | |
retaildive.com: # .;ldO0XNWWWWWWNXKOxl;. | |
retaildive.com: # ..'',,,,''.. | |
retaildive.com: # | |
retaildive.com: # | |
retaildive.com: # NOTE: Allow is a non-standard directive for robots.txt. It is allowed by Google bots. See https://developers.google.com/search/reference/robots_txt#allow | |
retaildive.com: # no deep queries to search | |
retaildive.com: # don't index our dynamic images | |
retaildive.com: # hide old-school trend report | |
retaildive.com: # | |
retaildive.com: # Rules for specific crawlers below. Note these replace and override the '*' rules above. | |
retaildive.com: # | |
retaildive.com: # Allow Twitter to see all links | |
retaildive.com: # Allow Googlebot-News to see header images and favicons, BUT make it follow all the directives from our * group | |
retaildive.com: # See below link for why we have to repeat these directives | |
retaildive.com: # https://developers.google.com/search/reference/robots_txt#order-of-precedence-for-user-agents | |
retaildive.com: # no deep queries to search | |
retaildive.com: # hide old-school trend report | |
retaildive.com: # Allow Google News to see header images and favicons | |
retaildive.com: # Don't let Google Images crawler see anything at all | |
retaildive.com: # Don't let PetalBot crawl at all | |
retaildive.com: # All Facebook crawler user-agent to see all | |
retaildive.com: # Allow swiftbot custom search to see all, but with a delay | |
retaildive.com: # We want this bot to crawl way slower http://ahrefs.com/robot/ | |
retaildive.com: # And be more aggressive on what not to allow | |
careerfoundry.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
careerfoundry.com: # | |
careerfoundry.com: # To ban all spiders from the entire site uncomment the next two lines: | |
ksusentinel.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
homes.com: # /robots.txt file for http://www.homes.com | |
homes.com: # @@ROBOTS-PROD@@ | |
homes.com: # e-mail web@homes.com for issues | |
subdl.com: # robots.txt generated at http://www.mcanerin.com | |
reedsy.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
reedsy.com: # | |
reedsy.com: # To ban all spiders from the entire site uncomment the next two lines: | |
reedsy.com: # User-agent: * | |
reedsy.com: # Disallow: / | |
genshin.gg: # https://www.robotstxt.org/robotstxt.html | |
bdo.com.ph: # | |
bdo.com.ph: # robots.txt | |
bdo.com.ph: # | |
bdo.com.ph: # This file is to prevent the crawling and indexing of certain parts | |
bdo.com.ph: # of your site by web crawlers and spiders run by sites like Yahoo! | |
bdo.com.ph: # and Google. By telling these "robots" where not to go on your site, | |
bdo.com.ph: # you save bandwidth and server resources. | |
bdo.com.ph: # | |
bdo.com.ph: # This file will be ignored unless it is at the root of your host: | |
bdo.com.ph: # Used: http://example.com/robots.txt | |
bdo.com.ph: # Ignored: http://example.com/site/robots.txt | |
bdo.com.ph: # | |
bdo.com.ph: # For more information about the robots.txt standard, see: | |
bdo.com.ph: # http://www.robotstxt.org/robotstxt.html | |
bdo.com.ph: # CSS, JS, Images | |
bdo.com.ph: # Directories | |
bdo.com.ph: # Files | |
bdo.com.ph: # Paths (clean URLs) | |
bdo.com.ph: # Paths (no clean URLs) | |
exactseek.com: # Allow only specific directories | |
pnu.ac.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
pnu.ac.ir: #content{margin:0 0 0 2%;position:relative;} | |
politpuzzle.ru: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/ | |
mccourier.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/ | |
uts.edu.au: # | |
uts.edu.au: # robots.txt | |
uts.edu.au: # | |
uts.edu.au: # This file is to prevent the crawling and indexing of certain parts | |
uts.edu.au: # of your site by web crawlers and spiders run by sites like Yahoo! | |
uts.edu.au: # and Google. By telling these "robots" where not to go on your site, | |
uts.edu.au: # you save bandwidth and server resources. | |
uts.edu.au: # | |
uts.edu.au: # This file will be ignored unless it is at the root of your host: | |
uts.edu.au: # Used: http://example.com/robots.txt | |
uts.edu.au: # Ignored: http://example.com/site/robots.txt | |
uts.edu.au: # | |
uts.edu.au: # For more information about the robots.txt standard, see: | |
uts.edu.au: # http://www.robotstxt.org/robotstxt.html | |
uts.edu.au: # CSS, JS, Images | |
uts.edu.au: # Directories | |
uts.edu.au: # Files | |
uts.edu.au: # Paths (clean URLs) | |
uts.edu.au: # Paths (no clean URLs) | |
moxtra.com: # Squarespace Robots Txt | |
mulesoft.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $ | |
mulesoft.com: # | |
mulesoft.com: # robots.txt | |
mulesoft.com: # | |
mulesoft.com: # This file is to prevent the crawling and indexing of certain parts | |
mulesoft.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
mulesoft.com: # and Google. By telling these "robots" where not to go on your site, | |
mulesoft.com: # you save bandwidth and server resources. | |
mulesoft.com: # | |
mulesoft.com: # This file will be ignored unless it is at the root of your host: | |
mulesoft.com: # Used: http://example.com/robots.txt | |
mulesoft.com: # Ignored: http://example.com/site/robots.txt | |
mulesoft.com: # | |
mulesoft.com: # For more information about the robots.txt standard, see: | |
mulesoft.com: # http://www.robotstxt.org/wc/robots.html | |
mulesoft.com: # | |
mulesoft.com: # For syntax checking, see: | |
mulesoft.com: # http://www.sxw.org.uk/computing/robots/check.html | |
mulesoft.com: # Directories | |
mulesoft.com: # Translated pages, origin | |
mulesoft.com: # Files | |
mulesoft.com: # Paths (clean URLs) | |
mulesoft.com: # Localization | |
mulesoft.com: # Paths (no clean URLs) | |
tufts.edu: # | |
tufts.edu: # robots.txt | |
tufts.edu: # | |
tufts.edu: # This file is to prevent the crawling and indexing of certain parts | |
tufts.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
tufts.edu: # and Google. By telling these "robots" where not to go on your site, | |
tufts.edu: # you save bandwidth and server resources. | |
tufts.edu: # | |
tufts.edu: # This file will be ignored unless it is at the root of your host: | |
tufts.edu: # Used: http://example.com/robots.txt | |
tufts.edu: # Ignored: http://example.com/site/robots.txt | |
tufts.edu: # | |
tufts.edu: # For more information about the robots.txt standard, see: | |
tufts.edu: # http://www.robotstxt.org/robotstxt.html | |
tufts.edu: # CSS, JS, Images | |
tufts.edu: # Directories | |
tufts.edu: # Files | |
tufts.edu: # Paths (clean URLs) | |
tufts.edu: # Paths (no clean URLs) | |
amasty.com: # Directories | |
amasty.com: # Paths (clean URLs) | |
amasty.com: # Files | |
amasty.com: # Paths (no clean URLs) | |
amasty.com: # Disallow: *do= | |
amasty.com: # Blog | |
sba.gov: # | |
sba.gov: # robots.txt | |
sba.gov: # | |
sba.gov: # CSS, JS, Images | |
sba.gov: # Directories | |
sba.gov: # Files | |
sba.gov: # Paths (clean URLs) | |
sba.gov: # Paths (no clean URLs) | |
kenhub.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
kenhub.com: # | |
kenhub.com: # To ban all spiders from the entire site uncomment the next two lines: | |
mix.com: # User-agent: Googlebot | |
mix.com: # User-agent: Bingbot | |
mix.com: # User-agent: baiduspider | |
mix.com: # User-agent: Applebot | |
mix.com: # User-agent: Yandex | |
mix.com: # Microsoft Search Engine Robot | |
mix.com: # User-agent: msnbot | |
mix.com: # Yahoo! Search Engine Robot | |
mix.com: # User-agent: Slurp | |
mix.com: # Stuff that search engines seem to pick up from wiggin.routes: | |
green-japan.com: # See http://www.robotstxt.org/wc/norobots for documentation on how to use the robots.txt file | |
peopleperhour.com: # Allow all robots to access our site | |
peopleperhour.com: # Real user monitoring causing errors | |
peopleperhour.com: # Disallowed pages | |
peopleperhour.com: # Disallowed Terms | |
peopleperhour.com: # Disallowed GET parameters | |
peopleperhour.com: # Disallow WordPress admin section | |
peopleperhour.com: # Sitemaps | |
topstarnews.net: # | |
topstarnews.net: # Other Bot, crawlers all disallow | |
topstarnews.net: # | |
staples.ca: # we use Shopify as our ecommerce platform | |
staples.ca: # Google adsbot ignores robots.txt unless specifically named! | |
exist.ru: # https://www.exist.ru | |
exist.ru: # Crawl-delay: 1 | |
sponichi.co.jp: Binary file (standard input) matches | |
ixigo.com: # Hi there! Since you are here, we assume you are either a bot or a geek. In either case, drop us an email at [careers@ixigo.com]. We would love to have a conversation with you ;) | |
ixigo.com: #sitemaps | |
animenewsnetwork.com: # disallowed for ALL robots due to impact on impressions/click tracking | |
animenewsnetwork.com: # deprecated | |
animenewsnetwork.com: # TODO: add nofollow to such links because not all bots understand wildcards | |
animenewsnetwork.com: # disallowed for search engines because redundant | |
animenewsnetwork.com: # only for authorized users | |
animenewsnetwork.com: ################################################################################ | |
animenewsnetwork.com: ################################################################################ | |
animenewsnetwork.com: ################################################################################ | |
animenewsnetwork.com: # block useless bot | |
raychat.io: # The robots.txt file is used to control how search engines index your live URLs. | |
raychat.io: # See http://sailsjs.org/documentation/anatomy/my-app/assets/robots-txt for more information. | |
raychat.io: # To prevent search engines from seeing the site altogether, uncomment the next two lines: | |
raychat.io: # User-Agent: * | |
raychat.io: # Disallow: / | |
ali213.net: # file: robots.txt,v 1.0 2015/03/06 created by ali213 | |
ali213.net: # robots.txt for www.ali213.net <URL:http://www.robotstxt.org> | |
ali213.net: # ----------------------------------------------------------------------------- | |
go4worldbusiness.com: # www.robotstxt.org/ | |
go4worldbusiness.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
go4worldbusiness.com: #User-agent: * | |
go4worldbusiness.com: #Disallow: / | |
go4worldbusiness.com: # Enable when the pages aren't in google's index anymore | |
go4worldbusiness.com: #User-agent: * | |
go4worldbusiness.com: #Disallow: /inquiries/send | |
go4worldbusiness.com: #Disallow: /report/complaint | |
go4worldbusiness.com: # Temporarily allowing as per Nikhil's request | |
go4worldbusiness.com: #User-agent: Xenu's | |
go4worldbusiness.com: #Disallow: / | |
radio-canada.ca: # Pages qui n'existe plus (erreur 404) | |
radio-canada.ca: # Pages Problèmatiques : wildcards | |
radio-canada.ca: # Vieilles pages de nouvelles régionales du Québec (Ticket 21836) | |
radio-canada.ca: #Cas des calendriers qui peuvent reculer jusqu'au début des temps | |
radio-canada.ca: # Disallow: /*calendrier.as* | |
radio-canada.ca: #Cas des pages avec des directives cache et nocache | |
radio-canada.ca: # Disallow: /*cache=* | |
radio-canada.ca: # Disallow: /regions/*/Dossiers/detail.asp?Pk_Dossiers_regionaux=*&Pk_Dossiers_regionaux_page=*&VCh=* | |
radio-canada.ca: # Disallow: /regions/*/emissions/emission.asp?pk=*&date=* | |
radio-canada.ca: # Pages Problèmatiques : cas par cas | |
radio-canada.ca: # | |
radio-canada.ca: # Ticket 17686 | |
radio-canada.ca: # Disallow: /sujet/monfleuvemonhistoire/ | |
radio-canada.ca: # Disallow: /sujet-complements/monfleuvemonhistoire/ | |
purewow.com: # robots.jsp | |
roguefitness.com: #CVS, SVN directories and dump files | |
roguefitness.com: # Magento Technical Folders | |
roguefitness.com: #Magento admin page | |
roguefitness.com: # Paths (clean URLs) Use if URLs are rewritten | |
roguefitness.com: # Checkout and user account - ensure proper checkout directory is used | |
roguefitness.com: # Magento Files | |
roguefitness.com: # Misc | |
anz.co.nz: # /robots.txt for https://www.anz.co.nz/ | |
anz.co.nz: # | |
smartbizloans.com: # robots.txt generated at http://www.mcanerin.com | |
smartbizloans.com: # Session new | |
cervantesvirtual.com: # go away | |
sueddeutsche.de: # Robots.txt for sueddeutsche.de | |
sueddeutsche.de: # www.robotstxt.org/ | |
sueddeutsche.de: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
sueddeutsche.de: # Exclude all other stuff for CRE tracking | |
sueddeutsche.de: # Exclude SEO-Tools & SPAM-Bots | |
sueddeutsche.de: # Uber Metrics | |
sueddeutsche.de: #Heidorn | |
tinuiti.com: # Disallow all crawlers from the following list updated 2019.10.02 | |
hualongxiang.com: # | |
hualongxiang.com: # robots.txt for PHPWind | |
hualongxiang.com: # Version 8.0 | |
hualongxiang.com: # | |
autoscout24.de: # Some bots are known to be trouble, particularly those designed to copy | |
autoscout24.de: # entire sites. Please obey robots.txt. | |
autoscout24.de: # Michael H, 17.12.19 | |
optimisemedia.com: # www.robotstxt.org/ | |
optimisemedia.com: # Allow crawling of all content | |
bookmyhsrp.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
bookmyhsrp.com: #content{margin:0 0 0 2%;position:relative;} | |
neighborhoodscout.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
neighborhoodscout.com: # | |
neighborhoodscout.com: # To ban all spiders from the entire site uncomment the next two lines: | |
neighborhoodscout.com: # User-agent: * | |
neighborhoodscout.com: # Disallow: / | |
avanza.se: # Avanza Bank Robots | |
lavoz.com.ar: # robots.txt La Voz | |
lavoz.com.ar: # Sitemaps | |
lavoz.com.ar: #Sitemap: https://www.lavoz.com.ar/sites/default/files/xmlsitemap/todos_sitemap_desktop.xml | |
lavoz.com.ar: # Tests | |
lavoz.com.ar: # API | |
lavoz.com.ar: # Denuncias | |
tripadvisor.co.uk: # Hi there, | |
tripadvisor.co.uk: # | |
tripadvisor.co.uk: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
tripadvisor.co.uk: # | |
tripadvisor.co.uk: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet? | |
tripadvisor.co.uk: # | |
tripadvisor.co.uk: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team | |
tripadvisor.co.uk: # | |
tripadvisor.co.uk: # Email seoRockstar@tripadvisor.com | |
tripadvisor.co.uk: # | |
tripadvisor.co.uk: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo | |
tripadvisor.co.uk: # | |
tripadvisor.co.uk: # | |
cellphones.com.vn: ## robots.txt for Magento Community and Enterprise | |
cellphones.com.vn: ## GENERAL SETTINGS | |
cellphones.com.vn: ## Enable robots.txt rules for all crawlers | |
cellphones.com.vn: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server. | |
cellphones.com.vn: ## Set a custom crawl rate if you're experiencing traffic problems with your server. | |
cellphones.com.vn: # Crawl-delay: 30 | |
cellphones.com.vn: ## Magento sitemap: uncomment and replace the URL to your Magento sitemap file | |
cellphones.com.vn: # Sitemap: http://www.example.com/sitemap/sitemap.xml | |
cellphones.com.vn: ## DEVELOPMENT RELATED SETTINGS | |
cellphones.com.vn: ## Do not crawl development files and folders: CVS, svn directories and dump files | |
cellphones.com.vn: ## GENERAL MAGENTO SETTINGS | |
cellphones.com.vn: ## Do not crawl Magento admin page | |
cellphones.com.vn: ## Do not crawl common Magento technical folders | |
cellphones.com.vn: ## Do not crawl common Magento files | |
cellphones.com.vn: ## MAGENTO SEO IMPROVEMENTS | |
cellphones.com.vn: ## Do not crawl sub category pages that are sorted or filtered. | |
cellphones.com.vn: ## Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs. | |
cellphones.com.vn: ## Do not crawl links with session IDs | |
cellphones.com.vn: ## Do not crawl checkout and user account pages | |
cellphones.com.vn: ## Do not crawl seach pages and not-SEO optimized catalog links | |
cellphones.com.vn: ## SERVER SETTINGS | |
cellphones.com.vn: ## Do not crawl common server technical folders and files | |
cellphones.com.vn: ## IMAGE CRAWLERS SETTINGS | |
cellphones.com.vn: ## Extra: Uncomment if you do not wish Google and Bing to index your images | |
cellphones.com.vn: # User-agent: Googlebot-Image | |
cellphones.com.vn: # Disallow: / | |
cellphones.com.vn: # User-agent: msnbot-media | |
cellphones.com.vn: # Disallow: / | |
cellphones.com.vn: ## Cellphones Sitemap | |
google.com.gt: # AdsBot | |
google.com.gt: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com. | |
cookieandkate.com: # flywheel permissions test | |
letemps.ch: # | |
letemps.ch: # robots.txt | |
letemps.ch: # | |
letemps.ch: # This file is to prevent the crawling and indexing of certain parts | |
letemps.ch: # of your site by web crawlers and spiders run by sites like Yahoo! | |
letemps.ch: # and Google. By telling these "robots" where not to go on your site, | |
letemps.ch: # you save bandwidth and server resources. | |
letemps.ch: # | |
letemps.ch: # This file will be ignored unless it is at the root of your host: | |
letemps.ch: # Used: http://example.com/robots.txt | |
letemps.ch: # Ignored: http://example.com/site/robots.txt | |
letemps.ch: # | |
letemps.ch: # For more information about the robots.txt standard, see: | |
letemps.ch: # http://www.robotstxt.org/robotstxt.html | |
letemps.ch: # CSS, JS, Images | |
letemps.ch: # Directories | |
letemps.ch: # Files | |
letemps.ch: # Paths (clean URLs) | |
letemps.ch: # Paths (no clean URLs) | |
takprosto.cc: #Disallow: /tag | |
takprosto.cc: #Disallow: /tag | |
mofa.go.kr: # | |
mofa.go.kr: # robots.txt | |
mofa.go.kr: # | |
mofa.go.kr: # This file is to prevent the crawling and indexing of certain parts | |
mofa.go.kr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
mofa.go.kr: # and Google. By telling these "robots" where not to go on your site, | |
mofa.go.kr: # you save bandwidth and server resources. | |
mofa.go.kr: # | |
mofa.go.kr: # This file will be ignored unless it is at the root of your host: | |
mofa.go.kr: # Used: http://example.com/robots.txt | |
mofa.go.kr: # Ignored: http://example.com/site/robots.txt | |
mofa.go.kr: # | |
mofa.go.kr: # For more information about the robots.txt standard, see: | |
mofa.go.kr: # http://www.robotstxt.org/wc/robots.html | |
mofa.go.kr: # | |
mofa.go.kr: # For syntax checking, see: | |
mofa.go.kr: # http://www.sxw.org.uk/computing/robots/check.html | |
mofa.go.kr: # Paths (no clean URLs) | |
fgv.br: # | |
fgv.br: # robots.txt | |
fgv.br: # | |
fgv.br: # This file is to prevent the crawling and indexing of certain parts | |
fgv.br: # of your site by web crawlers and spiders run by sites like Yahoo! | |
fgv.br: # and Google. By telling these "robots" where not to go on your site, | |
fgv.br: # you save bandwidth and server resources. | |
fgv.br: # | |
fgv.br: # This file will be ignored unless it is at the root of your host: | |
fgv.br: # Used: http://example.com/robots.txt | |
fgv.br: # Ignored: http://example.com/site/robots.txt | |
fgv.br: # | |
fgv.br: # For more information about the robots.txt standard, see: | |
fgv.br: # http://www.robotstxt.org/robotstxt.html | |
fgv.br: # Directories CPDOC | |
fgv.br: # Directories TIC | |
fgv.br: # Directories DICOM | |
handshake.com: # robots.txt file for www.handshake.com | |
efe.com: # Begin block Bad-Robots from robots.txt | |
efe.com: # SEO-related bots | |
vivanuncios.com.mx: #Sitemaps | |
vivanuncios.com.mx: #Sorting parameters | |
vivanuncios.com.mx: #Other comments: | |
vivanuncios.com.mx: #Sorting parameters | |
vivanuncios.com.mx: #Other comments: | |
vivanuncios.com.mx: #Sorting parameters | |
vivanuncios.com.mx: #Other comments: | |
baccredomatic.com: # | |
baccredomatic.com: # robots.txt | |
baccredomatic.com: # | |
baccredomatic.com: # This file is to prevent the crawling and indexing of certain parts | |
baccredomatic.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
baccredomatic.com: # and Google. By telling these "robots" where not to go on your site, | |
baccredomatic.com: # you save bandwidth and server resources. | |
baccredomatic.com: # | |
baccredomatic.com: # This file will be ignored unless it is at the root of your host: | |
baccredomatic.com: # Used: http://example.com/robots.txt | |
baccredomatic.com: # Ignored: http://example.com/site/robots.txt | |
baccredomatic.com: # | |
baccredomatic.com: # For more information about the robots.txt standard, see: | |
baccredomatic.com: # http://www.robotstxt.org/robotstxt.html | |
baccredomatic.com: # CSS, JS, Images | |
baccredomatic.com: # Directories | |
baccredomatic.com: # Files | |
baccredomatic.com: # Paths (clean URLs) | |
baccredomatic.com: # Paths (no clean URLs) | |
bitearns.com: # User-agent: * | |
bitearns.com: #User-agent: DeepCrawl | |
bitearns.com: #Disallow: / | |
ucsb.edu: # | |
ucsb.edu: # robots.txt | |
ucsb.edu: # | |
ucsb.edu: # This file is to prevent the crawling and indexing of certain parts | |
ucsb.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ucsb.edu: # and Google. By telling these "robots" where not to go on your site, | |
ucsb.edu: # you save bandwidth and server resources. | |
ucsb.edu: # | |
ucsb.edu: # This file will be ignored unless it is at the root of your host: | |
ucsb.edu: # Used: http://example.com/robots.txt | |
ucsb.edu: # Ignored: http://example.com/site/robots.txt | |
ucsb.edu: # | |
ucsb.edu: # For more information about the robots.txt standard, see: | |
ucsb.edu: # http://www.robotstxt.org/robotstxt.html | |
ucsb.edu: # CSS, JS, Images | |
ucsb.edu: # Directories | |
ucsb.edu: # Files | |
ucsb.edu: # Paths (clean URLs) | |
ucsb.edu: # Paths (no clean URLs) | |
ucsb.edu: # No A to Z items | |
prettylittlething.com: #**************************************************************************** | |
prettylittlething.com: # robots.txt | |
prettylittlething.com: # : Robots, spiders, and search engines use this file to detmine which | |
prettylittlething.com: # content they should *not* crawl while indexing your website. | |
prettylittlething.com: # : This system is called "The Robots Exclusion Standard." | |
prettylittlething.com: # : It is strongly encouraged to use a robots.txt validator to check | |
prettylittlething.com: # for valid syntax before any robots read it! | |
prettylittlething.com: # | |
prettylittlething.com: # Examples: | |
prettylittlething.com: # | |
prettylittlething.com: # Instruct all robots to stay out of the admin area. | |
prettylittlething.com: # : User-agent: * | |
prettylittlething.com: # : Disallow: /admin/ | |
prettylittlething.com: # | |
prettylittlething.com: # Restrict Google and MSN from indexing your images. | |
prettylittlething.com: # : User-agent: Googlebot | |
prettylittlething.com: # : Disallow: /images/ | |
prettylittlething.com: # : User-agent: MSNBot | |
prettylittlething.com: # : Disallow: /images/ | |
prettylittlething.com: #**************************************************************************** | |
viatorrents.com: #linksdownload a{ | |
viatorrents.com: #lista_download{ | |
viatorrents.com: #lista_download a{ | |
viatorrents.com: #lista_download strong a{ | |
viatorrents.com: #lista_download img{ | |
viatorrents.com: #menu_direito a{ | |
viatorrents.com: #menu_direito li{ | |
viatorrents.com: #informacoes{ | |
viatorrents.com: #elenco{ | |
viatorrents.com: #sinopse{ | |
viatorrents.com: #capas_pequenas a{ | |
viatorrents.com: #capas_pequenas{ | |
viatorrents.com: #capas_pequenas h3{ | |
viatorrents.com: #capas_pequenas img{ | |
viatorrents.com: #capas_pequenas p{ | |
viatorrents.com: #inicio img{ | |
viatorrents.com: #inicio .col-sm-3{ | |
viatorrents.com: #inicio h2{ | |
viatorrents.com: #inicio .col-12:hover h2{ | |
viatorrents.com: #pesquisa{ | |
viatorrents.com: #rodape { | |
mobareco.jp: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/ | |
nikkei225jp.com: #body1{width:1053px;overflow:hidden;border-right:#690 solid 2px;background:#fff} | |
nikkei225jp.com: #outline{padding-top:5px} | |
nikkei225jp.com: #main1{padding-left:5px;float:left} | |
nikkei225jp.com: #main2{width:732px} | |
nikkei225jp.com: #main3{padding-left:5px;float:left} | |
nikkei225jp.com: #main4{width:732px} | |
nikkei225jp.com: #side1{padding-right:5px;float:right} | |
nikkei225jp.com: #side2{width:305px} | |
nikkei225jp.com: #main1 .win2{width:728px;overflow:hidden} | |
nikkei225jp.com: #main3 .win2{width:728px;overflow:hidden} | |
nikkei225jp.com: #topmenu{background:#FeFbe3;border-bottom:#cb9 solid 1px;width:1053px;height:48px;overflow:hidden;} | |
nikkei225jp.com: #topmenu a{background:#FeFbe3;border-color:#ffd #cb9 #FeFbe3 #ffd;border-style:solid;border-width:1px;color:#825839;display:blaock;float:left;font-size:15px;font-weight:700;height:44px;line-height:46px;position:relative;text-align:center;text-decoration:none} | |
nikkei225jp.com: #topmenu a:before{border:6px transparent solid;border-left-color:#ec9;border-right-width:0;content:'';height:0;left:3px;position:absolute;top:16px;width:0} | |
nikkei225jp.com: #topmenu a:hover{background:#FFFABF;color:#bb3333} | |
nikkei225jp.com: #topmenu a{height:46px;line-height:48px;} | |
nikkei225jp.com: #topmenu .flag{margin:14px 0 0 4px} | |
nikkei225jp.com: #topmenu .topF{padding:0 8px 0 10px;} | |
nikkei225jp.com: #topmenu .topF .flag{border:1px solid#f7dfaf;} | |
nikkei225jp.com: #nkLink a{background:#FeFbe3;border-color:#ffe #eda #cb9 #ffe;border-style:solid;border-width:1px;color:#853;display:block;font-size:13px;font-weight:600;padding:0;height:33.3px;line-height:37px;position:relative;text-align:center;text-decoration:none;width:137px;float:left;overflow:hidden} | |
nikkei225jp.com: #nkLink a:before{border:6px transparent solid;border-left-color:#eda;border-right-width:0;content:'';height:0;position:absolute;left:3px;top:12px;width:0} | |
nikkei225jp.com: #nkLink a:hover{color:#b33} | |
nikkei225jp.com: #rankLink a{background:#FeFbe3;border-color:#ffe #eda #cb9 #ffe;border-style:solid;border-width:1px;color:#853;display:block;font-size:13px;font-weight:600;margin:0;padding-left:16px;height:22.7px;line-height:26px;position:relative;text-decoration:none;width:185px;float:right;overflow:hidden} | |
nikkei225jp.com: #rankLink a:before{border:5px transparent solid;border-left-color:#eda;border-right-width:0;content:'';height:0;position:absolute;left:3px;top:7px;width:0} | |
nikkei225jp.com: #rankLink a:hover{color:#b33} | |
nikkei225jp.com: #rankLink span{background:#FeF9ec;border-color:#ffe #eda #eda #ffe;border-style:solid;border-width:1px;color:#853;display:block;font-size:13px;font-weight:500;margin:0;padding:0;height:22.7px;line-height:26px;position:relative;text-align:center;width:44px;float:left} | |
nikkei225jp.com: #rankLink .tit7,#nkLink .tit7{padding:5px;border-left:1px solid #eda} | |
nikkei225jp.com: #rankLink span{border-left:1px solid #eda} | |
nikkei225jp.com: #wtime{margin-bottom:0px;} | |
nikkei225jp.com: #if_con11 a.title,#if_con33 a,.if_tit a{color:#ccc;text-decoration:none} | |
nikkei225jp.com: #if_con11 a:hover.title,#if_con33 a:hover,.if_tit a:hover{color:#ffcc00} | |
nikkei225jp.com: #if_con{background:#ffffff;border:1px solid #ffffff;font:normal 12px Helvetica,Arial} | |
nikkei225jp.com: #if_con11{background:#ffffff;font:bold 10px Helvetica;height:22px;line-height:22px;text-align:center;width:158px} | |
nikkei225jp.com: #if_con2{border-top:1px solid #ddd;padding-bottom:0px;padding-top:4px} | |
nikkei225jp.com: #if_con22{height:130px;width:158p;padding-left:8px;} | |
nikkei225jp.com: #if_con3{border-top:1px solid #ddd} | |
nikkei225jp.com: #if_con33{background:#ffffff;clear:both;color:#999;font:normal 11px Helvetica;height:18px;line-height:18px;text-align:center;width:158px} | |
nikkei225jp.com: #if_con3{display:none} | |
nikkei225jp.com: #eveS{font-size:13px;line-height:165%;padding:5px} | |
nikkei225jp.com: #headline tt,.eve tt,li tt{color:#bbb;font-family:Arial;font-size:14px} | |
nikkei225jp.com: #headline font{color:#444;font-size:12px} | |
nikkei225jp.com: #headline a{color:#777;font-size:12px;padding-left:7px} | |
nikkei225jp.com: #Suke span{padding:2px 3px 2px 4px;} | |
nikkei225jp.com: #Suke .day font{padding:2px 3px 2px 4px;} | |
nikkei225jp.com: #sideLink a.glink{width:140px;text-align:left;float:left;height:18px;padding:0;margin:3px;clear:both} | |
nikkei225jp.com: #sideLink a.glink span{margin-right:4px;} | |
nikkei225jp.com: #sideLink span.linkTX{width:100px;float:left;display:block;height:20px;line-height:24px;padding:0 0 0 3px;margin:0;font-size:11px;color:#aaa;} | |
nikkei225jp.com: #reload{background:#c33;color:#fff;display:none;font-weight:bold;height:30px;left:0;line-height:30px;position:fixed;text-align:center;top:0;width:1053px} | |
nikkei225jp.com: #datatbl{width:100%} | |
nikkei225jp.com: #dhtmltooltip{position:absolute;left:-300px;width:150px;border:1px solid black;visibility:hidden;z-index:100;filter:progid:DXImageTransform.Microsoft.Shadow(color=gray,direction=135);padding:2px;background:lightyellow} | |
nikkei225jp.com: #dhtmlpointer{position:absolute;left:-300px;z-index:101;visibility:hidden} | |
nikkei225jp.com: #snsSiteTwS,#snsSiteTwF,#snsChtTw,#Tw111,#Tw717{background-position:0 0} | |
nikkei225jp.com: #snsSiteFbS,#snsSiteFbF{background-position:0 -32px} | |
nikkei225jp.com: #snsChtFb,#snsTblFb,#snsChtFb2,#snsTblFb1,#snsTblFb2,#snsTblFb3,#snsTblFb4,#snsTblFb5{background-position:0 -24px} | |
nikkei225jp.com: #snsSiteTwS,#snsSiteFbS{margin:0 0 8px 22px;} | |
nikkei225jp.com: #snsSiteTwS{position:absolute;top:3px;left:-12px;} | |
nikkei225jp.com: #snsSiteFbS{position:absolute;top:3px;left:30px;} | |
nikkei225jp.com: #stbl td{white-space:nowrap} | |
shopify.jp: # ,: | |
shopify.jp: # ,' | | |
shopify.jp: # / : | |
shopify.jp: # --' / | |
shopify.jp: # \/ />/ | |
shopify.jp: # / <//_\ | |
shopify.jp: # __/ / | |
shopify.jp: # )'-. / | |
shopify.jp: # ./ :\ | |
shopify.jp: # /.' ' | |
shopify.jp: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers | |
shopify.jp: # robots.txt file for www.shopify.jp | |
jornalcontabil.com.br: #robots.txt by ServerDo.in -- www.jornalcontabil.com.br | |
ku.edu: # | |
ku.edu: # robots.txt | |
ku.edu: # | |
ku.edu: # This file is to prevent the crawling and indexing of certain parts | |
ku.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ku.edu: # and Google. By telling these "robots" where not to go on your site, | |
ku.edu: # you save bandwidth and server resources. | |
ku.edu: # | |
ku.edu: # This file will be ignored unless it is at the root of your host: | |
ku.edu: # Used: http://example.com/robots.txt | |
ku.edu: # Ignored: http://example.com/site/robots.txt | |
ku.edu: # | |
ku.edu: # For more information about the robots.txt standard, see: | |
ku.edu: # http://www.robotstxt.org/robotstxt.html | |
ku.edu: # CSS, JS, Images | |
ku.edu: # Directories | |
ku.edu: # Files | |
ku.edu: # Paths (clean URLs) | |
ku.edu: # Paths (no clean URLs) | |
codechef.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $ | |
codechef.com: # | |
codechef.com: # robots.txt | |
codechef.com: # | |
codechef.com: # This file is to prevent the crawling and indexing of certain parts | |
codechef.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
codechef.com: # and Google. By telling these "robots" where not to go on your site, | |
codechef.com: # you save bandwidth and server resources. | |
codechef.com: # | |
codechef.com: # This file will be ignored unless it is at the root of your host: | |
codechef.com: # Used: http://example.com/robots.txt | |
codechef.com: # Ignored: http://example.com/site/robots.txt | |
codechef.com: # | |
codechef.com: # For more information about the robots.txt standard, see: | |
codechef.com: # http://www.robotstxt.org/wc/robots.html | |
codechef.com: # | |
codechef.com: # For syntax checking, see: | |
codechef.com: # http://www.sxw.org.uk/computing/robots/check.html | |
codechef.com: # Allowing css, js and images | |
codechef.com: # Directories | |
codechef.com: # Files | |
codechef.com: # Paths (clean URLs) | |
codechef.com: # Paths (no clean URLs) | |
codechef.com: # Add Sitemap | |
loltoy.myshopify.com: # we use Shopify as our ecommerce platform | |
loltoy.myshopify.com: # Google adsbot ignores robots.txt unless specifically named! | |
viz.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
viz.com: # | |
viz.com: # To ban all spiders from the entire site uncomment the next two lines: | |
viz.com: # User-agent: * | |
viz.com: # Disallow: / | |
tv21.tv: #Websavers Bot Protection | |
tv21.tv: #Timely Events Calendar | |
tv21.tv: #ecwd Calendar | |
tv21.tv: #Tribe Events Calendar | |
tv21.tv: #Comment Mail Plugin | |
tv21.tv: #Search, sharing, sorting not needed for bots | |
jmbullion.com: # Added by SEO Ultimate's Link Mask Generator module | |
jmbullion.com: # End Link Mask Generator output | |
volusion.com: # robots.txt for https://www.volusion.com/ | |
volusion.com: # Directories | |
volusion.com: # Erin Directories | |
volusion.com: # Files | |
volusion.com: # Erin Files | |
volusion.com: # Paths (clean URLs) | |
volusion.com: # Erin Paths (clean URLs) | |
volusion.com: # Paths (no clean URLs) | |
volusion.com: ## Erin Paths (no clean URLs) | |
matplotlib.org: # Docs: https://developers.google.com/search/docs/advanced/robots/intro | |
matplotlib.org: # Note old files will still be indexed if they have links to them, | |
matplotlib.org: # hopefully they are weighted less... | |
matplotlib.org: # do not search root directory by default. | |
matplotlib.org: # files at top level: | |
matplotlib.org: # tell robots this is sitemap | |
nsportal.ru: # | |
nsportal.ru: # robots.txt | |
nsportal.ru: # | |
nsportal.ru: # This file is to prevent the crawling and indexing of certain parts | |
nsportal.ru: # of your site by web crawlers and spiders run by sites like Yahoo! | |
nsportal.ru: # and Google. By telling these "robots" where not to go on your site, | |
nsportal.ru: # you save bandwidth and server resources. | |
nsportal.ru: # | |
nsportal.ru: # This file will be ignored unless it is at the root of your host: | |
nsportal.ru: # Used: http://example.com/robots.txt | |
nsportal.ru: # Ignored: http://example.com/site/robots.txt | |
nsportal.ru: # | |
nsportal.ru: # For more information about the robots.txt standard, see: | |
nsportal.ru: # http://www.robotstxt.org/robotstxt.html | |
nsportal.ru: # CSS, JS, Images | |
nsportal.ru: # Directories | |
nsportal.ru: # Files | |
nsportal.ru: # Paths (clean URLs) | |
nsportal.ru: # Paths (no clean URLs) | |
bobvila.com: # allow boomtrain bot on entire sites | |
maalaimalar.com: # Sitemap Files | |
thewindowsclub.com: # global | |
techsmith.fr: # Robots.txt for www.techsmith.fr | |
techsmith.fr: #02 July 2018 | |
bitwarden.com: # .__________________________. | |
bitwarden.com: # | .___________________. |==| | |
bitwarden.com: # | | ................. | | | | |
bitwarden.com: # | | ::[ Dear robot ]: | | | | |
bitwarden.com: # | | ::::[ be nice ]:: | | | | |
bitwarden.com: # | | ::::::::::::::::: | | | | |
bitwarden.com: # | | ::::::::::::::::: | | | | |
bitwarden.com: # | | ::::::::::::::::: | | | | |
bitwarden.com: # | | ::::::::::::::::: | | ,| | |
bitwarden.com: # | !___________________! |(c| | |
bitwarden.com: # !_______________________!__! | |
bitwarden.com: # / \ | |
bitwarden.com: # / [][][][][][][][][][][][][] \ | |
bitwarden.com: # / [][][][][][][][][][][][][][] \ | |
bitwarden.com: #( [][][][][____________][][][][] ) | |
bitwarden.com: # \ ------------------------------ / | |
bitwarden.com: # \______________________________/ | |
bitwarden.com: # ________ | |
bitwarden.com: # __,_, | | | |
bitwarden.com: # [_|_/ | OK | | |
bitwarden.com: # // |________| | |
bitwarden.com: # _// __ / | |
bitwarden.com: #(_|) |@@| | |
bitwarden.com: # \ \__ \--/ __ | |
bitwarden.com: # \o__|----| | __ | |
bitwarden.com: # \ }{ /\ )_ / _\ | |
bitwarden.com: # /\__/\ \__O (__ | |
bitwarden.com: # (--/\--) \__/ | |
bitwarden.com: # _)( )(_ | |
bitwarden.com: # `---''---` | |
powr.io: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
powr.io: # | |
powr.io: # To ban all spiders from the entire site uncomment the next two lines: | |
powr.io: # User-agent: * | |
powr.io: # Disallow: / | |
mypetads.com: # Blocks robots from specific folders / directories | |
mypetads.com: # Crawl-delay: 80 | |
eventbrite.com.au: # http://www.google.com.au/adsbot.html - AdsBot ignores * wildcard | |
bloggingwizard.com: # block bots | |
bloggingwizard.com: # slow down bots | |
mercadolibre.cl: #siteId: MLC | |
mercadolibre.cl: #country: chile | |
mercadolibre.cl: ##Block - Referidos | |
mercadolibre.cl: ##Block - siteinfo urls | |
mercadolibre.cl: ##Block - Cart | |
mercadolibre.cl: ##Block Checkout | |
mercadolibre.cl: ##Block - User Logged | |
mercadolibre.cl: #Shipping selector | |
mercadolibre.cl: ##Block - last search | |
mercadolibre.cl: ## Block - Profile - By Id | |
mercadolibre.cl: ## Block - Profile - By Id and role (old version) | |
mercadolibre.cl: ## Block - Profile - Leg. Req. | |
mercadolibre.cl: ##Block - noindex | |
mercadolibre.cl: # Mercado-Puntos | |
mercadolibre.cl: # Viejo mundo | |
mercadolibre.cl: ##Block recommendations listing | |
secretsearchenginelabs.com: #Don't spider our own search results | |
uphf.fr: # Directories | |
uphf.fr: # Files | |
uphf.fr: # Paths (clean URLs) | |
uphf.fr: # Paths (no clean URLs) | |
deodap.com: # we use Shopify as our ecommerce platform | |
deodap.com: # Google adsbot ignores robots.txt unless specifically named! | |
decathlon.com: # we use Shopify as our ecommerce platform | |
decathlon.com: # Google adsbot ignores robots.txt unless specifically named! | |
depaul.edu: #AjaxDelta1 i { | |
manithan.com: # Disallow: /*? This is match ? anywhere in the URL | |
sololearn.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
sololearn.com: #content{margin:0 0 0 2%;position:relative;} | |
midocean.com: # dynamic url's | |
midocean.com: # no hammering | |
oneplus.in: # robots.txt for https://www.oneplus.in/ | |
ravelry.com: ## Below entries are from Wikipedia's robots.txt :) | |
ravelry.com: # | |
ravelry.com: # recursive wget | |
ravelry.com: # | |
ravelry.com: # | |
ravelry.com: # The 'grub' distributed client has been *very* poorly behaved. | |
ravelry.com: # | |
ravelry.com: # | |
ravelry.com: # Doesn't follow robots.txt anyway, but... | |
ravelry.com: # | |
ravelry.com: # | |
ravelry.com: # Hits many times per second, not acceptable | |
ravelry.com: # http://www.nameprotect.com/botinfo.html | |
ravelry.com: # A capture bot, downloads gazillions of pages with no public benefit | |
ravelry.com: # http://www.webreaper.net/ | |
akbank.com: ### Start ### | |
akbank.com: # global rules | |
akbank.com: # sitemaps | |
akbank.com: ### Stop ### | |
bookmarking.info: # 1) this filename (robots.txt) must stay lowercase | |
bookmarking.info: # 2) this file must be in the servers root directory | |
bookmarking.info: # ex: http://www.mydomain.com/kliqqisubfolder/ -- you must move the robots.txt from | |
bookmarking.info: # /kliqqisubfolder/ to the root folder for http://www.mydomain.com/ | |
bookmarking.info: # you must then add your subfolder to each 'Disallow' below | |
bookmarking.info: # ex: Disallow: /cache/ becomes Disallow: /kliqqisubfolder/cache/ | |
photoshelter.com: # ROBOTS.TXT FOR PHOTOSHELTER.COM | |
photoshelter.com: # Was disallowed because it was overly aggressive | |
photoshelter.com: # access re-enabled on May 30, 2013 | |
photoshelter.com: # User-agent: ia_archiver | |
photoshelter.com: # Disallow: / | |
capital.fr: # from orphanPage robots | |
capital.fr: # www.robotstxt.org/ | |
capital.fr: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
capital.fr: ############## | |
capital.fr: # Blocage URL NR | |
capital.fr: ############## | |
capital.fr: ############### | |
capital.fr: # URLs inutiles 2014 | |
capital.fr: ############### | |
capital.fr: ###### | |
capital.fr: # A CORRIGER | |
capital.fr: ##### | |
capital.fr: ###### | |
capital.fr: # 17-08-2017 | |
capital.fr: ###### | |
capital.fr: #Disallow: /bourse/communiques/ | |
vikingswap.finance: # https://www.robotstxt.org/robotstxt.html | |
binghamton.edu: ### User-agent: Baiduspider | |
binghamton.edu: ### Disallow: / | |
binghamton.edu: ### User-agent: "Sogou web spider" | |
binghamton.edu: ### Disallow: / | |
rankwatch.com: # robots.txt for / | |
jitunews.com: # robots.txt generated by Jay (jaykeren@gmail.com) | |
fangraphs.com: # robots.txt for http://www.fangraphs.com/ | |
converse.com: # https://www.converse.com robots.txt | |
designbundles.net: # www.robotstxt.org/ | |
designbundles.net: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
walmart.ca: # Prevent refined browse pages from being crawled, avoiding millions of near-duplicate entries. MG IG | |
walmart.ca: # Now allowing assets to be crawled. MG | |
walmart.ca: #Disallow: /assets/* | |
walmart.ca: # Prevents Financial Services One and Done page from being hit. 11-16-2016 NV | |
walmart.ca: # Prevents Financial WM MC page from being hit, contains promocodes. 11-16-2016 NV | |
walmart.ca: # Always include index sitemaps below rules. MB | |
walmart.ca: #Ending of robots.txt | |
granma.cu: #14-08-2019 | |
granma.cu: # Directories | |
granma.cu: # Files | |
granma.cu: # Paths (clean URLs) | |
granma.cu: # Paths (no clean URLs) | |
granma.cu: #Bots | |
progress.com: #Image Sitemap | |
progress.com: #Video Sitemap | |
coolblue.nl: # On all coolblue. domains | |
coolblue.nl: # Old, on shops | |
coolblue.nl: # No translation known or needed for | |
coolblue.nl: # Only on Coolblue.nl and .be - Dutch language (on coolblue.nl as a safeguard) | |
coolblue.nl: # Only on Coolblue. domains - English language | |
coolblue.nl: # The URL behind the # mark is the Dutch equivalent (just for reference, doesn't block anything), sorted alphabetically in Dutch | |
coolblue.nl: # If a line is behind a #, the translation still needs to be added | |
coolblue.nl: # Disallow: /en/??? # /nl/mailafriend | |
coolblue.nl: # Disallow: /en/??? # /nl/questionnaire | |
coolblue.nl: # Disallow: /en/??? # /nl/*/voor-de$ | |
coolblue.nl: # Disallow: /en/??? # /nl/*/voor-de/* | |
coolblue.nl: # Only on Coolblue. domains - English language | |
coolblue.nl: # Only on Coolblue.nl - Dutch only | |
coolblue.nl: # Only on Coolblue.nl - English only | |
coolblue.nl: # For specific bots (on all domains) | |
coolblue.nl: # Hi! Trying to reverse engineer something? | |
coolblue.nl: # Maybe you should come work with us. | |
coolblue.nl: # Apply at www.careersatcoolblue.com and mention this comment. | |
wnacg.org: #instantclick-bar{background:#d22;} | |
sreality.cz: # Better safe than sorry | |
biz2credit.com: # For more information about the robots.txt standard, see: | |
biz2credit.com: # http://www.robotstxt.org/orig.html | |
biz2credit.com: # | |
biz2credit.com: # For syntax checking, see: | |
biz2credit.com: # http://tool.motoricerca.info/robots-checker.phtml | |
hipertextual.com: # | |
hipertextual.com: # robots.txt | |
hipertextual.com: # | |
chmotor.cn: # | |
chmotor.cn: # robots.txt for chmotor.cn | |
chmotor.cn: # | |
simplenote.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
simplenote.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
simplenote.com: # This file was generated on Tue, 31 Mar 2020 18:02:18 +0000 | |
mq.edu.au: # Limit Bingbot | |
mq.edu.au: # Allow CSS and JS | |
mq.edu.au: # Allow: /_design/css/* | |
mq.edu.au: # Allow: /_design/js/* | |
mq.edu.au: # Disallow some matrix defaults | |
mq.edu.au: # Disallow UAT and sandbox pages | |
mq.edu.au: # Disallow search page | |
mq.edu.au: # sitemap | |
mq.edu.au: # Keep deliberately duplicated pages out of the Google index | |
mp3party.net: #Disallow: /online/* | |
mp3party.net: #Disallow: /search* | |
mp3party.net: #Disallow: /play/ | |
mp3party.net: #Disallow: *?*sort= | |
mp3party.net: #Disallow: /artist/*/new | |
mp3party.net: #Disallow: /artist/*/pop | |
retty.me: #User-agent: bingbot | |
retty.me: #Crawl-delay: 5 | |
retty.me: #User-agent: msnbot | |
retty.me: #Crawl-delay: 5 | |
retty.me: #User-agent: baiduspider | |
retty.me: #Crawl-delay: 5 | |
mailtrack.io: # www.robotstxt.org/ | |
weworkremotely.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
bollywoodhungama.com: #Baiduspider | |
xataka.com: # | |
xataka.com: # robots.txt | |
xataka.com: # | |
xataka.com: # Crawlers that are kind enough to obey, but which we'd rather not have | |
xataka.com: # unless they're feeding search engines. | |
xataka.com: # Some bots are known to be trouble, particularly those designed to copy | |
xataka.com: # entire sites. Please obey robots.txt. | |
xataka.com: # Sorry, wget in its recursive mode is a frequent problem. | |
xataka.com: # Please read the man page and use it properly; there is a | |
xataka.com: # --wait option you can use to set the delay between hits, | |
xataka.com: # for instance. | |
xataka.com: # | |
xataka.com: # | |
xataka.com: # The 'grub' distributed client has been *very* poorly behaved. | |
xataka.com: # | |
xataka.com: # | |
xataka.com: # Doesn't follow robots.txt anyway, but... | |
xataka.com: # | |
xataka.com: # | |
xataka.com: # Hits many times per second, not acceptable | |
xataka.com: # http://www.nameprotect.com/botinfo.html | |
xataka.com: # A capture bot, downloads gazillions of pages with no public benefit | |
xataka.com: # http://www.webreaper.net/ | |
teleamazonas.com: #Disallow: /wp-admin/ | |
teleamazonas.com: #Disallow: /wp-includes/ | |
temple.edu: # | |
temple.edu: # robots.txt | |
temple.edu: # | |
temple.edu: # This file is to prevent the crawling and indexing of certain parts | |
temple.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
temple.edu: # and Google. By telling these "robots" where not to go on your site, | |
temple.edu: # you save bandwidth and server resources. | |
temple.edu: # | |
temple.edu: # This file will be ignored unless it is at the root of your host: | |
temple.edu: # Used: http://example.com/robots.txt | |
temple.edu: # Ignored: http://example.com/site/robots.txt | |
temple.edu: # | |
temple.edu: # For more information about the robots.txt standard, see: | |
temple.edu: # http://www.robotstxt.org/robotstxt.html | |
temple.edu: # CSS, JS, Images | |
temple.edu: # Directories | |
temple.edu: # Files | |
temple.edu: # Paths (clean URLs) | |
temple.edu: # Paths (no clean URLs) | |
radio.co: # robots.txt for https://radio.co/ | |
radio.co: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
escortinparis.info: #linkpad.ru | |
escortinparis.info: #majestic.com | |
escortinparis.info: #ahrefs.com | |
escortinparis.info: #moz.com | |
escortinparis.info: #semrush.com | |
4over.com: # robotstxt.org | |
roofstock.com: # http://www.robotstxt.org | |
valenciacollege.edu: # robots.txt for http://valenciacollege.edu/ | |
lifelock.com: #Global Allow | |
lifelock.com: # General disallow rules | |
productreview.com.au: # Bad bots | |
productreview.com.au: # No query string or inner pages | |
productreview.com.au: # No query string | |
productreview.com.au: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's) | |
gnu.org: # robots.txt for http://www.gnu.org/ | |
gnu.org: # RT #1298215. | |
gnu.org: # RT #1298215. | |
gnu.org: # RT #1638325. | |
usatuan.com: # we use Shopify as our ecommerce platform | |
usatuan.com: # Google adsbot ignores robots.txt unless specifically named! | |
highlow.net: # www.robotstxt.org/ | |
highlow.net: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
ey.com: # Allow all robots complete access | |
riotpixels.com: # HOW TO USE THIS FILE: | |
riotpixels.com: # 1) Edit this file to change "/forum/" to the correct relative path from your base URL, for example if your forum was at "domain.com/sites/community", then you'd use "/sites/community/" | |
riotpixels.com: # 2) Rename the file to 'robots.txt' and move it to your web root (public_html, www, or htdocs) | |
riotpixels.com: # 3) Edit the file to remove this comment (anything above the dashed line, including the dashed line | |
riotpixels.com: # | |
riotpixels.com: # NOTES: | |
riotpixels.com: # Even though wild cards and pattern matching are not part of the robots.txt specification, many search bots understand and make use of them | |
riotpixels.com: #------------------------ REMOVE THIS LINE AND EVERYTHING ABOVE SO THAT User-agent: * IS THE FIRST LINE ------------------------------------------ | |
powershow.com: #Disallow: /js/ | |
powershow.com: # Old pages | |
powershow.com: # templates | |
powershow.com: # slides | |
powershow.com: # stock-photos | |
powershow.com: #books | |
infopicked.com: # Allow Google robots to crawl adServe | |
nuskin.com: # robots.txt for nuskin - Fastly | |
integrationsfonds.at: # folders | |
integrationsfonds.at: # parameters | |
wko.at: # 07/2007 hide mitarbeiterlistings and obsolete personal info | |
wko.at: # 08/2007 detailindex | |
wko.at: # 10/2018 | |
dorar.net: # www.robotstxt.org/ | |
dorar.net: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
ilsole24ore.com: # robots.txt www.ilsole24ore.com | |
ilsole24ore.com: # 20/06/2019 v. 1.0 | |
pocket-lint.com: # www.robotstxt.org/ | |
pocket-lint.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
healthunlocked.com: # User pages | |
healthunlocked.com: # Community pages | |
healthunlocked.com: # Private Programs | |
healthunlocked.com: # javascript files | |
healthunlocked.com: # Overrides | |
healthunlocked.com: # Private groups via API | |
healthunlocked.com: # Bot specific control | |
rapmls.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
rapmls.com: #content{margin:0 0 0 2%;position:relative;} | |
sosocar.cn: # robotstxt.org/ | |
ptrack1.com: # `..--:::::::::--.`` | |
ptrack1.com: # ``..-::///////////////:.` | |
ptrack1.com: # `..-:://////////////////- | |
ptrack1.com: # ```````..-----------::://+/////- | |
ptrack1.com: # .....```...............```.:+//// | |
ptrack1.com: # `----....-:-............`````.+///` | |
ptrack1.com: # `::/:---smmNy.........-hMNo```:+// | |
ptrack1.com: # `://:---dNNMd-........+MMMN.``-+/: | |
ptrack1.com: # -//----:os+-..........+hy:```:+/- | |
ptrack1.com: # -:/:---------............`.-:+//- | |
ptrack1.com: # -:/++////////+++++///////+++////:` | |
ptrack1.com: # `.-:/+++++++++++//////////////////:-.` ```` | |
ptrack1.com: # ``````````.-://++++++++++++++++//////////////::-.....--::/-` | |
ptrack1.com: # `----------:://++++++++++++++++++++//////////////::::::///////` | |
ptrack1.com: # :////////////++++++++++++++++++++++++++++++++++///////////////. | |
ptrack1.com: # /++++++++++++++++++++++++++++++++++++++++++++++++++++++++/////. | |
ptrack1.com: # :++++++++++++++++++++++++++++++++++++++++++++++++++++++++++///` | |
ptrack1.com: # `/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++/. | |
ptrack1.com: # .:////:::::/+++++++++++++++++++++++++++++++++++//://////:-` | |
ptrack1.com: # -/+++++++++++++++++++++++++++++++/` | |
ptrack1.com: # `///+++++++++++++++++++++++++++++- | |
ptrack1.com: # `:////+++++++++++++++++++++++++++. | |
ptrack1.com: # ::////+++++++++++++++++++++++++/` | |
ptrack1.com: # -::///+++++++++++++++++++++++++/` | |
ptrack1.com: # .-:///+++++++++++++++++++++++++/ | |
ptrack1.com: # .-::///++++++++++++++++++++++++/ | |
ptrack1.com: # .-::///++++++++++++++++++++++++/ | |
ptrack1.com: # --:////++++++++++++++++++++++++/ | |
ptrack1.com: # `-::///++++++++:``-+++++++++++++/` | |
ptrack1.com: # `-:///+++++++++.```+++++++++++++/` | |
ptrack1.com: # `::///+++++++++:``.+++++++++++++/.```````` | |
ptrack1.com: # ````:///++++++++++/..:+++++++++++++/.`````````````` | |
ptrack1.com: # ` ```````://++++++++++/----/+++++++++++/:...````````````` | |
ptrack1.com: # `````````.-----------......--::::/::::.`````````````` | |
pinkvilla.com: # Files | |
pinkvilla.com: # Paths (clean URLs) | |
pinkvilla.com: # Paths (no clean URLs) | |
academy.com: # changes for 11/01 release | |
academy.com: # changes for 10/26 release | |
academy.com: #Changes for 3/10/18 Kermit R1.1 | |
academy.com: #Changes for 16/10/18 Kermit 1.2 | |
makaan.com: # robots.txt for http://www.makaan.com/ | |
makaan.com: ########api docs######## | |
makaan.com: # Block SEM page | |
anaconda.com: # robots.txt for https://www.anaconda.com/ | |
anaconda.com: # live - don't allow web crawlers to index cpresources/ or vendor/ | |
vingle.net: # @user/interests/:interest | |
coolblue.be: # On all coolblue. domains | |
coolblue.be: # Old, on shops | |
coolblue.be: # No translation known or needed for | |
coolblue.be: # Only on Coolblue.nl and .be - Dutch language (on coolblue.nl as a safeguard) | |
coolblue.be: # Only on Coolblue. domains - English language | |
coolblue.be: # The URL behind the # mark is the Dutch equivalent (just for reference, doesn't block anything), sorted alphabetically in Dutch | |
coolblue.be: # If a line is behind a #, the translation still needs to be added | |
coolblue.be: # Disallow: /en/??? # /nl/mailafriend | |
coolblue.be: # Disallow: /en/??? # /nl/questionnaire | |
coolblue.be: # Disallow: /en/??? # /nl/*/voor-de$ | |
coolblue.be: # Disallow: /en/??? # /nl/*/voor-de/* | |
coolblue.be: # Only on Coolblue. domains - English language | |
coolblue.be: # Only on Coolblue.be - French language | |
coolblue.be: # The URL behind the # mark is the Dutch equivalent, sorted alphabetically in Dutch | |
coolblue.be: # If a line is behind a #, the translation still needs to be added | |
coolblue.be: # Disallow: /fr/??? # /nl/mailafriend | |
coolblue.be: # Disallow: /fr/??? # /nl/questionnaire | |
coolblue.be: # Disallow: /fr/??? # /nl/*/voor-de$ | |
coolblue.be: # Disallow: /fr/??? # /nl/*/voor-de/* | |
coolblue.be: # Only on Coolblue.be domain - French language | |
coolblue.be: # For specific bots (on all domains) | |
coolblue.be: # Hi! Trying to reverse engineer something? | |
coolblue.be: # Maybe you should come work with us. | |
coolblue.be: # Apply at www.careersatcoolblue.com and mention this comment. | |
gdeposylka.ru: # Begin Bad-Robots (DO NOT EDIT AFTER THIS LINE) | |
gdeposylka.ru: # SEO-related bots | |
smartdraw.com: # Additional restrictions for MSIECrawler anywhere on Site | |
sravni.ru: # tech | |
sravni.ru: # interface pages | |
sravni.ru: # auto | |
sravni.ru: # currency | |
sravni.ru: # news and articles | |
sravni.ru: #–†–µ–≥–∏–æ–Ω—ã –∏ —Å—É–±—ä–µ–∫—Ç—ã —Ñ–µ–¥–µ—Ä–∞—Ü–∏–∏ | |
sravni.ru: #old urls | |
sravni.ru: # tech | |
sravni.ru: # interface pages | |
sravni.ru: # auto | |
sravni.ru: # currency | |
sravni.ru: # news and articles | |
sravni.ru: #–†–µ–≥–∏–æ–Ω—ã –∏ —Å—É–±—ä–µ–∫—Ç—ã —Ñ–µ–¥–µ—Ä–∞—Ü–∏–∏ | |
sravni.ru: #old urls | |
unipd.it: # | |
unipd.it: # robots.txt | |
unipd.it: # | |
unipd.it: # This file is to prevent the crawling and indexing of certain parts | |
unipd.it: # of your site by web crawlers and spiders run by sites like Yahoo! | |
unipd.it: # and Google. By telling these "robots" where not to go on your site, | |
unipd.it: # you save bandwidth and server resources. | |
unipd.it: # | |
unipd.it: # This file will be ignored unless it is at the root of your host: | |
unipd.it: # Used: http://example.com/robots.txt | |
unipd.it: # Ignored: http://example.com/site/robots.txt | |
unipd.it: # | |
unipd.it: # For more information about the robots.txt standard, see: | |
unipd.it: # http://www.robotstxt.org/robotstxt.html | |
unipd.it: # CSS, JS, Images | |
unipd.it: # Directories | |
unipd.it: # Files | |
unipd.it: # Paths (clean URLs) | |
unipd.it: # Paths (no clean URLs) | |
unipd.it: # Directory sites/unipd.it/files | |
unipd.it: # ####### | |
ut.ac.id: # | |
ut.ac.id: # robots.txt | |
ut.ac.id: # | |
ut.ac.id: # This file is to prevent the crawling and indexing of certain parts | |
ut.ac.id: # of your site by web crawlers and spiders run by sites like Yahoo! | |
ut.ac.id: # and Google. By telling these "robots" where not to go on your site, | |
ut.ac.id: # you save bandwidth and server resources. | |
ut.ac.id: # | |
ut.ac.id: # This file will be ignored unless it is at the root of your host: | |
ut.ac.id: # Used: http://example.com/robots.txt | |
ut.ac.id: # Ignored: http://example.com/site/robots.txt | |
ut.ac.id: # | |
ut.ac.id: # For more information about the robots.txt standard, see: | |
ut.ac.id: # http://www.robotstxt.org/robotstxt.html | |
ut.ac.id: # CSS, JS, Images | |
ut.ac.id: # Directories | |
ut.ac.id: # Files | |
ut.ac.id: # Paths (clean URLs) | |
ut.ac.id: # Paths (no clean URLs) | |
makro.co.za: # For all robots | |
makro.co.za: # Block access to specific groups of pages | |
makro.co.za: # Allow search crawlers to discover the sitemap | |
makro.co.za: # Block CazoodleBot as it does not present correct accept content headers | |
makro.co.za: # Block MJ12bot as it is just noise | |
makro.co.za: # Block dotbot as it cannot parse base urls properly | |
makro.co.za: # Block Gigabot | |
doctorsfile.jp: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
doctorsfile.jp: # | |
doctorsfile.jp: # To ban all spiders from the entire site uncomment the next two lines: | |
doctorsfile.jp: # User-agent: * | |
doctorsfile.jp: # Disallow: / | |
theladders.com: # robots.txt for TheLadders.com | |
theladders.com: # robots.txt,v 3.1 2020/06/19 10:57:00 | |
classified4free.net: # Blocks robots from specific folders / directories | |
classified4free.net: # Crawl-delay: 80 | |
greetingsisland.com: #allowing twitter bot so ecard will show in ticket | |
greetingsisland.com: #disallow 'way back machine' | |
skiddle.com: #Disallow: /infofeed/ | |
skiddle.com: #Disallow hotel pages, apart from locations and nearby | |
skiddle.com: #Disallow: /hotels/*/$ | |
skiddle.com: #Allow: /hotels/*near*/$ | |
skiddle.com: #Allow: /hotels/*.html$ | |
skiddle.com: #Disallow restaurant pages, apart from locations and nearby | |
capterra.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
ic.gc.ca: #User-agent: * | |
ic.gc.ca: #Disallow: /app/scr/cc/CorporationsCanada/fdrlCrpDtls.html | |
pevex.hr: # Trive.digital robots.txt 17.9.2020 | |
pevex.hr: # Image Crawler Setup | |
pevex.hr: # All other bots | |
pevex.hr: # Directories | |
pevex.hr: # Paths (clean URLs) | |
pevex.hr: # Paths (no clean URLs) | |
pevex.hr: # Pevex specific | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # multichoice | |
sauto.cz: # 4+ params | |
sauto.cz: # Homepage categories | |
sauto.cz: # Better safe than sorry | |
brookings.edu: # Sitemap archive | |
cbp.gov: # | |
cbp.gov: # robots.txt | |
cbp.gov: # | |
cbp.gov: # This file is to prevent the crawling and indexing of certain parts | |
cbp.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
cbp.gov: # and Google. By telling these "robots" where not to go on your site, | |
cbp.gov: # you save bandwidth and server resources. | |
cbp.gov: # | |
cbp.gov: # This file will be ignored unless it is at the root of your host: | |
cbp.gov: # Used: http://example.com/robots.txt | |
cbp.gov: # Ignored: http://example.com/site/robots.txt | |
cbp.gov: # | |
cbp.gov: # For more information about the robots.txt standard, see: | |
cbp.gov: # http://www.robotstxt.org/robotstxt.html | |
cbp.gov: # CSS, JS, Images | |
cbp.gov: # Directories | |
cbp.gov: # Files | |
cbp.gov: # Paths (clean URLs) | |
cbp.gov: # Paths (no clean URLs) | |
wikidata.org: # | |
wikidata.org: # Please note: There are a lot of pages on this site, and there are | |
wikidata.org: # some misbehaved spiders out there that go _way_ too fast. If you're | |
wikidata.org: # irresponsible, your access to the site may be blocked. | |
wikidata.org: # | |
wikidata.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN | |
wikidata.org: # and ignoring 429 ratelimit responses, claims to respect robots: | |
wikidata.org: # http://mj12bot.com/ | |
wikidata.org: # advertising-related bots: | |
wikidata.org: # Wikipedia work bots: | |
wikidata.org: # Crawlers that are kind enough to obey, but which we'd rather not have | |
wikidata.org: # unless they're feeding search engines. | |
wikidata.org: # Some bots are known to be trouble, particularly those designed to copy | |
wikidata.org: # entire sites. Please obey robots.txt. | |
wikidata.org: # Misbehaving: requests much too fast: | |
wikidata.org: # | |
wikidata.org: # Sorry, wget in its recursive mode is a frequent problem. | |
wikidata.org: # Please read the man page and use it properly; there is a | |
wikidata.org: # --wait option you can use to set the delay between hits, | |
wikidata.org: # for instance. | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: # The 'grub' distributed client has been *very* poorly behaved. | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: # Doesn't follow robots.txt anyway, but... | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: # Hits many times per second, not acceptable | |
wikidata.org: # http://www.nameprotect.com/botinfo.html | |
wikidata.org: # A capture bot, downloads gazillions of pages with no public benefit | |
wikidata.org: # http://www.webreaper.net/ | |
wikidata.org: # | |
wikidata.org: # Friendly, low-speed bots are welcome viewing article pages, but not | |
wikidata.org: # dynamically-generated pages please. | |
wikidata.org: # | |
wikidata.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your | |
wikidata.org: # bot supports such a thing using the 'Crawl-delay' or another | |
wikidata.org: # instruction, please let us know. | |
wikidata.org: # | |
wikidata.org: # There is a special exception for API mobileview to allow dynamic | |
wikidata.org: # mobile web & app views to load section content. | |
wikidata.org: # These views aren't HTTP-cached but use parser cache aggressively | |
wikidata.org: # and don't expose special: pages etc. | |
wikidata.org: # | |
wikidata.org: # Another exception is for REST API documentation, located at | |
wikidata.org: # /api/rest_v1/?doc. | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: # ar: | |
wikidata.org: # | |
wikidata.org: # dewiki: | |
wikidata.org: # T6937 | |
wikidata.org: # sensible deletion and meta user discussion pages: | |
wikidata.org: # 4937#5 | |
wikidata.org: # T14111 | |
wikidata.org: # T15961 | |
wikidata.org: # | |
wikidata.org: # enwiki: | |
wikidata.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for | |
wikidata.org: # their name. See T6776 | |
wikidata.org: # T15398 | |
wikidata.org: # T16075 | |
wikidata.org: # T13261 | |
wikidata.org: # T12288 | |
wikidata.org: # T16793 | |
wikidata.org: # | |
wikidata.org: # eswiki: | |
wikidata.org: # T8746 | |
wikidata.org: # | |
wikidata.org: # fiwiki: | |
wikidata.org: # T10695 | |
wikidata.org: # | |
wikidata.org: # hewiki: | |
wikidata.org: #T11517 | |
wikidata.org: # | |
wikidata.org: # huwiki: | |
wikidata.org: # | |
wikidata.org: # itwiki: | |
wikidata.org: # T7545 | |
wikidata.org: # | |
wikidata.org: # jawiki | |
wikidata.org: # T7239 | |
wikidata.org: # nowiki | |
wikidata.org: # T13432 | |
wikidata.org: # | |
wikidata.org: # plwiki | |
wikidata.org: # T10067 | |
wikidata.org: # | |
wikidata.org: # ptwiki: | |
wikidata.org: # T7394 | |
wikidata.org: # | |
wikidata.org: # rowiki: | |
wikidata.org: # T14546 | |
wikidata.org: # | |
wikidata.org: # ruwiki: | |
wikidata.org: # | |
wikidata.org: # svwiki: | |
wikidata.org: # T12229 | |
wikidata.org: # T13291 | |
wikidata.org: # | |
wikidata.org: # zhwiki: | |
wikidata.org: # T7104 | |
wikidata.org: # | |
wikidata.org: # sister projects | |
wikidata.org: # | |
wikidata.org: # enwikinews: | |
wikidata.org: # T7340 | |
wikidata.org: # | |
wikidata.org: # itwikinews | |
wikidata.org: # T11138 | |
wikidata.org: # | |
wikidata.org: # enwikiquote: | |
wikidata.org: # T17095 | |
wikidata.org: # | |
wikidata.org: # enwikibooks | |
wikidata.org: # | |
wikidata.org: # working... | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: #----------------------------------------------------------# | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: # | |
wikidata.org: # Lines here will be added to the global robots.txt or at least to http://www.wikidata.org/robots.txt | |
wikidata.org: #</syntaxhighlight> | |
claroshop.com: #Google Search Engine Robot | |
claroshop.com: #Yahoo! Search Engine Robot | |
claroshop.com: #Yandex Search Engine Robot | |
claroshop.com: #Microsoft Search Engine Robot | |
claroshop.com: #Twitter Search Engine Robot | |
claroshop.com: # Every bot that might possibly read and respect this file. | |
claroshop.com: # the protocol of the sitemap. | |
money.it: # robots.txt | |
money.it: # @url: https://www.money.it | |
money.it: # @generator: SPIP 3.1.8 [23955] | |
money.it: # @template: money2017/robots.txt.html | |
motorimagazine.it: #wp stuff | |
motorimagazine.it: #files | |
sayidaty.net: # | |
sayidaty.net: # robots.txt | |
sayidaty.net: # | |
sayidaty.net: # This file is to prevent the crawling and indexing of certain parts | |
sayidaty.net: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sayidaty.net: # and Google. By telling these "robots" where not to go on your site, | |
sayidaty.net: # you save bandwidth and server resources. | |
sayidaty.net: # | |
sayidaty.net: # This file will be ignored unless it is at the root of your host: | |
sayidaty.net: # Used: http://example.com/robots.txt | |
sayidaty.net: # Ignored: http://example.com/site/robots.txt | |
sayidaty.net: # | |
sayidaty.net: # For more information about the robots.txt standard, see: | |
sayidaty.net: # http://www.robotstxt.org/robotstxt.html | |
sayidaty.net: # CSS, JS, Images | |
sayidaty.net: # Directories | |
sayidaty.net: # Files | |
sayidaty.net: # Paths (clean URLs) | |
sayidaty.net: # Paths (no clean URLs) | |
rayamarketing.com: # www.robotstxt.org/ | |
rayamarketing.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
rayamarketing.com: #Sitemap: https://rayamarketing.com/sitemap.xml | |
travelboutiqueonline.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
travelboutiqueonline.com: #content{margin:0 0 0 2%;position:relative;} | |
wdsz.org: # | |
wdsz.org: # robots.txt for PHPWind | |
wdsz.org: # Version 8.7 | |
wdsz.org: # | |
degiro.nl: # For sitemaps.xml autodiscovery. Uncomment if you have one: | |
containerstore.com: #best-selling-solutions .grid-parent, #design-tools .grid-parent, #components .grid-parent, #limited-time-savings .grid-parent, #trending-now .grid-parent, #tips-inspiration .grid-parent { | |
containerstore.com: #best-selling-solutions, #design-tools, #trending-now { | |
containerstore.com: #tips-inspiration { | |
containerstore.com: #limited-time-savings .bem-padding-bottom-40, #trending-now .bem-padding-bottom-40 { | |
containerstore.com: #best-selling-solutions .ht-closing_cta, #components .ht-closing_cta { | |
containerstore.com: #best-selling-solutions .ht-closing_cta_link, #components .ht-closing_cta_link{ | |
containerstore.com: #design-tools .ht-tile_overlay { | |
containerstore.com: #design-tools .ht-tile_label { | |
containerstore.com: #elfa_sale_header { | |
containerstore.com: #design-tools .ht-tile_label { | |
containerstore.com: #elfa_sale_header { | |
tiffany.com: # These hides directories from search engines | |
finder.com: # Prevent crawling searches. | |
finder.com: # https://finder.atlassian.net/browse/CWS-452 | |
finder.com: # Prevent crawling additional searches. | |
finder.com: # https://finder.atlassian.net/browse/CWS-497 | |
finder.com: # Allow Twitterbot to crawl anything | |
finder.com: # https://finder.atlassian.net/browse/OPS-915 | |
finder.com: # Block pages from appearing in Google News | |
myupchar.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
myupchar.com: # | |
myupchar.com: # To ban all spiders from the entire site uncomment the next two lines: | |
myupchar.com: #Baiduspider | |
myupchar.com: #Yandex | |
eonlineads.com: # Blocks all robots except google and disallow specific folders / directories | |
buyandship.co.jp: # Date: Wed, 24 Feb 2021 11:21:01 GMT | |
broadcom.com: # robots.txt for Broadcom.com 11/4/2020 | |
arabicpost.net: #WP Import Export Rule | |
fr.de: # robots.txt www.fr.de | |
actcorp.in: # CSS, JS, Images | |
actcorp.in: # Directories | |
actcorp.in: # Files | |
actcorp.in: # Paths (clean URLs) | |
actcorp.in: # Paths (no clean URLs) | |
computerhoy.com: # | |
computerhoy.com: # robots.txt | |
computerhoy.com: # | |
computerhoy.com: # This file is to prevent the crawling and indexing of certain parts | |
computerhoy.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
computerhoy.com: # and Google. By telling these "robots" where not to go on your site, | |
computerhoy.com: # you save bandwidth and server resources. | |
computerhoy.com: # | |
computerhoy.com: # This file will be ignored unless it is at the root of your host: | |
computerhoy.com: # Used: http://example.com/robots.txt | |
computerhoy.com: # Ignored: http://example.com/site/robots.txt | |
computerhoy.com: # | |
computerhoy.com: # For more information about the robots.txt standard, see: | |
computerhoy.com: # http://www.robotstxt.org/robotstxt.html | |
computerhoy.com: # Files | |
computerhoy.com: # Paths (clean URLs) | |
computerhoy.com: # Paths (no clean URLs) | |
computerhoy.com: # Paths (url errors) | |
computerhoy.com: # Sitemaps | |
woocommerce.com: # Sitemap archive | |
mtggoldfish.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
mtggoldfish.com: # | |
mtggoldfish.com: # To ban all spiders from the entire site uncomment the next two lines: | |
advanceautoparts.com: #tagline { | |
meraki.com: # www.robotstxt.org/ | |
meraki.com: # support.google.com/webmasters/answer/6062608 | |
pushauction.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
pushauction.com: #content{margin:0 0 0 2%;position:relative;} | |
glassdoor.sg: # Singapore | |
glassdoor.sg: # Greetings, human beings!, | |
glassdoor.sg: # | |
glassdoor.sg: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself. | |
glassdoor.sg: # | |
glassdoor.sg: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs? | |
glassdoor.sg: # | |
glassdoor.sg: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com | |
glassdoor.sg: # | |
glassdoor.sg: # | |
glassdoor.sg: #logging related | |
glassdoor.sg: # Blocking track urls (ACQ-2468) | |
glassdoor.sg: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831) | |
glassdoor.sg: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's) | |
glassdoor.sg: # TRFC-4037 Block page from being indexed | |
glassdoor.sg: # | |
glassdoor.sg: # Note that this file has the extension '.text' rather than the more-standard '.txt' | |
glassdoor.sg: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and | |
glassdoor.sg: # there doesn't seem to be a way to turn this off.) | |
glassdoor.sg: # | |
iamrohit.in: # Block NextGenSearchBot | |
iamrohit.in: # Block ia-archiver from crawling site | |
iamrohit.in: # Block archive.org_bot from crawling site | |
iamrohit.in: # Block Archive.org Bot from crawling site | |
iamrohit.in: # Block LinkWalker from crawling site | |
iamrohit.in: # Block GigaBlast Spider from crawling site | |
iamrohit.in: # Block ia_archiver-web.archive.org_bot from crawling site | |
iamrohit.in: # Block PicScout Crawler from crawling site | |
iamrohit.in: # Block BLEXBot Crawler from crawling site | |
iamrohit.in: # Block TinEye from crawling site | |
iamrohit.in: # Block SEOkicks | |
iamrohit.in: # Block BlexBot | |
iamrohit.in: # Block SISTRIX | |
iamrohit.in: # Block Uptime robot | |
iamrohit.in: # Block Ezooms Robot | |
iamrohit.in: # Block netEstate NE Crawler (+http://www.website-datenbank.de/) | |
iamrohit.in: # Block WiseGuys Robot | |
iamrohit.in: # Block Turnitin Robot | |
iamrohit.in: # Block Heritrix | |
iamrohit.in: # Block pricepi | |
iamrohit.in: # Block Eniro | |
iamrohit.in: # Block Psbot | |
iamrohit.in: # Block Youdao | |
iamrohit.in: # BLEXBot | |
iamrohit.in: # Block NaverBot | |
iamrohit.in: # Block ZBot | |
iamrohit.in: # Block Vagabondo | |
iamrohit.in: # Block LinkWalker | |
iamrohit.in: # Block SimplePie | |
iamrohit.in: # Block Wget | |
iamrohit.in: # Block Pixray-Seeker | |
iamrohit.in: # Block BoardReader | |
iamrohit.in: # Block Quantify | |
iamrohit.in: # Block Plukkie | |
iamrohit.in: # Block Cuam | |
iamrohit.in: # https://megaindex.com/crawler | |
shameless.com: #Disallow: /?* | |
shameless.com: #Disallow: /videos/*/?* | |
shameless.com: #Disallow: /models/?* | |
shameless.com: #Disallow: /models/*/?* | |
shameless.com: #Disallow: /models/*/*/?* | |
shameless.com: #Disallow: /categories/?* | |
shameless.com: #Disallow: /categories/*/?* | |
shameless.com: #Disallow: /categories/*/*/?* | |
shameless.com: #Disallow: /tags/?* | |
shameless.com: #Disallow: /tags/*/?* | |
shameless.com: #Disallow: /tags/*/*/?* | |
hardware.fr: # Fichier robots.txt du site HardWare.fr | |
usf.edu: ############################### | |
usf.edu: # robots.txt for USF.edu | |
usf.edu: ############################### | |
usf.edu: # list folders robots are not allowed to index | |
usf.edu: # | |
usf.edu: # list specific files robots are not allowed to index | |
usf.edu: # | |
usf.edu: # | |
usf.edu: # | |
usf.edu: # | |
usf.edu: # allow twitter to fetch images for news | |
usf.edu: # | |
usf.edu: # | |
usf.edu: # End of robots.txt file | |
usf.edu: # | |
usf.edu: ############################### | |
epochconverter.com: # Robots.txt for EpochConverter.com | |
olx.bg: # sitecode:olxbg-desktop | |
delltechnologies.com: # directory exclusion used for dellemc.com - AEM file | |
delltechnologies.com: # NOTE all paths must begin with "/*/" in order to apply to all AEM locales automatically | |
delltechnologies.com: # paths below this line are old and probably invalid for dellemc.com, leaving them for reference only | |
univ-lyon1.fr: #bandeau_flash{background:#ffffff;} | |
univ-lyon1.fr: #bandeau_flash_contenu{color:#000000;} | |
wsb.pl: # | |
wsb.pl: # robots.txt | |
wsb.pl: # | |
wsb.pl: # This file is to prevent the crawling and indexing of certain parts | |
wsb.pl: # of your site by web crawlers and spiders run by sites like Yahoo! | |
wsb.pl: # and Google. By telling these "robots" where not to go on your site, | |
wsb.pl: # you save bandwidth and server resources. | |
wsb.pl: # | |
wsb.pl: # This file will be ignored unless it is at the root of your host: | |
wsb.pl: # Used: http://example.com/robots.txt | |
wsb.pl: # Ignored: http://example.com/site/robots.txt | |
wsb.pl: # | |
wsb.pl: # For more information about the robots.txt standard, see: | |
wsb.pl: # http://www.robotstxt.org/robotstxt.html | |
wsb.pl: # CSS, JS, Images | |
wsb.pl: # Directories | |
wsb.pl: # Files | |
wsb.pl: # Paths (clean URLs) | |
wsb.pl: # Paths (no clean URLs) | |
edunet.bh: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
edunet.bh: #content{margin:0 0 0 2%;position:relative;} | |
yourator.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
yourator.co: # | |
yourator.co: # To ban all spiders from the entire site uncomment the next two lines: | |
yourator.co: # User-agent: * | |
yourator.co: # Disallow: / | |
gu-global.com: #footer { | |
gu-global.com: #footer ul li { | |
gu-global.com: #footer ul.menu li { | |
gu-global.com: #gu-footer { | |
gu-global.com: #gu-footer .uq-footer-innner { | |
gu-global.com: #gu-footer ul.menu { | |
gu-global.com: #dynamic-footer.sp { | |
gu-global.com: #dynamic-footer .footer_tax_bnr { | |
gu-global.com: #dynamic-footer .footer_tax_bnr div { | |
gu-global.com: #dynamic-footer .footer_tax_bnr div > span { | |
gu-global.com: #dynamic-footer .footer_tax_bnr div > span span { | |
gu-global.com: #dynamic-footer .footer-in { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li a, | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li span { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li svg { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li .world-gu { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li .world-gu:after { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li .active:after { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul li { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul li a { | |
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul li:last-child { | |
gu-global.com: #dynamic-footer .footer-in .sns-area { | |
gu-global.com: #dynamic-footer .footer-in .sns-area .sns-title { | |
gu-global.com: #dynamic-footer .footer-in .sns-area ul { | |
gu-global.com: #dynamic-footer .footer-in .sns-area ul li { | |
gu-global.com: #dynamic-footer .footer-in .copyright { | |
gu-global.com: #dynamic-footer .footer-in .footer_tax_bnr img { | |
gu-global.com: #dynamic-footer.pc { | |
gu-global.com: #dynamic-footer footer { | |
gu-global.com: #dynamic-footer footer .footer-in { | |
gu-global.com: #dynamic-footer footer .footer-in nav { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li::after { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li:last-child::after { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li a { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dt { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd ul li::after { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd ul li:last-child::after { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd ul li a { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-logo { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-logo a { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-logo a:first-child { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-copyright { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-copyright a { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-copyright svg { | |
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-sns ul li { | |
gu-global.com: #dynamic-footer .footer_tax_bnr { | |
gu-global.com: #dynamic-footer .footer_tax_bnr div { | |
gu-global.com: #pagetop { | |
appllio.com: # Directories | |
appllio.com: # Files | |
appllio.com: # Paths (clean URLs) | |
appllio.com: # Paths (no clean URLs) | |
sante.fr: # | |
sante.fr: # robots.txt | |
sante.fr: # | |
sante.fr: # This file is to prevent the crawling and indexing of certain parts | |
sante.fr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sante.fr: # and Google. By telling these "robots" where not to go on your site, | |
sante.fr: # you save bandwidth and server resources. | |
sante.fr: # | |
sante.fr: # This file will be ignored unless it is at the root of your host: | |
sante.fr: # Used: http://example.com/robots.txt | |
sante.fr: # Ignored: http://example.com/site/robots.txt | |
sante.fr: # | |
sante.fr: # For more information about the robots.txt standard, see: | |
sante.fr: # http://www.robotstxt.org/robotstxt.html | |
sante.fr: # CSS, JS, Images | |
sante.fr: # Directories | |
sante.fr: # Files | |
sante.fr: # Paths (clean URLs) | |
sante.fr: # Paths (no clean URLs) | |
aut.ac.ir: # robots.txt for https://aut.ac.ir | |
anu.edu.au: # | |
anu.edu.au: # robots.txt | |
anu.edu.au: # | |
anu.edu.au: # This file is to prevent the crawling and indexing of certain parts | |
anu.edu.au: # of your site by web crawlers and spiders run by sites like Yahoo! | |
anu.edu.au: # and Google. By telling these "robots" where not to go on your site, | |
anu.edu.au: # you save bandwidth and server resources. | |
anu.edu.au: # | |
anu.edu.au: # This file will be ignored unless it is at the root of your host: | |
anu.edu.au: # Used: http://example.com/robots.txt | |
anu.edu.au: # Ignored: http://example.com/site/robots.txt | |
anu.edu.au: # | |
anu.edu.au: # For more information about the robots.txt standard, see: | |
anu.edu.au: # http://www.robotstxt.org/robotstxt.html | |
anu.edu.au: # | |
anu.edu.au: # For syntax checking, see: | |
anu.edu.au: # http://www.frobee.com/robots-txt-check | |
anu.edu.au: # Directories | |
anu.edu.au: # Files | |
anu.edu.au: # Paths (clean URLs) | |
anu.edu.au: # Paths (no clean URLs) | |
anu.edu.au: # Legacy gateway paths | |
anu.edu.au: # Change of Preference | |
anu.edu.au: # Spam prevention | |
feeder.co: # Wieeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee üí© | |
editorx.com: # by editorx.com | |
softonic.ru: # ES | |
softonic.ru: # BR | |
softonic.ru: # DE | |
softonic.ru: # NL | |
softonic.ru: # EN,JP | |
softonic.ru: # FR | |
softonic.ru: # IT | |
softonic.ru: # PL | |
softonic.ru: #SHARED | |
softonic.ru: # CATEGORIES | |
softonic.ru: # EN | |
softonic.ru: # ES | |
softonic.ru: # DE | |
softonic.ru: # FR | |
softonic.ru: # BR | |
softonic.ru: # IT | |
softonic.ru: # PL | |
softonic.ru: # NL | |
softonic.ru: # JP | |
cookinglight.com: # Sitemaps | |
cookinglight.com: # CMS FE | |
cookinglight.com: #Content | |
cookinglight.com: # CMS FE | |
cookinglight.com: #Content | |
thevc.kr: # Notice: The use of robots or other automated means to access The VC without | |
thevc.kr: # the express permission of The VC is strictly prohibited. | |
thevc.kr: # The VC may, in its discretion, permit certain automated access to certain The VC pages, | |
thevc.kr: # for the limited purpose of including content in approved publicly available search engines. | |
costco.com.tw: # For all robots | |
costco.com.tw: # Block access to specific groups of pages | |
costco.com.tw: # Allow search crawlers to discover the sitemap | |
costco.com.tw: # Block CazoodleBot as it does not present correct accept content headers | |
costco.com.tw: # Block MJ12bot as it is just noise | |
costco.com.tw: # Block dotbot as it cannot parse base urls properly | |
costco.com.tw: # Block Gigabot | |
keilhub.com: # we use Shopify as our ecommerce platform | |
keilhub.com: # Google adsbot ignores robots.txt unless specifically named! | |
zbozi.cz: ## 8888888888P 888 d8b | |
zbozi.cz: ## d88P 888 Y8P | |
zbozi.cz: ## d88P 888 | |
zbozi.cz: ## d88P 88888b. .d88b. 88888888 888 .d8888b 88888888 | |
zbozi.cz: ## d88P 888 "88b d88""88b d88P 888 d88P" d88P | |
zbozi.cz: ## d88P 888 888 888 888 d88P 888 888 d88P | |
zbozi.cz: ## d88P 888 d88P Y88..88P d88P 888 d8b Y88b. d88P | |
zbozi.cz: ## d8888888888 88888P" "Y88P" 88888888 888 Y8P "Y8888P 88888888 | |
zbozi.cz: ## ############################################################### | |
zbozi.cz: ## Disallow clicks to shops | |
zbozi.cz: ## Disallow result pages - search results | |
zbozi.cz: ## multichoice in left menu | |
zbozi.cz: ## other | |
zbozi.cz: ## Disallow pages with 2 and more parameters - seznambot | |
zbozi.cz: ## kvuli pravidlu Disallow: /*?*&*&*&* | |
zbozi.cz: ## Disallow pages with specific parameters | |
zbozi.cz: ## Disallow pages that may directly set/postpone rating (intended for e-mail clickthrus) | |
zbozi.cz: # Disallow all searchScreen | |
zbozi.cz: ## nesmyslne rozsahy | |
zbozi.cz: ## location + parameters | |
zbozi.cz: ## bordel v URL | |
zbozi.cz: ## Disallow clicks to shops. | |
zbozi.cz: ## other | |
zbozi.cz: ## old pages in Google index | |
zbozi.cz: ## Disallow pages with 4 and more parameters | |
zbozi.cz: ## Disallow pages with specific parameters | |
zbozi.cz: ## product fotogaleries | |
zbozi.cz: ## multichoice in left menu | |
zbozi.cz: ## search in categories | |
zbozi.cz: ## do not index sorting in categories (except first page) | |
zbozi.cz: ## kvuli pravidlu Disallow: /*?*&*&*&* | |
zbozi.cz: ## Disallow pages that may directly set/postpone rating (intended for e-mail clickthrus) | |
zbozi.cz: ## nesmyslne rozsahy | |
zbozi.cz: ## location + parmeters | |
zbozi.cz: ## bordel v URL | |
zbozi.cz: # /?0=%5Bobject%20Object%5D&vyrobce=tommy-hilfiger | |
zbozi.cz: # 2019-08-03 | |
zbozi.cz: # /?LOCK%20HLE7203S=true&barva=cerna&vyrobce=desigual | |
zbozi.cz: # 2020-08-06 - fun with sitemaps | |
zbozi.cz: # 2020-12-02 - Prolinkovani bug | |
zbozi.cz: # /?amp;barva=modra&barva=bezova&vyrobce=guess | |
zbozi.cz: # /?amp%3Bstrana=10&barva=zluta&vyrobce=grosso | |
zbozi.cz: # invalid parameters in /vyrobek/ | |
zbozi.cz: ## Disallow clicks to shops. | |
zbozi.cz: ## other | |
zbozi.cz: ## old pages in Google index | |
zbozi.cz: ## Disallow pages with 4 and more parameters | |
zbozi.cz: ## Disallow pages with specific parameters | |
zbozi.cz: ## product fotogaleries | |
zbozi.cz: ## multichoice in left menu | |
zbozi.cz: ## search in categories | |
zbozi.cz: ## do not index sorting in categories (except first page) | |
zbozi.cz: ## kvuli pravidlu Disallow: /*?*&*&*&* | |
zbozi.cz: ## Disallow pages that may directly set/postpone rating (intended for e-mail clickthrus) | |
zbozi.cz: ## nesmyslne rozsahy | |
zbozi.cz: ## location + parmeters | |
zbozi.cz: ## bordel v URL | |
zbozi.cz: # /?0=%5Bobject%20Object%5D&vyrobce=tommy-hilfiger | |
zbozi.cz: # 2019-08-03 | |
zbozi.cz: # /?LOCK%20HLE7203S=true&barva=cerna&vyrobce=desigual | |
zbozi.cz: # 2020-08-06 - fun with sitemaps | |
zbozi.cz: # 2020-12-02 - Prolinkovani bug | |
zbozi.cz: # /?amp;barva=modra&barva=bezova&vyrobce=guess | |
zbozi.cz: # /?amp%3Bstrana=10&barva=zluta&vyrobce=grosso | |
zbozi.cz: # invalid parameters in /vyrobek/ | |
xjzsks.com: # | |
xjzsks.com: # robots.txt for baidu | |
xjzsks.com: # | |
blackbaud.com: # Allow MOZ to crawl everything | |
blackbaud.com: # Update the path to the file(s) and remove this comment when the site goes live | |
blackbaud.com: # Allow Siteimprove to access the site while in development | |
blackbaud.com: # Allow Siteimprove to access the site while in development | |
blackbaud.com: # Allow Siteimprove to access the site while in development | |
bankofindia.co.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
bankofindia.co.in: #content{margin:0 0 0 2%;position:relative;} | |
slate.fr: # CSS, JS, Images | |
slate.fr: # Directories | |
slate.fr: # Files | |
slate.fr: # Paths (clean URLs) | |
slate.fr: # Paths (no clean URLs) | |
pakbcn.live: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
pakbcn.live: #content{margin:0 0 0 2%;position:relative;} | |
sears.com.mx: #Google Search Engine Robot | |
sears.com.mx: #Yahoo! Search Engine Robot | |
sears.com.mx: #Yandex Search Engine Robot | |
sears.com.mx: #Microsoft Search Engine Robot | |
sears.com.mx: #Twitter Search Engine Robot | |
sears.com.mx: # Every bot that might possibly read and respect this file. | |
sears.com.mx: # Wait 1 second between successive requests. | |
chattanoogastate.edu: # | |
chattanoogastate.edu: # robots.txt | |
chattanoogastate.edu: # | |
chattanoogastate.edu: # This file is to prevent the crawling and indexing of certain parts | |
chattanoogastate.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
chattanoogastate.edu: # and Google. By telling these "robots" where not to go on your site, | |
chattanoogastate.edu: # you save bandwidth and server resources. | |
chattanoogastate.edu: # | |
chattanoogastate.edu: # This file will be ignored unless it is at the root of your host: | |
chattanoogastate.edu: # Used: http://example.com/robots.txt | |
chattanoogastate.edu: # Ignored: http://example.com/site/robots.txt | |
chattanoogastate.edu: # | |
chattanoogastate.edu: # For more information about the robots.txt standard, see: | |
chattanoogastate.edu: # http://www.robotstxt.org/robotstxt.html | |
chattanoogastate.edu: # CSS, JS, Images | |
chattanoogastate.edu: # Directories | |
chattanoogastate.edu: # Files | |
chattanoogastate.edu: # Paths (clean URLs) | |
chattanoogastate.edu: # Paths (no clean URLs) | |
userbenchmark.com: # UserBenchmark Robot.txt | |
userbenchmark.com: ####################### | |
aast.edu: # robots.txt | |
aast.edu: # | |
aast.edu: # This file is to prevent the crawling and indexing of certain parts | |
aast.edu: # of your site by web crawlers and spiders run by sites like Yahoo! | |
aast.edu: # and Google. By telling these "robots" where not to go on your site. | |
aast.edu: # Allow: / | |
aast.edu: # | |
videobin.co: #User-agent: * | |
videobin.co: #Disallow: / | |
joinindianarmy.nic.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
joinindianarmy.nic.in: #content{margin:0 0 0 2%;position:relative;} | |
express.com: #Disallow: /twitter-share-submit.jsp | |
nrttv.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
nrttv.com: #content{margin:0 0 0 2%;position:relative;} | |
jimmyjazz.com: # we use Shopify as our ecommerce platform | |
jimmyjazz.com: # Google adsbot ignores robots.txt unless specifically named! | |
reforge.com: # Squarespace Robots Txt | |
dailyguides.com: ## Default robots.txt | |
rakuten.ca: # /robots.txt file for https://www.rakuten.ca/ | |
9to5google.com: # Sitemap archive | |
missoma.com: # we use Shopify as our ecommerce platform | |
missoma.com: # Google adsbot ignores robots.txt unless specifically named! | |
autofarm.network: # https://www.robotstxt.org/robotstxt.html | |
geekwire.com: ### Version Information # | |
geekwire.com: ################################################### | |
geekwire.com: ### Version: V4.2020.04.2051 | |
geekwire.com: ### Updated: Wed Apr 22 22:38:14 SAST 2020 | |
geekwire.com: ### Bad Bot Count: 571 | |
geekwire.com: ################################################### | |
geekwire.com: ### Version Information ## | |
moat.com: # http://i.imgur.com/aj9eiII.jpg | |
bigcommerce.com: # robots.txt for https://www.bigcommerce.com/ | |
bigcommerce.com: # Directories | |
bigcommerce.com: # Files | |
bigcommerce.com: # Paths (clean URLs) | |
bigcommerce.com: # Paths (Don't index any unclean paths) | |
xn--b1aew.xn--p1ai: # robots.txt for xn--b1aew.xn--p1ai | |
casetify.com: # robotstxt.org/ | |
casetify.com: # Disallow: /*invite/ | |
casetify.com: # Disallow: /*showcase/ | |
casetify.com: # Disallow: /*controllers* | |
casetify.com: # Disallow: /*layout_template* | |
casetify.com: # Disallow: /*redirect* | |
casetify.com: # Allow Google Ad Spiders | |
wevrlabs.net: #Begin Attracta SEO Tools Sitemap. Do not remove | |
wevrlabs.net: #End Attracta SEO Tools Sitemap. Do not remove | |
mediaalpha.com: # Default Flywheel robots file | |
reddress.com: # we use Shopify as our ecommerce platform | |
reddress.com: # Google adsbot ignores robots.txt unless specifically named! | |
su.edu.sa: # | |
su.edu.sa: # robots.txt | |
su.edu.sa: # | |
su.edu.sa: # This file is to prevent the crawling and indexing of certain parts | |
su.edu.sa: # of your site by web crawlers and spiders run by sites like Yahoo! | |
su.edu.sa: # and Google. By telling these "robots" where not to go on your site, | |
su.edu.sa: # you save bandwidth and server resources. | |
su.edu.sa: # | |
su.edu.sa: # This file will be ignored unless it is at the root of your host: | |
su.edu.sa: # Used: http://example.com/robots.txt | |
su.edu.sa: # Ignored: http://example.com/site/robots.txt | |
su.edu.sa: # | |
su.edu.sa: # For more information about the robots.txt standard, see: | |
su.edu.sa: # http://www.robotstxt.org/robotstxt.html | |
su.edu.sa: # CSS, JS, Images | |
su.edu.sa: # Directories | |
su.edu.sa: # Files | |
su.edu.sa: # Paths (clean URLs) | |
su.edu.sa: # Paths (no clean URLs) | |
allbirds.com: # we use Shopify as our ecommerce platform | |
allbirds.com: # Google adsbot ignores robots.txt unless specifically named! | |
kmart.com: # 20190428 | |
kmart.com: # www.kmart.com | |
kmart.com: #Lumen #18359173 | |
kmart.com: # Category | |
kmart.com: # Product | |
kmart.com: #Sitemap: https://www.kmart.com/Sitemap_Index_Product_MP_1.xml | |
kmart.com: # Misc | |
kmart.com: #Images | |
kmart.com: #Sitemap: https://www.kmart.com/Sitemap_Index_Image_1.xml | |
kmart.com: #Sitemap: https://www.kmart.com/Sitemap_Index_Image_MP_1.xml | |
touchofmodern.com: # www.robotstxt.org/ | |
touchofmodern.com: # http://code.google.com/web/controlcrawlindex/ | |
netgalley.com: # www.robotstxt.org/ | |
netgalley.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
anibis.ch: #prevent crawling of all locations (catchall) | |
anibis.ch: #include localized listing overview pages | |
anibis.ch: #allow crawling of specific locations (exceptions) | |
thedrum.com: # www.robotstxt.org/ | |
thedrum.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | |
thedrum.com: # Legacy disallow statements | |
thedrum.com: # Directories | |
thedrum.com: # Files | |
thedrum.com: # Paths (clean URLs) | |
thedrum.com: # Paths (no clean URLs) | |
consorsbank.de: # Consorsbank robots.txt | |
consorsbank.de: # Sitemap: https://www.cortalconsors.de/content/dam/cortalconsors_de_cc/system/sitemap/sitemap.xml | |
superkopilka.com: #Sitemap: https://www.superkopilka.com/sitemap.xml.gz | |
oilprice.com: # robots.txt | |
oilprice.com: #User-agent: Mediapartners-Google | |
oilprice.com: #Disallow: | |
oilprice.com: # | |
oilprice.com: # Original rules for old site | |
oilprice.com: #mobile usability problems | |
oilprice.com: #Urgent SEO issues - please confirm receipt 15/02/2017 | |
qiniu.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
qiniu.com: # | |
qiniu.com: # To ban all spiders from the entire site uncomment the next two lines: | |
qiniu.com: # User-agent: * | |
qiniu.com: # Disallow: / | |
olacity.com: # vestacp autogenerated robots.txt | |
thriveagency.com: # Default Flywheel robots file | |
credit-suisse.com: # /robots.txt file for www.credit-suisse.com | |
credit-suisse.com: # Sitemap file | |
cntraveler.com: #disallow /user/ as there are incoming links going to pages within the /user/ directory that can't be accessed. | |
vistazo.com: # | |
vistazo.com: # robots.txt | |
vistazo.com: # | |
vistazo.com: # This file is to prevent the crawling and indexing of certain parts | |
vistazo.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
vistazo.com: # and Google. By telling these "robots" where not to go on your site, | |
vistazo.com: # you save bandwidth and server resources. | |
vistazo.com: # | |
vistazo.com: # This file will be ignored unless it is at the root of your host: | |
vistazo.com: # Used: http://example.com/robots.txt | |
vistazo.com: # Ignored: http://example.com/site/robots.txt | |
vistazo.com: # | |
vistazo.com: # For more information about the robots.txt standard, see: | |
vistazo.com: # http://www.robotstxt.org/robotstxt.html | |
vistazo.com: # Directories | |
vistazo.com: # Files | |
vistazo.com: # Paths (clean URLs) | |
vistazo.com: # Paths (no clean URLs) | |
twenty20.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
twenty20.com: # | |
twenty20.com: # To ban all spiders from the entire site uncomment the next two lines: | |
twenty20.com: # User-Agent: * | |
twenty20.com: # Disallow: / | |
twenty20.com: # 2012-12-05: Prevent crawlers from triggering checkout flow errors. | |
yeepay.com: #Sitemap files | |
bikroy.com: # Sitemap | |
bikroy.com: # Excludes | |
bikroy.com: # Blog | |
bikroy.com: # Promotions | |
bikroy.com: # msn | |
fashionnetwork.com: # To be purged into Google's cache | |
fashionnetwork.com: # Ajax URIs | |
fashionnetwork.com: # Fragments | |
fashionnetwork.com: # Yandex : bloquer toutes les URL de news | |
firstdata.com: # For domain: https://www.firstdata.com | |
firstdata.com: # Last updated: 06/11/2020 | |
uvic.ca: # Production only | |
uvic.ca: # Do not put child site rules in here. | |
uvic.ca: # Since this file is publicly readable, do not put in URLs that contain sensitive information and are not locked down with proper access controls. | |
lipscosme.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
lipscosme.com: # | |
lipscosme.com: # To ban all spiders from the entire site uncomment the next two lines: | |
nissan.co.jp: #container {width: 1200px; margin: 0 auto;} | |
nissan.co.jp: #head {margin-bottom: 80px;} | |
nissan.co.jp: #rogo {margin-left: 6.5px;} | |
nissan.co.jp: #foot {width: 1200px; height: 30px; border-top: 1px solid #999999; margin:120px auto 0; text-align: right; font-size: 10px; padding: 5px;} | |
comcast.net: # Comcast | |
comcast.net: # robots.txt for http://www.comcast.net | |
comcast.net: # Modified on 1/25/2017 | |
comcast.net: # Hosted on the Edge | |
thenewsminute.com: # | |
thenewsminute.com: # robots.txt | |
thenewsminute.com: # | |
thenewsminute.com: # This file is to prevent the crawling and indexing of certain parts | |
thenewsminute.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
thenewsminute.com: # and Google. By telling these "robots" where not to go on your site, | |
thenewsminute.com: # you save bandwidth and server resources. | |
thenewsminute.com: # | |
thenewsminute.com: # This file will be ignored unless it is at the root of your host: | |
thenewsminute.com: # Used: http://example.com/robots.txt | |
thenewsminute.com: # Ignored: http://example.com/site/robots.txt | |
thenewsminute.com: # | |
thenewsminute.com: # For more information about the robots.txt standard, see: | |
thenewsminute.com: # http://www.robotstxt.org/robotstxt.html | |
thenewsminute.com: # CSS, JS, Images | |
thenewsminute.com: # Directories | |
thenewsminute.com: # Files | |
thenewsminute.com: # Paths (clean URLs) | |
thenewsminute.com: # Paths (no clean URLs) | |
hotwire.com: #Disallow register/logout pages | |
hotwire.com: #Disallow legacy account & email pages | |
hotwire.com: #Disallow checkout pages | |
hotwire.com: #Disallow dynamic legacy deals | |
hotwire.com: #Disallow legacy results | |
hotwire.com: #HCORE-2775 Disallow things to do | |
hotwire.com: #Flight LOB specific rules | |
hotwire.com: #Car LOB specific rules | |
hotwire.com: #Hotel LOB specific rules | |
hotwire.com: #Script Specific Rules | |
hotwire.com: #Disallow ?chkin Hotel Infosite page links | |
hotwire.com: #Disallow deals page segment specific paths | |
hotwire.com: #HCORE-2880 Disallow request-cancellation path | |
salaire-brut-en-net.fr: #Disallow: /wp-includes/ | |
salaire-brut-en-net.fr: #Disallow: /wp-content/languages/ | |
salaire-brut-en-net.fr: #Disallow: /wp-content/plugins/ | |
salaire-brut-en-net.fr: #Disallow: /wp-content/themes/ | |
salaire-brut-en-net.fr: #Disallow: /wp-content/upgrade/ | |
salaire-brut-en-net.fr: #Disallow: /wp- | |
salaire-brut-en-net.fr: #Disallow: /wp-content/ | |
salaire-brut-en-net.fr: #Allow: /wp-content/uploads/ | |
carsguide.com.au: # | |
carsguide.com.au: # robots.txt | |
carsguide.com.au: # | |
carsguide.com.au: # This file is to prevent the crawling and indexing of certain parts | |
carsguide.com.au: # of your site by web crawlers and spiders run by sites like Yahoo! | |
carsguide.com.au: # and Google. By telling these "robots" where not to go on your site, | |
carsguide.com.au: # you save bandwidth and server resources. | |
carsguide.com.au: # | |
carsguide.com.au: # This file will be ignored unless it is at the root of your host: | |
carsguide.com.au: # Used: http://example.com/robots.txt | |
carsguide.com.au: # Ignored: http://example.com/site/robots.txt | |
carsguide.com.au: # | |
carsguide.com.au: # For more information about the robots.txt standard, see: | |
carsguide.com.au: # http://www.robotstxt.org/robotstxt.html | |
carsguide.com.au: # CSS, JS, Images | |
carsguide.com.au: # Directories | |
carsguide.com.au: # Files | |
carsguide.com.au: # Paths (clean URLs) | |
carsguide.com.au: # Paths (no clean URLs) | |
vans.com: # robots.txt for https://www.vans.com | |
vans.com: # __ __ _ _ _ ___ ___ ___ __ __ | |
vans.com: # \ \ / / /_\ | \| | / __| / __| / _ \ | \/ | | |
vans.com: # \ V / / _ \ | .` | \__ \ _ | (__ | (_) | | |\/| | | |
vans.com: # \_/ /_/ \_\ |_|\_| |___/ (_) \___| \___/ |_| |_| | |
vans.com: # | |
vans.com: # | |
vans.com: # || ___ ___ ___ | |
vans.com: # / _ \ | __| | __| | |
vans.com: # | (_) | | _| | _| | |
vans.com: # \___/ |_| |_| | |
vans.com: # | |
vans.com: # _____ _ _ ___ | |
vans.com: # |_ _| | || | | __| | |
vans.com: # | | | __ | | _| | |
vans.com: # |_| |_||_| |___| | |
vans.com: # | |
vans.com: # ___ ___ _ __ __ _ || | |
vans.com: # / __| | _ \ /_\ \ \ / / | | | |
vans.com: # | (__ | / / _ \ \ \/\/ / | |__ | |
vans.com: # \___| |_|_\ /_/ \_\ \_/\_/ |____| | |
vans.com: # | |
vans.com: # | |
vans.com: # O-\-<]: | |
beckershospitalreview.com: # Slow down bing | |
cpnl.cat: # robots.txt for https://www.cpnl.cat | |
segurcaixaadeslas.es: #Especialidades | |
radio.garden: # https://www.robotstxt.org/robotstxt.html | |
worldofwarships.asia: # General | |
worldofwarships.asia: # News | |
worldofwarships.asia: # Media | |
papaki.com: # | |
papaki.com: # robots.txt | |
papaki.com: # | |
papaki.com: # This file is to prevent the crawling and indexing of certain parts | |
papaki.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
papaki.com: # and Google. By telling these "robots" where not to go on your site, | |
papaki.com: # you save bandwidth and server resources. | |
papaki.com: # | |
papaki.com: # This file will be ignored unless it is at the root of your host: | |
papaki.com: # Used: http://example.com/robots.txt | |
papaki.com: # Ignored: http://example.com/site/robots.txt | |
papaki.com: # | |
papaki.com: # For more information about the robots.txt standard, see: | |
papaki.com: # http://www.robotstxt.org/robotstxt.html | |
papaki.com: # CSS, JS, Images | |
papaki.com: # Directories | |
papaki.com: # Files | |
papaki.com: # Paths (clean URLs) | |
papaki.com: # Paths (no clean URLs) | |
papaki.com: # Dynamic Directories | |
papaki.com: # Disallow 20210216 | |
papaki.com: # Dynamic Files | |
contentkingapp.com: # Don't watch our robots.txt, watch yours instead! | |
contentkingapp.com: # Monitor and keep track of changes to your robots.txt with ContentKing. | |
contentkingapp.com: # Start your free trial at https://www.contentkingapp.com/#onboarding-url | |
ctee.com.tw: ## Custom made disallows 21/07/2016 - 08:52 | |
ctee.com.tw: # from https://benza.es/robots.txt | |
ctee.com.tw: # Open Link Profiler | |
ctee.com.tw: # Mozilla/5.0+(compatible;+spbot/4.4.2;++http://OpenLinkProfiler.org/bot+) | |
ctee.com.tw: # http://OpenLinkProfiler.org/bot | |
ctee.com.tw: # SemrushBot | |
ctee.com.tw: # Mozilla/5.0 (compatible; SemrushBot/1.1~bl; +http://www.semrush.com/bot.html) | |
ctee.com.tw: # http://www.semrush.com/bot.html | |
ctee.com.tw: # DotBot | |
ctee.com.tw: # Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com) | |
ctee.com.tw: # http://www.opensiteexplorer.org/dotbot | |
ctee.com.tw: # AhrefsBot | |
ctee.com.tw: # Mozilla/5.0 (compatible; AhrefsBot/5.1; +http://ahrefs.com/robot/) | |
ctee.com.tw: # http://ahrefs.com/robot | |
ctee.com.tw: # MJ12bot | |
ctee.com.tw: # Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+) | |
ctee.com.tw: # http://www.majestic12.co.uk/bot.php | |
ctee.com.tw: # MojeekBot | |
ctee.com.tw: # Mozilla/5.0 (compatible; MojeekBot/0.6; +https://www.mojeek.com/bot.html) | |
ctee.com.tw: # https://www.mojeek.com/bot.html | |
ctee.com.tw: # YandexImages | |
ctee.com.tw: # Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots) | |
ctee.com.tw: # http://yandex.com/bots | |
ctee.com.tw: # Shareaholicbot | |
ctee.com.tw: # Mozilla/5.0 (compatible; Shareaholicbot/1.0; +http://www.shareaholic.com/bot) | |
ctee.com.tw: # http://www.shareaholic.com | |
ctee.com.tw: # Baiduspider | |
ctee.com.tw: # Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) | |
ctee.com.tw: # http://www.baidu.com/search/spider.html | |
ctee.com.tw: #User-agent: baiduspider | |
ctee.com.tw: #Disallow: / | |
ctee.com.tw: # BLEXBot | |
ctee.com.tw: # Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/) | |
ctee.com.tw: # http://webmeup-crawler.com 136.243.36.100 | |
ctee.com.tw: ## END Custom made | |
grab.com: #Sitemaps | |
grab.com: #Duplicated URLs | |
mindbodygreen.com: # allow | |
elgato.com: # | |
elgato.com: # robots.txt | |
elgato.com: # | |
elgato.com: # This file is to prevent the crawling and indexing of certain parts | |
elgato.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
elgato.com: # and Google. By telling these "robots" where not to go on your site, | |
elgato.com: # you save bandwidth and server resources. | |
elgato.com: # | |
elgato.com: # This file will be ignored unless it is at the root of your host: | |
elgato.com: # Used: http://example.com/robots.txt | |
elgato.com: # Ignored: http://example.com/site/robots.txt | |
elgato.com: # | |
elgato.com: # For more information about the robots.txt standard, see: | |
elgato.com: # http://www.robotstxt.org/robotstxt.html | |
elgato.com: # CSS, JS, Images | |
elgato.com: # Directories | |
elgato.com: # Files | |
elgato.com: # Paths (clean URLs) | |
elgato.com: # Paths (no clean URLs) | |
mdcomputers.in: # === Lightning code start | |
mdcomputers.in: # === Lightning code end | |
ivanontech.com: # This robots.txt allows indexing of all site paths. | |
ivanontech.com: # See http://www.robotstxt.org/robotstxt.html for more information. | |
ilan.gov.tr: # Crawlers | |
ilan.gov.tr: # Sitemap Files | |
ilan.gov.tr: # Sitemap: https://www.ilan.gov.tr/sitemap/daily.xml | |
ilan.gov.tr: # Sitemap: https://www.ilan.gov.tr/sitemap/ads.xml | |
socialbookmarkingmentor.com: # 1) this filename (robots.txt) must stay lowercase | |
socialbookmarkingmentor.com: # 2) this file must be in the servers root directory | |
socialbookmarkingmentor.com: # ex: http://www.mydomain.com/pliklisubfolder/ -- you must move the robots.txt from | |
socialbookmarkingmentor.com: # /pliklisubfolder/ to the root folder for http://www.mydomain.com/ | |
socialbookmarkingmentor.com: # you must then add your subfolder to each 'Disallow' below | |
socialbookmarkingmentor.com: # ex: Disallow: /cache/ becomes Disallow: /pliklisubfolder/cache/ | |
nav.com: # Custom disallow rules | |
fcc.gov: # | |
fcc.gov: # robots.txt | |
fcc.gov: # | |
fcc.gov: # This file is to prevent the crawling and indexing of certain parts | |
fcc.gov: # of your site by web crawlers and spiders run by sites like Yahoo! | |
fcc.gov: # and Google. By telling these "robots" where not to go on your site, | |
fcc.gov: # you save bandwidth and server resources. | |
fcc.gov: # | |
fcc.gov: # This file will be ignored unless it is at the root of your host: | |
fcc.gov: # Used: http://example.com/robots.txt | |
fcc.gov: # Ignored: http://example.com/site/robots.txt | |
fcc.gov: # | |
fcc.gov: # For more information about the robots.txt standard, see: | |
fcc.gov: # http://www.robotstxt.org/robotstxt.html | |
fcc.gov: # CSS, JS, Images | |
fcc.gov: # Directories | |
fcc.gov: # Files | |
fcc.gov: # Paths (clean URLs) | |
fcc.gov: # Paths (no clean URLs) | |
moe.gov.my: # If the Joomla site is installed within a folder | |
moe.gov.my: # eg www.example.com/joomla/ then the robots.txt file | |
moe.gov.my: # MUST be moved to the site root | |
moe.gov.my: # eg www.example.com/robots.txt | |
moe.gov.my: # AND the joomla folder name MUST be prefixed to all of the | |
moe.gov.my: # paths. | |
moe.gov.my: # eg the Disallow rule for the /administrator/ folder MUST | |
moe.gov.my: # be changed to read | |
moe.gov.my: # Disallow: /joomla/administrator/ | |
moe.gov.my: # | |
moe.gov.my: # For more information about the robots.txt standard, see: | |
moe.gov.my: # http://www.robotstxt.org/orig.html | |
moe.gov.my: # | |
moe.gov.my: # For syntax checking, see: | |
moe.gov.my: # http://tool.motoricerca.info/robots-checker.phtml | |
tipranks.com: # robotstxt.org | |
careerride.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
careerride.com: #content{margin:0 0 0 2%;position:relative;} | |
bancaynegocios.com: ####### Disallow: / | |
nationwide.com: # For domain: https://www.nationwide.com/ | |
nationwide.com: # Last updated: 8/26/2020 | |
nxp.com.cn: # | |
nxp.com.cn: # robots.txt for http://www.w3.org/ | |
nxp.com.cn: # | |
nxp.com.cn: # $Id: robots.txt,v 1.22 2002/04/18 20:23:04 ted Exp $ | |
nxp.com.cn: # | |
downyi.com: # | |
downyi.com: # robots.txt for www.downyi.com | |
downyi.com: # | |
hxnews.com: # | |
hxnews.com: # robots.txt for hxnews | |
hxnews.com: # | |
linuxprobe.com: # Forum regulated by the Ministry of Public Security. | |
linuxprobe.com: # Do not attempt to approach the invasion . thanks . | |
wifi.id: # https://www.robotstxt.org/robotstxt.html | |
punjabkesari.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
punjabkesari.in: #content{margin:0 0 0 2%;position:relative;} | |
mybroadband.co.za: # Disallow: /*.html?print$ | |
pagomiscuentas.com: # | |
pagomiscuentas.com: # robots.txt | |
pagomiscuentas.com: # | |
pagomiscuentas.com: # This file is to prevent the crawling and indexing of certain parts | |
pagomiscuentas.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
pagomiscuentas.com: # and Google. By telling these "robots" where not to go on your site, | |
pagomiscuentas.com: # you save bandwidth and server resources. | |
pagomiscuentas.com: # | |
pagomiscuentas.com: # This file will be ignored unless it is at the root of your host: | |
pagomiscuentas.com: # Used: http://example.com/robots.txt | |
pagomiscuentas.com: # Ignored: http://example.com/site/robots.txt | |
pagomiscuentas.com: # | |
pagomiscuentas.com: # For more information about the robots.txt standard, see: | |
pagomiscuentas.com: # http://www.robotstxt.org/robotstxt.html | |
pagomiscuentas.com: # CSS, JS, Images | |
pagomiscuentas.com: # Directories | |
pagomiscuentas.com: # Files | |
pagomiscuentas.com: # Paths (clean URLs) | |
pagomiscuentas.com: # Paths (no clean URLs) | |
kuantokusta.pt: # File robots.txt | |
kuantokusta.pt: # FULL access (Google Adsense) | |
kuantokusta.pt: # SITEMAP FILES | |
namepros.com: # vim: ft=robots | |
namepros.com: # Mediapartners-Google and AdsBot-Google ignore wildcard groups. | |
namepros.com: ## | |
namepros.com: # Inaccessible by crawlers because of authentication requirements | |
namepros.com: ## | |
namepros.com: # Require login | |
namepros.com: # Require POST | |
namepros.com: # Require login often enough to cause problems | |
namepros.com: # Transient pages | |
namepros.com: ## | |
namepros.com: # Only intended for real, authentic, genuine homo sapiens | |
namepros.com: ## | |
namepros.com: ## | |
namepros.com: # Internal stuff. | |
namepros.com: ## | |
namepros.com: # If you're a hacker, go straight to /internal/ because it's obviously the most vulnerable. | |
namepros.com: # That's where we hide our ion cannon. (We have cake!) | |
namepros.com: ## | |
namepros.com: # Bots that ignore * | |
namepros.com: ## | |
namepros.com: ## | |
namepros.com: # Less-than-intelligent bots that can't properly parse constructs such as rel="nofollow", X-Robots-Tag, or base tags. | |
namepros.com: # They also tend not to deduplicate URLs (so if a link to /misc/style appears on every page, they'll crawl it once for each page). | |
namepros.com: # If your crawler appears here, it's probably snowballing, resulting in a ridiculous number of requests. | |
namepros.com: ## | |
namepros.com: # THE CAKE IS A LIE | |
namepros.com: # THE CAKE IS A LIE | |
namepros.com: # THE CAKE IS^D | |
appointy.com: # Block Uptime robot | |
linebiz.com: # | |
linebiz.com: # robots.txt | |
linebiz.com: # | |
linebiz.com: # This file is to prevent the crawling and indexing of certain parts | |
linebiz.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
linebiz.com: # and Google. By telling these "robots" where not to go on your site, | |
linebiz.com: # you save bandwidth and server resources. | |
linebiz.com: # | |
linebiz.com: # This file will be ignored unless it is at the root of your host: | |
linebiz.com: # Used: http://example.com/robots.txt | |
linebiz.com: # Ignored: http://example.com/site/robots.txt | |
linebiz.com: # | |
linebiz.com: # For more information about the robots.txt standard, see: | |
linebiz.com: # http://www.robotstxt.org/robotstxt.html | |
linebiz.com: # CSS, JS, Images | |
linebiz.com: # Directories | |
linebiz.com: # Files | |
linebiz.com: # Paths (clean URLs) | |
linebiz.com: # Paths (no clean URLs) | |
vitalydesign.com: # we use Shopify as our ecommerce platform | |
vitalydesign.com: # Google adsbot ignores robots.txt unless specifically named! | |
fontspring.com: # Whitelisted user-agents are allowed | |
fontspring.com: #disallows language and tag pages with pagination | |
tori.fi: # It is expressly forbidden to use spiders or other | |
tori.fi: # automated methods to access tori.fi. Only if tori.fi | |
tori.fi: # has given special permit such access is allowed. | |
tori.fi: ## Archive.org | |
tori.fi: ## theTradeDesk | |
tori.fi: ## Common list for most search engines | |
kaidee.com: #Last update 10/02/2021 | |
kaidee.com: #Disallow all of parameters | |
kaidee.com: # Disable 'en' pages while still POC | |
dujiza.com: # | |
dujiza.com: # robots.txt for EmpireCMS | |
dujiza.com: # | |
edudisk.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
edudisk.cn: #content{margin:0 0 0 2%;position:relative;} | |
feizui.com: # | |
feizui.com: # robots.txt for EmpireCMS | |
feizui.com: # | |
game773.com: # Robots.txt file from http://www.game773.com | |
game773.com: # All robots will spider the domain | |
gujarat1.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. | |
gujarat1.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details. | |
gujarat1.wordpress.com: # This file was generated on Mon, 13 Apr 2020 06:47:57 +0000 | |
headphoneclub.com: # | |
headphoneclub.com: # robots.txt for Discuz! X3 | |
headphoneclub.com: # | |
idtag.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
idtag.cn: #content{margin:0 0 0 2%;position:relative;} | |
iemate.com: # | |
iemate.com: # robots.txt for EmpireCMS | |
iemate.com: # | |
itstrike.cn: # | |
itstrike.cn: # robots.txt for EmpireCMS | |
itstrike.cn: # | |
jiaoyizhe.com: # | |
jiaoyizhe.com: # robots.txt for Discuz! X3 | |
jiaoyizhe.com: # | |
jobbaidu.com: # | |
jobbaidu.com: # robots.txt for jobbaidu.com | |
jobbaidu.com: # | |
josmith1845.wordpress.com: # This file was generated on Thu, 11 Feb 2021 13:28:17 +0000 | |
kumi.cn: # robots.txt generated at http://tool.chinaz.com/robots/ | |
laawoo.com: #mask_div {background:none repeat scroll 0 0 #000000; left:0; opacity:0.1;filter:Alpha(Opacity=10); -moz-opacity:0.1; position:absolute; top:0;} | |
laawoo.com: #research_protocols {border:0;position:absolute;background:transparent none repeat scroll 0 0;} | |
lwlm.com: # | |
lwlm.com: # robots.txt for iwms | |
lwlm.com: # | |
maxviewrealty.com: # | |
maxviewrealty.com: # robots.txt for EmpireCMS | |
maxviewrealty.com: # Version 6.0 | |
maxviewrealty.com: # | |
mozest.com: # | |
mozest.com: # robots.txt for EmpireCMS | |
mozest.com: # | |
nanjixiong.com: # | |
nanjixiong.com: # robots.txt for Discuz! X3 | |
nanjixiong.com: # | |
north-plus.net: # | |
north-plus.net: # robots.txt for PHPWIND BOARD | |
north-plus.net: # Version 7.x | |
north-plus.net: # | |
openedu.com.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
openedu.com.cn: #content{margin:0 0 0 2%;position:relative;} | |
qumei.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
qumei.com: #content{margin:0 0 0 2%;position:relative;} | |
seed-china.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
seed-china.com: #content{margin:0 0 0 2%;position:relative;} | |
swiper.com.cn: #wrap{ | |
swiper.com.cn: #wrap a{ | |
swiper.com.cn: #wrap a:hover{background-color:#3f92f0;} | |
szcw.cn: # robots.txt generated at http://tool.chinaz.com/robots/ | |
tongyi.com: #container { | |
toutiao.io: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file | |
toutiao.io: # | |
toutiao.io: # To ban all spiders from the entire site uncomment the next two lines: | |
toutiao.io: # User-agent: * | |
toutiao.io: # Disallow: / | |
uzzf.com: #topNav,#footer,#page,#container{width:960px;display:block;margin:0 auto;clear:both;} | |
uzzf.com: #new_menu li { height:32px; font-size:16px; color:#438a32; font-family:'∫⁄ÃÂ'; line-height:32px; text-align:center; margin-bottom:3px; background: url(/skin/gr/images/class_menu.gif) no-repeat 0 -32px; letter-spacing:8px;cursor:pointer;} | |
uzzf.com: #new_menu li a { text-decoration:none; color:#438a32; padding-left:8px;} | |
uzzf.com: #new_menu .active, #new_menu .hover {background-position:0 0;} | |
uzzf.com: #new_menu .active a, #new_menu .hover a {color:#fff;} | |
wzaobao.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
wzaobao.com: #content{margin:0 0 0 2%;position:relative;} | |
xiangrikui.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file | |
xiangrikui.com: # | |
xiangrikui.com: # To ban all spiders from the entire site uncomment the next two lines: | |
xlobo.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF; | |
xlobo.com: #content{margin:0 0 0 2%;position:relative;} | |
xq0757.com: # | |
xq0757.com: # robots.txt for PHPWind | |
xq0757.com: # Version 8.7 | |
xq0757.com: # | |
yousheng8.com: #page{width:910px; padding:20px 20px 40px 20px; margin-top:80px;} | |
sitel.com.mk: # | |
sitel.com.mk: # robots.txt | |
sitel.com.mk: # | |
sitel.com.mk: # This file is to prevent the crawling and indexing of certain parts | |
sitel.com.mk: # of your site by web crawlers and spiders run by sites like Yahoo! | |
sitel.com.mk: # and Google. By telling these "robots" where not to go on your site, | |
sitel.com.mk: # you save bandwidth and server resources. | |
sitel.com.mk: # | |
sitel.com.mk: # This file will be ignored unless it is at the root of your host: | |
sitel.com.mk: # Used: http://example.com/robots.txt | |
sitel.com.mk: # Ignored: http://example.com/site/robots.txt | |
sitel.com.mk: # | |
sitel.com.mk: # For more information about the robots.txt standard, see: | |
sitel.com.mk: # http://www.robotstxt.org/robotstxt.html | |
sitel.com.mk: # CSS, JS, Images | |
sitel.com.mk: # Directories | |
sitel.com.mk: # Files | |
sitel.com.mk: # Paths (clean URLs) | |
sitel.com.mk: # Paths (no clean URLs) | |
strato.de: # robots.txt file | |
strato.de: # fuer www.strato.de | |
splunk.com: # Splunk Documentation | |
splunk.com: # Splunk Documentation | |
iamtxt.com: # | |
iamtxt.com: # robots.txt for EmpireCMS | |
iamtxt.com: # | |
mediaocean.com: # | |
mediaocean.com: # robots.txt | |
mediaocean.com: # | |
mediaocean.com: # This file is to prevent the crawling and indexing of certain parts | |
mediaocean.com: # of your site by web crawlers and spiders run by sites like Yahoo! | |
mediaocean.com: # and Google. By telling these "robots" where not to go on your site, | |
mediaocean.com: # you save bandwidth and server resources. | |
mediaocean.com: # | |
mediaocean.com: # This file will be ignored unless it is at the root of your host: | |
mediaocean.com: # Used: http://example.com/robots.txt | |
mediaocean.com: # Ignored: http://example.com/site/robots.txt | |
mediaocean.com: # | |
mediaocean.com: # For more information about the robots.txt standard, see: | |
mediaocean.com: # http://www.robotstxt.org/robotstxt.html | |
mediaocean.com: # CSS, JS, Images | |
mediaocean.com: # Directories | |
mediaocean.com: # Files | |
mediaocean.com: # Paths (clean URLs) | |
mediaocean.com: # Paths (no clean URLs) | |
emvolio.gov.gr: # | |
emvolio.gov.gr: # robots.txt | |
emvolio.gov.gr: # | |
emvolio.gov.gr: # This file is to prevent the crawling and indexing of certain parts | |
emvolio.gov.gr: # of your site by web crawlers and spiders run by sites like Yahoo! | |
emvolio.gov.gr: # and Google. By telling these "robots" where not to go on your site, | |
emvolio.gov.gr: # you save bandwidth and server resources. | |
emvolio.gov.gr: # | |
emvolio.gov.gr: # This file will be ignored unless it is at the root of your host: | |
emvolio.gov.gr: # Used: http://example.com/robots.txt | |
emvolio.gov.gr: # Ignored: http://example.com/site/robots.txt | |
emvolio.gov.gr: # | |
emvolio.gov.gr: # For more information about the robots.txt standard, see: | |
emvolio.gov.gr: # http://www.robotstxt.org/robotstxt.html | |
emvolio.gov.gr: # CSS, JS, Images | |
emvolio.gov.gr: # Directories | |
emvolio.gov.gr: # Files | |
emvolio.gov.gr: # Paths (clean URLs) | |
emvolio.gov.gr: # Paths (no clean URLs) | |
dof.gob.mx: # | |
sas.com: # | |
sas.com: # robots.txt file for www.sas.com | |
sas.com: # | |
babycare.nl: # | |
babycare.nl: # ____ _ _ | |
babycare.nl: # | _ \ | | | | | |
babycare.nl: # | |_) | __ _| |__ _ _ ___ __ _ _ __ ___ _ __ | | | |
babycare.nl: # | _ < / _` | '_ \| | | |/ __/ _` | '__/ _ \ | '_ \| | | |
babycare.nl: # | |_) | (_| | |_) | |_| | (_| (_| | | | __/_| | | | | | |
babycare.nl: # |____/ \__,_|_.__/ \__, |\___\__,_|_| \___(_)_| |_|_| | |
babycare.nl: # __/ | | |
babycare.nl: # |___/ | |
forloveandlemons.com: # we use Shopify as our ecommerce platform | |
forloveandlemons.com: # Google adsbot ignores robots.txt unless specifically named! | |
belk.com: #Non-Canonical Parameters | |
belk.com: #URLs | |
belk.com: #Pagination | |
belk.com: #OSS | |
belk.com: #Shop by Brand | |
belk.com: #Clearance | |
pu.go.id: #SigapMembangunNegeri</a></h4> | |
revenuquebec.ca: # robots.txt pour Revenu Québec | |
revenuquebec.ca: # Only allow URLs generated with RealURL | |
revenuquebec.ca: # L=0 is the default language | |
revenuquebec.ca: # Should always be protected (.htaccess) | |
revenuquebec.ca: # sitemap | |
dainikamadershomoy.com: # www.robotstxt.org/ | |
dainikamadershomoy.co |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Created using https://johnmu.com/robots-txt-comments/