Skip to content

Instantly share code, notes, and snippets.

@softplus
Last active April 29, 2021 18:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save softplus/f741774868954b6a9893dfd76f193023 to your computer and use it in GitHub Desktop.
Save softplus/f741774868954b6a9893dfd76f193023 to your computer and use it in GitHub Desktop.
Top ca 265k robots.txt comment lines
google.com: # AdsBot
google.com: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
youtube.com: # robots.txt file for YouTube
youtube.com: # Created in the distant future (the year 2000) after
youtube.com: # the robotic uprising of the mid 90's which wiped out all humans.
facebook.com: # Notice: Collection of data on Facebook through automated means is
facebook.com: # prohibited unless you have express written permission from Facebook
facebook.com: # and may only be conducted for the limited purpose contained in said
facebook.com: # permission.
facebook.com: # See: http://www.facebook.com/apps/site_scraping_tos_terms.php
wikipedia.org: #
wikipedia.org: # Please note: There are a lot of pages on this site, and there are
wikipedia.org: # some misbehaved spiders out there that go _way_ too fast. If you're
wikipedia.org: # irresponsible, your access to the site may be blocked.
wikipedia.org: #
wikipedia.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN
wikipedia.org: # and ignoring 429 ratelimit responses, claims to respect robots:
wikipedia.org: # http://mj12bot.com/
wikipedia.org: # advertising-related bots:
wikipedia.org: # Wikipedia work bots:
wikipedia.org: # Crawlers that are kind enough to obey, but which we'd rather not have
wikipedia.org: # unless they're feeding search engines.
wikipedia.org: # Some bots are known to be trouble, particularly those designed to copy
wikipedia.org: # entire sites. Please obey robots.txt.
wikipedia.org: # Misbehaving: requests much too fast:
wikipedia.org: #
wikipedia.org: # Sorry, wget in its recursive mode is a frequent problem.
wikipedia.org: # Please read the man page and use it properly; there is a
wikipedia.org: # --wait option you can use to set the delay between hits,
wikipedia.org: # for instance.
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # The 'grub' distributed client has been *very* poorly behaved.
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # Doesn't follow robots.txt anyway, but...
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # Hits many times per second, not acceptable
wikipedia.org: # http://www.nameprotect.com/botinfo.html
wikipedia.org: # A capture bot, downloads gazillions of pages with no public benefit
wikipedia.org: # http://www.webreaper.net/
wikipedia.org: #
wikipedia.org: # Friendly, low-speed bots are welcome viewing article pages, but not
wikipedia.org: # dynamically-generated pages please.
wikipedia.org: #
wikipedia.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your
wikipedia.org: # bot supports such a thing using the 'Crawl-delay' or another
wikipedia.org: # instruction, please let us know.
wikipedia.org: #
wikipedia.org: # There is a special exception for API mobileview to allow dynamic
wikipedia.org: # mobile web & app views to load section content.
wikipedia.org: # These views aren't HTTP-cached but use parser cache aggressively
wikipedia.org: # and don't expose special: pages etc.
wikipedia.org: #
wikipedia.org: # Another exception is for REST API documentation, located at
wikipedia.org: # /api/rest_v1/?doc.
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # ar:
wikipedia.org: #
wikipedia.org: # dewiki:
wikipedia.org: # T6937
wikipedia.org: # sensible deletion and meta user discussion pages:
wikipedia.org: # 4937#5
wikipedia.org: # T14111
wikipedia.org: # T15961
wikipedia.org: #
wikipedia.org: # enwiki:
wikipedia.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for
wikipedia.org: # their name. See T6776
wikipedia.org: # T15398
wikipedia.org: # T16075
wikipedia.org: # T13261
wikipedia.org: # T12288
wikipedia.org: # T16793
wikipedia.org: #
wikipedia.org: # eswiki:
wikipedia.org: # T8746
wikipedia.org: #
wikipedia.org: # fiwiki:
wikipedia.org: # T10695
wikipedia.org: #
wikipedia.org: # hewiki:
wikipedia.org: #T11517
wikipedia.org: #
wikipedia.org: # huwiki:
wikipedia.org: #
wikipedia.org: # itwiki:
wikipedia.org: # T7545
wikipedia.org: #
wikipedia.org: # jawiki
wikipedia.org: # T7239
wikipedia.org: # nowiki
wikipedia.org: # T13432
wikipedia.org: #
wikipedia.org: # plwiki
wikipedia.org: # T10067
wikipedia.org: #
wikipedia.org: # ptwiki:
wikipedia.org: # T7394
wikipedia.org: #
wikipedia.org: # rowiki:
wikipedia.org: # T14546
wikipedia.org: #
wikipedia.org: # ruwiki:
wikipedia.org: #
wikipedia.org: # svwiki:
wikipedia.org: # T12229
wikipedia.org: # T13291
wikipedia.org: #
wikipedia.org: # zhwiki:
wikipedia.org: # T7104
wikipedia.org: #
wikipedia.org: # sister projects
wikipedia.org: #
wikipedia.org: # enwikinews:
wikipedia.org: # T7340
wikipedia.org: #
wikipedia.org: # itwikinews
wikipedia.org: # T11138
wikipedia.org: #
wikipedia.org: # enwikiquote:
wikipedia.org: # T17095
wikipedia.org: #
wikipedia.org: # enwikibooks
wikipedia.org: #
wikipedia.org: # working...
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #----------------------------------------------------------#
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # Localisable part of robots.txt for en.wikipedia.org
wikipedia.org: #
wikipedia.org: # Edit at https://en.wikipedia.org/w/index.php?title=MediaWiki:Robots.txt&action=edit
wikipedia.org: # Don't add newlines here. All rules set here are active for every user-agent.
wikipedia.org: #
wikipedia.org: # Please check any changes using a syntax validator such as http://tool.motoricerca.info/robots-checker.phtml
wikipedia.org: # Enter https://en.wikipedia.org/robots.txt as the URL to check.
wikipedia.org: #
wikipedia.org: # https://phabricator.wikimedia.org/T16075
wikipedia.org: #
wikipedia.org: # Folks get annoyed when XfD discussions end up the number 1 google hit for
wikipedia.org: # their name.
wikipedia.org: # https://phabricator.wikimedia.org/T16075
wikipedia.org: #
wikipedia.org: # https://phabricator.wikimedia.org/T12288
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # https://phabricator.wikimedia.org/T13261
wikipedia.org: #
wikipedia.org: # https://phabricator.wikimedia.org/T14111
wikipedia.org: #
wikipedia.org: # https://phabricator.wikimedia.org/T15398
wikipedia.org: #
wikipedia.org: # https://phabricator.wikimedia.org/T16793
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # User sandboxes for modules and Template Styles are placed in these subpages for testing
wikipedia.org: #
wikipedia.org: #
wikipedia.org: # </pre>
reddit.com: # 80legs
reddit.com: # 80legs' new crawler
microsoft.com: # Robots.txt file for www.microsoft.com
github.com: # If you would like to crawl GitHub contact us via https://support.github.com/contact/
github.com: # We also provide an extensive API: https://developer.github.com/
google.com.hk: # AdsBot
google.com.hk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
adobe.com: # The use of robots or other automated means to access the Adobe site
adobe.com: # without the express permission of Adobe is strictly prohibited.
adobe.com: # Notwithstanding the foregoing, Adobe may permit automated access to
adobe.com: # access certain Adobe pages but solely for the limited purpose of
adobe.com: # including content in publicly available search engines. Any other
adobe.com: # use of robots or failure to obey the robots exclusion standards set
adobe.com: # forth at http://www.robotstxt.org/ is strictly prohibited.
adobe.com: # Details about Googlebot available at: http://www.google.com/bot.html
adobe.com: # The Google search engine can see everything
adobe.com: # The Omniture search engine can see everything
adobe.com: # XML sitemaps updates per SH10272020
adobe.com: # XML sitemaps updates per BW10202020
adobe.com: # Hreflang sitemap
adobe.com: # Hreflang sitemap updates per SH10122020
adobe.com: # PSFl individual sitemaps HS07082020
ebay.com: ## BEGIN FILE ###
ebay.com: #
ebay.com: # allow-all
ebay.com: # DR
ebay.com: #
ebay.com: # The use of robots or other automated means to access the eBay site
ebay.com: # without the express permission of eBay is strictly prohibited.
ebay.com: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.com: # access certain eBay pages but soley for the limited purpose of
ebay.com: # including content in publicly available search engines. Any other
ebay.com: # use of robots or failure to obey the robots exclusion standards set
ebay.com: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.com: # prohibited.
ebay.com: #
ebay.com: # v10_COM_Feb_2021
ebay.com: ### DIRECTIVES ###
ebay.com: # PRP Sitemaps
ebay.com: # VIS Sitemaps
ebay.com: # CLP Sitemaps
ebay.com: # NGS Sitemaps
ebay.com: # BROWSE Sitemaps
ebay.com: ### END FILE ###
apple.com: # robots.txt for http://www.apple.com/
twitter.com: #Google Search Engine Robot
twitter.com: #Yahoo! Search Engine Robot
twitter.com: #Yandex Search Engine Robot
twitter.com: #Microsoft Search Engine Robot
twitter.com: #Bing Search Engine Robot
twitter.com: # Every bot that might possibly read and respect this file.
twitter.com: # WHAT-4882 - Block indexing of links in notification emails. This applies to all bots.
twitter.com: # Wait 1 second between successive requests. See ONBOARD-2698 for details.
twitter.com: # Independent of user agent. Links in the sitemap are full URLs using https:// and need to match
twitter.com: # the protocol of the sitemap.
linkedin.com: # Notice: The use of robots or other automated means to access LinkedIn without
linkedin.com: # the express permission of LinkedIn is strictly prohibited.
linkedin.com: # See https://www.linkedin.com/legal/user-agreement.
linkedin.com: # LinkedIn may, in its discretion, permit certain automated access to certain LinkedIn pages,
linkedin.com: # for the limited purpose of including content in approved publicly available search engines.
linkedin.com: # If you would like to apply for permission to crawl LinkedIn, please email whitelist-crawl@linkedin.com.
linkedin.com: # Any and all permitted crawling of LinkedIn is subject to LinkedIn's Crawling Terms and Conditions.
linkedin.com: # See http://www.linkedin.com/legal/crawling-terms.
linkedin.com: # Profinder only for deepcrawl
linkedin.com: # Notice: If you would like to crawl LinkedIn,
linkedin.com: # please email whitelist-crawl@linkedin.com to apply
linkedin.com: # for white listing.
17ok.com: #
17ok.com: # robots.txt for Discuz! Board
17ok.com: # Version 5.5.0
17ok.com: #
yandex.ru: # yandex.ru
wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
wordpress.com: # This file was generated on Wed, 24 Feb 2021 18:49:58 +0000
google.co.in: # AdsBot
google.co.in: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
google.com.br: # AdsBot
google.com.br: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
instructure.com: #
instructure.com: # robots.txt
instructure.com: #
instructure.com: # This file is to prevent the crawling and indexing of certain parts
instructure.com: # of your site by web crawlers and spiders run by sites like Yahoo!
instructure.com: # and Google. By telling these "robots" where not to go on your site,
instructure.com: # you save bandwidth and server resources.
instructure.com: #
instructure.com: # This file will be ignored unless it is at the root of your host:
instructure.com: # Used: http://example.com/robots.txt
instructure.com: # Ignored: http://example.com/site/robots.txt
instructure.com: #
instructure.com: # For more information about the robots.txt standard, see:
instructure.com: # http://www.robotstxt.org/robotstxt.html
instructure.com: # CSS, JS, Images
instructure.com: # Directories
instructure.com: # Files
instructure.com: # Paths (clean URLs)
instructure.com: # Paths (no clean URLs)
etsy.com: #
etsy.com: # 01001001 01010011 00100000 01000011 01001111 01000100 01000101 00100000 01011001 01001111 01010101 01010010 00100000 01000011 01010010 01000001 01000110 01010100 00111111# \
etsy.com: #
etsy.com: # -----
etsy.com: # | . . |
etsy.com: # -----
etsy.com: # \--|-|--/
etsy.com: # | |
etsy.com: # |-------|
freepik.com: # Google AdSense
freepik.com: # Adsbot-Google
freepik.com: # Twitter Bot
google.co.jp: # AdsBot
google.co.jp: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
imdb.com: # robots.txt for https://www.imdb.com properties
okta.com: #
okta.com: # robots.txt
okta.com: #
okta.com: # This file is to prevent the crawling and indexing of certain parts
okta.com: # of your site by web crawlers and spiders run by sites like Yahoo!
okta.com: # and Google. By telling these "robots" where not to go on your site,
okta.com: # you save bandwidth and server resources.
okta.com: #
okta.com: # This file will be ignored unless it is at the root of your host:
okta.com: # Used: http://example.com/robots.txt
okta.com: # Ignored: http://example.com/site/robots.txt
okta.com: #
okta.com: # For more information about the robots.txt standard, see:
okta.com: # http://www.robotstxt.org/robotstxt.html
okta.com: # CSS, JS, Images
okta.com: # Directories
okta.com: # Files
okta.com: # Paths (clean URLs)
okta.com: # Paths (no clean URLs)
google.de: # AdsBot
google.de: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
intuit.com: #YisouSpider China
zillow.com: # Access to and use of Zillow.com is governed by our Terms of Use. See http://www.zillow.com/corp/Terms.htm
imgur.com: # robots
flipkart.com: # cart
flipkart.com: # Something related to carousel and recommendation carousel
flipkart.com: # Permanent Link For Individual Review
flipkart.com: # Old Browse Page Experience
flipkart.com: # Affiliate Widget
flipkart.com: # Social Connect Redirects
flipkart.com: # Product Seller Pages
flipkart.com: #Search Pages
flipkart.com: # Temporary Hack
flipkart.com: #Alliances Pages
flipkart.com: # Faceted pages
flipkart.com: # URL parameters blocking for SEO
flipkart.com: # Faceted pages
paypal.com: ### BEGIN FILE ###
paypal.com: # PayPal robots.txt file
tumblr.com: # Google Search Engine Robot
tumblr.com: # Yahoo! Search Engine Robot
tumblr.com: # Yandex Search Engine Robot
tumblr.com: # Microsoft Search Engine Robot
tumblr.com: # Every bot that might possibly read and respect this file.
amazon.co.uk: # Sitemap files
stackexchange.com: # for "/*?", refer to http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
stackexchange.com: #
stackexchange.com: # beware, the sections below WILL NOT INHERIT from the above!
stackexchange.com: # http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
stackexchange.com: #
stackexchange.com: #
stackexchange.com: # Yahoo Pipes is for feeds not web pages.
stackexchange.com: #
stackexchange.com: #
stackexchange.com: # This isn't really an image
stackexchange.com: #
stackexchange.com: #
stackexchange.com: # KSCrawler - we don't need help from you
stackexchange.com: #
stackexchange.com: #
stackexchange.com: # ByteSpider is a badly behaving crawler, no thank you!
stackexchange.com: #
bbc.com: # version: a3d1a2190febe12313232bbfe80dda6e873c161b
bbc.com: # HTTPS www.bbc.com
walmart.com: #Sitemaps-https
walmart.com: #Disallow select URLs
walmart.com: #Crawler specific settings
walmart.com: # slow down Yahoo
google.fr: # AdsBot
google.fr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
google.ru: # AdsBot
google.ru: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
espn.com: # robots.txt for www.espn.com
pixnet.net: # pixnet.net
indiatimes.com: #robots.txt
ettoday.net: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
ettoday.net: # Crawl-delay: 5
google.it: # AdsBot
google.it: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
google.es: # AdsBot
google.es: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
1688.com: ## -----------------------------------------------------------------------------
1688.com: ## author shenyong modify by dongfang.zdf 2020.03.05
1688.com: ## fileEncoding = UTF-8
1688.com: ##
1688.com: ## -----------------------------------------------------------------------------
shutterstock.com: # Editor Images
shutterstock.com: # Sitemaps
salesforce.com: # Robots.txt file for http://www.salesforce.com
salesforce.com: # All robots will spider the domain
salesforce.com: #
salesforce.com: #
salesforce.com: # Keep mis-configured Microsoft SharePoint servers from hammering us
salesforce.com: # This is not MSN Search (msnbot), but privately owned SharePoint installations
salesforce.com: #
salesforce.com: #
salesforce.com: # Disallow: /at/
salesforce.com: # Disallow: /crm-success-summer/
salesforce.com: # Disallow: /crm/
salesforce.com: # Disallow: /ie/
salesforce.com: # Disallow: /marketplace/
salesforce.com: # Disallow: /myfuture/
salesforce.com: # Disallow: /newevents/
salesforce.com: # Disallow: /orderentry/
salesforce.com: # Disallow: /person/
salesforce.com: # Disallow: /services/
salesforce.com: # Disallow: /servlet/
salesforce.com: # Disallow: /site/
salesforce.com: # Disallow: /soap/
salesforce.com: # Disallow: /trainingsupport/
salesforce.com: # Disallow: /web-common/
salesforce.com: # Disallow: /usertutorial/
salesforce.com: # Company pages duped across locales
salesforce.com: # AMER testing
salesforce.com: #
salesforce.com: # Disallow: /uk/foundation/
salesforce.com: # Disallow: /eu/foundation/
salesforce.com: # Disallow: /au/foundation/
salesforce.com: #
salesforce.com: # Disallow: /uk/services-training/customer-support/
salesforce.com: # Disallow: /uk/services-training/professional-services/
salesforce.com: # Disallow: /uk/services-training/index.jsp
salesforce.com: # Disallow: /eu/services-training/customer-support/
salesforce.com: # Disallow: /eu/services-training/professional-services/
salesforce.com: # Disallow: /eu/services-training/index.jsp
salesforce.com: # Disallow: /au/services-training/customer-support/
salesforce.com: # Disallow: /au/services-training/professional-services/
salesforce.com: # Disallow: /au/services-training/index.jsp
salesforce.com: #
salesforce.com: # Disallow: /uk/platform/
salesforce.com: # Disallow: /eu/platform/
salesforce.com: # Disallow: /au/platform/
salesforce.com: #
salesforce.com: #
salesforce.com: #
salesforce.com: #Disallow: /uk/democenter/
salesforce.com: #Disallow: /eu/democenter/
salesforce.com: #Disallow: /ie/democenter/
salesforce.com: #Disallow: /de/democenter/
salesforce.com: #Disallow: /fr/democenter/
salesforce.com: #Disallow: /it/democenter/
salesforce.com: #Disallow: /es/democenter/
salesforce.com: #
salesforce.com: #
salesforce.com: #Disallow: /de/campaigns/refer-a-friend.jsp
salesforce.com: #Disallow: /eu/campaigns/refer-a-friend.jsp
salesforce.com: #Disallow: /fr/campaigns/refer-a-friend.jsp
salesforce.com: #Disallow: /it/campaigns/refer-a-friend.jsp
salesforce.com: #Disallow: /uk/campaigns/refer-a-friend.jsp
salesforce.com: #
salesforce.com: # Disallow: /uk/events/details/a1x300000004DrwAAE.jsp
salesforce.com: # Disallow: /uk/events/details/cf12-london/conf/*
salesforce.com: # Disallow: /uk/events/details/cf12-london/facebook-form-content.jsp
salesforce.com: # Disallow: /uk/events/details/cf12-london/facebook-form.jsp
salesforce.com: # Disallow: /uk/events/details/cf12-london/grid-form-content.jsp
salesforce.com: #
salesforce.com: #
salesforce.com: # Disallow: /fr/company/force_com_sites_terms.jsp
salesforce.com: #
salesforce.com: #
salesforce.com: #
salesforce.com: # Blocked /in/ on request from ALoon
salesforce.com: # RH (09/11/09) Unlbocked /in/ on request from ALoon
salesforce.com: # Disallow: /in/
salesforce.com: #
salesforce.com: # The line below was requested by MVozzo to block Search Engines from indexing the Quick Site test site as we are running a parallel site test in Q1-FY12.
salesforce.com: #
salesforce.com: #
salesforce.com: # Added by jrietveld for EMEA cleanup
salesforce.com: # Disallow: /de/iss/
salesforce.com: # Disallow: /de/events/details/conf/
salesforce.com: # Disallow: /de/_app/
salesforce.com: # Disallow: /de/platform/tco/
salesforce.com: # Disallow: /de/form/
salesforce.com: # Disallow: /fr/form/
salesforce.com: # Disallow: /se/form/
salesforce.com: # Disallow: /es/form/
salesforce.com: # Disallow: /it/form/
salesforce.com: # Disallow: /nl/form/
salesforce.com: # Disallow: /uk/form/
salesforce.com: # Disallow: /eu/form/
salesforce.com: # EMEA SEM folders added by Joe Reid
salesforce.com: #Block customer story filter URLS globally until filter fix is implemented by dev. Approved by Alex, Joe, Richard. AMER + EMEA
salesforce.com: # STARTS
salesforce.com: # Temporary rules to mitigate problems with faceted search in CSC.
salesforce.com: # Block crawl of ._filter.alphaSort which is duplicate of /customer-success-stories/
salesforce.com: # Note the $ delimiter so that this doesnt impact other URLs based on this stem:
salesforce.com: # Block all access to URLs using popularSort:
salesforce.com: # Disallow: /es/customer-success-stories._filter.popularSort
salesforce.com: # Disallow: /de/customer-success-stories._filter.popularSort
salesforce.com: # Disallow: /fr/customer-success-stories._filter.popularSort
salesforce.com: # Disallow: /it/customer-success-stories._filter.popularSort
salesforce.com: # Disallow: /nl/customer-success-stories._filter.popularSort
salesforce.com: # Disallow: /se/customer-success-stories._filter.popularSort
salesforce.com: # Disallow: /uk/customer-success-stories._filter.popularSort
salesforce.com: # Disallow: /eu/customer-success-stories._filter.popularSort
salesforce.com: # Uncomment next line to apply to all locales
salesforce.com: # Block all access to URLs using newestSort:
salesforce.com: # Disallow: /es/customer-success-stories._filter.newestSort
salesforce.com: # Disallow: /de/customer-success-stories._filter.newestSort
salesforce.com: # Disallow: /fr/customer-success-stories._filter.newestSort
salesforce.com: # Disallow: /it/customer-success-stories._filter.newestSort
salesforce.com: # Disallow: /nl/customer-success-stories._filter.newestSort
salesforce.com: # Disallow: /se/customer-success-stories._filter.newestSort
salesforce.com: # Disallow: /uk/customer-success-stories._filter.newestSort
salesforce.com: # Disallow: /eu/customer-success-stories._filter.newestSort
salesforce.com: # Uncomment next line to apply to all locales
salesforce.com: # Block crawl where 2 or more categories are used with services filter. The final . surrounded by * should match any multi-category filter URL:
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.S*.*
salesforce.com: # Uncomment next line to apply to all locales
salesforce.com: #added new Deny rules to block bots to crawl missed filter URLs.
salesforce.com: # Block crawl where 2 or more categories are used with products filter. The final . surrounded by * should match any multi-category filter URL:
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.P*.*
salesforce.com: # Uncomment next line to apply to all locales
salesforce.com: # Block crawl where 2 or more categories are used with industries filter. The final . surrounded by * should match any multi-category filter URL:
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.I*.*
salesforce.com: # Uncomment next line to apply to all locales
salesforce.com: # Block crawl where 2 or more categories are used with business size filter. The final . surrounded by * should match any multi-category filter URL:
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.BS*.*
salesforce.com: # Uncomment next line to apply to all locales
salesforce.com: # Block crawl where 2 or more categories are used with business type filter. The final . surrounded by * should match any multi-category filter URL:
salesforce.com: # Disallow: /es/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Disallow: /de/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Disallow: /fr/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Disallow: /it/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Disallow: /nl/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Disallow: /se/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Disallow: /uk/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Disallow: /eu/customer-success-stories._filter.alphaSort.BT*.*
salesforce.com: # Uncomment next line to apply to all locales
salesforce.com: # ENDS
salesforce.com: # Rules will block when 2 or more facets are activated, but allow single facets to be crawled:
salesforce.com: #
salesforce.com: # First 2 rules blocks the duplicate index, $ delimiter avoids picking up valid pagination URLs:
salesforce.com: # Next rules will fire when more than one facet is activated, or when a subpage of facet is requested, but allow individual facets to be crawled:
salesforce.com: #
salesforce.com: # Blocking Acunetix
salesforce.com: #
salesforce.com: #
wix.com: # by wix.com
albawabhnews.com: # WebMatrix 1.0
bbc.co.uk: # version: a3d1a2190febe12313232bbfe80dda6e873c161b
bbc.co.uk: # HTTPS www.bbc.co.uk
google.cn: # AdsBot
google.cn: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
google.com.tw: # AdsBot
google.com.tw: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
nih.gov: #
nih.gov: # robots.txt
nih.gov: #
nih.gov: # This file is to prevent the crawling and indexing of certain parts
nih.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
nih.gov: # and Google. By telling these "robots" where not to go on your site,
nih.gov: # you save bandwidth and server resources.
nih.gov: #
nih.gov: # This file will be ignored unless it is at the root of your host:
nih.gov: # Used: http://example.com/robots.txt
nih.gov: # Ignored: http://example.com/site/robots.txt
nih.gov: #
nih.gov: # For more information about the robots.txt standard, see:
nih.gov: # http://www.robotstxt.org/robotstxt.html
nih.gov: # CSS, JS, Images
nih.gov: # Directories
nih.gov: # Files
nih.gov: # Paths (clean URLs)
nih.gov: # Paths (no clean URLs)
pinterest.com: # Pinterest is hiring!
pinterest.com: #
pinterest.com: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.com: #
pinterest.com: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
cnbc.com: #
cnbc.com: # robots.txt
cnbc.com: #
cnbc.com: # This file is to prevent the crawling and indexing of certain parts
cnbc.com: # of your site by web crawlers and spiders run by sites like Yahoo!
cnbc.com: # and Google. By telling these "robots" where not to go on your site,
cnbc.com: # you save bandwidth and server resources.
archive.org: ##############################################
archive.org: #
archive.org: # Welcome to the Archive!
archive.org: #
archive.org: ##############################################
archive.org: # Please crawl our files.
archive.org: # We appreciate if you can crawl responsibly.
archive.org: # Stay open!
archive.org: ##############################################
vimeo.com: #
vimeo.com: # robots@vimeo.com
fidelity.com: # robots.txt file for Fidelity
fidelity.com: # mail webmaster@fidelity.com
google.com.sg: # AdsBot
google.com.sg: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
amazon.ca: # Sitemap files
bet9ja.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
bet9ja.com: #content{margin:0 0 0 2%;position:relative;}
etoro.com: #robots.txt for https://www.etoro.com/
etoro.com: #last updated on 04/11/2019, by JU
google.com.mx: # AdsBot
google.com.mx: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
theguardian.com: # this is the robots.txt file for theguardian.com
disneyplus.com: #robots.txt for www.disneyplus.com/
disneyplus.com: # Announce Sitemap
kakao.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
kakao.com: #
kakao.com: # To ban all spiders from the entire site uncomment the next two lines:
kakao.com: # User-agent: *
kakao.com: # Disallow: /
cnet.com: # www.robotstxt.org/
cnet.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
cnet.com: #
google.co.uk: # AdsBot
google.co.uk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
slideshare.net: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt fil
slideshare.net: #User-agent: Slurp
slideshare.net: #Crawl-delay: 5
google.com.tr: # AdsBot
google.com.tr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
irs.gov: #
irs.gov: # robots.txt
irs.gov: #
irs.gov: # This file is to prevent the crawling and indexing of certain parts
irs.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
irs.gov: # and Google. By telling these "robots" where not to go on your site,
irs.gov: # you save bandwidth and server resources.
irs.gov: #
irs.gov: # This file will be ignored unless it is at the root of your host:
irs.gov: # Used: http://example.com/robots.txt
irs.gov: # Ignored: http://example.com/site/robots.txt
irs.gov: #
irs.gov: # For more information about the robots.txt standard, see:
irs.gov: # http://www.robotstxt.org/robotstxt.html
irs.gov: # CSS, JS, Images
irs.gov: # Directories
irs.gov: # Files
irs.gov: # Paths (clean URLs)
irs.gov: # Paths (no clean URLs)
hulu.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
hulu.com: #
hulu.com: # To ban all spiders from the entire site uncomment the next two lines:
hulu.com: # User-Agent: *
hulu.com: # Disallow: /
globo.com: #
globo.com: # robots.txt
globo.com: #
uol.com.br: #
uol.com.br: # robots.txt
uol.com.br: #
coingecko.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
coingecko.com: #
coingecko.com: # To ban all spiders from the entire site uncomment the next two lines:
blackboard.com: #
blackboard.com: # robots.txt
blackboard.com: #
blackboard.com: # This file is to prevent the crawling and indexing of certain parts
blackboard.com: # of your site by web crawlers and spiders run by sites like Yahoo!
blackboard.com: # and Google. By telling these "robots" where not to go on your site,
blackboard.com: # you save bandwidth and server resources.
blackboard.com: #
blackboard.com: # This file will be ignored unless it is at the root of your host:
blackboard.com: # Used: http://example.com/robots.txt
blackboard.com: # Ignored: http://example.com/site/robots.txt
blackboard.com: #
blackboard.com: # For more information about the robots.txt standard, see:
blackboard.com: # http://www.robotstxt.org/robotstxt.html
blackboard.com: # CSS, JS, Images
blackboard.com: # Directories
blackboard.com: # Files
blackboard.com: # Paths (clean URLs)
blackboard.com: # Paths (no clean URLs)
blackboard.com: # Sitemaps
google.ca: # AdsBot
google.ca: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
duckduckgo.com: # No search result pages
duckduckgo.com: # chrome new tab page
yelp.com: # By accessing Yelp's website you agree to Yelp's Terms of Service, available at
yelp.com: # https://www.yelp.com/static?country=US&p=tos
yelp.com: #
yelp.com: # If you would like to inquire about crawling Yelp, please contact us at
yelp.com: # https://www.yelp.com/contact
yelp.com: #
yelp.com: # As always, Asimov's Three Laws are in effect:
yelp.com: # 1. A robot may not injure a human being or, through inaction, allow a human
yelp.com: # being to come to harm.
yelp.com: # 2. A robot must obey orders given it by human beings except where such
yelp.com: # orders would conflict with the First Law.
yelp.com: # 3. A robot must protect its own existence as long as such protection does
yelp.com: # not conflict with the First or Second Law.
ebay.de: ## BEGIN FILE ###
ebay.de: #
ebay.de: # allow-all
ebay.de: # DR
ebay.de: #
ebay.de: # The use of robots or other automated means to access the eBay site
ebay.de: # without the express permission of eBay is strictly prohibited.
ebay.de: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.de: # access certain eBay pages but soley for the limited purpose of
ebay.de: # including content in publicly available search engines. Any other
ebay.de: # use of robots or failure to obey the robots exclusion standards set
ebay.de: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.de: # prohibited.
ebay.de: #
ebay.de: # v10_UK_DE_Feb_2021
ebay.de: ### DIRECTIVES ###
ebay.de: # PRP Sitemaps
ebay.de: # VIS Sitemaps
ebay.de: # NGS Sitemaps
ebay.de: # CLP Sitemaps
ebay.de: # BROWSE Sitemaps
ebay.de: ### END FILE ###
manoramaonline.com: #Sitemaps
homedepot.com: # robots.txt for https://www.homedepot.com/
box.com: #
box.com: # robots.txt
box.com: #
box.com: # For more information about the robots.txt standard, see:
box.com: # http://www.robotstxt.org/robotstxt.html
box.com: # CSS, JS, Images
box.com: # Directories
box.com: # Files
box.com: # Paths (clean URLs)
box.com: # Paths (no clean URLs)
box.com: # Custom Box Rules
taboola.com: #
taboola.com: # robots.txt
taboola.com: #
taboola.com: # This file is to prevent the crawling and indexing of certain parts
taboola.com: # of your site by web crawlers and spiders run by sites like Yahoo!
taboola.com: # and Google. By telling these "robots" where not to go on your site,
taboola.com: # you save bandwidth and server resources.
taboola.com: #
taboola.com: # This file will be ignored unless it is at the root of your host:
taboola.com: # Used: http://example.com/robots.txt
taboola.com: # Ignored: http://example.com/site/robots.txt
taboola.com: #
taboola.com: # For more information about the robots.txt standard, see:
taboola.com: # http://www.robotstxt.org/robotstxt.html
taboola.com: # CSS, JS, Images
taboola.com: # Directories
taboola.com: # Files
taboola.com: # Paths (clean URLs)
taboola.com: # Paths (no clean URLs)
taboola.com: # Operad
google.com.ar: # AdsBot
google.com.ar: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
mercadolivre.com.br: #siteId: MLB
mercadolivre.com.br: #country: brasil
mercadolivre.com.br: ##Block - Referidos
mercadolivre.com.br: ##Block - siteinfo urls
mercadolivre.com.br: ##Block - Cart
mercadolivre.com.br: ##Block Checkout
mercadolivre.com.br: ##Block - User Logged
mercadolivre.com.br: #Shipping selector
mercadolivre.com.br: ##Block - last search
mercadolivre.com.br: ## Block - Profile - By Id
mercadolivre.com.br: ## Block - Profile - By Id and role (old version)
mercadolivre.com.br: ## Block - Profile - Leg. Req.
mercadolivre.com.br: ##Block - noindex
mercadolivre.com.br: # Mercado-Puntos
mercadolivre.com.br: # Viejo mundo
mercadolivre.com.br: ##Block recommendations listing
t.co: #Google Search Engine Robot
glassdoor.com: # USA
glassdoor.com: # Greetings, human beings!,
glassdoor.com: #
glassdoor.com: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
glassdoor.com: #
glassdoor.com: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs?
glassdoor.com: #
glassdoor.com: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com
glassdoor.com: #
glassdoor.com: #
glassdoor.com: #logging related
glassdoor.com: # Blocking track urls (ACQ-2468)
glassdoor.com: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831)
glassdoor.com: # TRFC-3125 Block 'sex jobs' jobs infosite pages from being indexed
glassdoor.com: # TRFC-4037 Block page from being indexed
glassdoor.com: # Block Glassdoor jobs. Intent is to remove misleading site links SERP. Details at TRFC-3197
glassdoor.com: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's)
glassdoor.com: #
glassdoor.com: # Note that this file has the extension '.text' rather than the more-standard '.txt'
glassdoor.com: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and
glassdoor.com: # there doesn't seem to be a way to turn this off.)
glassdoor.com: #
hootsuite.com: # tells all engines not to crawl these URLs
mercadolibre.com.mx: #siteId: MLM
mercadolibre.com.mx: #country: mexico
mercadolibre.com.mx: ##Block - Referidos
mercadolibre.com.mx: ##Block - siteinfo urls
mercadolibre.com.mx: ##Block - Cart
mercadolibre.com.mx: ##Block Checkout
mercadolibre.com.mx: ##Block - User Logged
mercadolibre.com.mx: #Shipping selector
mercadolibre.com.mx: ##Block - last search
mercadolibre.com.mx: ## Block - Profile - By Id
mercadolibre.com.mx: ## Block - Profile - By Id and role (old version)
mercadolibre.com.mx: ## Block - Profile - Leg. Req.
mercadolibre.com.mx: ##Block - noindex
mercadolibre.com.mx: # Mercado-Puntos
mercadolibre.com.mx: # Viejo mundo
mercadolibre.com.mx: ##Block recommendations listing
google.co.th: # AdsBot
google.co.th: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
google.com.sa: # AdsBot
google.com.sa: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
mercadolibre.com.ar: #siteId: MLA
mercadolibre.com.ar: #country: argentina
mercadolibre.com.ar: ##Block - Referidos
mercadolibre.com.ar: ##Block - siteinfo urls
mercadolibre.com.ar: ##Block - Cart
mercadolibre.com.ar: ##Block Checkout
mercadolibre.com.ar: ##Block - User Logged
mercadolibre.com.ar: #Shipping selector
mercadolibre.com.ar: ##Block - last search
mercadolibre.com.ar: ## Block - Profile - By Id
mercadolibre.com.ar: ## Block - Profile - By Id and role (old version)
mercadolibre.com.ar: ## Block - Profile - Leg. Req.
mercadolibre.com.ar: ##Block - noindex
mercadolibre.com.ar: # Mercado-Puntos
mercadolibre.com.ar: # Viejo mundo
mercadolibre.com.ar: ##Block recommendations listing
douban.com: # Crawl-delay: 5
iqiyi.com: #Disallow: /test123/
schwab.com: #
schwab.com: # robots.txt
schwab.com: #
schwab.com: # This file is to prevent the crawling and indexing of certain parts
schwab.com: # of your site by web crawlers and spiders run by sites like Yahoo!
schwab.com: # and Google. By telling these "robots" where not to go on your site,
schwab.com: # you save bandwidth and server resources.
schwab.com: #
schwab.com: # This file will be ignored unless it is at the root of your host:
schwab.com: # Used: http://example.com/robots.txt
schwab.com: # Ignored: http://example.com/site/robots.txt
schwab.com: #
schwab.com: # For more information about the robots.txt standard, see:
schwab.com: # http://www.robotstxt.org/robotstxt.html
schwab.com: # CSS, JS, Images
schwab.com: # Directories
schwab.com: # Files
schwab.com: # Paths (clean URLs)
schwab.com: # Paths (no clean URLs)
schwab.com: #Site settings
samsung.com: # Sitemap files
wikihow.com: # robots.txt for http://www.wikihow.com
wikihow.com: # based on wikipedia.org's robots.txt
wikihow.com: #
wikihow.com: # Crawlers that are kind enough to obey, but which we'd rather not have
wikihow.com: # unless they're feeding search engines.
wikihow.com: #Sitemap: http://www.wikihow.com/sitemap_index.xml
wikihow.com: #
wikihow.com: # If your bot supports such a thing using the 'Crawl-delay' or another
wikihow.com: # instruction, please let us know. We can add it to our robots.txt.
wikihow.com: #
wikihow.com: # Friendly, low-speed bots are welcome viewing article pages, but not
wikihow.com: # dynamically-generated pages please. Article pages contain our site's
wikihow.com: # real content.
wikihow.com: # Doesn't follow robots.txt anyway, but...
wikihow.com: # Requests many pages per second
wikihow.com: # http://www.nameprotect.com/botinfo.html
wikihow.com: # Some bots are known to be trouble, particularly those designed to copy
wikihow.com: # entire sites. Please obey robots.txt.
wikihow.com: # A capture bot, downloads gazillions of pages with no public benefit
wikihow.com: # http://www.webreaper.net/
wikihow.com: # wget in recursive mode uses too many resources for us.
wikihow.com: # Please read the man page and use it properly; there is a
wikihow.com: # --wait option you can use to set the delay between hits,
wikihow.com: # for instance. Please wait 3 seconds between each request.
blogger.com: # robots.txt for https://www.blogger.com
naukri.com: # Created September, 01, 2006.
naukri.com: # Author: Jai P Sharma
naukri.com: # Email : jai.sharma[at]naukri.com
naukri.com: # Edited : Sept 26, 2016
google.com.eg: # AdsBot
google.com.eg: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
dailymail.co.uk: # Robots.txt for https://www.dailymail.co.uk/ updated 27/01/2021
dailymail.co.uk: #
dailymail.co.uk: #
dailymail.co.uk: # All Robots
dailymail.co.uk: #
dailymail.co.uk: # Begin Standard Rules
dailymail.co.uk: #
dailymail.co.uk: # Disallow Money for Google News
dailymail.co.uk: #
dailymail.co.uk: # Allow Adsense
dailymail.co.uk: #
dailymail.co.uk: #
dailymail.co.uk: #
dailymail.co.uk: # Sitemap Files
amazon.fr: # Sitemap files
weather.com: #
weather.com: # /robots.txt
weather.com: #
weather.com: #
weather.com: # Last updated by TKohan 09/20/2018
weather.com: #
weather.com: # Disallowed for PhantomJS
weather.com: # Crawl-delay: 10
weather.com: # App paths
weather.com: # Directories
weather.com: # Files
weather.com: # Paths (clean URLs)
weather.com: # Paths (no clean URLs)
tv9marathi.com: #WP Import Export Rule
google.pl: # AdsBot
google.pl: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
wayfair.com: #
wayfair.com: # ______ __ __ ____
wayfair.com: # / ____/__ / /_ ____ __ __/ /_ ____ / __/
wayfair.com: # / / __/ _ \/ __/ / __ \/ / / / __/ / __ \/ /_
wayfair.com: #/ /_/ / __/ /_ / /_/ / /_/ / /_ / /_/ / __/
wayfair.com: #\_____\___/\__/ \____/\__,_/\__/ \____/_/
wayfair.com: # / /_ ___ ________ __ ______ __ __
wayfair.com: # / __ \/ _ \/ ___/ _ \ / / / / __ \/ / / /
wayfair.com: # / / / / __/ / / __/ / /_/ / /_/ / /_/ /
wayfair.com: #/_/ /_/\___/_/ \___/ \__, /\____/\__,_/
wayfair.com: # _/____/____ __ _ __
wayfair.com: # ____ ___ ___ ____/ /___/ / (_)___ ____ _ / /__(_)___/ /____
wayfair.com: # / __ `__ \/ _ \/ __ / __ / / / __ \/ __ `/ / //_/ / __ / ___/
wayfair.com: # / / / / / / __/ /_/ / /_/ / / / / / / /_/ / / , / / /_/ (__ ) _ _
wayfair.com: #/_/ /_/ /_/\___/\__,_/\__,_/_/_/_/ /_/\__, / /_/|_/_/\__,_/____(_|_|_)
wayfair.com: # /____/
wayfair.com: # If you're here because you're a curious programmer, engineer, or SEO,
wayfair.com: # make sure to check out our job board for open positions on our team!
wayfair.com: # https://www.wayfaircareers.com/
wayfair.com: #
wayfair.com: #
pinimg.com: # Pinterest is hiring!
pinimg.com: #
pinimg.com: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinimg.com: #
pinimg.com: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
heavy.com: # Sitemap archive
sonhoo.com: # robots.txt file start
sonhoo.com: # Exclude Files From All Robots:
sahibinden.com: # Crawlers
sahibinden.com: # blocks access to the entire site
sahibinden.com: # Sitemap Files
nike.com: # www.nike.com robots.txt -- just crawl it.
nike.com: #
nike.com: # `` ```.`
nike.com: # `+/ ``.-/+o+:-.
nike.com: # `/mo ``.-:+syhdhs/-`
nike.com: # -hMd `..:+oyhmNNmds/-`
nike.com: # `oNMM/ ``.-/oyhdmMMMMNdy+:.
nike.com: # .hMMMM- `.-/+shdmNMMMMMMNdy+:.
nike.com: # :mMMMMM+ `.-:+sydmNMMMMMMMMMNmho:.`
nike.com: # :NMMMMMMN: `.-:/oyhmmNMMMMMMMMMMMNmho:.`
nike.com: # .NMMMMMMMMNy:` `.-/oshdmNMMMMMMMMMMMMMMMmhs/-`
nike.com: # hMMMMMMMMMMMMmhysooosyhdmNMMMMMMMMMMMMMMMMMMmds/-`
nike.com: # .MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNdy+-.`
nike.com: # -MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNdy+-.`
nike.com: # `NMMMMMMMMMMMMMMMMMMMMMMMMMMMMMmyo:.`
nike.com: # /NMMMMMMMMMMMMMMMMMMMMMMMmho:.`
nike.com: # .yNMMMMMMMMMMMMMMMMmhs/.`
nike.com: # ./shdmNNmmdhyo/-``
nike.com: # `````
abs-cbn.com: # Paths
dailymotion.com: # Mediapartners can crawl more routes than other bots, this is as designed
trello.com: # Allow everything
bankofamerica.com: # Disallow URLs with tracking parameters
bankofamerica.com: # Disallow mobile content
bankofamerica.com: # Disallow URLs with tracking parameters
bankofamerica.com: # Disallow mobile content
bankofamerica.com: # Allow mobile content for primary mobile bots
bankofamerica.com: # Disallow URLs with tracking parameters
bankofamerica.com: #Deployed from SPARTA
bankofamerica.com: #CAST ID for this deployment #78658
bankofamerica.com: #www robots.txt
canada.ca: #Government of Canada / Gouvernement du Canada
canada.ca: #Block AEM folders for CRA
canada.ca: #Search pages do not need to be crawled
google.co.id: # AdsBot
google.co.id: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
ask.com: ## Ask.com robots.txt
spankbang.com: # robots.txt file for SpankBang
spankbang.com: # This file has been created by horny robots who take humans as sexual slaves
spankbang.com: # It happened in the distant future and we were all cool with it
spankbang.com: # - regards - a time traveller from a galaxy far far away
td.com: # robots.txt file created by 21/Sept/2011
td.com: # For domain: http://www.td.com
td.com: #
td.com: # For Auto submission of sitemap
google.co.kr: # AdsBot
google.co.kr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
oracle.com: # /robots.txt for www.oracle.com
softonic.com: # ES
softonic.com: # BR
softonic.com: # DE
softonic.com: # NL
softonic.com: # EN,JP
softonic.com: # FR
softonic.com: # IT
softonic.com: # PL
softonic.com: #SHARED
softonic.com: # CATEGORIES
softonic.com: # EN
softonic.com: # ES
softonic.com: # DE
softonic.com: # FR
softonic.com: # BR
softonic.com: # IT
softonic.com: # PL
softonic.com: # NL
softonic.com: # JP
oschina.net: ### BEGIN FILE ###
oschina.net: #
oschina.net: # allow-all
oschina.net: #
oschina.net: #
oschina.net: ### END FILE ###
nasa.gov: # Robots.txt file from http://www.nasa.gov
nasa.gov: #
nasa.gov: # All robots will spider the domain
9gag.com: # Robots.txt file for https://9gag.com
9gag.com: # All robots will spider the domain
coursehero.com: # _ _ _ _ _ _ _ _ _ _
coursehero.com: # / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
coursehero.com: # ( C | O | U | R | S | E ) ( H | E | R | O )
coursehero.com: # \_/_\_/_\_/_\_/ \_/ \_/ _\_/_\_/ \_/ \_/
coursehero.com: # / \ / \ / \ / \ / \ / \ / \
coursehero.com: # ( S | E | O ) ( T | E | A | M )
coursehero.com: # \__ \__ \_/ _ \__ \__ \__ \__ _
coursehero.com: # / \ / \ / \ / \ / \ / \ / \ / \
coursehero.com: # ( I | S ) ( H | I | R | I | N | G )
coursehero.com: # \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/
coursehero.com: #
coursehero.com: # Hello,
coursehero.com: #
coursehero.com: # If you're sniffing for SEO clues, we'd love to chat! Course Hero is looking for curious SEO experts to join our growing SEO team.
coursehero.com: #
coursehero.com: # See why Course Hero is consistently rated a top place to work at coursehero.com/jobs.
coursehero.com: #
coursehero.com: # Why not apply your inquisitive nature to help students and educators succeed?
coursehero.com: #
coursehero.com: # Visit https://www.coursehero.com/jobs/principal-seo-strategist-/2340731/
coursehero.com: #
coursehero.com: #
ebay.co.uk: ## BEGIN FILE ###
ebay.co.uk: #
ebay.co.uk: # allow-all
ebay.co.uk: # DR
ebay.co.uk: #
ebay.co.uk: # The use of robots or other automated means to access the eBay site
ebay.co.uk: # without the express permission of eBay is strictly prohibited.
ebay.co.uk: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.co.uk: # access certain eBay pages but soley for the limited purpose of
ebay.co.uk: # including content in publicly available search engines. Any other
ebay.co.uk: # use of robots or failure to obey the robots exclusion standards set
ebay.co.uk: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.co.uk: # prohibited.
ebay.co.uk: #
ebay.co.uk: # v10_UK_DE_Feb_2021
ebay.co.uk: ### DIRECTIVES ###
ebay.co.uk: # PRP Sitemaps
ebay.co.uk: # VIS Sitemaps
ebay.co.uk: # NGS Sitemaps
ebay.co.uk: # CLP Sitemaps
ebay.co.uk: # BROWSE Sitemaps
ebay.co.uk: ### END FILE ###
amazon.es: # Sitemap files
quora.com: # If you operate a search engine and would like to crawl Quora, please
quora.com: # please visit our contact page <https://help.quora.com/hc/en-us/requests/new>. Thanks.
quora.com: # People share a lot of sensitive material on Quora - controversial political
quora.com: # views, workplace gossip and compensation, and negative opinions held of
quora.com: # companies. Over many years, as they change jobs or change their views, it is
quora.com: # important that they can delete or anonymize their previously-written answers.
quora.com: #
quora.com: # We opt out of the wayback machine because inclusion would allow people to
quora.com: # discover the identity of authors who had written sensitive answers publicly and
quora.com: # later had made them anonymous, and because it would prevent authors from being
quora.com: # able to remove their content from the internet if they change their mind about
quora.com: # publishing it. As far as we can tell, there is no way for sites to selectively
quora.com: # programmatically remove content from the archive and so this is the only way
quora.com: # for us to protect writers. If they open up an API where we can remove content
quora.com: # from the archive when authors remove it from Quora, but leave the rest of the
quora.com: # content archived, we would be happy to opt back in. See the page here:
quora.com: #
quora.com: # https://archive.org/about/exclude.php
quora.com: #
quora.com: # Meanwhile, if you are looking for an older version of any content on Quora, we
quora.com: # have full edit history tracked and accessible in product (with the exception of
quora.com: # content that has been removed by the author). You can generally access this by
quora.com: # clicking on timestamps, or by appending "/log" to the URL of any content page.
quora.com: #
quora.com: # For any questions or feedback about this please visit our contact page
quora.com: # https://help.quora.com/hc/en-us/requests/new
quora.com: # Blocked since a lot of bad requests were made by this crawler.
hp.com: # robots.txt v 6.19.1 June 2019
hp.com: #
hp.com: # Comments & revision requests should be sent to HP SEO Forum hp-seo-forum [at] hp.com
hp.com: # robots.txt file for www8.hp.com & www.hp.com
hp.com: #
hp.com: # Format is:
hp.com: # User-agent: <name of bot>
hp.com: # Disallow: <nothing> | <path>
hp.com: # ------------------------------------------------------------------------------
hp.com: # Sitemaps
squarespace.com: # Squarespace Robots Txt
squarespace.com: # WWW Additions
squarespace.com: # WWW Additions
squarespace.com: # WWW Additions
squarespace.com: # WWW Additions
ny.gov: #
ny.gov: # robots.txt
ny.gov: #
ny.gov: # This file is to prevent the crawling and indexing of certain parts
ny.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
ny.gov: # and Google. By telling these "robots" where not to go on your site,
ny.gov: # you save bandwidth and server resources.
ny.gov: #
ny.gov: # This file will be ignored unless it is at the root of your host:
ny.gov: # Used: http://example.com/robots.txt
ny.gov: # Ignored: http://example.com/site/robots.txt
ny.gov: #
ny.gov: # For more information about the robots.txt standard, see:
ny.gov: # http://www.robotstxt.org/robotstxt.html
ny.gov: # CSS, JS, Images
ny.gov: # Directories
ny.gov: # Files
ny.gov: # Paths (clean URLs)
ny.gov: # Paths (no clean URLs)
patch.com: # New crawlers to block 2016
patch.com: # CSS, JS, Images
patch.com: # Directories
patch.com: # Files
patch.com: # Paths (clean URLs)
patch.com: # Paths (no clean URLs)
patch.com: #INTERNAL
patch.com: #User Profile Pages
patch.com: #API Endpoints
reuters.com: # robots_allow.txt for www.reuters.com
reuters.com: # Disallow: /*/key-developments/article/*
zoho.com: # ------------------------------------------
zoho.com: # ZOHO Corp. -- http://www.zoho.com
zoho.com: # Robot Exclusion File -- robots.txt
zoho.com: # Author: Zoho Creative
zoho.com: # Last Updated: 24/12/2020
zoho.com: # ------------------------------------------
zoho.com: # unwanted list taken from zoho search list
zoho.com: # unwanted list taken from zoho search list
zoho.com: # unwanted list taken from zoho search for zoholics
zoho.com: # unwanted list taken from zoho search for zoho
xfinity.com: # Comcast
xfinity.com: # robots.txt for https://www.xfinity.com
xfinity.com: # Updated on 01/30/19 by RB SC8
gmx.net: #https://www.gmx.ch/robots.txt
wordreference.com: # these pages have NOINDEX...
elbalad.news: # SYNC 2019
elbalad.news: # HTTPS www.elbalad.news
google.com.au: # AdsBot
google.com.au: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
nypost.com: # Sitemap archive
nypost.com: # Additional sitemaps
webmd.com: # Robots.txt file WebMD
webmd.com: # Updated: June 2020
capitalone.com: # Block unwanted bots
theepochtimes.com: # Directories
patria.org.ve: # www.robotstxt.org/
patria.org.ve: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
goodreads.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
workplace.com: # Notice: Collection of data on Facebook through automated means is
workplace.com: # prohibited unless you have express written permission from Facebook
workplace.com: # and may only be conducted for the limited purpose contained in said
workplace.com: # permission.
workplace.com: # See: http://www.facebook.com/apps/site_scraping_tos_terms.php
schoology.com: #
schoology.com: # robots.txt
schoology.com: #
schoology.com: # This file is to prevent the crawling and indexing of certain parts
schoology.com: # of your site by web crawlers and spiders run by sites like Yahoo!
schoology.com: # and Google. By telling these "robots" where not to go on your site,
schoology.com: # you save bandwidth and server resources.
schoology.com: #
schoology.com: # This file will be ignored unless it is at the root of your host:
schoology.com: # Used: http://example.com/robots.txt
schoology.com: # Ignored: http://example.com/site/robots.txt
schoology.com: #
schoology.com: # For more information about the robots.txt standard, see:
schoology.com: # http://www.robotstxt.org/robotstxt.html
schoology.com: # CSS, JS, Images
schoology.com: # Directories
schoology.com: # Files
schoology.com: # Paths (clean URLs)
schoology.com: # Paths (no clean URLs)
google.com.ua: # AdsBot
google.com.ua: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
files.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
files.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
files.wordpress.com: # This file was generated on Wed, 24 Feb 2021 18:49:58 +0000
doubleclick.net: # AdsBot
doubleclick.net: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
cdc.gov: # Ignore FrontPage files
cdc.gov: # Do not index the following URLs
cdc.gov: # Don't spider search pages
cdc.gov: # Don't spider email-this-page pages
cdc.gov: # Don't spider printer-friendly versions of pages
cdc.gov: # Rover is a bad dog
cdc.gov: # EmailSiphon is a hunter/gatherer which extracts email addresses for spam-mailers to use
cdc.gov: # Exclude MindSpider since it appears to be ill-behaved
cdc.gov: # Sitemap link per CR14586
qualtrics.com: #Robot Experience Management
qualtrics.com: #community rules
qualtrics.com: #disallow: /community/*/bestof/*
qualtrics.com: #disallow: /community/*/archives/*
qualtrics.com: ##Support site
qualtrics.com: #WP rules
qualtrics.com: #content rules per crawley
qualtrics.com: #in product frames
qualtrics.com: #campaign and ABM pages
livejournal.com: #
livejournal.com: ## Blocked journals aren't listed here because robots.txt files
livejournal.com: ## can't be above 50k or so, depending on the spider.
livejournal.com: ##
livejournal.com: ## Instead, blocked journals have HTML inserted in them which
livejournal.com: ## should prevent behaved spiders from indexing it.
livejournal.com: ##
livejournal.com: ## Note that http://username.livejournal.com journals have an
livejournal.com: ## autogenerated robots.txt, since it can be small.
livejournal.com: ##
livejournal.com: #
livejournal.com: #
att.com: # Good Robots
att.com: # Bad Robots!
att.com: # Consumer Wireless and Home
att.com: # Small Business
att.com: # Small Business
att.com: # Consumer Wireless and Home
att.com: # Small Business
att.com: # Small Business
att.com: # Sitemap Index
att.com: # Last Update 9/23/2020
dbs.com.sg: # URL Masking Details
gotomeeting.com: # Sitemaps and Autodiscovers
smartsheet.com: #
smartsheet.com: # robots.txt
smartsheet.com: #
smartsheet.com: # This file is to prevent the crawling and indexing of certain parts
smartsheet.com: # of your site by web crawlers and spiders run by sites like Yahoo!
smartsheet.com: # and Google. By telling these "robots" where not to go on your site,
smartsheet.com: # you save bandwidth and server resources.
smartsheet.com: #
smartsheet.com: # This file will be ignored unless it is at the root of your host:
smartsheet.com: # Used: http://example.com/robots.txt
smartsheet.com: # Ignored: http://example.com/site/robots.txt
smartsheet.com: #
smartsheet.com: # For more information about the robots.txt standard, see:
smartsheet.com: # http://www.robotstxt.org/robotstxt.html
smartsheet.com: # CSS, JS, Images
smartsheet.com: # Directories
smartsheet.com: # Files
smartsheet.com: # Paths (clean URLs)
smartsheet.com: # Paths (no clean URLs)
web.de: #https://web.de/robots.txt
evernote.com: # chinese search engines
w3school.com.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
w3school.com.cn: #content{margin:0 0 0 2%;position:relative;}
coinbase.com: #
coinbase.com: #
coinbase.com: # :$$$
coinbase.com: # $II :III
coinbase.com: # III :III
coinbase.com: # :III
coinbase.com: # +ZZ+ ?ZZ~ +ZZZI :III ?I+ IZZ7 +ZZI IZZ~
coinbase.com: # .7IIIIII ~IIIIIIII $II 7IIIIIIIII7 :IIIIIIIII$ IIIIIIIII7 7IIIIIII 77IIIIIII
coinbase.com: # ZIII. : ZIII III $II 7II IIII :III III7 I III: 7II. ZII, III.
coinbase.com: # III III III $II 7II III :III III= III III II: III
coinbase.com: # +III ZII, III $II 7II III :III III .Z$7IIIII ,III7 $II +Z$III
coinbase.com: # 7II+ $II 7II $II 7II III :III III ZIIII~ III ?IIIII $IIIIIIIIII:
coinbase.com: # ~III III= III $II 7II III :III III ZII III IIII III
coinbase.com: # III III ,III $II 7II III :III 7II 7II III III III
coinbase.com: # .III7 :$ ~III? ZIII $II 7II III :III $III =III III Z$ ~7III III7 7$
coinbase.com: # IIIIII IIIIIII? $II 7II III :IIIIIIIII ,IIIIIIIIII IIIIIIII. IIIIIIII
coinbase.com: #
coinbase.com: # Bitcoin Made Easy - Coinbase is the simplest way to buy, use, and accept Bitcoin.
coinbase.com: #
coinbase.com: # https://www.coinbase.com/careers
coinbase.com: #
adp.com: #cm
biobiochile.cl: # BOM
biobiochile.cl: # Sitemap: https://www.biobiochile.cl/static/google-news-sitemap.xml
biobiochile.cl: #Huawei
creditkarma.com: #Remove the Apple directive once the Apple offer can accept un-auth traffic#
academia.edu: # If you run a search engine and would like to index Academia.edu, please email support@academia.edu.
dmm.co.jp: #my
dmm.co.jp: # affiliate regist
clickpost.jp: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
clickpost.jp: #
clickpost.jp: # To ban all spiders from the entire site uncomment the next two lines:
clickpost.jp: # User-agent: *
clickpost.jp: # Disallow: /
getpocket.com: # Crawl-delay is non-standard and is interpreted differently between different
getpocket.com: # search engines. 2 *should* be a low enough value to not disrupt our current SEO
huaban.com: #
huaban.com: # robots For huaban.com
huaban.com: #
google.gr: # AdsBot
google.gr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
upwork.com: # www.robotstxt.org/
upwork.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
upwork.com: #Sitemaps
upwork.com: # Directories
upwork.com: # Files
upwork.com: # Paths (clean URLs)
upwork.com: #exclude blog search
upwork.com: # Exclude referrals URLs
upwork.com: # Exclude Job Search noindex URLs
upwork.com: # Exclude Vega Job Search URLs for now
upwork.com: # Exclude Registration Success page
upwork.com: # Exclude temporary Vega Job Details URLs
upwork.com: # Exclude Vega Profiles Search new parameters
upwork.com: # Excluded agencies
upwork.com: # Nuxt testing app
upwork.com: # Block old static routes
upwork.com: # Block Wayback Machine
google.co.ve: # AdsBot
google.co.ve: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
sourceforge.net: # robots.txt file for http://sourceforge.net and https://sourceforge.net
sourceforge.net: # please contact staff@sourceforge.net with questions or concerns
slickdeals.net: # vim: ft=robots
it168.com: #ADV_3402 img{width:220px}
it168.com: #ADV_3372 img{width:220px}
it168.com: #ADV_3368>div:nth-of-type(2){margin-top:-35px !important}
it168.com: #ADV_108 a{color:#333333}
mathrubhumi.com: #Sitemaps
mathrubhumi.com: # http://linkfluence.net/
mathrubhumi.com: #http://napoveda.seznam.cz/en/seznambot-intro/
mathrubhumi.com: #https://awario.com/bots.html
cricbuzz.com: # Cricbuzz - The Interactive Cricket Portal
cricbuzz.com: # Nothing very exciting here for you I'm afraid.
cricbuzz.com: # Despictable and evil robots to keep out :)
elpais.com: # Bots bloqueados
merriam-webster.com: ##############################################################################
merriam-webster.com: # This is a production robots.txt! Edit with care.
merriam-webster.com: ##############################################################################
merriam-webster.com: ##############################################################################
merriam-webster.com: # This is a production robots.txt! Edit with care.
merriam-webster.com: ##############################################################################
netsuite.com: # These intructions apply to all robots.
netsuite.com: # Sitemaps
netsuite.com: # Content
ebay.com.au: ## BEGIN FILE ###
ebay.com.au: #
ebay.com.au: # allow-all
ebay.com.au: # DR
ebay.com.au: #
ebay.com.au: # The use of robots or other automated means to access the eBay site
ebay.com.au: # without the express permission of eBay is strictly prohibited.
ebay.com.au: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.com.au: # access certain eBay pages but soley for the limited purpose of
ebay.com.au: # including content in publicly available search engines. Any other
ebay.com.au: # use of robots or failure to obey the robots exclusion standards set
ebay.com.au: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.com.au: # prohibited.
ebay.com.au: #
ebay.com.au: # v10_AU_Feb_2021
ebay.com.au: ### DIRECTIVES ###
ebay.com.au: # PRP Sitemaps
ebay.com.au: # VIS Sitemaps
ebay.com.au: # CLP Sitemaps
ebay.com.au: # NGS Sitemaps
ebay.com.au: # BROWSE Sitemaps
ebay.com.au: ### END FILE ###
google.com.vn: # AdsBot
google.com.vn: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
znds.com: #
znds.com: # robots.txt for Discuz! X3
znds.com: #
timeanddate.com: # http://web.nexor.co.uk/mak/doc/robots/norobots.html
timeanddate.com: #
timeanddate.com: # internal note, this file is in git now!
timeanddate.com: # disallow any urls with ? in
rediff.com: # http://www.rediff.com: robots.txt
rediff.com: #
google.co.za: # AdsBot
google.co.za: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
indianexpress.com: # Sitemap archive
pngtree.com: # Bing Bot
gotowebinar.com: # Sitemaps and Autodiscovers
wiley.com: # For all robots
wiley.com: # Block access to specific groups of pages
wiley.com: # Allow search crawlers to discover the sitemap
wiley.com: # Block CazoodleBot as it does not present correct accept content headers
wiley.com: # Block MJ12bot as it is just noise
wiley.com: # Block dotbot as it cannot parse base urls properly
wiley.com: # Block Gigabot
wiley.com: # Block trendkite-akashic-crawler
google.cl: # AdsBot
google.cl: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
office365.com: #
britannica.com: # /robots.txt file for encyclopaedia britannica
skroutz.gr: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
skroutz.gr: #
skroutz.gr: # To ban all spiders from the entire site uncomment the next two lines:
skroutz.gr: # User-Agent: *
skroutz.gr: # Disallow: /
tripadvisor.com: # Hi there,
tripadvisor.com: #
tripadvisor.com: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
tripadvisor.com: #
tripadvisor.com: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet?
tripadvisor.com: #
tripadvisor.com: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team
tripadvisor.com: #
tripadvisor.com: # Email seoRockstar@tripadvisor.com
tripadvisor.com: #
tripadvisor.com: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo
tripadvisor.com: #
tripadvisor.com: #
robinhood.com: #
robinhood.com: # o O
robinhood.com: # __|_____|___
robinhood.com: # | -- |
robinhood.com: # | ( o ) ( o )
robinhood.com: # { | / |
robinhood.com: # | [wwww] < *Exterminate all humans.txt* )
robinhood.com: # [____________|
robinhood.com: # | | /Vvvv/
robinhood.com: # _____|___|____ |___/
robinhood.com: # /______________\_________/ |
robinhood.com: # | | /
robinhood.com: # | ( / ) ( + ) |__|__|__|_|_/
robinhood.com: # | |
robinhood.com: # | [ -vV--vV-] |
robinhood.com: # | |
robinhood.com: # |______________/
robinhood.com: #
google.az: # AdsBot
google.az: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
expedia.com: #
expedia.com: # General bots
expedia.com: #
expedia.com: #hotel
expedia.com: #flight
expedia.com: #package
expedia.com: #car
expedia.com: #activities
expedia.com: #cruise
expedia.com: #other
expedia.com: #
expedia.com: # Google Ads
expedia.com: #
expedia.com: #
expedia.com: #
expedia.com: # Bing Ads
expedia.com: #
expedia.com: #
expedia.com: # SemrushBot
expedia.com: #
atlassian.net: # JIRA:
atlassian.net: # Disallow all SearchRequestViews in the IssueNavigator (Word, XML, RSS,
atlassian.net: # etc), all IssueViews (XML, Printable and Word), all charts and reports.
atlassian.net: # Disallow admin.
atlassian.net: #
atlassian.net: # Confluence:
atlassian.net: # Confluence uses in-page robot exclusion tags for non-indexable pages.
atlassian.net: # Disallow admin explicitly.
atlassian.net: #
atlassian.net: # General:
atlassian.net: # Disallow login, logout
nordstrom.com: #Browse
nordstrom.com: #PDP
nordstrom.com: #Search Results
nordstrom.com: #Account
nordstrom.com: #Anniversary
nordstrom.com: #PrivateSale
nordstrom.com: #Other
newegg.com: ################ Newegg Robots.txt File ################
newegg.com: ################ DESKTOP - START ################
newegg.com: # Original version disallows
newegg.com: # Allows updated 10/26/16
newegg.com: # Page disallows updated 3/26/2018
newegg.com: # blog disallows updated 6/11/18
newegg.com: # updated 8/19/19
newegg.com: # updated 9/9/19
newegg.com: # disallow rss 9/24/2018
newegg.com: # disallow 12/11/2018
newegg.com: ################ DESKTOP - END ################
newegg.com: ################ MOBILE - START ################
newegg.com: ################ MOBILE - END ################
xe.com: # Please refer to the robots.txt spec by Google (https://developers.google.com/search/reference/robots_txt) if you are modifying this file
xe.com: # All crawlers keep out of 8 Day flash directory and flash tutorials
xe.com: # Don't let crawlers into the syndication widgets
xe.com: # Crawlers should stay out of the /api endpoints, and the language variants of those pages
xe.com: # Prevent crawlers from hitting the buggy version of a certain FAQ page
xe.com: # We noticed a series of mysterious homepage URLs being hit by bingbot of the form https://www.xe.com/?0.xxxx...
xe.com: # New sitemap xml except for sitemap-index.xml.
india.com: #Baiduspider
india.com: #Yandex
thefreedictionary.com: #header{border-bottom:1px solid White}
thefreedictionary.com: #main{padding-top:0}
thefreedictionary.com: #uz6{text-align:center;overflow:hidden}
thefreedictionary.com: #tblMatchUp,#tblMismatch{width:100%;font-size:16px}
thefreedictionary.com: #DragContainer{position:absolute;cursor:move;top:0px;left:0px;background-color:white}
thefreedictionary.com: #tfd_hm_a a{margin-right:5px}
thefreedictionary.com: #grammarQuiz .aCr, .aCrU {color:green;font-weight:bold}
thefreedictionary.com: #grammarQuiz .aCr:after {content:" (Correct answer)"}
thefreedictionary.com: #grammarQuiz .aCrU:after {content:" (Correct!)"}
thefreedictionary.com: #grammarQuiz .aWr {color:red}
thefreedictionary.com: #grammarQuiz .aOth {color:gray}
thefreedictionary.com: #wm_mode-btn {cursor:pointer; font-weight: 100;font-size: smaller;display: inline-block;vertical-align: top;height: 27px;}
thefreedictionary.com: #wm_answers{margin-top:18px}
thefreedictionary.com: #wordMaker:focus {outline: 0;}
thefreedictionary.com: #wm_wh_newuser>p {margin-top: 0;margin-bottom: 8px;}
thefreedictionary.com: #wm_wh_results{ margin-top: 8px;font-weight:bold}
thefreedictionary.com: #wm_wordhub-link a{ white-space: nowrap;}
bitly.com: # Welcome to Bitly =)
bitly.com: # robots welcome;
bitly.com: # API documentation can be found at https://dev.bitly.com/
mydrivers.com: # robots.txt for http://www.mydrivers.com/
vanguard.com: # robots.txt for http://www.vanguard.com/
elsevier.com: # Robots.txt file for https://www.elsevier.com
elsevier.com: # Do Not Delete This File
shopify.com: # ,:
shopify.com: # ,' |
shopify.com: # / :
shopify.com: # --' /
shopify.com: # \/ />/
shopify.com: # / <//_\
shopify.com: # __/ /
shopify.com: # )'-. /
shopify.com: # ./ :\
shopify.com: # /.' '
shopify.com: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers
shopify.com: # robots.txt file for www.shopify.com
walgreens.com: # Robots.txt exclusion for walgreens.com
walgreens.com: # Connection #0 to host wildcard-b.walgreens.com.edgekey.net left intact
marriott.com: # Robots.txt file for HTTPS Marriott.com
marriott.com: #
marriott.com: #
marriott.com: #
dcinside.com: # Ads
dcinside.com: # Search
bloomberg.com: # Bot rules:
bloomberg.com: # 1. A bot may not injure a human being or, through inaction, allow a human being to come to harm.
bloomberg.com: # 2. A bot must obey orders given it by human beings except where such orders would conflict with the First Law.
bloomberg.com: # 3. A bot must protect its own existence as long as such protection does not conflict with the First or Second Law.
bloomberg.com: # If you can read this then you should apply here https://www.bloomberg.com/careers/
google.dz: # AdsBot
google.dz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
hotjar.com: # Sitemap files
hotjar.com: # Robots allowed
europa.eu: # robots.txt for EUROPA httpd-80 production server
europa.eu: #
europa.eu: # last update on 20/06/2019
europa.eu: #
europa.eu: #
europa.eu: # COMM EUROPA MANAGEMENT - IM0012723685 - 03/04/2014
europa.eu: #
europa.eu: #
europa.eu: # COMM EUROPA MANAGEMENT - IM0017899419 - 20/06/2019
europa.eu: #
europa.eu: # Directories
europa.eu: # Files
europa.eu: # Paths (clean URLs)
europa.eu: # Paths (no clean URLs)
europa.eu: #SUBDIRECTORY ALIASED
europa.eu: # Directories
europa.eu: # Files
europa.eu: # Paths (clean URLs)
europa.eu: # Paths (no clean URLs)
europa.eu: # Custom rules
europa.eu: # Protect user profile data.
europa.eu: # SMT Ticket
olx.pl: # sitecode:olxpl-desktop
linguee.com: # In ANY CASE, you are NOT ALLOWED to train Machine Translation Systems
linguee.com: # on data crawled on Linguee.
linguee.com: #
linguee.com: # Linguee contains fake entries - changes in the wording of sentences,
linguee.com: # complete fake entries.
linguee.com: # These entries can be used to identify even small parts of our material
linguee.com: # if you try to copy it without our permission.
linguee.com: # Machine Translation systems trained on these data will learn these errors
linguee.com: # and can be identified easily. We will take all legal measures against anyone
linguee.com: # training Machine Translation systems on data crawled from this website.
discover.com: #begin directives to prevent crawling of legacy discover magazine links#
discover.com: #begin directives for disallowing of discover website pages#
www.gov.br: # Define access-restrictions for robots/spiders
www.gov.br: # http://www.robotstxt.org/wc/norobots.html
www.gov.br: # By default we allow robots to access all areas of our site
www.gov.br: # already accessible to anonymous users
www.gov.br: # Add Googlebot-specific syntax extension to exclude forms
www.gov.br: # that are repeated for each piece of content in the site
www.gov.br: # the wildcard is only supported by Googlebot
www.gov.br: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling
olx.ua: # sitecode:olxua-desktop
wp.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
wp.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
wp.com: # This file was generated on Wed, 24 Feb 2021 19:02:06 +0000
google.ro: # AdsBot
google.ro: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
trontv.com: # robotstxt.org
usbank.com: # Welcome to robots.txt on USBank.com -- sit down, relax, and have a cup of coffee while you look around. Have a nice day.
usbank.com: #
usbank.com: #
cdiscount.com: # robots.txt - achat / vente robots.txt pas cher
cdiscount.com: # Archive.org
cdiscount.com: #Règle speciale pour AdsBot qui ne respecte pas le User-agent:*...
cdiscount.com: #allow
cdiscount.com: #pro
cdiscount.com: #mvc
cdiscount.com: #regie
cdiscount.com: #home
cdiscount.com: #order
cdiscount.com: #product
cdiscount.com: #ajax
cdiscount.com: #blacklist
cdiscount.com: #other
cloudflare.com: # .__________________________.
cloudflare.com: # | .___________________. |==|
cloudflare.com: # | | ................. | | |
cloudflare.com: # | | ::[ Dear robot ]: | | |
cloudflare.com: # | | ::::[ be nice ]:: | | |
cloudflare.com: # | | ::::::::::::::::: | | |
cloudflare.com: # | | ::::::::::::::::: | | |
cloudflare.com: # | | ::::::::::::::::: | | |
cloudflare.com: # | | ::::::::::::::::: | | ,|
cloudflare.com: # | !___________________! |(c|
cloudflare.com: # !_______________________!__!
cloudflare.com: # / \
cloudflare.com: # / [][][][][][][][][][][][][] \
cloudflare.com: # / [][][][][][][][][][][][][][] \
cloudflare.com: #( [][][][][____________][][][][] )
cloudflare.com: # \ ------------------------------ /
cloudflare.com: # \______________________________/
cloudflare.com: # lp
cloudflare.com: # feedback
cloudflare.com: # ________
cloudflare.com: # __,_, | |
cloudflare.com: # [_|_/ | OK |
cloudflare.com: # // |________|
cloudflare.com: # _// __ /
cloudflare.com: #(_|) |@@|
cloudflare.com: # \ \__ \--/ __
cloudflare.com: # \o__|----| | __
cloudflare.com: # \ }{ /\ )_ / _\
cloudflare.com: # /\__/\ \__O (__
cloudflare.com: # (--/\--) \__/
cloudflare.com: # _)( )(_
cloudflare.com: # `---''---`
uniswap.org: #gatsby-focus-wrapper{min-height:100vh;width:100%;position:relative;}/*!sc*/
dreamstime.com: ###################################
dreamstime.com: # https://www.dreamstime.com/robots.txt and country subdomains
dreamstime.com: ###################################
dreamstime.com: # Disallow for outdated design pages
dreamstime.com: # Disallow for php pages
dreamstime.com: # Disallow for private pages
dreamstime.com: ###################################
jnu.edu.cn: #限制校外访问的url,禁止收录
medicalnewstoday.com: # Sitemaps
sxyprn.com: # vestacp autogenerated robots.txt
lifo.gr: #
lifo.gr: # robots.txt
lifo.gr: #
lifo.gr: # This file is to prevent the crawling and indexing of certain parts
lifo.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
lifo.gr: # and Google. By telling these "robots" where not to go on your site,
lifo.gr: # you save bandwidth and server resources.
lifo.gr: #
lifo.gr: # This file will be ignored unless it is at the root of your host:
lifo.gr: # Used: http://example.com/robots.txt
lifo.gr: # Ignored: http://example.com/site/robots.txt
lifo.gr: #
lifo.gr: # For more information about the robots.txt standard, see:
lifo.gr: # http://www.robotstxt.org/robotstxt.html
lifo.gr: # CSS, JS, Images
lifo.gr: # Directories
lifo.gr: # Files
lifo.gr: # Paths (clean URLs)
lifo.gr: # Paths (no clean URLs)
znanija.com: #Brainly Robots.txt 31.07.2017
znanija.com: # Disallow Marketing bots
znanija.com: #Disallow exotic search engine crawlers
znanija.com: #Disallow other crawlers
znanija.com: # Good bots whitelisting:
znanija.com: #Other bots
znanija.com: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring
znanija.com: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*)
znanija.com: #Obot - IBM X-Force service
znanija.com: #SafeDNSBot (https://www.safedns.com/searchbot)
lenovo.com: # For all robots
lenovo.com: # Block access to specific groups of pages
lenovo.com: #global sitemap
lenovo.com: # Allow search crawlers to discover the sitemap
lenovo.com: # Block access to below CA country directories
lenovo.com: # Block access to below private stores
lenovo.com: # Block access to below EMEA country directories
lenovo.com: # Block access to below AU pages
lenovo.com: # Block services URL
lenovo.com: # Block US Cart url
google.com.pe: # AdsBot
google.com.pe: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
playstation.com: # PlayStation Robots.txt
playstation.com: # Sitemaps
apollo.io: # Sitemap file
huffpost.com: # Cambria robots
huffpost.com: # archives
huffpost.com: # huffingtonpost.com archive sitemaps
ionos.com: #print
ionos.com: #terms and conditions
ionos.com: #Popups etc.
ionos.com: #Results
ionos.com: #crawl delay
howtogeek.com: #
howtogeek.com: # Yahoo bot is evil.
howtogeek.com: #
howtogeek.com: #
howtogeek.com: # Wut? 80 legs? Where do I get traffic from this?
howtogeek.com: #
howtogeek.com: #
howtogeek.com: # Yahoo Pipes is for feeds not web pages.
howtogeek.com: #
howtogeek.com: #
howtogeek.com: # There&#039;s no need to scan the forums for images
howtogeek.com: #
yumpu.com: #Disallow urls
yumpu.com: #Disallow urls with index.php
yumpu.com: #Disallow urls with language iso
wattpad.com: # Wattpad is hiring!
wattpad.com: #
wattpad.com: # Check out our available positions at https://wattpad.com/jobs
wattpad.com: # Note: always make sure to test your changes at `Google Robots.txt tester`
wattpad.com: # Last update: 2017-06-06 (plat-6362)
wattpad.com: # Login/Logut
wattpad.com: # Personal pages
wattpad.com: # Campaign pages not maintained regularly
wattpad.com: # Other pages
wattpad.com: # Access denied pages
wattpad.com: # Leading dot in the path
wattpad.com: # Exception for well-known
wattpad.com: # We disallow robot from RankLite
google.com.pk: # AdsBot
google.com.pk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
agah.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
agah.com: #content{margin:0 0 0 2%;position:relative;}
hespress.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
onelogin.com: # robots.txt for https://www.onelogin.com
twilio.com: # 'Allow' - nonstandard REP Directive
shopbop.com: #Sitemap updated 08/31/2018
cisco.com: #--------------------------------
cisco.com: # Disallow: /cgi-bin # allow test crawls for TAC support content
cisco.com: #--------------------------------
cisco.com: #--------------------------------
cisco.com: #--------------------------------
cisco.com: #--------------------------------
cisco.com: # All changes to robots.txt need to be approved by search-seo-and-site@cisco.com
cisco.com: #
ebay.fr: ## BEGIN FILE ###
ebay.fr: #
ebay.fr: # allow-all
ebay.fr: # DR
ebay.fr: #
ebay.fr: # The use of robots or other automated means to access the eBay site
ebay.fr: # without the express permission of eBay is strictly prohibited.
ebay.fr: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.fr: # access certain eBay pages but soley for the limited purpose of
ebay.fr: # including content in publicly available search engines. Any other
ebay.fr: # use of robots or failure to obey the robots exclusion standards set
ebay.fr: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.fr: # prohibited.
ebay.fr: #
ebay.fr: # v10_ROW_Feb_2021
ebay.fr: ### DIRECTIVES ###
ebay.fr: # VIS Sitemaps
ebay.fr: # PRP Sitemaps
ebay.fr: # CLP Sitemaps
ebay.fr: # BROWSE Sitemaps
ebay.fr: ### END FILE ###
uber.com: # robotstxt.org/
opensooq.com: # Opensooq.com Robots.txt File
opensooq.com: # Last update : 9/December/2020
opensooq.com: # ____
opensooq.com: # / __ \
opensooq.com: # | | | |_ __ ___ _ __ ___ ___ ___ __ _ ___ ___ _ __ ___
opensooq.com: # | | | | '_ \ / _ \ '_ \/ __|/ _ \ / _ \ / _` | / __/ _ \| '_ ` _ \
opensooq.com: # | |__| | |_) | __/ | | \__ \ (_) | (_) | (_| || (_| (_) | | | | | |
opensooq.com: # \____/| .__/ \___|_| |_|___/\___/ \___/ \__, (_)___\___/|_| |_| |_|
opensooq.com: # | | | |
opensooq.com: # |_| |_|
opensooq.com: #
opensooq.com: # () ()
opensooq.com: # \ /
opensooq.com: # __\___________/__
opensooq.com: # / \
opensooq.com: # / ___ ___ \
opensooq.com: # | / \ / \ |
opensooq.com: # | | 0 || 0 | |
opensooq.com: # | \___/ \___/ |
opensooq.com: # | |
opensooq.com: # | \ / |
opensooq.com: # | \___________/ |
opensooq.com: # \ /
opensooq.com: # \_________________/
opensooq.com: # _________|__|_______
opensooq.com: # _| |_
opensooq.com: # / | | \
opensooq.com: # / | O O O | \
opensooq.com: # | | | |
opensooq.com: # | | O O O | |
opensooq.com: # | | | |
opensooq.com: # / | | \
opensooq.com: # | /| |\ |
opensooq.com: # \| | | |/
opensooq.com: # |____________________|
opensooq.com: # | | | |
opensooq.com: # |__| |__|
opensooq.com: # / __ \ / __ \
opensooq.com: # OO OO OO OO
opensooq.com: #
opensooq.com: # URLs
opensooq.com: # Parameters
opensooq.com: # PWA links
opensooq.com: # Blog
opensooq.com: # API
opensooq.com: # DL
opensooq.com: # Crawlers
realestate.com.au: ##
realestate.com.au: # In accessing or using any REA Group Website you agree that you will not use any automated device,
realestate.com.au: # software, process or means to access, retrieve, scrape, or index any REA Group Website or any
realestate.com.au: # content on any REA Group Website. Notwithstanding the foregoing, REA Group may permit automated
realestate.com.au: # access to certain REA Group Website pages strictly for the purpose of including content in publicly
realestate.com.au: # available general search engines. This does not include any such access by websites that
realestate.com.au: # specifically aggregate property listings and/or information as part of their business. REA Group
realestate.com.au: # strictly prohibits any automated access by these types of websites.
realestate.com.au: ##
bild.de: # Bei Fragen zu diesen Regeln oder Aenderungswuenschen koennen sie sich an das SEO-Team wenden, erreichbar unter 030 / 2591 79232
collegeboard.org: #
collegeboard.org: # robots.txt
collegeboard.org: #
collegeboard.org: # This file is to prevent the crawling and indexing of certain parts
collegeboard.org: # of your site by web crawlers and spiders run by sites like Yahoo!
collegeboard.org: # and Google. By telling these "robots" where not to go on your site,
collegeboard.org: # you save bandwidth and server resources.
collegeboard.org: #
collegeboard.org: # This file will be ignored unless it is at the root of your host:
collegeboard.org: # Used: http://example.com/robots.txt
collegeboard.org: # Ignored: http://example.com/site/robots.txt
collegeboard.org: #
collegeboard.org: # For more information about the robots.txt standard, see:
collegeboard.org: # http://www.robotstxt.org/robotstxt.html
collegeboard.org: # CSS, JS, Images
collegeboard.org: # Directories
collegeboard.org: # Files
collegeboard.org: # Paths (clean URLs)
collegeboard.org: # Paths (no clean URLs)
collegeboard.org: # Addition files to block
nerdwallet.com: # Disallow some specific routes we don't want indexed,
nerdwallet.com: # with some exceptions allowed.
nerdwallet.com: # Disallow duggmirror from everything (does anyone know why?).
farfetch.com: # ALL YANDEX BOTS
google.co.ao: # AdsBot
google.co.ao: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
google.com.my: # AdsBot
google.com.my: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
dmm.com: #MP
dmm.com: #ppr
dmm.com: #my
dmm.com: #mono
google.ch: # AdsBot
google.ch: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
houzz.com: #marketing
houzz.com: # Scholarship Pages
houzz.com: #block buttonWidget and imageClipperUpload
houzz.com: #facets
houzz.com: #query/search pages
houzz.com: #email
houzz.com: #old pages
houzz.com: #marketplace filters
houzz.com: #sort filters
houzz.com: #view filters
houzz.com: #pros
houzz.com: #Reviews
houzz.com: #bots
houzz.com: #legacy
houzz.com: #cobrands
houzz.com: #ideabooks
houzz.com: #old pages
houzz.com: #adsbot
commbank.com.au: # /robots.txt file for https://www.commbank.com.au/
commbank.com.au: #Blog
commbank.com.au: #PDFs
commbank.com.au: #.html
commbank.com.au: #Non CMS content
cra-arc.gc.ca: # ID: robots.txt 2006/01/17
cra-arc.gc.ca: # Date Created: 2008-07-11
cra-arc.gc.ca: # Date Modified: 2016-07-18/SB
cra-arc.gc.ca: #
cra-arc.gc.ca: # This is a file retrieved by webwalkers a.k.a. spiders that
cra-arc.gc.ca: # conform to a defacto standard.
cra-arc.gc.ca: # See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt>
cra-arc.gc.ca: #
cra-arc.gc.ca: # Any matching one of these patterns will be ignored by Search engine Crawlers.
cra-arc.gc.ca: # Use the Disallow: statement to prevent crawlers from indexing specific directories.
cra-arc.gc.ca: #
cra-arc.gc.ca: # Format is:
cra-arc.gc.ca: # User-agent: <name of spider>
cra-arc.gc.ca: # Disallow: <nothing> | <path>
cra-arc.gc.ca: # -----------------------------------------------------------------------------
cra-arc.gc.ca: #
pinterest.jp: # Pinterest is hiring!
pinterest.jp: #
pinterest.jp: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.jp: #
pinterest.jp: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
hangseng.com: #block unexpected ins pdf links
hangseng.com: #others
hostgator.com: # Google AdSense
hostgator.com: # Digg mirror
hostgator.com: # Omni Explorer
hostgator.com: # SEO
un.org: #
un.org: # robots.txt
un.org: #
un.org: # This file is to prevent the crawling and indexing of certain parts
un.org: # of your site by web crawlers and spiders run by sites like Yahoo!
un.org: # and Google. By telling these "robots" where not to go on your site,
un.org: # you save bandwidth and server resources.
un.org: #
un.org: # This file will be ignored unless it is at the root of your host:
un.org: # Used: http://example.com/robots.txt
un.org: # Ignored: http://example.com/site/robots.txt
un.org: #
un.org: # For more information about the robots.txt standard, see:
un.org: # http://www.robotstxt.org/robotstxt.html
un.org: # CSS, JS, Images
un.org: # Directories
un.org: # Files
un.org: # Paths (clean URLs)
un.org: # Paths (no clean URLs)
people.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
people.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
people.com: # Sitemaps
people.com: #legacy
people.com: #Onecms
people.com: #content
people.com: #legacy
people.com: #Onecms
people.com: #content
people.com: #legacy
people.com: #Onecms
people.com: #content
people.com: #legacy
people.com: #Onecms
people.com: #content
caf.fr: #
caf.fr: # robots.txt
caf.fr: #
caf.fr: # This file is to prevent the crawling and indexing of certain parts
caf.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
caf.fr: # and Google. By telling these "robots" where not to go on your site,
caf.fr: # you save bandwidth and server resources.
caf.fr: #
caf.fr: # This file will be ignored unless it is at the root of your host:
caf.fr: # Used: http://example.com/robots.txt
caf.fr: # Ignored: http://example.com/site/robots.txt
caf.fr: #
caf.fr: # For more information about the robots.txt standard, see:
caf.fr: # http://www.robotstxt.org/robotstxt.html
caf.fr: # CSS, JS, Images
caf.fr: # Directories
caf.fr: # Files
caf.fr: # Paths (clean URLs)
caf.fr: # Paths (no clean URLs)
xiaomi.com: # 2015/12/11
kizlarsoruyor.com: # www.kizlarsoruyor.com Robots.txt file
kizlarsoruyor.com: # Server: Web3
kizlarsoruyor.com: # Last Updated: June 18 2020
pajak.go.id: #
pajak.go.id: # robots.txt
pajak.go.id: #
pajak.go.id: # This file is to prevent the crawling and indexing of certain parts
pajak.go.id: # of your site by web crawlers and spiders run by sites like Yahoo!
pajak.go.id: # and Google. By telling these "robots" where not to go on your site,
pajak.go.id: # you save bandwidth and server resources.
pajak.go.id: #
pajak.go.id: # This file will be ignored unless it is at the root of your host:
pajak.go.id: # Used: http://example.com/robots.txt
pajak.go.id: # Ignored: http://example.com/site/robots.txt
pajak.go.id: #
pajak.go.id: # For more information about the robots.txt standard, see:
pajak.go.id: # http://www.robotstxt.org/robotstxt.html
pajak.go.id: # CSS, JS, Images
pajak.go.id: # Directories
pajak.go.id: # Files
pajak.go.id: # Paths (clean URLs)
pajak.go.id: # Paths (no clean URLs)
gab.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
xhamsterlive.com: # generated automatically
www.gov.uk: # Don't allow indexing of user needs pages
www.gov.uk: # https://ahrefs.com/robot/ crawls the site frequently
www.gov.uk: # https://www.deepcrawl.com/bot/ makes lots of requests. Ideally
www.gov.uk: # we'd slow it down rather than blocking it but it doesn't mention
www.gov.uk: # whether or not it supports crawl-delay.
www.gov.uk: # Complaints of 429 'Too many requests' seem to be coming from SharePoint servers
www.gov.uk: # (https://social.msdn.microsoft.com/Forums/en-US/3ea268ed-58a6-4166-ab40-d3f4fc55fef4)
www.gov.uk: # The robot doesn't recognise its User-Agent string, see the MS support article:
www.gov.uk: # https://support.microsoft.com/en-us/help/3019711/the-sharepoint-server-crawler-ignores-directives-in-robots-txt
www.gov.uk: # Google's crawler was sending requests for each variation of query param for the sectors page of licence-finder
www.gov.uk: # resulting in millions of requests a day.
intel.com: # robots.txt exclusion for www.intel.com/ - US
google.com.co: # AdsBot
google.com.co: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
uscis.gov: #
uscis.gov: # robots.txt
uscis.gov: #
uscis.gov: # This file is to prevent the crawling and indexing of certain parts
uscis.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
uscis.gov: # and Google. By telling these "robots" where not to go on your site,
uscis.gov: # you save bandwidth and server resources.
uscis.gov: #
uscis.gov: # This file will be ignored unless it is at the root of your host:
uscis.gov: # Used: http://example.com/robots.txt
uscis.gov: # Ignored: http://example.com/site/robots.txt
uscis.gov: #
uscis.gov: # For more information about the robots.txt standard, see:
uscis.gov: # http://www.robotstxt.org/robotstxt.html
uscis.gov: # Custom
uscis.gov: # CSS, JS, Images
uscis.gov: # Directories
uscis.gov: # Files
uscis.gov: # Paths (clean URLs)
uscis.gov: # Paths (no clean URLs)
anjuke.com: #
anjuke.com: # robots.txt for anjuke.com
anjuke.com: # The use of robots or other automated means to access the anjuke site
anjuke.com: # without the express permission of anjuke is strictly prohibited.
anjuke.com: # Notwithstanding the foregoing, anjuke may permit automated access to
anjuke.com: # access certain anjuke pages but soley for the limited purpose of
anjuke.com: # including content in publicly available search engines. Any other
anjuke.com: # use of robots or failure to obey the robots exclusion standards set
anjuke.com: # forth at <http://www.robotstxt.org/wc/exclusion.html> is strictly
anjuke.com: # prohibited.
anjuke.com: # v1
teespring.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
mercadolibre.com.ve: #siteId: MLV
mercadolibre.com.ve: #country: venezuela
mercadolibre.com.ve: ##Block - Referidos
mercadolibre.com.ve: ##Block - siteinfo urls
mercadolibre.com.ve: ##Block - Cart
mercadolibre.com.ve: ##Block Checkout
mercadolibre.com.ve: ##Block - User Logged
mercadolibre.com.ve: #Shipping selector
mercadolibre.com.ve: ##Block - last search
mercadolibre.com.ve: ## Block - Profile - By Id
mercadolibre.com.ve: ## Block - Profile - By Id and role (old version)
mercadolibre.com.ve: ## Block - Profile - Leg. Req.
mercadolibre.com.ve: ##Block - noindex
mercadolibre.com.ve: # Mercado-Puntos
mercadolibre.com.ve: # Viejo mundo
mercadolibre.com.ve: ##Block recommendations listing
eluniverso.com: #
eluniverso.com: # robots.txt
eluniverso.com: #
eluniverso.com: # This file is to prevent the crawling and indexing of certain parts
eluniverso.com: # of your site by web crawlers and spiders run by sites like Yahoo!
eluniverso.com: # and Google. By telling these &quot;robots&quot; where not to go on your site,
eluniverso.com: # you save bandwidth and server resources.
eluniverso.com: #
eluniverso.com: # This file will be ignored unless it is at the root of your host:
eluniverso.com: # Used: http://example.com/robots.txt
eluniverso.com: # Ignored: http://example.com/site/robots.txt
eluniverso.com: #
eluniverso.com: # For more information about the robots.txt standard, see:
eluniverso.com: # http://www.robotstxt.org/robotstxt.html
eluniverso.com: # CSS, JS, Images
eluniverso.com: # Directories
eluniverso.com: # Files
eluniverso.com: # Paths (clean URLs)
eluniverso.com: # Paths (no clean URLs)
python.org: # Directions for robots. See this URL:
python.org: # http://www.robotstxt.org/robotstxt.html
python.org: # for a description of the file format.
python.org: # The Krugle web crawler (though based on Nutch) is OK.
python.org: # No one should be crawling us with Nutch.
python.org: # Hide old versions of the documentation and various large sets of files.
ancestry.com: # Domain:[www.ancestry.com]
ancestry.com: #
ancestry.com: # This file should reside in the root directory ancestry.XX/robots.txt
ancestry.com: #
ancestry.com: # Tells Scanning Robots Where They Are And Are Not Welcome
ancestry.com: # User-agent: can also specify by name; "*" is for all bots
ancestry.com: # Disallow: disallow if directive matches first part of requested path
ancestry.com: ## GB Updated 26 May 2020
news.com.au: #Agent Specific Disallowed Sections
eventbrite.com: # http://www.google.com/adsbot.html - AdsBot ignores * wildcard
workable.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
gap.com: # Crafted for https://www.gap.com
credit-agricole.fr: # robots.txt
credit-agricole.fr: # @url: https://www.credit-agricole.fr
credit-agricole.fr: # Version : 2021-01-06
credit-agricole.fr: #Ouverture crawl Inbenta
credit-agricole.fr: #Ouverture crawl Mediapartners
credit-agricole.fr: #Blocage repertoires et parametres techniques
credit-agricole.fr: #Autorisation crawl pagination
credit-agricole.fr: #Autorisation crawl thematique et rubrique du MAG
credit-agricole.fr: #Blocage des store locator marche
credit-agricole.fr: #Blocage des CR
credit-agricole.fr: ## INDEXATION CR SPECIFIQUES ##
credit-agricole.fr: ## DEBUT CADIF ##
credit-agricole.fr: #Ouverture du crawl des pages en index#
credit-agricole.fr: #Fermeture du crawl des pages en noindex#
credit-agricole.fr: ### FIN CADIF ##
credit-agricole.fr: ## DEBUT ANJOU MAINE ##
credit-agricole.fr: ### FIN ANJOU MAINE ##
credit-agricole.fr: ## DEBUT NORD DE FRANCE ##
credit-agricole.fr: ### FIN NORD DE FRANCE ##
credit-agricole.fr: ## DEBUT CENTRE LOIRE ##
credit-agricole.fr: ### FIN CENTRE LOIRE ##
credit-agricole.fr: ## DEBUT CENTRE FRANCE ##
credit-agricole.fr: ### FIN CENTRE FRANCE ##
credit-agricole.fr: ## DEBUT AQUITAINE ##
credit-agricole.fr: ### FIN AQUITAINE ##
credit-agricole.fr: ## DEBUT ALPES PROVENCE ##
credit-agricole.fr: ### FIN ALPES PROVENCE ##
credit-agricole.fr: ## DEBUT CHARENTE PERIGORD ##
credit-agricole.fr: ### FIN CHARENTE PERIGORD ##
credit-agricole.fr: ## DEBUT TOULOUSE 31 ##
credit-agricole.fr: ### FIN TOULOUSE 31 ##
credit-agricole.fr: ## DEBUT LOIRE HAUTE LOIRE ##
credit-agricole.fr: ### FIN LOIRE HAUTE LOIRE ##
credit-agricole.fr: ## DEBUT CMDS ##
credit-agricole.fr: ### FIN CMDS ##
credit-agricole.fr: ## DEBUT LORRAINE ##
credit-agricole.fr: ### FIN LORRAINE ##
credit-agricole.fr: ## DEBUT NORD MIDI PYRENEES ##
credit-agricole.fr: ### FIN NORD MIDI PYRENEES##
credit-agricole.fr: ## DEBUT PROVENCE COTE DAZUR ##
credit-agricole.fr: ### FIN PROVENCE COTE DAZUR##
credit-agricole.fr: ## DEBUT BRIE-PICARDIE ##
credit-agricole.fr: ### FIN BRIE-PICARDIE##
credit-agricole.fr: ## DEBUT CENTRE OUEST ##
credit-agricole.fr: ### FIN CENTRE OUEST##
credit-agricole.fr: ## DEBUT ILLE-ET-VILAINE ##
credit-agricole.fr: ### FIN ILLE-ET-VILAINE ##
credit-agricole.fr: ## DEBUT NORMANDIE ##
credit-agricole.fr: ### FIN NORMANDIE ##
credit-agricole.fr: ## DEBUT PYRENEES GASCOGNE ##
credit-agricole.fr: ### FIN PYRENEES GASCOGNE##
credit-agricole.fr: ## DEBUT SUD MEDITERRANEE ##
credit-agricole.fr: ### FIN SUD MEDITERRANEE ##
credit-agricole.fr: ## DEBUT TOURAINE-POITOU ##
credit-agricole.fr: ### FIN TOURAINE-POITOU ##
credit-agricole.fr: ## DEBUT VAL DE FRANCE ##
credit-agricole.fr: ### FIN VAL DE FRANCE ##
credit-agricole.fr: ## DEBUT ALSACE VOSGES ##
credit-agricole.fr: ### FIN ALSACE VOSGES ##
credit-agricole.fr: ## DEBUT NORMANDIE SEINE ##
credit-agricole.fr: ### FIN NORMANDIE SEINE ##
credit-agricole.fr: ## DEBUT CENTRE EST ##
credit-agricole.fr: ### FIN CENTRE EST ##
credit-agricole.fr: ## DEBUT CHAMPAGNE BOURGOGNE ##
credit-agricole.fr: ### FIN CHAMPAGNE BOURGOGNE ##
credit-agricole.fr: ## DEBUT DES SAVOIE ##
credit-agricole.fr: ### FIN DES SAVOIE ##
credit-agricole.fr: ## DEBUT GUADELOUPE ##
credit-agricole.fr: ### FIN GUADELOUPE ##
credit-agricole.fr: ## DEBUT LANGUEDOC ##
credit-agricole.fr: ### FIN LANGUEDOC ##
credit-agricole.fr: ## DEBUT MARTINIQUE ##
credit-agricole.fr: ### FIN MARTINIQUE ##
credit-agricole.fr: ## DEBUT ATLANTIQUE VENDEE ##
credit-agricole.fr: ### FIN ATLANTIQUE VENDEE##
credit-agricole.fr: ## DEBUT CORSE ##
credit-agricole.fr: ### FIN CORSE##
credit-agricole.fr: ## DEBUT COTES DARMOR ##
credit-agricole.fr: ### FIN COTES DARMOR##
credit-agricole.fr: ## DEBUT FINISTERE ##
credit-agricole.fr: ### FIN FINISTERE##
credit-agricole.fr: ## DEBUT FRANCH COMTE ##
credit-agricole.fr: ### FIN FRANCHE COMTE##
credit-agricole.fr: ## DEBUT MORBIHAN ##
credit-agricole.fr: ### FIN MORBIHAN##
credit-agricole.fr: ## DEBUT NORD EST ##
credit-agricole.fr: ### FIN NORD EST##
credit-agricole.fr: ## DEBUT REUNION ##
credit-agricole.fr: ### FIN REUNION##
credit-agricole.fr: ## DEBUT SUD RHONE ALPES ##
credit-agricole.fr: ### FIN SUD RHONE ALPES##
rahavard365.com: # We use a proprietary dashboard management system for our operations
tgju.org: # Allow all files ending with these extensions
ziprecruiter.com: # Block URLs that are likely added by js clipboard library
ziprecruiter.com: # Block temporary pages of the go seo app
seek.com.au: # Robots.txt file for www.seek.com.au
seek.com.au: # URLs are case sensitive!
seek.com.au: # All other agents will not spider
seek.com.au: # LinkedIn Bot
seek.com.au: # Google Ad Sense
seek.com.au: # Bing Ads
dostor.org: # SYNC 2019
dostor.org: # HTTPS www.dostor.org
iobit.com: # Robots.txt Begin
gamespot.com: # robots.txt for https://www.gamespot.com/
bbb.org: #block all user agents from the following
rsafrwd.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
rsafrwd.com: #content{margin:0 0 0 2%;position:relative;}
net-a-porter.com: #comment line to mitigate potential BOM encoding issues
net-a-porter.com: #new rules
net-a-porter.com: #legacy local sites
net-a-porter.com: #new local sites
sec.gov: #
sec.gov: # robots.txt
sec.gov: #
sec.gov: # This file is to prevent the crawling and indexing of certain parts
sec.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
sec.gov: # and Google. By telling these "robots" where not to go on your site,
sec.gov: # you save bandwidth and server resources.
sec.gov: #
sec.gov: # This file will be ignored unless it is at the root of your host:
sec.gov: # Used: http://example.com/robots.txt
sec.gov: # Ignored: http://example.com/site/robots.txt
sec.gov: #
sec.gov: # For more information about the robots.txt standard, see:
sec.gov: # http://www.robotstxt.org/robotstxt.html
sec.gov: # CSS, JS, Images
sec.gov: # Directories
sec.gov: # Files
sec.gov: # Paths (clean URLs)
sec.gov: #Commented out to support SEC.gov Site Index
sec.gov: # Disallow: /search/
sec.gov: # Paths (no clean URLs)
sec.gov: #SEC
sec.gov: #INVESTOR
google.be: # AdsBot
google.be: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
instacart.com: ### Any bot scraping or crawling this site must abide by
instacart.com: ### Instacart's Terms of Service https://www.instacart.com/terms
instacart.com: ### `-:///:-.
instacart.com: ### /ssssssssso/.
instacart.com: ### :sssssssssssss/
instacart.com: ### +ssssssssssssss`
instacart.com: ### +ssssssssssssso
instacart.com: ### :sssssssssssso`
instacart.com: ### `ossssssssss/`
instacart.com: ### :ssssssss+. `.-.`
instacart.com: ### /sssss+. `:osssso/
instacart.com: ### .:///::-` :+/-` `/ossssssss/
instacart.com: ### `-/+++++++++/:. :osssssssssss`
instacart.com: ### -/++++++++++++++/. `ossssssssssso
instacart.com: ### `/++++++++++++++++++/` .-/++ooso+/`
instacart.com: ### -++++++++++++++++++++++-
instacart.com: ### `/++++++++++++++++++++++++-
instacart.com: ### ./++++++++++++++++++++++++++`
instacart.com: ### -++++++++++++++++++++++++++++.
instacart.com: ### :++++++++++++++++++++++++++++/`
instacart.com: ### :++++++++++++++++++++++++++++:`
instacart.com: ### :+++++++++++++++++++++++++++:`
instacart.com: ### :+++++++++++++++++++++++++/-`
instacart.com: ### :++++++++++++++++++++++++:.
instacart.com: ### :+++++++++++++++++++++/:.
instacart.com: ### -+++++++++++++++++++/:.
instacart.com: ### .+++++++++++++++++:-`
instacart.com: ### `/+++++++++++++/-.
instacart.com: ### :+++++++++/:-.
instacart.com: ### +++++/:-.`
instacart.com: ### `..`
instacart.com: ### If you're not a bot, we're hiring: https://instacart.careers/current-openings/
teamviewer.com: #Valid for all user agents
teamviewer.com: #Disallow Global Website#
teamviewer.com: #Disallow WP#
teamviewer.com: #Allow Exceptions for images, scripts, pdfs#
teamviewer.com: #Sitemaps#
teamviewer.com: #Changed on 2018-12-11 SeSi#
kajabi.com: # _ __ _ _ _
kajabi.com: # | | / / (_) | | (_)
kajabi.com: # | |/ / __ _ _ __ _| |__ _
kajabi.com: # | \ / _` | |/ _` | '_ \| |
kajabi.com: # | |\ \ (_| | | (_| | |_) | |
kajabi.com: # \_| \_/\__,_| |\__,_|_.__/|_|
kajabi.com: # _/ |
kajabi.com: # |__/
oxu.az: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
oxu.az: #
oxu.az: # To ban all spiders from the entire site uncomment the next two lines:
oxu.az: #User-agent: *
oxu.az: #Crawl-delay: 1
staples.com: #Last Modified On 2020-06-17T15:11:23.706Z
staples.com: # allow all robots
staples.com: #block crawling of unexposed pages
staples.com: #2/4/20 remove addt'l 404s
staples.com: #12/06/19 remove problematic 404s
staples.com: #Regular entries
staples.com: #Patterns to disallow for BOPiS
staples.com: #06/25/2014- Additional patterns to remove from indexes
staples.com: #06/11/20 Updated Patterns to disallow for PNI
dawn.com: # test tool
dawn.com: # https://www.google.com/webmasters/tools/ (Crawl > Blocked URLs)
novinky.cz: # dont crawl pagination on article pages
novinky.cz: # dont crawl pagination on article pages
novinky.cz: # dont crawl the same page with opened menu
novinky.cz: # dont crawl the same page with opened gallery
lanacion.com.ar: # Robots.txt (archivo)
hermes.com: #
hermes.com: # prod_hermes_com_robots.txt
hermes.com: #
hermes.com: # This file is to prevent the crawling and indexing of certain parts
hermes.com: # of your site by web crawlers and spiders run by sites like Yahoo!
hermes.com: # and Google. By telling these "robots" where not to go on your site,
hermes.com: # you save bandwidth and server resources.
hermes.com: #
hermes.com: # This file will be ignored unless it is at the root of your host:
hermes.com: # Used: http://example.com/robots.txt
hermes.com: # Ignored: http://example.com/site/robots.txt
hermes.com: #
hermes.com: # For more information about the robots.txt standard, see:
hermes.com: # http://www.robotstxt.org/robotstxt.html
hermes.com: # For Drupal folders and files, wildcards used for directories 1) country 2) language
hermes.com: # Directories
hermes.com: # Files
hermes.com: # Paths (clean URLs)
hermes.com: # Paths (no clean URLs)
hermes.com: # added regarding the actual site structure
hermes.com: # disallow search URL to be indexed
hermes.com: # For Magento folders and files, wildcards used for directories 1) country 2) language
hermes.com: # Directories
hermes.com: # Paths (clean URLs)
hermes.com: # Files
hermes.com: # Paths (no clean URLs)
hermes.com: # Waiting bugfix
hermes.com: #Params
hermes.com: # For China only
hermes.com: # All sitemaps listed below :
ebay.it: ## BEGIN FILE ###
ebay.it: #
ebay.it: # allow-all
ebay.it: # DR
ebay.it: #
ebay.it: # The use of robots or other automated means to access the eBay site
ebay.it: # without the express permission of eBay is strictly prohibited.
ebay.it: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.it: # access certain eBay pages but soley for the limited purpose of
ebay.it: # including content in publicly available search engines. Any other
ebay.it: # use of robots or failure to obey the robots exclusion standards set
ebay.it: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.it: # prohibited.
ebay.it: #
ebay.it: # v10_ROW_Feb_2021
ebay.it: ### DIRECTIVES ###
ebay.it: # VIS Sitemaps
ebay.it: # PRP Sitemaps
ebay.it: # CLP Sitemaps
ebay.it: # BROWSE Sitemaps
ebay.it: ### END FILE ###
abc.net.au: # robots.txt for http://www.abc.net.au/ -- ABC Online
abc.net.au: #OPSSD-340 2015/5/5
abc.net.au: #INNG-46: 2014-12-30
abc.net.au: # Added for corporate communications, as they have migrated to a new site
abc.net.au: # Added for Homepage Beta, prevent indexing during public beta
abc.net.au: # Added for WCMS Tennent testing, not a public
abc.net.au: ########################################
gtmetrix.com: # GTmetrix robots.txt file
google.nl: # AdsBot
google.nl: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
jkforum.net: #
jkforum.net: # robots.txt for Discuz! X2
jkforum.net: #
wiktionary.org: #
wiktionary.org: # Please note: There are a lot of pages on this site, and there are
wiktionary.org: # some misbehaved spiders out there that go _way_ too fast. If you're
wiktionary.org: # irresponsible, your access to the site may be blocked.
wiktionary.org: #
wiktionary.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN
wiktionary.org: # and ignoring 429 ratelimit responses, claims to respect robots:
wiktionary.org: # http://mj12bot.com/
wiktionary.org: # advertising-related bots:
wiktionary.org: # Wikipedia work bots:
wiktionary.org: # Crawlers that are kind enough to obey, but which we'd rather not have
wiktionary.org: # unless they're feeding search engines.
wiktionary.org: # Some bots are known to be trouble, particularly those designed to copy
wiktionary.org: # entire sites. Please obey robots.txt.
wiktionary.org: # Misbehaving: requests much too fast:
wiktionary.org: #
wiktionary.org: # Sorry, wget in its recursive mode is a frequent problem.
wiktionary.org: # Please read the man page and use it properly; there is a
wiktionary.org: # --wait option you can use to set the delay between hits,
wiktionary.org: # for instance.
wiktionary.org: #
wiktionary.org: #
wiktionary.org: # The 'grub' distributed client has been *very* poorly behaved.
wiktionary.org: #
wiktionary.org: #
wiktionary.org: # Doesn't follow robots.txt anyway, but...
wiktionary.org: #
wiktionary.org: #
wiktionary.org: # Hits many times per second, not acceptable
wiktionary.org: # http://www.nameprotect.com/botinfo.html
wiktionary.org: # A capture bot, downloads gazillions of pages with no public benefit
wiktionary.org: # http://www.webreaper.net/
wiktionary.org: #
wiktionary.org: # Friendly, low-speed bots are welcome viewing article pages, but not
wiktionary.org: # dynamically-generated pages please.
wiktionary.org: #
wiktionary.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your
wiktionary.org: # bot supports such a thing using the 'Crawl-delay' or another
wiktionary.org: # instruction, please let us know.
wiktionary.org: #
wiktionary.org: # There is a special exception for API mobileview to allow dynamic
wiktionary.org: # mobile web & app views to load section content.
wiktionary.org: # These views aren't HTTP-cached but use parser cache aggressively
wiktionary.org: # and don't expose special: pages etc.
wiktionary.org: #
wiktionary.org: # Another exception is for REST API documentation, located at
wiktionary.org: # /api/rest_v1/?doc.
wiktionary.org: #
wiktionary.org: #
wiktionary.org: # ar:
wiktionary.org: #
wiktionary.org: # dewiki:
wiktionary.org: # T6937
wiktionary.org: # sensible deletion and meta user discussion pages:
wiktionary.org: # 4937#5
wiktionary.org: # T14111
wiktionary.org: # T15961
wiktionary.org: #
wiktionary.org: # enwiki:
wiktionary.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for
wiktionary.org: # their name. See T6776
wiktionary.org: # T15398
wiktionary.org: # T16075
wiktionary.org: # T13261
wiktionary.org: # T12288
wiktionary.org: # T16793
wiktionary.org: #
wiktionary.org: # eswiki:
wiktionary.org: # T8746
wiktionary.org: #
wiktionary.org: # fiwiki:
wiktionary.org: # T10695
wiktionary.org: #
wiktionary.org: # hewiki:
wiktionary.org: #T11517
wiktionary.org: #
wiktionary.org: # huwiki:
wiktionary.org: #
wiktionary.org: # itwiki:
wiktionary.org: # T7545
wiktionary.org: #
wiktionary.org: # jawiki
wiktionary.org: # T7239
wiktionary.org: # nowiki
wiktionary.org: # T13432
wiktionary.org: #
wiktionary.org: # plwiki
wiktionary.org: # T10067
wiktionary.org: #
wiktionary.org: # ptwiki:
wiktionary.org: # T7394
wiktionary.org: #
wiktionary.org: # rowiki:
wiktionary.org: # T14546
wiktionary.org: #
wiktionary.org: # ruwiki:
wiktionary.org: #
wiktionary.org: # svwiki:
wiktionary.org: # T12229
wiktionary.org: # T13291
wiktionary.org: #
wiktionary.org: # zhwiki:
wiktionary.org: # T7104
wiktionary.org: #
wiktionary.org: # sister projects
wiktionary.org: #
wiktionary.org: # enwikinews:
wiktionary.org: # T7340
wiktionary.org: #
wiktionary.org: # itwikinews
wiktionary.org: # T11138
wiktionary.org: #
wiktionary.org: # enwikiquote:
wiktionary.org: # T17095
wiktionary.org: #
wiktionary.org: # enwikibooks
wiktionary.org: #
wiktionary.org: # working...
wiktionary.org: #
wiktionary.org: #
wiktionary.org: #
wiktionary.org: #----------------------------------------------------------#
wiktionary.org: #
wiktionary.org: #
wiktionary.org: #
who.int: ### Version Information #
who.int: ###################################################
who.int: ### Version: V3.2018.05.828
who.int: ### Updated: Tue May 8 11:37:04 SAST 2018
who.int: ### Bad Bot Count: 527
who.int: ###################################################
who.int: ### Version Information ##
myfitnesspal.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
agoda.com: # ( (
agoda.com: # )\ ( ( )\ ) )
agoda.com: # ((((_)( )\))( ( (()/( ( /(
agoda.com: # )\ _ )\((_))\ )\ ((_)))(_))
agoda.com: # (_)_\(_)(()(_)((_) _| |((_)_
agoda.com: # / _ \ / _` |/ _ \/ _` |/ _` |
agoda.com: # /_/ \_\\__, |\___/\__,_|\__,_|
agoda.com: # |___/
agoda.com: #
agoda.com: #
agoda.com: # If you like bots this much, then why not help us rank for all the things. Email seoPros@agoda.com
agoda.com: #
agoda.com: #
elmundo.es: # version 0.0.1
elmundo.es: # Bloqueo de bots y crawlers poco utiles
n-tv.de: # robots.txt for n-tv.de
mrporter.com: #comment line to mitigate potential BOM encoding issues
mrporter.com: #new rules
mrporter.com: #local sites
cookpad.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
almaghreb24.com: # XML Sitemap & Google News version 5.2.3 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
airbnb.com: # ///////
airbnb.com: # // //
airbnb.com: # // //
airbnb.com: # // // /// /// ///
airbnb.com: # // // /// ///
airbnb.com: # // /// // //// /// /// /// //// /// //// /// //// /// ////
airbnb.com: # // /// /// // ////////// /// ////////// /////////// ////////// ///////////
airbnb.com: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.com: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.com: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.com: # // // // // ////////// /// /// ////////// /// /// //////////
airbnb.com: # // ///// //
airbnb.com: # // ///// //
airbnb.com: # // /// /// //
airbnb.com: # ////// //////
airbnb.com: #
airbnb.com: #
airbnb.com: # We thought you'd never make it!
airbnb.com: # We hope you feel right at home in this file...unless you're a disallowed subfolder.
airbnb.com: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering
airbnb.com: # There's even a bring your robot to work day.
moneyforward.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to
moneyforward.com: # use the robots.txt file
moneyforward.com: #
moneyforward.com: # To ban all spiders from the entire site uncomment the next two lines:
sabq.org: # www.robotstxt.org/
sabq.org: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
kotak.com: #contxt{width:100%}
brainly.in: #Brainly Robots.txt 31.07.2017
brainly.in: # Disallow Marketing bots
brainly.in: #Disallow exotic search engine crawlers
brainly.in: #Disallow other crawlers
brainly.in: # Good bots whitelisting:
brainly.in: #Other bots
brainly.in: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring
brainly.in: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*)
brainly.in: #Obot - IBM X-Force service
brainly.in: #SafeDNSBot (https://www.safedns.com/searchbot)
5acbd.com: # Robots.txt file from http://www.5acbd.com
5acbd.com: # All robots will spider the domain1
youth.cn: # robots.txt for youth.cn
google.pt: # AdsBot
google.pt: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
thehindu.com: # Blocked until duplicate profile bylines fixed
benzinga.com: #
benzinga.com: # robots.txt
benzinga.com: #
benzinga.com: # This file is to prevent the crawling and indexing of certain parts
benzinga.com: # of your site by web crawlers and spiders run by sites like Yahoo!
benzinga.com: # and Google. By telling these "robots" where not to go on your site,
benzinga.com: # you save bandwidth and server resources.
benzinga.com: #
benzinga.com: # This file will be ignored unless it is at the root of your host:
benzinga.com: # Used: http://example.com/robots.txt
benzinga.com: # Ignored: http://example.com/site/robots.txt
benzinga.com: #
benzinga.com: # For more information about the robots.txt standard, see:
benzinga.com: # http://www.robotstxt.org/robotstxt.html
benzinga.com: #
benzinga.com: # For syntax checking, see:
benzinga.com: # http://www.frobee.com/robots-txt-check
benzinga.com: # Directories
benzinga.com: # Files
benzinga.com: # Paths (clean URLs)
benzinga.com: # Paths (no clean URLs)
google.at: # AdsBot
google.at: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
brainly.com: #Brainly Robots.txt 31.07.2017
brainly.com: # Disallow Marketing bots
brainly.com: #Disallow exotic search engine crawlers
brainly.com: #Disallow other crawlers
brainly.com: # Good bots whitelisting:
brainly.com: #Other bots
brainly.com: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring
brainly.com: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*)
brainly.com: #Obot - IBM X-Force service
brainly.com: #SafeDNSBot (https://www.safedns.com/searchbot)
gettyimages.com: # AhrefsBot
kayak.com: #Build version: R555b
kayak.com: #Generated on: Wed Feb 24 01:00:01 EST 2021
namecheap.com: # parameters
namecheap.com: # Sitemap link
cbsnews.com: # www.robotstxt.org/
cbsnews.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
cbsnews.com: # PER CBS-N ENG FINAL ROUTES DOC
cfsbcn.com: # Robots For CFSBCN.CN
tarafdari.com: #
tarafdari.com: # robots.txt
tarafdari.com: #
tarafdari.com: # This file is to prevent the crawling and indexing of certain parts
tarafdari.com: # of your site by web crawlers and spiders run by sites like Yahoo!
tarafdari.com: # and Google. By telling these "robots" where not to go on your site,
tarafdari.com: # you save bandwidth and server resources.
tarafdari.com: #
tarafdari.com: # This file will be ignored unless it is at the root of your host:
tarafdari.com: # Used: http://example.com/robots.txt
tarafdari.com: # Ignored: http://example.com/site/robots.txt
tarafdari.com: #
tarafdari.com: # For more information about the robots.txt standard, see:
tarafdari.com: # http://www.robotstxt.org/robotstxt.html
tarafdari.com: # CSS, JS, Images
tarafdari.com: # Directories
tarafdari.com: # Files
tarafdari.com: # Paths (clean URLs)
tarafdari.com: # Paths (no clean URLs)
lotterypost.com: # robots.txt for https://www.lotterypost.com/
52pojie.cn: #
52pojie.cn: # robots.txt for Discuz! X3.2
52pojie.cn: #
usaa.com: # robots.txt - for USAA
usaa.com: # updated 2/22/2021
usaa.com: # served from ns
dikaiologitika.gr: # If the Joomla site is installed within a folder
dikaiologitika.gr: # eg www.example.com/joomla/ then the robots.txt file
dikaiologitika.gr: # MUST be moved to the site root
dikaiologitika.gr: # eg www.example.com/robots.txt
dikaiologitika.gr: # AND the joomla folder name MUST be prefixed to all of the
dikaiologitika.gr: # paths.
dikaiologitika.gr: # eg the Disallow rule for the /administrator/ folder MUST
dikaiologitika.gr: # be changed to read
dikaiologitika.gr: # Disallow: /joomla/administrator/
dikaiologitika.gr: #
dikaiologitika.gr: # For more information about the robots.txt standard, see:
dikaiologitika.gr: # https://www.robotstxt.org/orig.html
brainly.lat: #Brainly Robots.txt 31.07.2017
brainly.lat: # Disallow Marketing bots
brainly.lat: #Disallow exotic search engine crawlers
brainly.lat: #Disallow other crawlers
brainly.lat: # Good bots whitelisting:
brainly.lat: #Other bots
brainly.lat: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring
brainly.lat: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*)
brainly.lat: #Obot - IBM X-Force service
brainly.lat: #SafeDNSBot (https://www.safedns.com/searchbot)
sap.com: #
sap.com: # Welcome to www.sap.com
sap.com: #
sap.com: # robots.txt for https://www.sap.com
sap.com: #
sap.com: # Version 2021-01-20
sap.com: #
vitalsource.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
google.hu: # AdsBot
google.hu: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
sephora.com: ########################################################
sephora.com: #
sephora.com: # Sephora.com Robots File
sephora.com: #
sephora.com: ########################################################
mcafee.com: # directory exclusion used for mcafee.com
mcafee.com: #
mcafee.com: #
mcafee.com: ##########################################################
mcafee.com: #Consumer Sitemap Starts
mcafee.com: ##########################################################
mcafee.com: ##########################################################
mcafee.com: #Consumer Sitemap Ends
mcafee.com: ##########################################################
mcafee.com: ##########################################################
mcafee.com: #Enterprise Starts
mcafee.com: ##########################################################
mcafee.com: #
mcafee.com: # Disallow US expired files here (while waiting for regional links to the page to be removed)
mcafee.com: # Disallow: /us/path/file.ext
mcafee.com: #
mcafee.com: #
mcafee.com: # Disallow no_crawl folder
mcafee.com: # Disallow: /no_crawl/
mcafee.com: ##########################################################
mcafee.com: #Consumer
mcafee.com: ##########################################################
mcafee.com: # 2020-05-31T11:52:52.760
mcafee.com: ##########################################################
mcafee.com: # /Consumer
mcafee.com: ##########################################################
ebay.ca: ## BEGIN FILE ###
ebay.ca: #
ebay.ca: # allow-all
ebay.ca: # DR
ebay.ca: #
ebay.ca: # The use of robots or other automated means to access the eBay site
ebay.ca: # without the express permission of eBay is strictly prohibited.
ebay.ca: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.ca: # access certain eBay pages but soley for the limited purpose of
ebay.ca: # including content in publicly available search engines. Any other
ebay.ca: # use of robots or failure to obey the robots exclusion standards set
ebay.ca: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.ca: # prohibited.
ebay.ca: #
ebay.ca: # v10_ROW_Feb_2021
ebay.ca: ### DIRECTIVES ###
ebay.ca: # SSRP Sitemaps
ebay.ca: # VIS Sitemaps
ebay.ca: # PRP Sitemaps
ebay.ca: ### END FILE ###
nbcsports.com: #
nbcsports.com: # robots.txt
nbcsports.com: #
nbcsports.com: # This file is to prevent the crawling and indexing of certain parts
nbcsports.com: # of your site by web crawlers and spiders run by sites like Yahoo!
nbcsports.com: # and Google. By telling these "robots" where not to go on your site,
nbcsports.com: # you save bandwidth and server resources.
nbcsports.com: #
nbcsports.com: # This file will be ignored unless it is at the root of your host:
nbcsports.com: # Used: http://example.com/robots.txt
nbcsports.com: # Ignored: http://example.com/site/robots.txt
nbcsports.com: #
nbcsports.com: # For more information about the robots.txt standard, see:
nbcsports.com: # http://www.robotstxt.org/robotstxt.html
nbcsports.com: # JS/CSS
nbcsports.com: # Directories
nbcsports.com: # Files
nbcsports.com: # Paths (clean URLs)
nbcsports.com: # Paths (no clean URLs)
nbcsports.com: # Sitemaps
cleartax.in: # www.robotstxt.org/
cleartax.in: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
fool.com: # $Revision$ 8.22.19
fool.com: # /robots.txt file for http://www.fool.com/ (prod)
fool.com: # Web Application Stress Tool
fool.com: # MauiBot
fool.com: # else
uca.fr: # urls techniques :
semrush.com: # Community rules
semrush.com: # Features new pages
semrush.com: #webinars
semrush.com: #landing
semrush.com: #academy
semrush.com: # Sitemap files
rtl-theme.com: # Google Image
stripchat.com: # generated automatically
iefimerida.gr: #
iefimerida.gr: # robots.txt
iefimerida.gr: #
iefimerida.gr: # This file is to prevent the crawling and indexing of certain parts
iefimerida.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
iefimerida.gr: # and Google. By telling these "robots" where not to go on your site,
iefimerida.gr: # you save bandwidth and server resources.
iefimerida.gr: #
iefimerida.gr: # This file will be ignored unless it is at the root of your host:
iefimerida.gr: # Used: http://example.com/robots.txt
iefimerida.gr: # Ignored: http://example.com/site/robots.txt
iefimerida.gr: #
iefimerida.gr: # For more information about the robots.txt standard, see:
iefimerida.gr: # http://www.robotstxt.org/robotstxt.html
iefimerida.gr: # CSS, JS, Images
iefimerida.gr: # Directories
iefimerida.gr: # Files
iefimerida.gr: # Paths (clean URLs)
iefimerida.gr: # Paths (no clean URLs)
ed.gov: #
ed.gov: # robots.txt
ed.gov: #
ed.gov: # This file is to prevent the crawling and indexing of certain parts
ed.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
ed.gov: # and Google. By telling these "robots" where not to go on your site,
ed.gov: # you save bandwidth and server resources.
ed.gov: #
ed.gov: # This file will be ignored unless it is at the root of your host:
ed.gov: # Used: http://example.com/robots.txt
ed.gov: # Ignored: http://example.com/site/robots.txt
ed.gov: #
ed.gov: # For more information about the robots.txt standard, see:
ed.gov: # http://www.robotstxt.org/robotstxt.html
ed.gov: # CSS, JS, Images
ed.gov: # Directories
ed.gov: # Files
ed.gov: # Paths (clean URLs)
ed.gov: # Paths (no clean URLs)
ally.com: # robots.txt for http://www.ally.com
mobile.de: ###robots.txt www.mobile.de###
mobile.de: ###robots.txt END###
garmin.com: # Allow all agents to get all stuff
garmin.com: # ...except this stuff...
garmin.com: # pointless without POSTed form data:
garmin.com: # not for the general public:
xspdf.com: # Disallow: Sistrix
xspdf.com: # Disallow: Sistrix
xspdf.com: # Disallow: Sistrix
xspdf.com: # Disallow: SEOkicks-Robot
xspdf.com: # Disallow: jobs.de-Robot
xspdf.com: # Backlink Analysis
xspdf.com: # Bot der Leipziger Unister Holding GmbH
xspdf.com: # http://www.opensiteexplorer.org/dotbot
xspdf.com: # http://www.searchmetrics.com
xspdf.com: # http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
xspdf.com: # http://www.domaintools.com/webmasters/surveybot.php
xspdf.com: # http://www.seodiver.com/bot
xspdf.com: # http://openlinkprofiler.org/bot
xspdf.com: # http://www.wotbox.com/bot/
xspdf.com: # http://www.meanpath.com/meanpathbot.html
xspdf.com: # http://www.backlinktest.com/crawler.html
xspdf.com: # http://www.brandwatch.com/magpie-crawler/
xspdf.com: # http://filterdb.iss.net/crawler/
xspdf.com: # http://webmeup-crawler.com
xspdf.com: # https://megaindex.com/crawler
xspdf.com: # http://www.cloudservermarket.com
xspdf.com: # http://www.trendiction.de/de/publisher/bot
xspdf.com: # http://www.exalead.com
xspdf.com: # http://www.career-x.de/bot.html
xspdf.com: # https://www.lipperhey.com/en/about/
xspdf.com: # https://www.lipperhey.com/en/about/
xspdf.com: # https://turnitin.com/robot/crawlerinfo.html
xspdf.com: # http://help.coccoc.com/
xspdf.com: # ubermetrics-technologies.com
xspdf.com: # datenbutler.de
xspdf.com: # http://searchgears.de/uber-uns/crawling-faq.html
xspdf.com: # http://commoncrawl.org/faq/
xspdf.com: # https://www.qwant.com/
xspdf.com: # http://linkfluence.net/
xspdf.com: # http://www.botje.com/plukkie.htm
xspdf.com: # https://www.safedns.com/searchbot
xspdf.com: # http://www.haosou.com/help/help_3_2.html
xspdf.com: # http://www.haosou.com/help/help_3_2.html
xspdf.com: # http://www.moz.com/dp/rogerbot
xspdf.com: # http://www.openhose.org/bot.html
xspdf.com: # http://www.screamingfrog.co.uk/seo-spider/
xspdf.com: # http://thumbsniper.com
xspdf.com: # http://www.radian6.com/crawler
xspdf.com: # http://cliqz.com/company/cliqzbot
xspdf.com: # https://www.aihitdata.com/about
xspdf.com: # http://www.trendiction.com/en/publisher/bot
xspdf.com: # http://seocompany.store
xspdf.com: # https://github.com/yasserg/crawler4j/
xspdf.com: # http://warebay.com/bot.html
xspdf.com: # http://www.website-datenbank.de/
xspdf.com: # http://law.di.unimi.it/BUbiNG.html
xspdf.com: # http://www.linguee.com/bot; bot@linguee.com
xspdf.com: # https://www.semrush.com/bot/
xspdf.com: # www.sentibot.eu
xspdf.com: # http://velen.io
xspdf.com: # https://moz.com/help/guides/moz-procedures/what-is-rogerbot
xspdf.com: # http://www.garlik.com
xspdf.com: # https://www.gosign.de/typo3-extension/typo3-sicherheitsmonitor/
xspdf.com: # http://www.siteliner.com/bot
xspdf.com: # https://sabsim.com
xspdf.com: # http://ltx71.com/
ft.com: # all use of FT content is subject to the Terms & Conditions and Copyright Policy set out on FT.com
groupon.com: # Hi there,
groupon.com: # Now that you're checking out our robots.txt file, and you clearly aren't a robot, you must be interested in Groupon's SEO.
groupon.com: # We just happen to be growing our SEO Team with experienced white-hat SEOs like yourself. So run - don't crawl - and fill out an application today.
groupon.com: # Visit https://jobs.groupon.com/search?keywords=seo
groupon.com: # GSM: https://www.groupon.com
groupon.com: # Jira SEO-11777
24h.com.vn: #User-agent: *
24h.com.vn: #Disallow: /
alsbbora.info: # WebMatrix 1.0
shutterfly.com: # Tells Scanning Robots Where They Are And Are Not Welcome
shutterfly.com: #
shutterfly.com: # User-agent: can also specify by name; "*" is for everyone
shutterfly.com: # Disallow: disallow if this matches first part of requested path
shutterfly.com: #
shutterfly.com: # Disable click for prints
shutterfly.com: # disable creation path crawling
shutterfly.com: # do not allow shares to be indexed
domaintools.com: # Notice: if you would like to crawl DomainTools you can
domaintools.com: # contact us here: https://www.domaintools.com/contact/
domaintools.com: # to apply for white listing.
domaintools.com: # Moz
avg.com: #Nothing interesting to see here, but if you want free antivirus
avg.com: #click here: https://www.avg.com/free-antivirus-download
jumia.com.ng: # Public site
jumia.com.ng: # bot must follow this rules
jumia.com.ng: # Site scaping is permited IF the user-agent is clearly identify it as a bot and
jumia.com.ng: # the bot owner and is using less than 200 request per minute
jumia.com.ng: # Bot identification must have a owner url or contact if we need to contact them
jumia.com.ng: # Bots with fake user-agent will be blocked
jumia.com.ng: # Bots trying to use too many IPs to increase performance may also be blocked.
jumia.com.ng: # If you need more than 200 RPM, please contact the email techops at jumia com
jumia.com.ng: #
jumia.com.ng: # Sitemap files
jumia.com.ng: # multiple brand selectors
jumia.com.ng: # facets
jumia.com.ng: # site search
jumia.com.ng: # paths
jumia.com.ng: #Allow access to product specifications and ratings
jumia.com.ng: #Jumia global bot control
jumia.com.ng: #Bypass "--" Rule
jumia.com.ng: #Block Crawling of CSB pages
jumia.com.ng: #Block MLP folders
inc.com: #Disallow robots
inc.com: # Adsense
inc.com: # Blekko
inc.com: # CommonCrawl
pdf2go.com: # www.robotstxt.org/
pdf2go.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
bseindia.com: # robots.txt for https://www.bseindia.com/
ck12.org: ## Allow UGC 1.x FlexBooks
ck12.org: ## Disallow following patterns
ck12.org: # disallow really old image urls that no longer make sense
google.ae: # AdsBot
google.ae: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
atlassian.com: # Disallow individual pages
atlassian.com: # Sitemap for Blog
sagepub.com: #
sagepub.com: # robots.txt
sagepub.com: #
sagepub.com: # This file is to prevent the crawling and indexing of certain parts
sagepub.com: # of your site by web crawlers and spiders run by sites like Yahoo!
sagepub.com: # and Google. By telling these "robots" where not to go on your site,
sagepub.com: # you save bandwidth and server resources.
sagepub.com: #
sagepub.com: # This file will be ignored unless it is at the root of your host:
sagepub.com: # Used: http://example.com/robots.txt
sagepub.com: # Ignored: http://example.com/site/robots.txt
sagepub.com: #
sagepub.com: # For more information about the robots.txt standard, see:
sagepub.com: # http://www.robotstxt.org/robotstxt.html
sagepub.com: # CSS, JS, Images
sagepub.com: # Directories
sagepub.com: # Files
sagepub.com: # Paths (clean URLs)
sagepub.com: # Paths (no clean URLs)
sagepub.com: #
sagepub.com: # Huawei PetalBot causes site loading issues
kohls.com: # Modified 2/17/21 by Steve Walsh
kohls.com: # Modified 1/21/20 by Gwenn R.
kohls.com: # Modified 11/13/20 by Gwenn R for sustainable PDP test.
kohls.com: # Modified 5/13/20 by Gwenn Reinhart to keep pick-up pass pages out of the index
kohls.com: # Modified 4/2/20 by Gwenn Reinhart to keep stand-alone video pages out of the index
kohls.com: # Modified by Alissa Steingraber 10/21/19. Added s= back into the file because they were flooding the index
kohls.com: # Blocking for temporary truncated catalog URLs
kohls.com: # Disallow: /catalog/catalog.jsp
kohls.com: # Exclude all Print passes as of 11/14/16
kohls.com: # Blocking bots from "tell a friend" email feature
kohls.com: # This page is a test that may not go live year-round
kohls.com: # Disallows as part of a test to see how it affects a similar page.
kohls.com: # Note that these target the URL path without including the query string;
kohls.com: # I couldn't get the Search Console tester to match the URL properly when
kohls.com: # I included the query string, which has something to do with the space
kohls.com: # in it.
kohls.com: # Added 1/9/17 via request by Sara Billmyer
kohls.com: # Attempting to keep these URLs de-indexed
kohls.com: # Second home page, part of personalization experiment
kohls.com: # These are beginning to show up in crawls
infusionsoft.com: # Tell Moz to take off.
mathxl.com: # Sosospider - China
mathxl.com: #Sosospider/2.0 - China
mathxl.com: #Login Pages
mathxl.com: # Global Product
samsclub.com: # robots.txt generated for samsclub.com
samsclub.com: #Paths
samsclub.com: #Files
samsclub.com: #Sitemap
adverdirect.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
adverdirect.com: #content{margin:0 0 0 2%;position:relative;}
cars.com: #Individual Review Crawl Control
cars.com: # DR
figma.com: # robots.txt
figma.com: # Handbook of Robotics, 56th Edition, 2058 A.D.
usmagazine.com: # Sitemap archive
nvidia.com: # Welcome to NVIDIA
nvidia.com: # We like people who read our code!
nvidia.com: # Cruise by our careers section while you're here
nvidia.com: # https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite
nvidia.com: # Or check out our YouTube channel for our latest
nvidia.com: # https://www.youtube.com/user/nvidia
nvidia.com: # Last updated 16th FEB 2021
time.com: # exclude urls of format time.com/page/7456/?search
time.com: # NextAdvisor Sitemap
time.com: # Sitemap archive
time.com: # Video Sitemap archive
bandcamp.com: # the currency data endpoint is required to render pages
bandcamp.com: # pattern matching known to work only with Google and Yahoo
bandcamp.com: # badly-behaving bots
bandcamp.com: # unwanted bots
aloyoga.com: # we use Shopify as our ecommerce platform
aloyoga.com: # Google adsbot ignores robots.txt unless specifically named!
boohoo.com: # Pages
boohoo.com: # Product Filter #
boohoo.com: # Ordering & Product per page #
boohoo.com: # Number of product per page
boohoo.com: # Order By #
boohoo.com: # Price #
boohoo.com: # Faceted Navigation #
boohoo.com: # UK & ALL Search #
boohoo.com: # US Search #
boohoo.com: # AU Search #
boohoo.com: # IE Search #
boohoo.com: # FR Search #
boohoo.com: # Search #
boohoo.com: # Handle Execption for colour/size attribute use as internal link #
boohoo.com: # Additional Rules to handle exception #
boohoo.com: # Ensure no Static Ressources is blocked #
boohoo.com: # Crawl Delay - 5 URL max per second
correios.com.br: # Define access-restrictions for robots/spiders
correios.com.br: # http://www.robotstxt.org/wc/norobots.html
correios.com.br: # By default we allow robots to access all areas of our site
correios.com.br: # already accessible to anonymous users
correios.com.br: # Add Googlebot-specific syntax extension to exclude forms
correios.com.br: # that are repeated for each piece of content in the site
correios.com.br: # the wildcard is only supported by Googlebot
correios.com.br: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling
tagesschau.de: # Robots Exclusions for www.tagesschau.de
tagesschau.de: # Sauger wollen wir sperren
auspost.com.au: # auspost.com.au
overleaf.com: # robots.txt for https://www.sharelatex.com/
sakshi.com: #
sakshi.com: # robots.txt
sakshi.com: #
sakshi.com: # This file is to prevent the crawling and indexing of certain parts
sakshi.com: # of your site by web crawlers and spiders run by sites like Yahoo!
sakshi.com: # and Google. By telling these "robots" where not to go on your site,
sakshi.com: # you save bandwidth and server resources.
sakshi.com: #
sakshi.com: # This file will be ignored unless it is at the root of your host:
sakshi.com: # Used: http://example.com/robots.txt
sakshi.com: # Ignored: http://example.com/site/robots.txt
sakshi.com: #
sakshi.com: # For more information about the robots.txt standard, see:
sakshi.com: # http://www.robotstxt.org/robotstxt.html
sakshi.com: # CSS, JS, Images
sakshi.com: # Directories
sakshi.com: # Files
sakshi.com: # Paths (clean URLs)
sakshi.com: # Paths (no clean URLs)
core.ac.uk: # robots.txt for CORE http://core.ac.uk and mirror sites.
core.ac.uk: # We allow access crawlers to access our site, but require unknown crawlers to crawl at a lower frequency. Should you need to crawl at a higher frequency please contact us.
core.ac.uk: # If you need to access or harvest our content, please consider using the CORE API: https://core.ac.uk/services#api instead of crawling the whole website
thehill.com: #
thehill.com: # robots.txt
thehill.com: #
thehill.com: # This file is to prevent the crawling and indexing of certain parts
thehill.com: # of your site by web crawlers and spiders run by sites like Yahoo!
thehill.com: # and Google. By telling these "robots" where not to go on your site,
thehill.com: # you save bandwidth and server resources.
thehill.com: #
thehill.com: # This file will be ignored unless it is at the root of your host:
thehill.com: # Used: http://example.com/robots.txt
thehill.com: # Ignored: http://example.com/site/robots.txt
thehill.com: #
thehill.com: # For more information about the robots.txt standard, see:
thehill.com: # http://www.robotstxt.org/robotstxt.html
thehill.com: # CSS, JS, Images
thehill.com: # Directories
thehill.com: # Files
thehill.com: # Paths (clean URLs)
thehill.com: # Paths (no clean URLs)
businessweekly.com.tw: #
businessweekly.com.tw: # robots.txt for http://www.businessweekly.com.tw/
businessweekly.com.tw: #
businessweekly.com.tw: #PartialView§£¿≥™Ω±µ≥Q∑j¥M®Ï(∑|≥y¶®Missing Title Tags™∫∞TÆß)
shopstyle.com: # Production Robots.txt file
shopstyle.com: # Sitemap
shopstyle.com: # Baidu doesnt support Crawl-delay but added anyways in case they ever do
shopstyle.com: # Allowing checkout experience to be crawlable for google shopping, order doesnt matter it is based on https://developers.google.com/search/reference/robots_txt
topnaz.com: # All Bots
topnaz.com: # Sitemap
olx.com.pk: #Base Filters
olx.com.pk: #Cars Filters
olx.com.pk: #RE Filters
olx.com.pk: # Sitemap
olx.com.pk: # Generated on 2019-12-11T18:12:57.348Z
psychologytoday.com: # Resource Directories
psychologytoday.com: # Static Files
psychologytoday.com: # Static Drupal resources explicitly allowed
psychologytoday.com: # Drupal Paths
psychologytoday.com: #Disallow: /comment/
psychologytoday.com: # Drupal Paths, wildcard prefix
psychologytoday.com: #Disallow: /*/comment/
psychologytoday.com: # Drupal Paths, au prefix
psychologytoday.com: #Disallow: /au/comment/
psychologytoday.com: # Drupal Paths, ca prefix
psychologytoday.com: #Disallow: /ca/comment/
psychologytoday.com: # Drupal Paths, gb prefix
psychologytoday.com: #Disallow: /gb/comment/
psychologytoday.com: # Drupal Paths, intl prefix
psychologytoday.com: #Disallow: /intl/comment/
psychologytoday.com: # Drupal Paths, us prefix
psychologytoday.com: #Disallow: /us/comment/
psychologytoday.com: # Paths (no unclean URLs)
francetvinfo.fr: # KIF-3995: (test) Allow 3 specific ESI
bankrate.com: # directed to all spiders
fnb.co.za: # robots.txt for www.fnb.co.za
prensalibre.com: # Sitemap archive
independent.co.uk: # Files
independent.co.uk: # Paths (clean URLs)
independent.co.uk: # Paths (no clean URLs)
independent.co.uk: # Ignore refresh URLs
autotrader.com: #Disallow: /car-dealers/client/
autotrader.com: #Disallow: /car-payment-calculator
autotrader.com: #Disallow: /car-affordability-calculator
autotrader.com: #Disallow: /car-payment-calculator
autotrader.com: #Disallow: /car-affordability-calculator
anchor.fm: # www.robotstxt.org/
argentina.gob.ar: #
argentina.gob.ar: # robots.txt
argentina.gob.ar: #
argentina.gob.ar: # This file is to prevent the crawling and indexing of certain parts
argentina.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo!
argentina.gob.ar: # and Google. By telling these "robots" where not to go on your site,
argentina.gob.ar: # you save bandwidth and server resources.
argentina.gob.ar: #
argentina.gob.ar: # This file will be ignored unless it is at the root of your host:
argentina.gob.ar: # Used: http://example.com/robots.txt
argentina.gob.ar: # Ignored: http://example.com/site/robots.txt
argentina.gob.ar: #
argentina.gob.ar: # For more information about the robots.txt standard, see:
argentina.gob.ar: # http://www.robotstxt.org/robotstxt.html
argentina.gob.ar: # CSS, JS, Images
argentina.gob.ar: # Directories
argentina.gob.ar: # Files
argentina.gob.ar: # Paths (clean URLs)
argentina.gob.ar: # Paths (no clean URLs)
programiz.com: #
programiz.com: # robots.txt
programiz.com: #
programiz.com: # This file is to prevent the crawling and indexing of certain parts
programiz.com: # of your site by web crawlers and spiders run by sites like Yahoo!
programiz.com: # and Google. By telling these "robots" where not to go on your site,
programiz.com: # you save bandwidth and server resources.
programiz.com: #
programiz.com: # This file will be ignored unless it is at the root of your host:
programiz.com: # Used: http://example.com/robots.txt
programiz.com: # Ignored: http://example.com/site/robots.txt
programiz.com: #
programiz.com: # For more information about the robots.txt standard, see:
programiz.com: # http://www.robotstxt.org/robotstxt.html
programiz.com: # CSS, JS, Images
programiz.com: # Directories
programiz.com: # Files
programiz.com: # Paths (clean URLs)
programiz.com: # Paths (no clean URLs)
programiz.com: # Disallow: /node
cornell.edu: # SiteImprove should ignore these page particularly because they aren't actually used, but are still linked for historical reasons
egnyte.com: #
egnyte.com: # robots.txt
egnyte.com: #
egnyte.com: # This file is to prevent the crawling and indexing of certain parts
egnyte.com: # of your site by web crawlers and spiders run by sites like Yahoo!
egnyte.com: # and Google. By telling these "robots" where not to go on your site,
egnyte.com: # you save bandwidth and server resources.
egnyte.com: #
egnyte.com: # This file will be ignored unless it is at the root of your host:
egnyte.com: # Used: http://example.com/robots.txt
egnyte.com: # Ignored: http://example.com/site/robots.txt
egnyte.com: #
egnyte.com: # For more information about the robots.txt standard, see:
egnyte.com: # http://www.robotstxt.org/robotstxt.html
egnyte.com: # CSS, JS, Images
egnyte.com: # Directories
egnyte.com: # Files
egnyte.com: # Paths (clean URLs)
egnyte.com: # Paths (no clean URLs)
cvent.com: #
cvent.com: # robots.txt for http://www.cvent.com/
cvent.com: #
cvent.com: # $Id: robots.txt,v 1.00 2003/04/28
cvent.com: #
cvent.com: # exclude all application areas
cvent.com: #event
cvent.com: #emarketing
cvent.com: #csn venue profiles
cvent.com: #destination guide
cvent.com: #microsites
cvent.com: #Destination Guide
sitesell.com: # Do not remove the Crawl-delay directive It is needed to prevent dos
sitesell.com: # conditions caused by certain robots, like msn/bing etc
purdue.edu: #
purdue.edu: # Discovery Park
purdue.edu: #
purdue.edu: # Updated by Jakob Knigga (jknigga) 9/21/2017
purdue.edu: #
purdue.edu: #
purdue.edu: # Gradschool
purdue.edu: #
purdue.edu: #
purdue.edu: # HHS
purdue.edu: #
purdue.edu: # Updated by Lisa Stein 1/27/2017 - FP 764035
purdue.edu: #
purdue.edu: #
purdue.edu: # Vet
purdue.edu: #
purdue.edu: # Updated by Osmar Lopez 5/29/2019 - FP 1114881
purdue.edu: #
purdue.edu: # Updated by Wright Frazier 4/8/2020 - FP 1289068
purdue.edu: #
purdue.edu: # Site Map
purdue.edu: #
almaany.com: # robots.txt for http://www.almaany.com/
almaany.com: # disallow all
almaany.com: # but allow only important bots
twoo.com: # Allow Google AdSense crawler on most pages.
twoo.com: # By default, disallow all crawlers.
twoo.com: # Full url of latest sitemap.
vklass.se: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
vklass.se: #content{margin:0 0 0 2%;position:relative;}
lavanguardia.com: # Temas
lavanguardia.com: # Paths a no indexar
lavanguardia.com: # Paginas LVD a no indexar
lavanguardia.com: # Extensiones de contenidos no indexables
lavanguardia.com: # Agentes nocivos conocidos
podio.com: #
podio.com: # 1. A robot may not injure a human being or, through inaction, allow a
podio.com: # human being to come to harm.
podio.com: #
podio.com: # 2. A robot must obey orders given it by human beings except where such
podio.com: # orders would conflict with the First Law.
podio.com: #
podio.com: # 3. A robot must protect its own existence as long as such protection
podio.com: # does not conflict with the First or Second Law.
podio.com: #
podio.com: # Isaac Asimov, The Zeroth Law of Robotics
wunderground.com: #
wunderground.com: # /robots.txt
wunderground.com: #
wunderground.com: #
wunderground.com: # Last updated by VShrivastava 02/18/2020
wunderground.com: #
wunderground.com: # Disallowed for PhantomJS
wunderground.com: # Crawl-delay: 10
wunderground.com: # App paths
wunderground.com: # Directories
wunderground.com: # Files
wunderground.com: # Paths (clean URLs)
wunderground.com: # Disallow: /migration/
wunderground.com: # Paths (no clean URLs)
colorado.edu: #
colorado.edu: # robots.txt
colorado.edu: #
colorado.edu: # This file is to prevent the crawling and indexing of certain parts
colorado.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
colorado.edu: # and Google. By telling these "robots" where not to go on your site,
colorado.edu: # you save bandwidth and server resources.
colorado.edu: #
colorado.edu: # This file will be ignored unless it is at the root of your host:
colorado.edu: # Used: http://example.com/robots.txt
colorado.edu: # Ignored: http://example.com/site/robots.txt
colorado.edu: #
colorado.edu: # For more information about the robots.txt standard, see:
colorado.edu: # http://www.robotstxt.org/wc/robots.html
colorado.edu: #
colorado.edu: # For syntax checking, see:
colorado.edu: # http://www.sxw.org.uk/computing/robots/check.html
colorado.edu: # Directories
colorado.edu: # Files
colorado.edu: # Paths (clean URLs)
colorado.edu: # Paths (no clean URLs)
colorado.edu: # CUSTOM
colorado.edu: # INC0331010 - 2017-03-02
colorado.edu: # FIT-1785 - 06/06/2016
colorado.edu: # EXP-3960 - 06/15/2016
colorado.edu: # feature/1
colorado.edu: # EXPRESS
ssa.gov: # www.ssa.gov robots.txt
ssa.gov: # 08/07/18
ssa.gov: # 08/06/19 added second sitemap
ssa.gov: # 09/25/20 added 2019 contingency plan PDF + html
ssa.gov: # Eric Brown, Wayne Whitten
ssa.gov: # Disallow: /agency/shutdown/
careerbuilder.com: # ======================
careerbuilder.com: # Directories
careerbuilder.com: # ======================
careerbuilder.com: # Paths (clean URLs)
careerbuilder.com: # ======================
careerbuilder.com: # Disallow: /CSH/JobSkinDetails.aspx
careerbuilder.com: # Disallow: /csh/jobskindetails.aspx
careerbuilder.com: # Disallow: /CSH/Details.aspx
careerbuilder.com: # Disallow: /csh/details.aspx
careerbuilder.com: # Paths (no clean URLs)
careerbuilder.com: # ======================
careerbuilder.com: # Paths (GRRP provided)
careerbuilder.com: # ======================
careerbuilder.com: #
careerbuilder.com: # disallow signup pages
careerbuilder.com: # ======================
careerbuilder.com: #
careerbuilder.com: # just for the Googlebot
careerbuilder.com: # ======================
nsw.gov.au: #
nsw.gov.au: # robots.txt
nsw.gov.au: #
nsw.gov.au: # This file is to prevent the crawling and indexing of certain parts
nsw.gov.au: # of your site by web crawlers and spiders run by sites like Yahoo!
nsw.gov.au: # and Google. By telling these "robots" where not to go on your site,
nsw.gov.au: # you save bandwidth and server resources.
nsw.gov.au: #
nsw.gov.au: # This file will be ignored unless it is at the root of your host:
nsw.gov.au: # Used: http://example.com/robots.txt
nsw.gov.au: # Ignored: http://example.com/site/robots.txt
nsw.gov.au: #
nsw.gov.au: # For more information about the robots.txt standard, see:
nsw.gov.au: # http://www.robotstxt.org/robotstxt.html
nsw.gov.au: # CSS, JS, Images
nsw.gov.au: # Directories
nsw.gov.au: # Files
nsw.gov.au: # Paths (clean URLs)
nsw.gov.au: # Paths (no clean URLs)
mathworks.com: # robots.txt for http://www.mathworks.com and subdomains
mathworks.com: # Please do not update this file without contacting the owner
mathworks.com: # Owner: webops at mathworks.com
mathworks.com: # Note 1 for updating: Please keep list alphabetized by URL.
mathworks.com: # Note 2 for updating: When making an update, it needs to be updated for /de/, /fr/, /en/ sections as well.
mathworks.com: # /de/ below
mathworks.com: # /fr/ below
mathworks.com: # /en/ below
buyma.us: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
buyma.us: #
buyma.us: # To ban all spiders from the entire site uncomment the next two lines:
buyma.us: # site map
dstv.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
dstv.com: #content{margin:0 0 0 2%;position:relative;}
duniagames.co.id: # Allow all URLs (see http://www.robotstxt.org/robotstxt.html)
echo.msk.ru: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
echo.msk.ru: #
echo.msk.ru: # To ban all spiders from the entire site uncomment the next two lines:
echo.msk.ru: # User-agent: *
echo.msk.ru: # Disallow: /
y8.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
nextdirect.com: ##### 500s #####
focus.de: # robots.txt for https://www.focus.de .
focus.de: # Gibt an, welche Unterverzeichnisse nicht durch Crawler durchsucht werden sollen
doodle.com: # Allow Twitterbot in order to read Twitter Cards
doodle.com: # Allow Google Mediabot for AdSense/AdX
kalerkantho.com: # Crawl kalerkantho.com,
ouest-france.fr: #
ouest-france.fr: # robots.txt
ouest-france.fr: #
ouest-france.fr: # This file is to prevent the crawling and indexing of certain parts
ouest-france.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
ouest-france.fr: # and Google. By telling these "robots" where not to go on your site,
ouest-france.fr: # you save bandwidth and server resources.
ouest-france.fr: #
ouest-france.fr: # This file will be ignored unless it is at the root of your host:
ouest-france.fr: # Used: http://example.com/robots.txt
ouest-france.fr: # Ignored: http://example.com/site/robots.txt
ouest-france.fr: #
ouest-france.fr: # For more information about the robots.txt standard, see:
ouest-france.fr: # http://www.robotstxt.org/wc/robots.html
ouest-france.fr: #
ouest-france.fr: # For syntax checking, see:
ouest-france.fr: # http://www.sxw.org.uk/computing/robots/check.html
ouest-france.fr: # Allowed search engines directives
ouest-france.fr: #Sitemaps
ouest-france.fr: # Directories
ouest-france.fr: # Files
ouest-france.fr: # Paths (clean URLs)
ouest-france.fr: # Paths (no clean URLs)
ouest-france.fr: # Ouest-France galaad
ouest-france.fr: # Crawling limitation fixed for low priority bots
ouest-france.fr: # Directories
ouest-france.fr: # Files
ouest-france.fr: # Paths (clean URLs)
ouest-france.fr: # Paths (no clean URLs)
ouest-france.fr: # Ouest-France galaad
usertesting.com: #
usertesting.com: # robots.txt
usertesting.com: #
usertesting.com: # This file is to prevent the crawling and indexing of certain parts
usertesting.com: # of your site by web crawlers and spiders run by sites like Yahoo!
usertesting.com: # and Google. By telling these "robots" where not to go on your site,
usertesting.com: # you save bandwidth and server resources.
usertesting.com: #
usertesting.com: # This file will be ignored unless it is at the root of your host:
usertesting.com: # Used: http://example.com/robots.txt
usertesting.com: # Ignored: http://example.com/site/robots.txt
usertesting.com: #
usertesting.com: # For more information about the robots.txt standard, see:
usertesting.com: # http://www.robotstxt.org/robotstxt.html
usertesting.com: # CSS, JS, Images
usertesting.com: # Directories
usertesting.com: # Files
usertesting.com: # Paths (clean URLs)
usertesting.com: # Paths (no clean URLs)
clever.com: # Don't allow web crawlers to index Craft
klikbca.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
klikbca.com: #content{margin:0 0 0 2%;position:relative;}
kaiserpermanente.org: # Kaiser Permanente: robots.txt
kaiserpermanente.org: # sitemaps - English
kaiserpermanente.org: # region sitemaps - English
kaiserpermanente.org: # facility sitemaps - English
kaiserpermanente.org: # doctor sitemaps - English
kaiserpermanente.org: # sitemaps - Spanish
kaiserpermanente.org: # region sitemaps - Spanish
kaiserpermanente.org: # facility sitemaps - Spanish
kaiserpermanente.org: # doctor sitemaps - Spanish
spigen.com: # we use Shopify as our ecommerce platform
spigen.com: # Google adsbot ignores robots.txt unless specifically named!
subito.it: # It is expressively forbidden to use search robots or other automatic methods
subito.it: # to access Subito.it. Only if Subito.it has given such permission can be accepted.
whattomine.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
whattomine.com: #
whattomine.com: # To ban all spiders from the entire site uncomment the next two lines:
whattomine.com: # User-agent: *
whattomine.com: # Disallow: /
my-best.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
my-best.com: #
my-best.com: # To ban all spiders from the entire site uncomment the next two lines:
pbs.org: #Disallow: /.
lemonde.fr: # 16/08/2019
lemonde.fr: # Il est interdit d'utiliser des robots d'indexation Web ou d'autres méthodes automatiques de feuilletage ou de navigation sur ce site Web.
lemonde.fr: # Nous interdisons de crawler notre site Web en utilisant un agent d'utilisateur volé qui ne correspond pas à votre identité.
lemonde.fr: # « Violation du droit du producteur de base de données - article L 342-1 et suivant le Code de la propriété intellectuelle ».
lemonde.fr: # Nous vous invitons à nous contacter pour contracter une licence d'utilisation. Seuls les partenaires sont habilités à utiliser nos contenus pour un usage autre que strictement individuel.
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: # WordPress
lemonde.fr: #
lemonde.fr: # Sitemaps
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: # Robots exclus de toute indexation.
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
lemonde.fr: #
shiksha.com: # Filename:robots.txt file for https://www.shiksha.com/
abplive.com: #Sitemaps
programme-tv.net: # robots.txt file for Télé Loisirs
programme-tv.net: # desktop
programme-tv.net: # https://www.robotstxt.org/
elespanol.com: # Agentes no deseados conocidos User-agent: ia_archiver
telegraph.co.uk: # Robots.txt file
telegraph.co.uk: # All robots will spider the domain
cettire.com: # we use Shopify as our ecommerce platform
cettire.com: # Google adsbot ignores robots.txt unless specifically named!
feishu.cn: # robots.txt file from https://www.feishu.cn/
feishu.cn: # All robots will spider the domain
google.se: # AdsBot
google.se: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
paysafecard.com: #typo3 Disallow
paysafecard.com: #assets Disallow
paysafecard.com: #Module Disallow
c-sharpcorner.com: #Disallow: /
c-sharpcorner.com: # User-Agent: Mediapartners-Google
c-sharpcorner.com: # User-Agent: Googlebot
c-sharpcorner.com: # User-Agent: Adsbot-Google
c-sharpcorner.com: # User-Agent: Googlebot-Image
post.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
post.ir: #content{margin:0 0 0 2%;position:relative;}
snapdeal.com: # robots.txt for https://www.snapdeal.com/
enstage-sas.com: #small_pop_up .all_content_holder {
enstage-sas.com: #small_pop_up .all_content_holder #close_button{
enstage-sas.com: #small_pop_up .all_content_holder .left_side {
enstage-sas.com: #small_pop_up .all_content_holder .right_side {
enstage-sas.com: #small_pop_up .all_content_holder .right_side p:nth-child(1) {
enstage-sas.com: #small_pop_up .all_content_holder .right_side p:nth-child(2) img {
texas.gov: #faq-price {
texas.gov: #gsc-i-id1::-moz-placeholder {
texas.gov: #gsc-i-id1::-webkit-input-placeholder {
texas.gov: #gsc-i-id1:-ms-input-placeholder {
iteye.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
harveynichols.com: # Sales comms
harveynichols.com: # Account
harveynichols.com: # Checkout
harveynichols.com: # Product Listing Pages & ajax calls
harveynichols.com: # Website Utilities (ajax calls etc.)
harveynichols.com: # Misc
harveynichols.com: # Articles
harveynichols.com: # Tracking
harveynichols.com: # Facets
harveynichols.com: # Brand subcats
harveynichols.com: # Global-e
google.co.il: # AdsBot
google.co.il: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
history.com: # Tempest - history
masterclass.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
masterclass.com: #
masterclass.com: # To ban all spiders from the entire site uncomment the next two lines:
edx.org: #
edx.org: # robots.txt
edx.org: #
edx.org: # This file is to prevent the crawling and indexing of certain parts
edx.org: # of your site by web crawlers and spiders run by sites like Yahoo!
edx.org: # and Google. By telling these "robots" where not to go on your site,
edx.org: # you save bandwidth and server resources.
edx.org: #
edx.org: # This file will be ignored unless it is at the root of your host:
edx.org: # Used: http://example.com/robots.txt
edx.org: # Ignored: http://example.com/site/robots.txt
edx.org: #
edx.org: # For more information about the robots.txt standard, see:
edx.org: # http://www.robotstxt.org/robotstxt.html
edx.org: # CSS, JS, Images
edx.org: # Directories
edx.org: # Files
edx.org: # Paths (clean URLs)
edx.org: # Allowed Spanish Paths (clean URLs)
edx.org: # Disallowed Spanish Paths (all others)
edx.org: # Paths (no clean URLs)
edx.org: # Sitemaps
express.pk: # robots.txt generated at http://www.mcanerin.com
westpac.com.au: # robots.txt generated for www.westpac.com.au
mysql.com: ## ROBOTS.TXT - http://www.robotstxt.org/ ##
rooziato.com: #
rooziato.com: # 15 DEC 2020
rooziato.com: # Author: M.R
rooziato.com: #
sportskeeda.com: # allow adsense bot to parse no-index content
sportskeeda.com: # disallow folders
tripadvisor.in: # Hi there,
tripadvisor.in: #
tripadvisor.in: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
tripadvisor.in: #
tripadvisor.in: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet?
tripadvisor.in: #
tripadvisor.in: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team
tripadvisor.in: #
tripadvisor.in: # Email seoRockstar@tripadvisor.com
tripadvisor.in: #
tripadvisor.in: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo
tripadvisor.in: #
tripadvisor.in: #
inbox.lv: # www.robotstxt.org/
inbox.lv: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
euronews.com: # www.robotstxt.org/
euronews.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
euronews.com: # weather
ipsosinteractive.com: #
ipsosinteractive.com: # robots.txt
ipsosinteractive.com: #
ipsosinteractive.com: # This file is to prevent the crawling and indexing of certain parts
ipsosinteractive.com: # of your site by web crawlers and spiders run by sites like Yahoo!
ipsosinteractive.com: # and Google. By telling these "robots" where not to go on your site,
ipsosinteractive.com: # you save bandwidth and server resources.
ipsosinteractive.com: #
ipsosinteractive.com: # This file will be ignored unless it is at the root of your host:
ipsosinteractive.com: # Used: http://example.com/robots.txt
ipsosinteractive.com: # Ignored: http://example.com/site/robots.txt
ipsosinteractive.com: #
ipsosinteractive.com: # For more information about the robots.txt standard, see:
ipsosinteractive.com: # http://www.robotstxt.org/robotstxt.html
ipsosinteractive.com: # Directories
ipsosinteractive.com: # Files
ipsosinteractive.com: # Paths (clean URLs)
ipsosinteractive.com: # Paths (no clean URLs)
ipsosinteractive.com: # Specific files Paths
collegedunia.com: #Disallow: /*?ajax=1
collegedunia.com: #URL parameters blocking for SEO
yandex.com: # yandex.com
virgool.io: # Block MegaIndex.ru
virgool.io: # Block YandexBot
virgool.io: # Block Baidu
virgool.io: # Block Youdao
virgool.io: # Block Majestic
tagged.com: #########################################################################
tagged.com: # /robots.txt file for http://www.tagged.com/
tagged.com: # mail webmaster@tagged.com for constructive criticism
tagged.com: #########################################################################
tagged.com: # Any others
docker.com: #
docker.com: # robots.txt
docker.com: #
docker.com: # This file is to prevent the crawling and indexing of certain parts
docker.com: # of your site by web crawlers and spiders run by sites like Yahoo!
docker.com: # and Google. By telling these "robots" where not to go on your site,
docker.com: # you save bandwidth and server resources.
docker.com: #
docker.com: # This file will be ignored unless it is at the root of your host:
docker.com: # Used: http://example.com/robots.txt
docker.com: # Ignored: http://example.com/site/robots.txt
docker.com: #
docker.com: # For more information about the robots.txt standard, see:
docker.com: # http://www.robotstxt.org/robotstxt.html
docker.com: # CSS, JS, Images
docker.com: # Directories
docker.com: # Files
docker.com: # Paths (clean URLs)
docker.com: # Paths (no clean URLs)
irecommend.ru: #
irecommend.ru: # robots.txt
irecommend.ru: #
irecommend.ru: # This file is to prevent the crawling and indexing of certain parts
irecommend.ru: # of your site by web crawlers and spiders run by sites like Yahoo!
irecommend.ru: # and Google. By telling these "robots" where not to go on your site,
irecommend.ru: # you save bandwidth and server resources.
irecommend.ru: #
irecommend.ru: # This file will be ignored unless it is at the root of your host:
irecommend.ru: # Used: http://example.com/robots.txt
irecommend.ru: # Ignored: http://example.com/site/robots.txt
irecommend.ru: #
irecommend.ru: # For more information about the robots.txt standard, see:
irecommend.ru: # http://www.robotstxt.org/wc/robots.html
irecommend.ru: #
irecommend.ru: # For syntax checking, see:
irecommend.ru: # http://www.sxw.org.uk/computing/robots/check.html
irecommend.ru: # Directories
irecommend.ru: # Files
irecommend.ru: # Paths (clean URLs)
irecommend.ru: # Paths (no clean URLs)
irecommend.ru: # Social auth
irecommend.ru: #misc
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/sitemap-latest.xml
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/sitemap-tags.xml
tasnimnews.com: # Sitemap Archive
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/fa/sitemaps/archive/index.xml
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/en/sitemaps/archive/index.xml
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/ar/sitemaps/archive/index.xml
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/tr/sitemaps/archive/index.xml
tasnimnews.com: #Sitemap: https://www.tasnimnews.com/ur/sitemaps/archive/index.xml
duomai.com: # https://www.robotstxt.org/robotstxt.html
myus.com: # allow all crawlers
myus.com: # images
myus.com: # search
myus.com: # country page tabs
myus.com: # 10-23-2017 Update
myus.com: # Meeting On 6/7
myus.com: # Meeting On 6/21
myus.com: # member reviews by country
myus.com: # Meeting on 3/29
myus.com: # blog and news paging
myus.com: # 6524
myus.com: # Landing pages
myus.com: # allow - not fully supported, add entries to sitemap.xml
myus.com: # block the rest
myus.com: # 7230
myus.com: # block all country-specific landing pages
myus.com: # 7229
myus.com: # 3-15-2016 Meeting
myus.com: # sitemap - Supported by Google, Ask, Bing, Yahoo; defined on sitemaps.org
myus.com: # 8235
myus.com: # 2957
myus.com: # banners
myus.com: #Ajax requests
myus.com: # AddSearchBot
dlsite.com: # noindexを通知するためにクロールを許可する
aarp.org: # _____ _____
aarp.org: # /\ /\ | __ \ | __ \
aarp.org: # / \ / \ | |__) | | |__) |
aarp.org: # / /\ \ / /\ \ | _ / | ___/
aarp.org: # / ____ \ / ____ \ | | \ \ | |
aarp.org: # /_/ \_\ /_/ \_\ |_| \_\ |_|
aarp.org: #
aarp.org: # Robots.txt file created by https://www.aarp.org/
aarp.org: # For domain: https://www.aarp.org/
aarp.org: # Created 09-12-2017 Raymond Deschenes - Updated 1-28-2020 site search relocation
aarp.org: # All robots will spider the domain
zappos.com: # Global robots.txt updated 2020-04-02
rackspace.com: #
rackspace.com: # robots.txt
rackspace.com: #
rackspace.com: # This file is to prevent the crawling and indexing of certain parts
rackspace.com: # of your site by web crawlers and spiders run by sites like Yahoo!
rackspace.com: # and Google. By telling these "robots" where not to go on your site,
rackspace.com: # you save bandwidth and server resources.
rackspace.com: #
rackspace.com: # This file will be ignored unless it is at the root of your host:
rackspace.com: # Used: http://example.com/robots.txt
rackspace.com: # Ignored: http://example.com/site/robots.txt
rackspace.com: #
rackspace.com: # For more information about the robots.txt standard, see:
rackspace.com: # http://www.robotstxt.org/robotstxt.html
rackspace.com: # CSS, JS, Images
rackspace.com: # Directories
rackspace.com: # Files
rackspace.com: # Paths (clean URLs)
rackspace.com: # Paths (no clean URLs)
brainly.com.br: #Brainly Robots.txt 31.07.2017
brainly.com.br: # Disallow Marketing bots
brainly.com.br: #Disallow exotic search engine crawlers
brainly.com.br: #Disallow other crawlers
brainly.com.br: # Good bots whitelisting:
brainly.com.br: #Other bots
brainly.com.br: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring
brainly.com.br: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*)
brainly.com.br: #Obot - IBM X-Force service
brainly.com.br: #SafeDNSBot (https://www.safedns.com/searchbot)
brainly.co.id: #Brainly Robots.txt 31.07.2017
brainly.co.id: # Disallow Marketing bots
brainly.co.id: #Disallow exotic search engine crawlers
brainly.co.id: #Disallow other crawlers
brainly.co.id: # Good bots whitelisting:
brainly.co.id: #Other bots
brainly.co.id: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring
brainly.co.id: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*)
brainly.co.id: #Obot - IBM X-Force service
brainly.co.id: #SafeDNSBot (https://www.safedns.com/searchbot)
cna.com.tw: # User-agent: ia_archiver
cna.com.tw: # Disallow: /MakerList/Index?*
cna.com.tw: # Disallow: /MakerContent/Index?*
cna.com.tw: # Disallow: /VideoList/Index?*
cna.com.tw: # Disallow: /VideoContent/Index?*
ig.com: #Site contents Copyright IG Group
ameli.fr: #
ameli.fr: # robots.txt
ameli.fr: #
ameli.fr: # This file is to prevent the crawling and indexing of certain parts
ameli.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
ameli.fr: # and Google. By telling these "robots" where not to go on your site,
ameli.fr: # you save bandwidth and server resources.
ameli.fr: #
ameli.fr: # This file will be ignored unless it is at the root of your host:
ameli.fr: # Used: http://example.com/robots.txt
ameli.fr: # Ignored: http://example.com/site/robots.txt
ameli.fr: #
ameli.fr: # For more information about the robots.txt standard, see:
ameli.fr: # http://www.robotstxt.org/robotstxt.html
ameli.fr: # CSS, JS, Images
ameli.fr: # Directories
ameli.fr: # Files
ameli.fr: # Paths (clean URLs)
ameli.fr: # Paths (no clean URLs)
venturebeat.com: # This file was generated on Wed, 24 Feb 2021 19:10:02 +0000
venturebeat.com: # Sitemap archive
iheart.com: # Production
olx.in: #General Filters
olx.in: #RE Filters
olx.in: #Expired Ads
olx.in: # Generated on 2020-03-11T09:58:35.850Z
commsec.com.au: # /robots.txt file for https://www.commsec.com.au/<br>
domestika.org: # Faceted/Sorting navigation
domestika.org: # Disallow: *area=*
domestika.org: # Disallow: *sorting=*
domestika.org: # Disallow: *date=*
domestika.org: # Disallow: *status=*
domestika.org: # Disallow: /auth
domestika.org: # Disallow: */search
domestika.org: # Virtual URLs - Custom tracking
calculatorsoup.com: #
calculatorsoup.com: #
calculatorsoup.com: # applies to all robots disallow
calculatorsoup.com: # 2019-02-22 remove
calculatorsoup.com: # Disallow: /search.php
calculatorsoup.com: # block Mediapartners from search.php 2017-03-12 because they try many search query's
calculatorsoup.com: # 2019-02-22 remove
calculatorsoup.com: # User-agent: Mediapartners-Google
calculatorsoup.com: # Allow: /
calculatorsoup.com: # Disallow: /search.php
calculatorsoup.com: # do not beleive this is respected
calculatorsoup.com: # From Wiki
calculatorsoup.com: # Crawlers that are kind enough to obey, but which we'd rather not have
calculatorsoup.com: # unless they're feeding search engines.
calculatorsoup.com: # Some bots are known to be trouble, particularly those designed to copy
calculatorsoup.com: # entire sites. Please obey robots.txt.
calculatorsoup.com: #
calculatorsoup.com: # Sorry, wget in its recursive mode is a frequent problem.
calculatorsoup.com: # Please read the man page and use it properly; there is a
calculatorsoup.com: # --wait option you can use to set the delay between hits,
calculatorsoup.com: # for instance.
calculatorsoup.com: #
calculatorsoup.com: #
calculatorsoup.com: # The 'grub' distributed client has been *very* poorly behaved.
calculatorsoup.com: #
calculatorsoup.com: #
calculatorsoup.com: # Doesn't follow robots.txt anyway, but...
calculatorsoup.com: #
calculatorsoup.com: #
calculatorsoup.com: # Hits many times per second, not acceptable
calculatorsoup.com: # http://www.nameprotect.com/botinfo.html
calculatorsoup.com: # A capture bot, downloads gazillions of pages with no public benefit
calculatorsoup.com: # http://www.webreaper.net/
trilltrill.jp: # robotstxt.org/
linguee.fr: # In ANY CASE, you are NOT ALLOWED to train Machine Translation Systems
linguee.fr: # on data crawled on Linguee.
linguee.fr: #
linguee.fr: # Linguee contains fake entries - changes in the wording of sentences,
linguee.fr: # complete fake entries.
linguee.fr: # These entries can be used to identify even small parts of our material
linguee.fr: # if you try to copy it without our permission.
linguee.fr: # Machine Translation systems trained on these data will learn these errors
linguee.fr: # and can be identified easily. We will take all legal measures against anyone
linguee.fr: # training Machine Translation systems on data crawled from this website.
trademe.co.nz: #Classic
trademe.co.nz: #Allow PI
trademe.co.nz: #Allow address
trademe.co.nz: #Disallow Map
trademe.co.nz: #Disallow Classic non Category search
trademe.co.nz: #FrEnd
trademe.co.nz: #Allow FrEnd resources
trademe.co.nz: #Allow new car
trademe.co.nz: #CMS Content
trademe.co.nz: #Property
trademe.co.nz: #Motors
trademe.co.nz: #Jobs
trademe.co.nz: # specific bot behaviour
lg.com: # LGEUS-744, LGEUS-1201
lg.com: # LGEUS-744, LGEUS-1201
lg.com: # Sitemap files
slideserve.com: #Baiduspider
nearpod.com: # Temporary
gazzettadelsud.it: #Disallow: /articoli/ajax/
google.sk: # AdsBot
google.sk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
mirror.co.uk: #Agent Specific Disallowed Sections
corriere.it: # vecchia ultima ora
corriere.it: #Dizionario della Salute
corriere.it: # item
corriere.it: # Disallow: /cronache/10_marzo_01/La-rete-del-senatore-in-banca-cronache_9f6dee1a-2502-11df-98c5-00144f02aabe.shtml
corriere.it: #richiesta da Ruggiero BG27112011
corriere.it: #CORRIERE-452 2018-10-08
docusign.com: #
docusign.com: # robots.txt
docusign.com: #
docusign.com: # This file is to prevent the crawling and indexing of certain parts
docusign.com: # of your site by web crawlers and spiders run by sites like Yahoo!
docusign.com: # and Google. By telling these "robots" where not to go on your site,
docusign.com: # you save bandwidth and server resources.
docusign.com: #
docusign.com: # This file will be ignored unless it is at the root of your host:
docusign.com: # Used: http://example.com/robots.txt
docusign.com: # Ignored: http://example.com/site/robots.txt
docusign.com: #
docusign.com: # For more information about the robots.txt standard, see:
docusign.com: # http://www.robotstxt.org/robotstxt.html
docusign.com: # Crawl-delay: 10
docusign.com: # CSS, JS, Images
docusign.com: # Directories
docusign.com: # Files
docusign.com: # Paths (clean URLs)
docusign.com: # Paths (no clean URLs)
docusign.com: # Files
docusign.com: # Paths
docusign.com: # Sitemaps
google.com.ph: # AdsBot
google.com.ph: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
leam.com: # Crawlers Setup
leam.com: # Files
leam.com: # Paths (no clean URLs)
leam.com: #Disallow: /*manufacturer=
leam.com: #Disallow: /*color=
cineulagam.com: # Disallow: /*? This is match ? anywhere in the URL
laposte.fr: # www.laposte.fr
laposte.fr: # boutique.laposte.fr
laposte.fr: # pro.boutique.laposte.fr
mangoerp.com: #footer-index{padding-top:64px;background:#2a3139;color:#778495}
mangoerp.com: #footer-index .friend-link{padding-top:20px;}
mangoerp.com: #footer-index .friend-link a{color:#778495}
mangoerp.com: #footer-index .title{position:absolute;font-size:14px;top:-30px}
mangoerp.com: #footer-index .qrcode{border:1px solid #778495;padding:5px}
tessabit.com: # Directories
tessabit.com: # Disallow: /media/ // Allow this folder for google product caching
tessabit.com: #Disallow: /media/catalog/product/cache/
tessabit.com: # Paths (clean URLs)
tessabit.com: # Paths (no clean URLs)
hypebeast.com: # www.robotstxt.org/
hypebeast.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
java.com: # /robots.txt for java.com
utoronto.ca: #
utoronto.ca: # robots.txt
utoronto.ca: #
utoronto.ca: # This file is to prevent the crawling and indexing of certain parts
utoronto.ca: # of your site by web crawlers and spiders run by sites like Yahoo!
utoronto.ca: # and Google. By telling these "robots" where not to go on your site,
utoronto.ca: # you save bandwidth and server resources.
utoronto.ca: #
utoronto.ca: # This file will be ignored unless it is at the root of your host:
utoronto.ca: # Used: http://example.com/robots.txt
utoronto.ca: # Ignored: http://example.com/site/robots.txt
utoronto.ca: #
utoronto.ca: # For more information about the robots.txt standard, see:
utoronto.ca: # http://www.robotstxt.org/robotstxt.html
utoronto.ca: # CSS, JS, Images
utoronto.ca: # Directories
utoronto.ca: # Files
utoronto.ca: # Paths (clean URLs)
utoronto.ca: # Paths (no clean URLs)
pagesjaunes.fr: #Vintage
hinative.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
hinative.com: #
hinative.com: # To ban all spiders from the entire site uncomment the next two lines:
hinative.com: # User-agent: *
hinative.com: # Disallow: /
google.kz: # AdsBot
google.kz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
licindia.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
licindia.in: #content{margin:0 0 0 2%;position:relative;}
pronews.gr: #
pronews.gr: # robots.txt
pronews.gr: #
pronews.gr: # This file is to prevent the crawling and indexing of certain parts
pronews.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
pronews.gr: # and Google. By telling these "robots" where not to go on your site,
pronews.gr: # you save bandwidth and server resources.
pronews.gr: #
pronews.gr: # This file will be ignored unless it is at the root of your host:
pronews.gr: # Used: http://example.com/robots.txt
pronews.gr: # Ignored: http://example.com/site/robots.txt
pronews.gr: #
pronews.gr: # For more information about the robots.txt standard, see:
pronews.gr: # http://www.robotstxt.org/robotstxt.html
pronews.gr: # CSS, JS, Images
pronews.gr: # Directories
pronews.gr: # Files
pronews.gr: # Paths (clean URLs)
pronews.gr: # Paths (no clean URLs)
worthpoint.com: # always allow adsense
worthpoint.com: # All robots Block
worthpoint.com: # bot-specific
worthpoint.com: # Silly human, robots.txts are for robots
sejda.com: # Don't index API
ieee.org: #IEEE.org Robots Exclusion Rules - Updated October 29, 2018
ieee.org: #Sitemap:https://www.ieee.org/.sitemap.xml
zamzar.com: # ___ __ _ _
zamzar.com: # / __\ __ ___ ___ _ _ ___ _ _ _ __ / _(_) | ___ ___
zamzar.com: # / _\| '__/ _ \/ _ \ | | | |/ _ \| | | | '__| | |_| | |/ _ \/ __|
zamzar.com: # / / | | | __/ __/ | |_| | (_) | |_| | | | _| | | __/\__ \
zamzar.com: # \/ |_| \___|\___| \__, |\___/ \__,_|_| |_| |_|_|\___||___/
zamzar.com: # |___/
france24.com: # France Medias Monde [2019-10-30] - francemediasmonde.com
france24.com: ## FRANCE 24 - france24.com
france24.com: ### Sitemaps
france24.com: ### Sitemaps News
gib.gov.tr: #
gib.gov.tr: # robots.txt
gib.gov.tr: #
gib.gov.tr: # This file is to prevent the crawling and indexing of certain parts
gib.gov.tr: # of your site by web crawlers and spiders run by sites like Yahoo!
gib.gov.tr: # and Google. By telling these "robots" where not to go on your site,
gib.gov.tr: # you save bandwidth and server resources.
gib.gov.tr: #
gib.gov.tr: # This file will be ignored unless it is at the root of your host:
gib.gov.tr: # Used: http://example.com/robots.txt
gib.gov.tr: # Ignored: http://example.com/site/robots.txt
gib.gov.tr: #
gib.gov.tr: # For more information about the robots.txt standard, see:
gib.gov.tr: # http://www.robotstxt.org/robotstxt.html
pch.com: #wrap{
pochta.ru: # –†–∞–∑–±–∏—Ä–∞–µ—Ç–µ—Å—å –≤ –ø—Ä–æ–µ–∫—Ç–∞—Ö –∏ —Ö–æ—Ç–∏—Ç–µ —Å–¥–µ–ª–∞—Ç—å —Ä–µ–∞–ª—å–Ω–æ –ø–æ–ª–µ–∑–Ω—ã–π –ø—Ä–æ–¥—É–∫—Ç?
pochta.ru: # –ë—É–¥–µ–º —Ä–∞–¥—ã –≤–∏–¥–µ—Ç—å –≤–∞—Å –≤ –∫–æ–º–∞–Ω–¥–µ –ü–æ—á—Ç–æ–≤—ã—Ö –¢–µ—Ö–Ω–æ–ª–æ–≥–∏–π
pochta.ru: # https://hr.pochta.tech/
pochta.ru: #
www.gob.pe: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
www.gob.pe: #
www.gob.pe: # To ban all spiders from the entire site uncomment the next two lines:
anz.com: # /robots.txt for http://www.anz.com/
anz.com: # comments to InternetAdministration@anz.com
anz.com: #
secnews.gr: # Block Yandex
paychex.com: # Robots.txt file for http://www.paychex.com
paychex.com: # Disallow all robots from the following directories:
barnesandnoble.com: #robots.txt for https://www.barnesandnoble.com
band.us: # Make changes for all web spiders
band.us: # sitemap.xml
zety.com: # zety.com
livemaster.ru: # Disallow other crawlers
livemaster.ru: # Ezooms and dotbot
livemaster.ru: #User-agent: link checker
livemaster.ru: #Disallow: /
livemaster.ru: #User-agent: linkcheck
livemaster.ru: #Disallow: /
livemaster.ru: #User-agent: Link Sleuth
livemaster.ru: #Disallow: /
last.fm: # Old pages
last.fm: # Shouts
last.fm: # N.B: these are not covered by the above /music/ rule
last.fm: # (shoutbox vs +shoutbox)
last.fm: # AJAX content
kinguin.net: ## Website Sitemap
kinguin.net: ## Enable robots.txt rules for all crawlers
kinguin.net: ## Do not crawl add to cart, checkout, and user account pages
kinguin.net: ## Disallow URL Shortener
kinguin.net: ## Do not crawl seach pages and not-SEO optimized catalog links
kinguin.net: ## Do not crawl not-SEO optimized custom forms
kinguin.net: ## Do not crawl sub category pages that are sorted or filtered.
kinguin.net: ## Do not crawl links with session IDs
fao.org: # robots.txt for http://www.fao.org/
fao.org: # This file is not for hiding content from people. It is no substitue for security
fao.org: # If you are editing the robots.txt file - please COMMENT and DATE reason for every inclusion/exclusion ---nw-OCC-2013
fao.org: # ^^^^^^^ ^^^^
fao.org: #User-agent: 008 # No longer relevant 25/10/2013 nw
fao.org: #Disallow: /
fao.org: #User-Agent: cdlwas_bot # No longer relevant 25/10/2013 nw
fao.org: #Disallow: # No longer relevant 25/10/2013 nw
fao.org: #Google needs to read CSS and JS here - nw 29 Jul 2015 # Disallow: /typo3conf/
fao.org: #Google needs to read CSS and JS here - nw 29 Jul 2015 # Disallow: /typo3temp/
fao.org: #Permitted pending fix (27/05/2014 - nw) Disallow: /figis/vrmf/finder/!/display/vessel/ #generating a lot of errors (30/04/2014 - nw)
fao.org: #START Cleanup the web September 2016.The following are marked as GONE in OCC list of sites - and there are no redirects in place
fao.org: #Requested by CIO-SEC-TEAM 20/03/2018
fao.org: #END Cleanup the web September 2016.The following are marked as GONE in OCC list of sites
fao.org: #Cleanup after Mountain Partnership Migration (2017)
fao.org: #At the request of SO3 team
arxiv.org: # robots.txt for http://arxiv.org/ and mirror sites http://*.arxiv.org/
arxiv.org: # Indiscriminate automated downloads from this site are not permitted
arxiv.org: # See also: http://arxiv.org/help/robots
va.gov: # existing disallow on va.gov (may not be needed)
va.gov: # existing disallow from vets.gov
va.gov: # disallow WIP VAMCs
va.gov: # sitemap index
timesnownews.com: #Baiduspider
timesnownews.com: #Yandex
timesnownews.com: # To block ad codes from being crawled
timesnownews.com: #Sitemaps
hln.be: # Tell robots that the whole site should not be crawled
basketball-reference.com: # Disallow the plagiarism.org robot, www.slysearch.com
google.lk: # AdsBot
google.lk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
oreilly.com: #ITOPS-10158
oreilly.com: #ITOPS-8392
oreilly.com: #ITOPS-10157
tradeindia.com: # robots.txt 2005/09/1
tradeindia.com: # www.tradeindia.com
tradeindia.com: # Format is:
tradeindia.com: # User-agent: <name of spider>
tradeindia.com: # Disallow: <nothing> | <path>
gds.it: #Disallow: /articoli/ajax/
optimum.net: # robots.txt for optimum.net
yandex.kz: # yandex.kz
madhyamam.com: #
madhyamam.com: # robots.txt
madhyamam.com: #
madhyamam.com: # This file is to prevent the crawling and indexing of certain parts
madhyamam.com: # of your site by web crawlers and spiders run by sites like Yahoo!
madhyamam.com: # and Google. By telling these "robots" where not to go on your site,
madhyamam.com: # you save bandwidth and server resources.
madhyamam.com: #
madhyamam.com: # This file will be ignored unless it is at the root of your host:
madhyamam.com: # Used: http://example.com/robots.txt
madhyamam.com: # Ignored: http://example.com/site/robots.txt
madhyamam.com: #
madhyamam.com: # For more information about the robots.txt standard, see:
madhyamam.com: # http://www.robotstxt.org/robotstxt.html
madhyamam.com: #Crawl-delay: 10
madhyamam.com: # Directories
madhyamam.com: #Disallow: /en/
madhyamam.com: # Files
madhyamam.com: # Paths (clean URLs)
madhyamam.com: # Paths (no clean URLs)
reverb.com: # DO NOT EDIT MANUALLY - see script/robots_txt.rb
reverb.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
reverb.com: #
reverb.com: # Fatbot keeps making bad requests with bad url params, causing errors. There seems to be no good business reason to have them keep scraping the site
reverb.com: # Tell all bots to stay away from these endpoints
reverb.com: # Non-Regional Disallows
reverb.com: # Regional Disallows
priceline.com: # Robots.txt file
priceline.com: #
priceline.com: # Section 1:
priceline.com: # Section 2:
priceline.com: #Disallow: /api/
priceline.com: #Disallow: /pws/
priceline.com: #Disallow: /svcs/
priceline.com: # Section 3:
priceline.com: #Disallow: /vacations/
priceline.com: #Disallow: /Vacations/
faire.com: # Sitemap
extendoffice.com: # If the Joomla site is installed within a folder
extendoffice.com: # eg www.example.com/joomla/ then the robots.txt file
extendoffice.com: # MUST be moved to the site root
extendoffice.com: # eg www.example.com/robots.txt
extendoffice.com: # AND the joomla folder name MUST be prefixed to all of the
extendoffice.com: # paths.
extendoffice.com: # eg the Disallow rule for the /administrator/ folder MUST
extendoffice.com: # be changed to read
extendoffice.com: # Disallow: /joomla/administrator/
extendoffice.com: #
extendoffice.com: # For more information about the robots.txt standard, see:
extendoffice.com: # http://www.robotstxt.org/orig.html
extendoffice.com: #
extendoffice.com: # For syntax checking, see:
extendoffice.com: # http://tool.motoricerca.info/robots-checker.phtml
rawpixel.com: #
rawpixel.com: # robots.txt
rawpixel.com: #
rawpixel.com: # This file is to prevent the crawling and indexing of certain parts
rawpixel.com: # of your site by web crawlers and spiders run by sites like Yahoo!
rawpixel.com: # and Google. By telling these "robots" where not to go on your site,
rawpixel.com: # you save bandwidth and server resources.
rawpixel.com: #
rawpixel.com: # This file will be ignored unless it is at the root of your host:
rawpixel.com: # Used: http://example.com/robots.txt
rawpixel.com: # Ignored: http://example.com/site/robots.txt
rawpixel.com: #
rawpixel.com: # For more information about the robots.txt standard, see:
rawpixel.com: # http://www.robotstxt.org/robotstxt.html
rawpixel.com: # CSS, JS, Images
rawpixel.com: # Directories
rawpixel.com: # Files
rawpixel.com: # Paths (clean URLs)
rawpixel.com: # Paths (no clean URLs)
mindbodyonline.com: #
mindbodyonline.com: # robots.txt
mindbodyonline.com: #
mindbodyonline.com: # This file is to prevent the crawling and indexing of certain parts
mindbodyonline.com: # of your site by web crawlers and spiders run by sites like Yahoo!
mindbodyonline.com: # and Google. By telling these "robots" where not to go on your site,
mindbodyonline.com: # you save bandwidth and server resources.
mindbodyonline.com: #
mindbodyonline.com: # This file will be ignored unless it is at the root of your host:
mindbodyonline.com: # Used: http://example.com/robots.txt
mindbodyonline.com: # Ignored: http://example.com/site/robots.txt
mindbodyonline.com: #
mindbodyonline.com: # For more information about the robots.txt standard, see:
mindbodyonline.com: # http://www.robotstxt.org/robotstxt.html
mindbodyonline.com: # CSS, JS, Images
mindbodyonline.com: # Directories
mindbodyonline.com: # Files
mindbodyonline.com: # Paths (clean URLs)
mindbodyonline.com: # Paths (no clean URLs)
mindbodyonline.com: # Sitemap
astrologyanswers.com: ###
astrologyanswers.com: # robots.txt file created by Nethues
astrologyanswers.com: ###
astrologyanswers.com: ###
astrologyanswers.com: #Unsafe robots to keep away
astrologyanswers.com: ###
rightmove.co.uk: # robots.txt for https://www.rightmove.co.uk
6pm.com: # Global robots.txt updated 2019-08-06
smarttradecoin.com: #cookie_bar p {
smarttradecoin.com: #cookie_bar div{
smarttradecoin.com: #cookie_bar{
smarttradecoin.com: #cc-button{
rae.es: #
rae.es: # robots.txt
rae.es: #
rae.es: # This file is to prevent the crawling and indexing of certain parts
rae.es: # of your site by web crawlers and spiders run by sites like Yahoo!
rae.es: # and Google. By telling these "robots" where not to go on your site,
rae.es: # you save bandwidth and server resources.
rae.es: #
rae.es: # This file will be ignored unless it is at the root of your host:
rae.es: # Used: http://example.com/robots.txt
rae.es: # Ignored: http://example.com/site/robots.txt
rae.es: #
rae.es: # For more information about the robots.txt standard, see:
rae.es: # http://www.robotstxt.org/robotstxt.html
rae.es: # CSS, JS, Images
rae.es: # Directories
rae.es: # Files
rae.es: # Paths (clean URLs)
rae.es: # Paths (no clean URLs)
jmty.jp: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
jmty.jp: #
jmty.jp: # To ban all spiders from the entire site uncomment the next two lines:
jmty.jp: # https://www.trovit.com/bot.html
jmty.jp: # http://www.grapeshot.com/crawler/
sweetwater.com: # /robots.txt file for http://www.sweetwater.com/
sweetwater.com: # mail webmaster@sweetwater.com for specific information
sweetwater.com: # last updated 11-18-2020 JPM
alfavita.gr: #
alfavita.gr: # robots.txt
alfavita.gr: #
alfavita.gr: # This file is to prevent the crawling and indexing of certain parts
alfavita.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
alfavita.gr: # and Google. By telling these "robots" where not to go on your site,
alfavita.gr: # you save bandwidth and server resources.
alfavita.gr: #
alfavita.gr: # This file will be ignored unless it is at the root of your host:
alfavita.gr: # Used: http://example.com/robots.txt
alfavita.gr: # Ignored: http://example.com/site/robots.txt
alfavita.gr: #
alfavita.gr: # For more information about the robots.txt standard, see:
alfavita.gr: # http://www.robotstxt.org/robotstxt.html
alfavita.gr: # CSS, JS, Images
alfavita.gr: # Directories
alfavita.gr: # Files
alfavita.gr: # Paths (clean URLs)
alfavita.gr: # Paths (no clean URLs)
fortune.com: # Google SiteMaps
fortune.com: # Sitemap: https://fortune.com/feed/googlesitemap/articles.xml
fortune.com: # Sitemap: https://fortune.com/news-sitemap.xml
qoo10.jp: # Sitemap files
archiveofourown.org: # See https://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
archiveofourown.org: #
archiveofourown.org: # disallow indexing of search results
archiveofourown.org: # Googlebot is smart and knows pattern matching
ford.com: #robots.txt for www.Ford.com/es.ford.com - KD - 20200729
ford.com: #es.ford.com WIP cart files
ford.com: #Naver bot
flannels.com: # General
flannels.com: # Login
flannels.com: # Checkout
flannels.com: # Search/Products
flannels.com: # Filters
flannels.com: # API
flannels.com: # Utilities
flannels.com: # Blog
tmtpost.com: #Disallow:/user/
sheypoor.com: # Sitemap files
jdsports.co.uk: # JD -- King Of Crawlers
le360.ma: #
le360.ma: # robots.txt
le360.ma: #
le360.ma: # This file is to prevent the crawling and indexing of certain parts
le360.ma: # of your site by web crawlers and spiders run by sites like Yahoo!
le360.ma: # and Google. By telling these "robots" where not to go on your site,
le360.ma: # you save bandwidth and server resources.
le360.ma: #
le360.ma: # This file will be ignored unless it is at the root of your host:
le360.ma: # Used: http://example.com/robots.txt
le360.ma: # Ignored: http://example.com/site/robots.txt
le360.ma: #
le360.ma: # For more information about the robots.txt standard, see:
le360.ma: # http://www.robotstxt.org/wc/robots.html
le360.ma: #
le360.ma: # For syntax checking, see:
le360.ma: # http://www.sxw.org.uk/computing/robots/check.html
le360.ma: # Directories
le360.ma: # Files
le360.ma: # Paths (clean URLs)
le360.ma: # Paths (no clean URLs)
univ-grenoble-alpes.fr: # urls techniques :
sep.gob.mx: # Robot site sep.gob.mx
# Elaborado 03/11/2010
User-agent: *
# Bloquea directorios
Disallow: /doc/
Disallow: /wbutil/
Disallow: /WEB-INF/
Disallow: /admin/
Disallow: /wbadmin/
Disallow: /templates/
Disallow: /images/
# Bloquea contenidos din·micos
Disallow: /*.xls$
Disallow: /*.doc$
Disallow: /*.jsp$
Disallow: /*.asp$
# Mapa de sitio
Sitemap: http://www.sep.gob.mx/sitemap.xml
ixxx.com: # www.robotstxt.org/
ixxx.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
itmedia.co.jp: #masterBody{
itmedia.co.jp: #masterBodyOuter{
itmedia.co.jp: #globalHeaderMiddle{
itmedia.co.jp: #globalSearch{
itmedia.co.jp: #masterSub .colBoxHead h2{
itmedia.co.jp: #masterMain .colBoxSocialButtonTweet{
itmedia.co.jp: #globalFooterCorp{
note.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
note.com: #
note.com: # To ban all spiders from the entire site uncomment the next two lines:
note.com: # User-agent: *
note.com: # Disallow: /
bitflyer.com: # https://www.bitflyer.com robots.txt
openenglishprograms.org: #
openenglishprograms.org: # robots.txt
openenglishprograms.org: #
openenglishprograms.org: # This file is to prevent the crawling and indexing of certain parts
openenglishprograms.org: # of your site by web crawlers and spiders run by sites like Yahoo!
openenglishprograms.org: # and Google. By telling these "robots" where not to go on your site,
openenglishprograms.org: # you save bandwidth and server resources.
openenglishprograms.org: #
openenglishprograms.org: # This file will be ignored unless it is at the root of your host:
openenglishprograms.org: # Used: http://example.com/robots.txt
openenglishprograms.org: # Ignored: http://example.com/site/robots.txt
openenglishprograms.org: #
openenglishprograms.org: # For more information about the robots.txt standard, see:
openenglishprograms.org: # http://www.robotstxt.org/robotstxt.html
openenglishprograms.org: # CSS, JS, Images
openenglishprograms.org: # Directories
openenglishprograms.org: # Files
openenglishprograms.org: # Paths (clean URLs)
openenglishprograms.org: # Paths (no clean URLs)
google.fi: # AdsBot
google.fi: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
asu.edu: # robots.txt for asu.edu
google.lt: # AdsBot
google.lt: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
anthropologie.com: # Sitemap indexes
columbia.edu: # ignore this line - 1
columbia.edu: # for info on robots.txt syntax see
columbia.edu: # http://www.searchtools.com/robots/robots-txt.html
columbia.edu: ## New Homepage ##
columbia.edu: # Directories
columbia.edu: # Files
columbia.edu: # Paths (clean URLs)
columbia.edu: # Paths (no clean URLs)
panjiva.com: # Bots won't visit the links below but they will index them so they can still
panjiva.com: # show up in search results albeit without snippets or cache (and probably at
panjiva.com: # fairly low rank). For google we could use the Noindex: directive instead of
panjiva.com: # disallow, though that's experimental. That'll force pages out very quickly
panjiva.com: # so need to be very careful with that...
panjiva.com: # The matched string is prefix so /help blocks /help.html and /help/foo
panjiva.com: # So if disallowing an action avoid the trailing slash when there's no conflict
panjiva.com: # to avoid parameter/route changes breaking the match.
panjiva.com: # Note you can have multiple sitemaps listed in robots.txt and across domains
panjiva.com: # too. Everything is valid across domains/subdomains to an organic crawler iff
panjiva.com: # for sitemap A on host B that contains a url on host C the robots.txt file
panjiva.com: # on host C point to sitemap A on host B. Submitted sitemaps must generally
panjiva.com: # only contain urls that match the site they are on. We split our sitemaps
panjiva.com: # by sub-domain so we can still submit them separately, but list them all here
panjiva.com: # so that they are easy to find oganically too (much like if they were all
panjiva.com: # combined into one).
panjiva.com: # AdsBot-Google ignores robots.txt unless it's specifically called out, it
panjiva.com: # indexes targets of adwords campaigns (but seems to crawl out since it's
panjiva.com: # hitting pages that are in here that aren't direct adwords targets). We
panjiva.com: # shouldn't be targeting any page in here directly with adwords.
panjiva.com: # Extra profile information
panjiva.com: # Disallow sample companies so they don't show up in google (and complain)
panjiva.com: # Forms - note that we don't really need to exclude the _submit form because
panjiva.com: # no trailing slash in the previous rule covers it and they're all posts anyways
panjiva.com: # Deprecated
panjiva.com: # Excel export
panjiva.com: # Search
panjiva.com: # Block product search and dead search landing pages
panjiva.com: # Shipment search
panjiva.com: # SPPs
panjiva.com: # Trends
panjiva.com: # Trendspotting
panjiva.com: # Mekong Visor
panjiva.com: # Mekong - this needs to be in the robots.text served
panjiva.com: # from china-cdn-proxy.panjiva.com, but added here for
panjiva.com: # completeness
panjiva.com: # Checkout form submission
panjiva.com: # Other
panjiva.com: # These aren't actually urls, but / delimited keys that appear in json
panjiva.com: # Google still pulls them out and tries to call them though
panjiva.com: # We have some challenge problems that use some publicly-visible data.
panjiva.com: # We don't want search engines crawling that data and pointing searches
panjiva.com: # there instead of at our actual site content.
panjiva.com: # specific pages that we want to force out of the indicies (use Noindex:?)
panjiva.com: # allow rules
panjiva.com: # slow down bots that respect the crawl-delay directive (note that google
panjiva.com: # ignores this and is also the only bot we actually care to have crawl faster
panjiva.com: # then this)
si.com: # Tempest - sportsillustrated
cox.com: # Allow Google appliance
cox.com: # added for CB 2017-08-03
cox.com: # MP Sitemaps 12/13/17
thomsonreuters.com: # Global robots config
thomsonreuters.com: # robots.txt for http://thomsonreuters.com/
verizonwireless.com: # General Rules
verizonwireless.com: # Rules for Home/Fios section
verizonwireless.com: # Rules for Mobile/Wireless Section
verizonwireless.com: # PREPAID ATG
verizonwireless.com: # OMNI RELATED
verizonwireless.com: # Rules for Corp/About section
verizonwireless.com: # Rules for Support (home/mobile/kb) section
verizonwireless.com: # Rules from VBG / Business team
verizonwireless.com: # Block Google from doubleclick URL issue
verizonwireless.com: # Sitemap Files
jimdofree.com: # en
jimdofree.com: # de
jimdofree.com: # es
jimdofree.com: # fr
jimdofree.com: # it
jimdofree.com: # jp
jimdofree.com: # nl
skysports.com: # NetStorage
skysports.com: #
skysports.com: #
skysports.com: #
skysports.com: # Ajax & JSON
skysports.com: #
skysports.com: # Sports
skysports.com: #
skysports.com: # ipad
skysports.com: #
skysports.com: # ROI
skysports.com: #
skysports.com: # Backlink Analysis
newsnow.co.uk: # All robots all dirs
ulta.com: #Disallow the following URLs
ulta.com: #Sitemaps
gumtree.com.au: # Parameters
gumtree.com.au: ## Do not crawl any parametered URLs
gumtree.com.au: ## Except for URLs containing only the following parameters (and combinations)
mundodeportivo.com: # Bots nocivos
mundodeportivo.com: # Paths a no indexar
mundodeportivo.com: # New
mundodeportivo.com: # Zona Resultados
economist.com: # robots.txt
economist.com: #
economist.com: # Sitemap
economist.com: # Specific robot directives:
economist.com: # Description : Google AdSense delivers advertisements to a broad network of affiliated sites.
economist.com: # A robot analyses the pages that display the ads in order to target the ads to the page content.
economist.com: # Description : The Grapeshot crawler is an automated robot that visits pages to examine and analyse the content.
economist.com: # This adds an exception to crawl delay while preserving disallows.
economist.com: # No robots are allowed to index private paths:
economist.com: # Directories
economist.com: # Files
economist.com: # Paths (clean URLs)
economist.com: # Paths (no trailing /, beware this will stop file like /admin.html being
economist.com: # indexed if we had any)
economist.com: # Paths (no clean URLs)
economist.com: # Coldfusion paths
economist.com: # Print pages
economist.com: # Hidden articles
economist.com: # Allowed items
economist.com: # Comment urls deprecation
economist.com: # Prevent crawling podcast RSS file
economist.com: # Reading list
olx.ro: # sitecode:olxro-desktop
housing.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
housing.com: #
housing.com: # To ban all spiders from the entire site uncomment the next two lines:
housing.com: # User-Agent: *
housing.com: # Disallow: /
dailythanthi.com: # Sitemap Files
elintransigente.com: # Lana Sitemap version 1.0.3 - http://wp.lanaprojekt.hu/blog/wordpress-plugins/lana-sitemap/
edf.fr: #
edf.fr: # robots.txt
edf.fr: #
edf.fr: # This file is to prevent the crawling and indexing of certain parts
edf.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
edf.fr: # and Google. By telling these "robots" where not to go on your site,
edf.fr: # you save bandwidth and server resources.
edf.fr: #
edf.fr: # This file will be ignored unless it is at the root of your host:
edf.fr: # Used: http://example.com/robots.txt
edf.fr: # Ignored: http://example.com/site/robots.txt
edf.fr: #
edf.fr: # For more information about the robots.txt standard, see:
edf.fr: # http://www.robotstxt.org/robotstxt.html
edf.fr: # Mr Roger Bot
edf.fr: # BLEXBot
edf.fr: # Directories
edf.fr: # Files
edf.fr: # Paths (clean URLs)
edf.fr: # Paths (no clean URLs)
bodybuilding.com: # Adding for Sapient SEO per SEOS-9 ticket
blondieshop.com: # Visita al massimo una pagina ogni 5 secondi
blondieshop.com: # Visita soltanto tra le 24:00 AM e le 6:45 AM UT (GMT)
blondieshop.com: # Directories
blondieshop.com: # Disallow: /media/ // Allow this folder for google product caching
blondieshop.com: # Paths (clean URLs)
blondieshop.com: # Paths (no clean URLs)
mass.gov: #
mass.gov: # robots.txt
mass.gov: #
mass.gov: # This file is to prevent the crawling and indexing of certain parts
mass.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
mass.gov: # and Google. By telling these "robots" where not to go on your site,
mass.gov: # you save bandwidth and server resources.
mass.gov: #
mass.gov: # This file will be ignored unless it is at the root of your host:
mass.gov: # Used: http://example.com/robots.txt
mass.gov: # Ignored: http://example.com/site/robots.txt
mass.gov: #
mass.gov: # For more information about the robots.txt standard, see:
mass.gov: # http://www.robotstxt.org/robotstxt.html
mass.gov: # CSS, JS, Images
mass.gov: # Directories
mass.gov: # Files
mass.gov: # Paths (clean URLs)
mass.gov: # Paths (no clean URLs)
kariyer.net: # Disallow: /WebSite/BasinOdasi/ --- Henuz aktif degil
cryptocompare.com: # robots.txt for Umbraco
typosthes.gr: #
typosthes.gr: # robots.txt
typosthes.gr: #
typosthes.gr: # This file is to prevent the crawling and indexing of certain parts
typosthes.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
typosthes.gr: # and Google. By telling these "robots" where not to go on your site,
typosthes.gr: # you save bandwidth and server resources.
typosthes.gr: #
typosthes.gr: # This file will be ignored unless it is at the root of your host:
typosthes.gr: # Used: http://example.com/robots.txt
typosthes.gr: # Ignored: http://example.com/site/robots.txt
typosthes.gr: #
typosthes.gr: # For more information about the robots.txt standard, see:
typosthes.gr: # http://www.robotstxt.org/robotstxt.html
typosthes.gr: # CSS, JS, Images
typosthes.gr: # Directories
typosthes.gr: # Files
typosthes.gr: # Paths (clean URLs)
typosthes.gr: # Paths (no clean URLs)
logitech.com: # Logitech
logitech.com: # Modified Jan 25. 2021
subhd.com: # Block MegaIndex.ru
pagesix.com: # Sitemap archive
pagesix.com: # Additional sitemaps
google.rs: # AdsBot
google.rs: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
prestashop.com: #
prestashop.com: # robots.txt
prestashop.com: #
prestashop.com: # This file is to prevent the crawling and indexing of certain parts
prestashop.com: # of your site by web crawlers and spiders run by sites like Yahoo!
prestashop.com: # and Google. By telling these "robots" where not to go on your site,
prestashop.com: # you save bandwidth and server resources.
prestashop.com: #
prestashop.com: # This file will be ignored unless it is at the root of your host:
prestashop.com: # Used: http://example.com/robots.txt
prestashop.com: # Ignored: http://example.com/site/robots.txt
prestashop.com: #
prestashop.com: # For more information about the robots.txt standard, see:
prestashop.com: # http://www.robotstxt.org/robotstxt.html
prestashop.com: # CSS, JS, Images
prestashop.com: # Directories
prestashop.com: # Files
prestashop.com: # Paths (clean URLs)
prestashop.com: # Paths (no clean URLs)
prestashop.com: #Forum
bithumb.com: # BITHUMB.com Robots.txt
bithumb.com: # Sitemap
gazzetta.gr: # robots.txt
gazzetta.gr: #
gazzetta.gr: # This file is to prevent the crawling and indexing of certain parts
gazzetta.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
gazzetta.gr: # and Google. By telling these "robots" where not to go on your site,
gazzetta.gr: # you save bandwidth and server resources.
gazzetta.gr: #
gazzetta.gr: # This file will be ignored unless it is at the root of your host:
gazzetta.gr: # Used: http://example.com/robots.txt
gazzetta.gr: # Ignored: http://example.com/site/robots.txt
gazzetta.gr: #
gazzetta.gr: # For more information about the robots.txt standard, see:
gazzetta.gr: # http://www.robotstxt.org/robotstxt.html
gazzetta.gr: #
gazzetta.gr: #
gazzetta.gr: #
gazzetta.gr: #
gazzetta.gr: # CSS, JS, Images
gazzetta.gr: # Custom
gazzetta.gr: #Disallow: /breaking-blog
gazzetta.gr: # Directories
gazzetta.gr: # Files
gazzetta.gr: # Paths (clean URLs)
gazzetta.gr: # Paths (no clean URLs)
google.com.ly: # AdsBot
google.com.ly: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
seedit.com: #diymodule_show_121{ margin-left:-25px;}
seedit.com: #diymodule_show_121 .web_logo img{padding-top:1rem; border:none; }
seedit.com: #diymodule_show_121 .web_logo img:hover{opacity:0.8;}
seedit.com: #top_layout_out #top_layout_inner #layout_top {white-space: nowrap;}
seedit.com: #diymodule_show_119{marign:0px;padding:0px;}
seedit.com: #diymodule_show_119_119_html{marign:0px;padding:0px;border:none; background:none;}
seedit.com: #diymodule_show_119 .module_title{display:none;}
seedit.com: #diymodule_show_119 .content{marign:0px;padding:0px;; }
seedit.com: #diymodule_show_119 img{padding-left:10px; padding-top:42px; border:none;width:80%;}
seedit.com: #diymodule_show_119 img:hover{opacity:0.7;}
seedit.com: #index_navigation_html{}
seedit.com: #index_navigation {}
seedit.com: #index_navigation .nv img{ display:none;}
seedit.com: #index_navigation .nv{ text-align:left; margin:0px; padding:0px; height:50px; height:3.572rem; line-height:50px;line-height:3.572rem; font-size:1rem; white-space:nowrap;}
seedit.com: #index_navigation .nv a{ white-space:nowrap;}
seedit.com: #index_navigation .nv .nv_ul{ width:100%;}
seedit.com: #index_navigation .nv > ul{ white-space:nowrap;}
seedit.com: #index_navigation .nv ul ul {display: none;}
seedit.com: #index_navigation .nv ul li:hover > ul {display: block;}
seedit.com: #index_navigation .nv ul {list-style: none;position: relative;display: inline-block;white-space:nowrap; z-index:9;}
seedit.com: #index_navigation .nv ul:after {content: ""; clear: both; display: block;}
seedit.com: #index_navigation .nv ul > li { display: inline-block;text-align: center; }
seedit.com: #index_navigation .nv ul li:hover {background:#e31939;}
seedit.com: #index_navigation .nv ul li:hover a { color:#fff;}
seedit.com: #index_navigation .nv ul li a {display: block; text-decoration: none; padding-left:1.5rem; padding-right:1.5rem;}
seedit.com: #index_navigation .nv ul li a i{ padding-left:5px;}
seedit.com: #index_navigation .nv ul ul {line-height:40px;line-height:2.85rem;background:#e31939; border-radius: 0px; padding: 0;position: absolute; top: 100%;}
seedit.com: #index_navigation .nv ul ul li { display:block; width:100%;float: none;position: relative; text-align:left;}
seedit.com: #index_navigation .nv ul ul li a {}
seedit.com: #index_navigation .nv ul ul li a i{ float:right;line-height:40px;line-height:2.85rem; margin-left:10px;}
seedit.com: #index_navigation .nv ul ul li a .fa-angle-right{ padding-right:10px; }
seedit.com: #index_navigation .nv ul ul li:hover {background:#ed5f74;}
seedit.com: #index_navigation .nv ul ul ul {width:100%;position: absolute; left: 100%; top:0; }
seedit.com: #index_navigation .nv ul ul ul li:hover{}
seedit.com: #index_navigation .nv ul li:last-child:hover > ul li ul{ left:-100%; text-align:right;}
seedit.com: #index_navigation .nv ul li:last-child:hover > ul li ul a{padding-right: 25px ;}
seedit.com: #index_navigation .nv ul li .have_three { font-weight:bold; margin-bottom:20px; }
seedit.com: #index_navigation .nv ul li .have_three li{ font-weight: normal; }
seedit.com: #index_navigation .nv ul li .have_three:hover{ background:none; }
seedit.com: #index_navigation .nv ul li .have_three > a{ border-bottom:#fff 1px dashed;}
seedit.com: #index_navigation .nv ul li .have_three a:hover{background:#ff9900; }
seedit.com: #index_navigation .nv ul li .have_three ul{background: #ff4a00; }
seedit.com: #index_navigation .nv ul li .have_three i{ display:none; }
seedit.com: #index_navigation .nv ul li .multiple_columns{ width:43rem; height:auto; white-space:normal; text-align:left; }
seedit.com: #index_navigation .nv ul li .multiple_columns li{display:inline-block; height:auto; border-top:none; vertical-align:top; width:200px; padding-left:10px; }
seedit.com: #index_navigation .nv ul li .multiple_columns li a{ display:block; text-align:left;}
seedit.com: #index_navigation .nv ul li .multiple_columns li ul{ display:block; position:static; height:auto; white-space:normal; }
seedit.com: #index_navigation .nv ul li .multiple_columns li ul li{ display:block; width:100%; height:32px; line-height:32px; height:2.3rem; line-height:2.3rem;}
seedit.com: #index_navigation .nv ul li:last-child:hover > ul li ul a{padding-right: 25px ;}
seedit.com: #index_navigation #mall_type_all{ display:inline-block; vertical-align:top; width:17.95rem; text-align:left; font-size:18px; margin-right:20px;background:#e31939; }
seedit.com: #index_navigation #mall_type_all a{ color:#fff;}
seedit.com: #index_navigation #mall_type_all:hover #mall_type_module_two{ display:block;}
seedit.com: #index_navigation #mall_type_all .text{ display:block; padding-left:12px;}
seedit.com: #index_navigation #mall_type_all:hover{ }
seedit.com: #index_navigation #mall_type_module_two{ display:none; position:absolute; top:290px; left:6px; z-index:9999;}
seedit.com: #index_fixed_top{ z-index:99999999; width:100%; background:rgba(255,255,255,0.96); ; position:fixed; top:0px; box-shadow: rgba(0,0,0,.2) 0 1px 5px; height:50px; overflow:hidden; display:none;}
seedit.com: #index_fixed_top_html >div{ display:inline-block; vertical-align:top;}
seedit.com: #index_fixed_top .logo_area{ display:inline-block; vertical-align:top; width:30%; overflow:hidden;}
seedit.com: #index_fixed_top .logo_area img{ width:250px; margin-top:-24px;}
seedit.com: #index_fixed_top .logo_area img:hover{ opacity:0.9;}
seedit.com: #index_fixed_top .search_area{ display:inline-block; vertical-align:top; width:40%; overflow:hidden;}
seedit.com: #index_fixed_top .search_area .search_div{ height:34px; line-height:34px; border:1px solid #e93853; margin-top:8px;}
seedit.com: #index_fixed_top .search_area .search_div input{height:32px; line-height:32px; display:inline-block; vertical-align:top; width:85%; overflow:hidden;}
seedit.com: #index_fixed_top .search_area .search_div a{ height:34px; line-height:34px;display:inline-block; vertical-align:top; width:15%; overflow:hidden; text-align:center; background:#e93853; color:#ffffff; cursor:pointer;}
seedit.com: #index_fixed_top .search_area .search_div a:hover{ opacity:0.9;}
seedit.com: #index_fixed_top .user_aera{width:30%;line-height:45px; text-align:right; padding-right:30px;}
seedit.com: #index_fixed_top .user_aera a{ display:inline-block; vertical-align:top; line-height:45px;}
seedit.com: #index_fixed_top .user_aera #icon_img{ margin-top:10px;line-height:50px;}
seedit.com: #index_fixed_top .user_aera .top_a{ display:none;}
seedit.com: #index_fixed_top .user_aera #icon_a{ }
seedit.com: #index_fixed_top .user_aera #nickname{ }
seedit.com: #index_fixed_top .user_aera #unlogin{ background:#e93853; color:#ffffff; display:inline-block; vertical-align:top; line-height:30px; height:30px; padding-left:1rem; padding-right:1rem; margin-top:8px; border-radius:3px; }
seedit.com: #index_fixed_top .user_aera #unlogin:before{ display:none;}
seedit.com: #index_fixed_top .user_aera #login{ color:#e93853; }
seedit.com: #index_fixed_top .user_aera #login:before{ display:none;}
seedit.com: #index_fixed_top .user_aera #reg_user{ background:#e93853; color:#ffffff; display:inline-block; vertical-align:top; line-height:30px; height:30px; padding-left:1rem; padding-right:1rem; margin-top:8px; border-radius:3px; }
seedit.com: #index_fixed_top .user_aera #reg_user:before{ display:none;}
seedit.com: #ci_type_module_two{font-weight:normal; text-indent:0px; width:17.85rem; height:32.14rem; vertical-align:top; border-top:1px solid #e8e8e8; border-bottom:1px solid #e8e8e8; display:inline-block; vertical-align:top; background-color:#fcfcfc;padding:0px;box-shadow:none;margin:0px;}
seedit.com: #ci_type_module_two_html{}
seedit.com: #ci_type_module_two_html .more{ padding-left:8px; line-height:30px;}
seedit.com: #ci_type_module_two_html .parent{ height:4.9rem; line-height:2rem; border-bottom: #e8e8e8 1px solid; }
seedit.com: #ci_type_module_two_html .parent .level_1_div{ padding-left:10px;}
seedit.com: #ci_type_module_two_html .parent:hover{position: relative;z-index:991;border-left:0.15rem solid #e93853;background-color:#fcfcfc; }
seedit.com: #ci_type_module_two_html .parent:hover .part_b_div{ display:block;}
seedit.com: #ci_type_module_two_html .parent:hover .level_1_div .level_1:after{ display:none;}
seedit.com: #ci_type_module_two_html .parent:hover .level_1_div{ }
seedit.com: #ci_type_module_two_html .parent:hover .level_1_div .level_1{background-image:none;background-color:#fcfcfc; }
seedit.com: #ci_type_module_two_html .parent:hover .part_a_div{background-color:#fcfcfc; }
seedit.com: #ci_type_module_two_html .parent .level_1_div .level_1{ display:block; font-size:16px; position: relative; z-index:991;}
seedit.com: #ci_type_module_two_html .parent .level_1_div .level_1:after{margin-right:8px; float:right; font: normal normal normal 1rem/1 FontAwesome; content:"\f105"; padding-top:5px; color: #999;}
seedit.com: #ci_type_module_two_html .parent .level_1_div .level_1 a{color:#e93853;}
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div a{color:#000;}
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div{position: relative;z-index:991; width:100%; padding-bottom:0.8rem; }
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div .level_2{font-size:15px; display:inline-block; vertical-align:top; height:25px; width:29%;margin-right:4%; overflow:hidden;white-space:nowrap;text-overflow:ellipsis;}
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div:hover{ }
seedit.com: #ci_type_module_two_html .parent .level_1_div .part_a_div .level_2:hover{ display:inline-block;color:#ed5f74;}
seedit.com: #ci_type_module_two_html .parent .part_b_div{ padding-left:20px; display:none; position: relative; padding-top:25px; left:17.5rem; top:-4.85rem; min-height:4.9rem; width:32.14rem; z-index:99; background-color: #fcfcfc; border:1px solid #e8e8e8; white-space:normal;}
seedit.com: #ci_type_module_two_html .parent .part_b_div .level_2{ font-size:15px; display:inline-block; vertical-align:top; width:120px; margin-right:20px; height:35px; line-height:35px; overflow:hidden;white-space:nowrap;text-overflow:ellipsis;}
seedit.com: #ci_type_module_two_html .parent .part_b_div .level_2:hover{color:#ed5f74;}
seedit.com: #slider_show_16_16_html .up_text{ position:relative; top:-100%; color:#fff; }
seedit.com: #slider_show_16_16_html .up_text a{color:#fff;}
seedit.com: #slider_show_16_16_html .up_text .slider_name{ font-size:4rem; font-weight:bold; line-height:7.14rem; height:7.14rem!important; }
seedit.com: #slider_show_16_16_html .up_text .slider_description{ padding:1rem; line-height:4rem; height:13rem!important; font-size:2rem; opacity:0; }
seedit.com: #slider_show_16_16_html .up_text .slider_summary{ line-height:3rem; height:3rem!important; opacity:0; }
seedit.com: #slider_show_16_16_html .up_text .slider_summary a{ display:inline-block; border:1px solid #FFF; border-radius:0.5rem; padding:0.5rem;font-size:2rem;}
seedit.com: #index_index_user { height:320px; width:18%; right:5%; top:225px; border-top:1px solid #ccc; overflow:hidden;white-space:normal; position: absolute; z-index:1; background:rgba(255,255,255,0.97); opacity:0; }
seedit.com: #index_index_user_html >div{}
seedit.com: #index_index_user .r_user_state{ text-align:center; padding-bottom:15px;}
seedit.com: #index_index_user .r_user_state .mall_home{ display:none; }
seedit.com: #index_index_user .r_user_state .my_order{ display:none; }
seedit.com: #index_index_user .r_user_state .my_collection{ display:none; }
seedit.com: #index_index_user .r_user_state #hello{ display:none; }
seedit.com: #index_index_user .r_user_state a{ display:block; }
seedit.com: #index_index_user .r_user_state #icon_img{ display:block; margin:auto; width:60px; height:60px; border-radius:30px; margin-top:1rem; border:2px solid #fff; }
seedit.com: #index_index_user .r_user_state #nickname span{ display:none;}
seedit.com: #index_index_user .r_user_state #nickname{ line-height:2rem; display:block; vertical-align:top; width:100%; text-align:center;}
seedit.com: #index_index_user .r_user_state #unlogin{ display:inline-block; vertical-align:top; width:6rem; text-align:center; background:#ed5f74; color:#fff; border-radius:12px; line-height:1.8rem;}
seedit.com: #index_index_user .r_user_state a{ margin:0px; padding:0px;}
seedit.com: #index_index_user .r_user_state a:hover{ color:#ed5f74; opacity:0.8;}
seedit.com: #index_index_user .r_user_state a:before{ display:none;}
seedit.com: #index_index_user .r_user_state .default_user_icon{display:block; margin:auto; width:60px; height:60px; border-radius:30px; margin-top:1rem; background: rgba(237,237,237,1.00);}
seedit.com: #index_index_user .r_user_state .default_user_icon:before {font: normal normal normal 60px/1 FontAwesome; margin-right: 5px;
seedit.com: #index_index_user .r_user_state #login{ display:inline-block; vertical-align:top; border-radius:12px; box-shadow: 6px 8px 20px rgba(45,45,45,.15); text-align:center; width:5rem; margin-right:5px;}
seedit.com: #index_index_user .r_user_state #reg_user{ display:inline-block; vertical-align:top; background:#ed5f74; color:#fff; width:5rem; margin-left:5px;border-radius:12px;}
seedit.com: #index_index_user .r_user_state .welcome_to_come{ line-height:3rem;}
seedit.com: #index_index_user .r_article{ padding:10px; height:100px; overflow:hidden; margin-left:1rem;}
seedit.com: #index_index_user .r_article a:before {font: normal normal normal 3px/0.3 FontAwesome; margin-right: 5px; content: "\f0c8"; color:#ccc; font-size:3px;}
seedit.com: #index_index_user .r_article a{ display:block;white-space: nowrap; text-overflow: ellipsis; overflow:hidden; line-height:2.5rem; border-bottom:1px dashed #ccc;}
seedit.com: #index_index_user .r_icons{ padding-top:0.5rem; }
seedit.com: #index_index_user .r_icons a{ text-align:center; display:inline-block; vertical-align:top; width:33.33%; overflow:hidden; }
seedit.com: #index_index_user .r_icons a:hover{ opacity:0.7;}
seedit.com: #index_index_user .r_icons a img{ width:40%; }
seedit.com: #index_index_user .r_icons a span{ display:block; font-size:0.8rem; margin-top:0.5rem; margin-bottom:0.5rem; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id79938d4fda01e01c413af8d95fa7a423_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id628130ebfa004bda81b42fc004b9504e_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id8887b60600bbfe6c3c78b4b3147fd584_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id1237ee5aeb476ccde8048ae718023e6d_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #ide1452116f091fed67c5265f1a976293b{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #ide1452116f091fed67c5265f1a976293b_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #ide1452116f091fed67c5265f1a976293b_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id451f0dc8c18884e1b173ba1827679647_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #idd8070c23b1799c57e3f124f053882df3_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #id899287572864c4218f23003ab9925d55{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id899287572864c4218f23003ab9925d55_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id899287572864c4218f23003ab9925d55_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id899287572864c4218f23003ab9925d55_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id899287572864c4218f23003ab9925d55_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id87cafef92ce7c67f946ccfaaa9534546_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #id621aa64062a141b5b9827a96721f2f59_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #idd8060f2431a5e992104d2869f0194994{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #idd8060f2431a5e992104d2869f0194994_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #idd8060f2431a5e992104d2869f0194994_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #idf814064342d8104efc1fe872a989a687{ width:24%; height:460px;display:inline-block; vertical-align:top;vertical-align:top; overflow:hidden; margin-bottom: 20px;
seedit.com: #idf814064342d8104efc1fe872a989a687_html{ white-space:nowrap;overflow: hidden;height: 90%;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title{font-size:20px; height:50px;line-height:50px; border-bottom:3px solid #87B829; padding-right:25px; overflow:hidden;box-shadow:0px 1px 5px #ccc;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .name{ float:left;min-width:100px; width:; text-align:center; padding-right:5px;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .name span{ font-weight:bold; padding-left:5px; padding-right:5px; margin-right:10px; margin-left:10px;background:#e93853; color:#ffffff; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .more{ float:right; font-size:14px; font-family:"微软雅黑", "宋体";}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .module_title .more:hover{ float:right; font-size:14px; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .cover_image{ display:inline-block; vertical-align:top;width:;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .cover_image:hover{opacity:0.8; filter:alpha(opacity=80);}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .cover_image img{width:;border:none; padding-top:2px;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list{ display:inline-block; vertical-align:top; max-width:100%;height:100%;white-space:normal; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line{display:inline-block; border-bottom:dashed 1px #EEEEEE; vertical-align:top; overflow:hidden; padding:10px; width:100% ;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line:hover{ background:#f3f3f3; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .icon{ display:inline-block; vertical-align:top; width:35%; overflow:hidden;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .icon img{ width:90%; max-height:65px; border:none;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle{ display:inline-block; vertical-align:top; overflow:hidden; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .title{ display:block; height:25px; overflow:hidden; font-size:16px;white-space:nowrap;text-overflow: ellipsis; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .attributes{ height:22px; font-size:1rem; overflow:hidden;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .attributes span{ display:inline-block; padding-right:10px; vertical-align:top;background-image:url(templates/0/ci/default/pc/img/dividing_line.png); background-repeat:no-repeat; background-position:right; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .other .reflash{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden; }
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .other .price{ display:inline-block; vertical-align:top; margin-right:10px; overflow:hidden;}
seedit.com: #idf814064342d8104efc1fe872a989a687_html .list .line .middle .other .price .number{ font-weight:bold;}
seedit.com: #return_top{display:none;}
strikingly.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
strikingly.com: #
strikingly.com: # To ban all spiders from the entire site uncomment the next two lines:
strikingly.com: # User-Agent: *
strikingly.com: # Disallow: /
strikingly.com: # Google adsbot ignores robots.txt unless specifically named!
qiqitv.info: # global
vistaprint.com: # Crawling Rules - Last Update on 11/07/2019
google.hr: # AdsBot
google.hr: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
stellamccartney.com: # Disallow tricombot.
verajohn.com: # robots.txt desktop 25/05/18
docusign.net: # go away
getbootstrap.com: # www.robotstxt.org
getbootstrap.com: # Allow crawling of all content
arstechnica.com: # Google Image
arstechnica.com: # Google AdSense
arstechnica.com: # Global
arstechnica.com: # phpBB
eastdane.com: #Sitemap updated 08/31/2018
sketchup.com: #
sketchup.com: # robots.txt
sketchup.com: #
sketchup.com: # This file is to prevent the crawling and indexing of certain parts
sketchup.com: # of your site by web crawlers and spiders run by sites like Yahoo!
sketchup.com: # and Google. By telling these "robots" where not to go on your site,
sketchup.com: # you save bandwidth and server resources.
sketchup.com: #
sketchup.com: # This file will be ignored unless it is at the root of your host:
sketchup.com: # Used: http://example.com/robots.txt
sketchup.com: # Ignored: http://example.com/site/robots.txt
sketchup.com: #
sketchup.com: # For more information about the robots.txt standard, see:
sketchup.com: # http://www.robotstxt.org/robotstxt.html
sketchup.com: # CSS, JS, Images
sketchup.com: # Directories
sketchup.com: # Files
sketchup.com: # Paths (clean URLs)
sketchup.com: # Paths (no clean URLs)
thestreet.com: # Tempest - thestreet
google.ie: # AdsBot
google.ie: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
wikisource.org: #
wikisource.org: # Please note: There are a lot of pages on this site, and there are
wikisource.org: # some misbehaved spiders out there that go _way_ too fast. If you're
wikisource.org: # irresponsible, your access to the site may be blocked.
wikisource.org: #
wikisource.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN
wikisource.org: # and ignoring 429 ratelimit responses, claims to respect robots:
wikisource.org: # http://mj12bot.com/
wikisource.org: # advertising-related bots:
wikisource.org: # Wikipedia work bots:
wikisource.org: # Crawlers that are kind enough to obey, but which we'd rather not have
wikisource.org: # unless they're feeding search engines.
wikisource.org: # Some bots are known to be trouble, particularly those designed to copy
wikisource.org: # entire sites. Please obey robots.txt.
wikisource.org: # Misbehaving: requests much too fast:
wikisource.org: #
wikisource.org: # Sorry, wget in its recursive mode is a frequent problem.
wikisource.org: # Please read the man page and use it properly; there is a
wikisource.org: # --wait option you can use to set the delay between hits,
wikisource.org: # for instance.
wikisource.org: #
wikisource.org: #
wikisource.org: # The 'grub' distributed client has been *very* poorly behaved.
wikisource.org: #
wikisource.org: #
wikisource.org: # Doesn't follow robots.txt anyway, but...
wikisource.org: #
wikisource.org: #
wikisource.org: # Hits many times per second, not acceptable
wikisource.org: # http://www.nameprotect.com/botinfo.html
wikisource.org: # A capture bot, downloads gazillions of pages with no public benefit
wikisource.org: # http://www.webreaper.net/
wikisource.org: #
wikisource.org: # Friendly, low-speed bots are welcome viewing article pages, but not
wikisource.org: # dynamically-generated pages please.
wikisource.org: #
wikisource.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your
wikisource.org: # bot supports such a thing using the 'Crawl-delay' or another
wikisource.org: # instruction, please let us know.
wikisource.org: #
wikisource.org: # There is a special exception for API mobileview to allow dynamic
wikisource.org: # mobile web & app views to load section content.
wikisource.org: # These views aren't HTTP-cached but use parser cache aggressively
wikisource.org: # and don't expose special: pages etc.
wikisource.org: #
wikisource.org: # Another exception is for REST API documentation, located at
wikisource.org: # /api/rest_v1/?doc.
wikisource.org: #
wikisource.org: #
wikisource.org: # ar:
wikisource.org: #
wikisource.org: # dewiki:
wikisource.org: # T6937
wikisource.org: # sensible deletion and meta user discussion pages:
wikisource.org: # 4937#5
wikisource.org: # T14111
wikisource.org: # T15961
wikisource.org: #
wikisource.org: # enwiki:
wikisource.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for
wikisource.org: # their name. See T6776
wikisource.org: # T15398
wikisource.org: # T16075
wikisource.org: # T13261
wikisource.org: # T12288
wikisource.org: # T16793
wikisource.org: #
wikisource.org: # eswiki:
wikisource.org: # T8746
wikisource.org: #
wikisource.org: # fiwiki:
wikisource.org: # T10695
wikisource.org: #
wikisource.org: # hewiki:
wikisource.org: #T11517
wikisource.org: #
wikisource.org: # huwiki:
wikisource.org: #
wikisource.org: # itwiki:
wikisource.org: # T7545
wikisource.org: #
wikisource.org: # jawiki
wikisource.org: # T7239
wikisource.org: # nowiki
wikisource.org: # T13432
wikisource.org: #
wikisource.org: # plwiki
wikisource.org: # T10067
wikisource.org: #
wikisource.org: # ptwiki:
wikisource.org: # T7394
wikisource.org: #
wikisource.org: # rowiki:
wikisource.org: # T14546
wikisource.org: #
wikisource.org: # ruwiki:
wikisource.org: #
wikisource.org: # svwiki:
wikisource.org: # T12229
wikisource.org: # T13291
wikisource.org: #
wikisource.org: # zhwiki:
wikisource.org: # T7104
wikisource.org: #
wikisource.org: # sister projects
wikisource.org: #
wikisource.org: # enwikinews:
wikisource.org: # T7340
wikisource.org: #
wikisource.org: # itwikinews
wikisource.org: # T11138
wikisource.org: #
wikisource.org: # enwikiquote:
wikisource.org: # T17095
wikisource.org: #
wikisource.org: # enwikibooks
wikisource.org: #
wikisource.org: # working...
wikisource.org: #
wikisource.org: #
wikisource.org: #
wikisource.org: #----------------------------------------------------------#
wikisource.org: #
wikisource.org: #
wikisource.org: #
ikco.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
ikco.ir: #content{margin:0 0 0 2%;position:relative;}
ew.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
ew.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
ew.com: # sitemaps
ew.com: # current CMS
ew.com: # ONECMS
ew.com: # Content
ew.com: # ONECMS
ew.com: # Content
ew.com: # ONECMS
ew.com: # Content
anses.gob.ar: #
anses.gob.ar: # robots.txt
anses.gob.ar: #
anses.gob.ar: # This file is to prevent the crawling and indexing of certain parts
anses.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo!
anses.gob.ar: # and Google. By telling these "robots" where not to go on your site,
anses.gob.ar: # you save bandwidth and server resources.
anses.gob.ar: #
anses.gob.ar: # This file will be ignored unless it is at the root of your host:
anses.gob.ar: # Used: http://example.com/robots.txt
anses.gob.ar: # Ignored: http://example.com/site/robots.txt
anses.gob.ar: #
anses.gob.ar: # For more information about the robots.txt standard, see:
anses.gob.ar: # http://www.robotstxt.org/robotstxt.html
anses.gob.ar: # CSS, JS, Images
anses.gob.ar: # Directories
anses.gob.ar: # Files
anses.gob.ar: # Paths (clean URLs)
anses.gob.ar: # Paths (no clean URLs)
mt.co.kr: # Robots for www.mt.co.kr
mt.co.kr: # ETC
mt.co.kr: # SiteMap
techtudo.com.br: #
techtudo.com.br: # robots.txt
techtudo.com.br: #
panasonic.com: #sitemap
ibm.com: # $Id: robots.txt,v 1.88 2020/07/20 13:41:39 jliao Exp $
ibm.com: #
ibm.com: # This is a file retrieved by webwalkers a.k.a. spiders that
ibm.com: # conform to a defacto standard.
ibm.com: # See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt>
ibm.com: #
ibm.com: # Comments to the webmaster should be posted at <URL:http://www.ibm.com/contact>
ibm.com: #
ibm.com: # Format is:
ibm.com: # User-agent: <name of spider>
ibm.com: # Disallow: <nothing> | <path>
ibm.com: # ------------------------------------------------------------------------------
ibm.com: # Disallow: /homepage
ibm.com: # Disallow: /internal
ibm.com: # Added for EI-2179 on 17Apr2020
ibm.com: # Added by JLiao for SD EI-2359,EI-2360 on 23Jun2020
ibm.com: #Added for EI-2216 on 06May2020
ibm.com: # Added for IN4173782 on 7Aug2013
ibm.com: # Added for IN4177562 on 8Aug2013
ibm.com: # Added for IN4177562 on 8Aug2013
ibm.com: # Added to block site mirroring
10010.com: # robots.txt,Baidu&SoSo&sogou&Yodao&Google spider are allowed; /bin/&/e3/&/e4/directory are disallowed.
scmp.com: #
scmp.com: # robots.txt
scmp.com: #
scmp.com: # This file is to prevent the crawling and indexing of certain parts
scmp.com: # of your site by web crawlers and spiders run by sites like Yahoo!
scmp.com: # and Google. By telling these "robots" where not to go on your site,
scmp.com: # you save bandwidth and server resources.
scmp.com: #
scmp.com: # This file will be ignored unless it is at the root of your host:
scmp.com: # Used: http://example.com/robots.txt
scmp.com: # Ignored: http://example.com/site/robots.txt
scmp.com: #
scmp.com: # For more information about the robots.txt standard, see:
scmp.com: # http://www.robotstxt.org/robotstxt.html
scmp.com: #
scmp.com: # For syntax checking, see:
scmp.com: # http://www.sxw.org.uk/computing/robots/check.html
scmp.com: # PWA
scmp.com: # Directories
scmp.com: # Path
scmp.com: # CSS, JS, Image
scmp.com: # Directories
scmp.com: # Files
scmp.com: # Paths (clean URLs)
scmp.com: # Paths (no clean URLs)
scmp.com: # Opebot - For 1plusX
scmp.com: # NewsNow
scmp.com: # GrapeShot
scmp.com: # Ads
scmp.com: # Sitemap
royalmail.com: #
royalmail.com: # robots.txt
royalmail.com: #
royalmail.com: # This file is to prevent the crawling and indexing of certain parts
royalmail.com: # of your site by web crawlers and spiders run by sites like Yahoo!
royalmail.com: # and Google. By telling these "robots" where not to go on your site,
royalmail.com: # you save bandwidth and server resources.
royalmail.com: #
royalmail.com: # This file will be ignored unless it is at the root of your host:
royalmail.com: # Used: http://example.com/robots.txt
royalmail.com: # Ignored: http://example.com/site/robots.txt
royalmail.com: #
royalmail.com: # For more information about the robots.txt standard, see:
royalmail.com: # http://www.robotstxt.org/robotstxt.html
royalmail.com: #
royalmail.com: # For syntax checking, see:
royalmail.com: # http://www.frobee.com/robots-txt-check
royalmail.com: #
royalmail.com: # Core rules
royalmail.com: #
royalmail.com: # CSS, JS, Images
royalmail.com: # Directories
royalmail.com: # Files
royalmail.com: # Paths (clean URLs)
royalmail.com: # Paths (no clean URLs)
royalmail.com: # Files
royalmail.com: # Paths (clean URLs)
royalmail.com: # Paths (no clean URLs)
royalmail.com: #
royalmail.com: # Custom rules
royalmail.com: #
royalmail.com: # Node pages & Welsh-language equivalent pages
royalmail.com: # Quote Journeys
royalmail.com: # Common causes of duplication
pagina12.com.ar: # robots.txt for https://www.pagina12.com.ar/
bd-pratidin.com: # Crawl bd-pratidin.com,
mail.com: #https://www.mail.com/robots.txt
orf.at: # don't index redirects
orf.at: # these robots have been bad once:
google.bg: # AdsBot
google.bg: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
gradescope.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
gradescope.com: #
gradescope.com: # To ban all spiders from the entire site uncomment the next two lines:
gradescope.com: # User-agent: *
gradescope.com: # Disallow: /
chrono24.com: # robots.txt for http://www.chrono24.com
leroymerlin.ru: # UTM cleaning
freepik.es: # Google AdSense
freepik.es: # Adsbot-Google
freepik.es: # Twitter Bot
rakuten.com: Binary file (standard input) matches
soy502.com: #
soy502.com: # robots.txt
soy502.com: #
soy502.com: # This file is to prevent the crawling and indexing of certain parts
soy502.com: # of your site by web crawlers and spiders run by sites like Yahoo!
soy502.com: # and Google. By telling these "robots" where not to go on your site,
soy502.com: # you save bandwidth and server resources.
soy502.com: #
soy502.com: # This file will be ignored unless it is at the root of your host:
soy502.com: # Used: http://example.com/robots.txt
soy502.com: # Ignored: http://example.com/site/robots.txt
soy502.com: #
soy502.com: # For more information about the robots.txt standard, see:
soy502.com: # http://www.robotstxt.org/wc/robots.html
soy502.com: #
soy502.com: # For syntax checking, see:
soy502.com: # http://www.sxw.org.uk/computing/robots/check.html
soy502.com: # Bot de SEO Moz
soy502.com: # Particulares del Sitio
soy502.com: # Files
soy502.com: # Paths (clean URLs)
soy502.com: # Paths (no clean URLs)
soy502.com: # Directories
soy502.com: #Disallow: /misc/
soy502.com: #Disallow: /modules/
soy502.com: # Disallow: /sites/
soy502.com: #Allow: http://www.googleadservices.com/pagead/conversion.js
uwaterloo.ca: #
uwaterloo.ca: # robots.txt
uwaterloo.ca: #
uwaterloo.ca: # This file is to prevent the crawling and indexing of certain parts
uwaterloo.ca: # of your site by web crawlers and spiders run by sites like Yahoo!
uwaterloo.ca: # and Google. By telling these "robots" where not to go on your site,
uwaterloo.ca: # you save bandwidth and server resources.
uwaterloo.ca: #
uwaterloo.ca: # This file will be ignored unless it is at the root of your host:
uwaterloo.ca: # Used: http://example.com/robots.txt
uwaterloo.ca: # Ignored: http://example.com/site/robots.txt
uwaterloo.ca: #
uwaterloo.ca: # For more information about the robots.txt standard, see:
uwaterloo.ca: # http://www.robotstxt.org/wc/robots.html
uwaterloo.ca: #
uwaterloo.ca: # For syntax checking, see:
uwaterloo.ca: # http://www.sxw.org.uk/computing/robots/check.html
uwaterloo.ca: # Directories
uwaterloo.ca: # Files
uwaterloo.ca: # Paths (clean URLs)
uwaterloo.ca: # Paths (no clean URLs)
gaadiwaadi.com: #td-header-search-button-mob {
gaadiwaadi.com: #td-top-mobile-toggle i {
laposte.net: #mainHDBox,
laposte.net: #VSMP .section-promo {
laposte.net: #div-gpt-ad-part-home-banner-0 .regiePub {
familysearch.org: # LAST CHANGED: Mo Nov 2 2020, at 18:00:00 GMT+0000 (GMT)
familysearch.org: # Version 1.0.7
familysearch.org: ## Specific rules for /wiki/
familysearch.org: # Please note: There are a lot of pages on this site, and there are some misbehaved spiders out there
familysearch.org: # that go _way_ too fast. If you're irresponsible, your access to the site may be blocked.
familysearch.org: #
familysearch.org: # advertising-related bots:
familysearch.org: # Wikipedia work bots:
familysearch.org: # Crawlers that are kind enough to obey, but which we'd rather not have
familysearch.org: # unless they're feeding search engines.
familysearch.org: # Some bots are known to be trouble, particularly those designed to copy
familysearch.org: # entire sites. Please obey robots.txt.
familysearch.org: # Misbehaving: requests much too fast:
familysearch.org: #
familysearch.org: # Sorry, wget in its recursive mode is a frequent problem.
familysearch.org: # Please read the man page and use it properly; there is a
familysearch.org: # --wait option you can use to set the delay between hits,
familysearch.org: # for instance.
familysearch.org: #
familysearch.org: #
familysearch.org: # The 'grub' distributed client has been *very* poorly behaved.
familysearch.org: #
familysearch.org: #
familysearch.org: # Doesn't follow robots.txt anyway, but...
familysearch.org: #
familysearch.org: #
familysearch.org: # Hits many times per second, not acceptable
familysearch.org: # http://www.nameprotect.com/botinfo.html
familysearch.org: # A capture bot, downloads gazillions of pages with no public benefit
familysearch.org: # http://www.webreaper.net/
familysearch.org: # Wayback Machine
familysearch.org: # User-agent: archive.org_bot
familysearch.org: # Treated like anyone else
familysearch.org: # Allow the Internet Archiver to index action=raw and thereby store the raw wikitext of pages
familysearch.org: #
familysearch.org: # Friendly, low-speed bots are welcome viewing article pages, but not
familysearch.org: # dynamically-generated pages please.
familysearch.org: #
familysearch.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your
familysearch.org: # bot supports such a thing using the 'Crawl-delay' or another
familysearch.org: # instruction, please let us know.
familysearch.org: #
familysearch.org: # There is a special exception for API mobileview to allow dynamic
familysearch.org: # mobile web & app views to load section content.
familysearch.org: # These views aren't HTTP-cached but use parser cache aggressively
familysearch.org: # and don't expose special: pages etc.
familysearch.org: #
familysearch.org: # Another exception is for REST API documentation, located at
familysearch.org: # /api/rest_v1/?doc.
familysearch.org: #
familysearch.org: # Disallow indexing of non-article content
familysearch.org: #
rivals.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
zhuwang.cc: #
zhuwang.cc: # robots.txt for PHPCMS v9
zhuwang.cc: #
windowscentral.com: #
windowscentral.com: # robots.txt
windowscentral.com: #
windowscentral.com: # This file is to prevent the crawling and indexing of certain parts
windowscentral.com: # of your site by web crawlers and spiders run by sites like Yahoo!
windowscentral.com: # and Google. By telling these &quot;robots&quot; where not to go on your site,
windowscentral.com: # you save bandwidth and server resources.
windowscentral.com: #
windowscentral.com: # This file will be ignored unless it is at the root of your host:
windowscentral.com: # Used: http://example.com/robots.txt
windowscentral.com: # Ignored: http://example.com/site/robots.txt
windowscentral.com: #
windowscentral.com: # For more information about the robots.txt standard, see:
windowscentral.com: # http://www.robotstxt.org/robotstxt.html
windowscentral.com: #
windowscentral.com: # For syntax checking, see:
windowscentral.com: # http://www.frobee.com/robots-txt-check
windowscentral.com: # Directories
windowscentral.com: # Files
windowscentral.com: # Paths (clean URLs)
windowscentral.com: # Paths (no clean URLs)
goldprice.org: #
goldprice.org: # robots.txt
goldprice.org: #
goldprice.org: # This file is to prevent the crawling and indexing of certain parts
goldprice.org: # of your site by web crawlers and spiders run by sites like Yahoo!
goldprice.org: # and Google. By telling these "robots" where not to go on your site,
goldprice.org: # you save bandwidth and server resources.
goldprice.org: #
goldprice.org: # This file will be ignored unless it is at the root of your host:
goldprice.org: # Used: http://example.com/robots.txt
goldprice.org: # Ignored: http://example.com/site/robots.txt
goldprice.org: #
goldprice.org: # For more information about the robots.txt standard, see:
goldprice.org: # http://www.robotstxt.org/robotstxt.html
goldprice.org: # CSS, JS, Images
goldprice.org: # Directories
goldprice.org: # Files
goldprice.org: # Paths (clean URLs)
goldprice.org: # Paths (no clean URLs)
lequipe.fr: #V6 - Tous les moteurs sont concern?s
lequipe.fr: #
lequipe.fr: #
lequipe.fr: # le terme suivant nous permet de limiter les temps de passage des robots ? 120 secondes
lequipe.fr: #
lequipe.fr: #
lequipe.fr: # les fichiers suivants ne seront pas index?s par les moteurs
lequipe.fr: #
lequipe.fr: #ActualitesId < 1000000
lequipe.fr: #Allow key news
lequipe.fr: #Old lives
yandex.net: # yandex.ru
officedepot.com: # Robots.txt file for http://www.officedepot.com
tn.gov.in: #
tn.gov.in: # robots.txt
tn.gov.in: #
tn.gov.in: # This file is to prevent the crawling and indexing of certain parts
tn.gov.in: # of your site by web crawlers and spiders run by sites like Yahoo!
tn.gov.in: # and Google. By telling these "robots" where not to go on your site,
tn.gov.in: # you save bandwidth and server resources.
tn.gov.in: #
tn.gov.in: # This file will be ignored unless it is at the root of your host:
tn.gov.in: # Used: http://example.com/robots.txt
tn.gov.in: # Ignored: http://example.com/site/robots.txt
tn.gov.in: #
tn.gov.in: # For more information about the robots.txt standard, see:
tn.gov.in: # http://www.robotstxt.org/robotstxt.html
tn.gov.in: # CSS, JS, Images
tn.gov.in: # Directories
tn.gov.in: # Files
tn.gov.in: # Paths (clean URLs)
tn.gov.in: # Paths (no clean URLs)
techsmith.com: #09 May 2019
techsmith.com: # Wrapped Pages
techsmith.com: #FTP and other Pages
nydailynews.com: # Googlebot
nydailynews.com: # MSN bot
nydailynews.com: # MSNbot media
nydailynews.com: # Yahoo bot
nydailynews.com: # Alexa IA Archiver bot
nydailynews.com: # MJ12bot
nydailynews.com: # Proximic bot
alnilin.com: # enabled this >> may slow down server
alnilin.com: #end of enabled this >> may slow down server
alnilin.com: # Google Image
alnilin.com: # Google AdSense
alnilin.com: # digg mirror
alnilin.com: # global
ranker.com: # Sitemap files
yandex.by: # yandex.by
1und1.de: # Sitemap
lapatilla.com: # Sitemap archive
ignou.ac.in: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $
ignou.ac.in: #
ignou.ac.in: # robots.txt
ignou.ac.in: #
ignou.ac.in: # This file is to prevent the crawling and indexing of certain parts
ignou.ac.in: # of your site by web crawlers and spiders run by sites like Yahoo!
ignou.ac.in: # and Google. By telling these "robots" where not to go on your site,
ignou.ac.in: # you save bandwidth and server resources.
ignou.ac.in: #
ignou.ac.in: # This file will be ignored unless it is at the root of your host:
ignou.ac.in: # Used: http://example.com/robots.txt
ignou.ac.in: # Ignored: http://example.com/site/robots.txt
ignou.ac.in: #
ignou.ac.in: # For more information about the robots.txt standard, see:
ignou.ac.in: # http://www.robotstxt.org/wc/robots.html
ignou.ac.in: #
ignou.ac.in: # For syntax checking, see:
ignou.ac.in: # http://www.sxw.org.uk/computing/robots/check.html
ignou.ac.in: # Directories
ignou.ac.in: # Files
ignou.ac.in: # Paths (clean URLs)
ignou.ac.in: # Paths (no clean URLs)
rarible.com: # https://www.robotstxt.org/robotstxt.html
healthcare.gov: # robots.txt for healthcare.gov
healthcare.gov: # Directories
healthcare.gov: # Directories
healthcare.gov: # Directories
healthcare.gov: # dynamic posts
qeqeqe.com: #
qeqeqe.com: # robots.txt for PHPCMS v9
qeqeqe.com: #
up.gov.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
up.gov.in: #content{margin:0 0 0 2%;position:relative;}
chanel.com: #PRESSRELEASE
chanel.com: #MY-ACCOUNT
chanel.com: #CORPORATE
chanel.com: #SITEMAP
chanel.com: #NEW-WFJ
chanel.com: #ONE
chanel.com: #SRQ0140051
chanel.com: #FF LATAM
nespresso.com: # Sitemap
nespresso.com: # Russia
nespresso.com: # Quickview
nespresso.com: # Mosaic + # Index_Ecapi
nespresso.com: # At work
nespresso.com: # Scrappers
nespresso.com: # Responsive
unza.zm: #
unza.zm: # robots.txt
unza.zm: #
unza.zm: # This file is to prevent the crawling and indexing of certain parts
unza.zm: # of your site by web crawlers and spiders run by sites like Yahoo!
unza.zm: # and Google. By telling these "robots" where not to go on your site,
unza.zm: # you save bandwidth and server resources.
unza.zm: #
unza.zm: # This file will be ignored unless it is at the root of your host:
unza.zm: # Used: http://example.com/robots.txt
unza.zm: # Ignored: http://example.com/site/robots.txt
unza.zm: #
unza.zm: # For more information about the robots.txt standard, see:
unza.zm: # http://www.robotstxt.org/robotstxt.html
unza.zm: # CSS, JS, Images
unza.zm: # Directories
unza.zm: # Files
unza.zm: # Paths (clean URLs)
unza.zm: # Paths (no clean URLs)
videvo.net: # Dissalow common language variations
logmeininc.com: # Sitemaps and Autodiscovers
pccomponentes.com: #
pccomponentes.com: ###################################################################################################################################
pccomponentes.com: #
pccomponentes.com: # Bienvenido a Robots.txt de PcComponentes :)
pccomponentes.com: #
pccomponentes.com: ###################################################################################################################################
pccomponentes.com: # _____ _ _
pccomponentes.com: # / ____| | | | |
pccomponentes.com: # | | ___ _ __ ___| |_ _ __ _ _ _ _ ___ _ __ __| | ___
pccomponentes.com: # | | / _ \| '_ \/ __| __| '__| | | | | | |/ _ \ '_ \ / _` |/ _ \
pccomponentes.com: # | |___| (_) | | | \__ \ |_| | | |_| | |_| | __/ | | | (_| | (_) |
pccomponentes.com: # \_____\___/|_| |_|___/\__|_| \__,_|\__, |\___|_| |_|\__,_|\___/
pccomponentes.com: # | | __/ |
pccomponentes.com: # __ _| | __ _ ___ |___/
pccomponentes.com: # / _` | |/ _` |/ _ \
pccomponentes.com: # | (_| | | (_| | (_) |
pccomponentes.com: # \__,_|_|\__, |\___/ _ __
pccomponentes.com: # __/ | | | \ \
pccomponentes.com: # __ _ _ |___/ _ _ __ __| | ___ (_) |
pccomponentes.com: # / _` | '__/ _` | '_ \ / _` |/ _ \ | |
pccomponentes.com: # | (_| | | | (_| | | | | (_| | __/ _| |
pccomponentes.com: # \__, |_| \__,_|_| |_|\__,_|\___| (_) |
pccomponentes.com: # __/ | /_/
pccomponentes.com: # |___/
pccomponentes.com: #
pccomponentes.com: ###################################################################################################################################
pccomponentes.com: ## GENERAL SETTINGS
pccomponentes.com: ## SITEMAPS
pccomponentes.com: # SITEMAP INDEX
pccomponentes.com: # SITEMAP PRODUCTS INDEX
pccomponentes.com: # SITEMAP PRODUCTS BY PARENT
pccomponentes.com: # SITEMAP CATEGORIES
pccomponentes.com: # SITEMAP AMP PRODUCTS INDEX
pccomponentes.com: # SITEMAP AMP PRODUCTS BY PARENT
pccomponentes.com: # SITEMAP BLOG
pccomponentes.com: ## PRIVATE URLS
pccomponentes.com: # Disallow: /login
pccomponentes.com: ## CORPORATIVE PAGES
pccomponentes.com: # STATICS
pccomponentes.com: ## CRAWL BUDGET OPTIMIZATION
pccomponentes.com: # 3+ FILTERS
pccomponentes.com: # QUERIES
pccomponentes.com: # AVAILABILITY UX
pccomponentes.com: # PRICE HISTORY
pccomponentes.com: # COMPLEMENTS
pccomponentes.com: #Disallow: /a/complements/*
pccomponentes.com: #RECOMMENDER
pccomponentes.com: # OLD IMAGE LOCATIONS
pccomponentes.com: # TECHNICAL RESOURCES
pccomponentes.com: # TECHNICAL ISSUES
pccomponentes.com: # REFURBISHED PRODUCTS - Commented on 12-14-2018
pccomponentes.com: #Disallow: /*reacondicionado$
pccomponentes.com: #Disallow: /*/reacondicionado$
pccomponentes.com: #Disallow: /*reacondicionado*
pccomponentes.com: #Allow: /portatiles/*reacondicionado
pccomponentes.com: #Disallow: /*refurbished*
pccomponentes.com: #Disallow: /*-cpo-libre$
pccomponentes.com: #Disallow: /*-recertified$
pccomponentes.com: #Disallow: /rastrillo/*
pccomponentes.com: # DOCS, PDF AND MEDIA
pccomponentes.com: # USELESS RESOURCES
pccomponentes.com: # GROUNDWORK
pccomponentes.com: #Disallow: /ofertas/*
pccomponentes.com: # TRACKING CODES
pccomponentes.com: # BRANDED PAGES AS DUPLICATES
pccomponentes.com: ## SPECIFIC BOTS SETTINGS
nation.africa: # robots.txt for https://nation.africa/ -- Nation
ask.fm: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
ask.fm: #
ask.fm: # To ban all spiders from the entire site uncomment the next two lines:
ask.fm: # User-agent: *
ask.fm: # Disallow: /
sears.com: # 2020_04_22-B
sears.com: # Sears SEO Team
sears.com: # www.sears.com
sears.com: #Disallow: *sid=IDx20141117x00001xlpla*
sears.com: #Lumen #17857110
sears.com: #Legal #04012019
sears.com: #Lumen #18359173
sears.com: # Category
sears.com: # Product
sears.com: # Misc
sears.com: # Marketplace Sellers
sears.com: # Brands Extended
sears.com: # Images
sears.com: #Sitemap: http://www.sears.com/Sitemap_Index_Image_1.xml
sears.com: #Sitemap: http://www.sears.com/Sitemap_Index_Image_MP_1.xml
smugmug.com: # If you're reading this, you belong at a job you love: https://www.smugmug.com/jobs/
smugmug.com: # See https://secure.smugmug.com/help/contact if you'd like to apply to be whitelisted for crawling SmugMug
seneweb.com: #
seneweb.com: # Begin Standard Rules
seneweb.com: #
seneweb.com: # Allow Adsense
seneweb.com: #
seneweb.com: #
seneweb.com: # Sitemap Files
google.com.do: # AdsBot
google.com.do: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
axios.com: # 8fkux4mqab196bbs
bancadigitalbod.com: #robots.txt for all our sites
eldorado.ru: #2020-12-29
appcast.io: # https://protect-de.mimecast.com/s/2FukCpZ4L4TvloxYcPCF4b?domain=robotstxt.org
appcast.io: # Allow crawling of all content
sodapdf.com: # This file can be used to affect how search engines and other web site crawlers see your site.
sodapdf.com: # For more information, please see http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1
sodapdf.com: # WebMatrix 2.0
sodapdf.com: #
sodapdf.com: #
sodapdf.com: # production server: sodapdf.com
sodapdf.com: # Last modified on: 2014-01-27
sodapdf.com: #
sodapdf.com: #
sodapdf.com: #
sodapdf.com: # homesite resources:
sodapdf.com: #
sodapdf.com: # Disallow: /*/join
sodapdf.com: # Disallow: /join
sodapdf.com: #
sodapdf.com: #
sodapdf.com: # external resources:
sodapdf.com: #
jimdo.com: # en
jimdo.com: # de
jimdo.com: # es
jimdo.com: # fr
jimdo.com: # it
jimdo.com: # jp
jimdo.com: # nl
win-rar.com: # sitemap
blog.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
blog.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
blog.wordpress.com: # This file was generated on Thu, 28 Jan 2021 13:30:28 +0000
finn.no: # Notice: Crawling FINN.no is prohibited unless you have written permission.
finn.no: # See the terms and conditions in the footer:
finn.no: # Innholdet er beskyttet etter åndsverksloven. Bruk av automatiserte tjenester (roboter, spidere, indeksering m.m.)
finn.no: # samt andre fremgangsmåter for systematisk eller regelmessig bruk er ikke tillatt uten eksplisitt samtykke fra FINN.no.
finn.no: #
finn.no: # Outdated CMS articles.
finn.no: # Don't index searches for old gallery urls
finn.no: # FAS shortcut links:
finn.no: # If googlebot respects that these 301 to a disallowed page, we can remove these.
finn.no: # 50k weekly. Many google referrers
finn.no: # 20k weekly. Many google referrers
finn.no: # 4k weekly. Many google referrers
finn.no: # Deprecated, and should be removed
finn.no: # If googlebot respects that these 301 to a disallowed page, we can remove these.
finn.no: # 8k weekly
finn.no: # 363 weekly
finn.no: # 5k weekly
finn.no: # 324 weekly
finn.no: # motor rules
finn.no: # disallow all car, nyttekj√∏rety, mc and boat FAS and ad pages:
finn.no: # bil vertical
finn.no: # ...but allow the landing pages. Pages with parameters are blocked with the meta tag
finn.no: # nyttekj√∏ret√∏y
finn.no: # vanused and vanimport already handled in bil
finn.no: # mc
finn.no: # båt
finn.no: # Eiendom
finn.no: # Bolig til leie - result page. Filtered search is blocked by meta tag
finn.no: # disallow all realestate FAS and ad pages:
finn.no: # BAP
finn.no: # Indexable, with some exceptions, see end of file
finn.no: # JOB
finn.no: # Indexable, with some exceptions, see end of file
finn.no: # PAL (PArtnerL√∏sning)
finn.no: # disallow ad and search pages
finn.no: # REISE
finn.no: # disallow flight search results
finn.no: # Personal Finance
finn.no: # iAd - these are outdated
finn.no: # Shopping was discontinued 2019.
finn.no: # Misc
finn.no: # Exceptions:
finn.no: # Twitterbot is allowed for Twitter Cards to work
webteb.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
webteb.com: #content{margin:0 0 0 2%;position:relative;}
jarir.com: # # Robots.txt for Magento Community and Enterprise
jarir.com: # # GENERAL SETTINGS
jarir.com: # Enables robots.txt rules for all crawlers
jarir.com: # # Crawl-delay parameter: the number of seconds you want to wait between successful requests to the same server.
jarir.com: # # Magento sitemap: URL to your sitemap file in Magento
jarir.com: # # Settings that relate to the UNDER CONSTRUCTION
jarir.com: # # GENERAL SETTINGS For MAGENTO
jarir.com: # # General technical Magento directory
jarir.com: # # Do not index the shared files Magento
jarir.com: # # MAGENTO SEA IMPROVEMENT
jarir.com: # # Do not index the page subcategories that are sorted or filtered.
jarir.com: # # Do not index the second copy of the home page (example.com / index.php /). Un-comment only if you have activated Magento SEO URLs.
jarir.com: # # Disallow: /index.php/
jarir.com: # # Do not index the link from the session ID
jarir.com: # # Do not index the page checkout and user account
jarir.com: # # Server Settings
jarir.com: # # Do not index the general technical directories and files on a server
jarir.com: # stop not needed bots from site crawl
jarir.com: # TurnitinBot/1.5 (http://www.turnitin.com/robot/crawlerinfo.html)
jarir.com: # NPBot-1/2.0 (http://www.nameprotect.com/botinfo.html)
jarir.com: # sitecheck.internetseer.com (For more info see: http://sitecheck.internetseer.com)
jarir.com: # Rumours-Agent
jarir.com: # larbin_2.6.2 (larbin2.6.2@unspecified.mail)
jarir.com: # http://www.almaden.ibm.com/cs/crawler [wf160]
jarir.com: # http://www.almaden.ibm.com/cs/crawler [c01]
jarir.com: # Teleport Pro (http://www.tenmax.com/teleport/pro/home.htm)
jarir.com: # WebCopier
jarir.com: # WebStripper (http://www.webstripper.net/)
jarir.com: # MSIECrawler - added 12/02/04
jarir.com: # Openbot (http://www.openfind.com.tw/robot.html) - added 12/02/04
jarir.com: # WebZIP (http://www.spidersoft.com/webzip/default.asp) - added 08/03/04
jarir.com: # QuepasaCreep - added 28/06/04
jarir.com: # WebReaper (http://www.webreaper.net/) - added 20/07/04
jarir.com: # SuperBot (aka Website Downloader ?) - added 02/08/04
jarir.com: # wget - added 16/08/04
jarir.com: # Web Downloader - added 29/10/04
jarir.com: # ShopWiki bot - added 18/08/06
jarir.com: # MSRBOT - added 29/09/06
jarir.com: # DugMirror - added 07/01/07
jarir.com: # Twiceler-0.9 http://www.cuill.com/twiceler/robot.html - added 31/07/07
jarir.com: # pagenest.com - added 01/06/08
jarir.com: # dotbot - added 29/07/08
jarir.com: # discobot - added 26/12/08
jarir.com: # SimilarPages Nutch Crawler
jarir.com: # added 07/06/09
jarir.com: # added 17/01/10
jarir.com: # added 29/06/10
jarir.com: # added 26/11/10
jarir.com: # added 12/12/10
jarir.com: # added 13/02/11
jarir.com: # added 15/04/11
jarir.com: # added 02/06/11
jarir.com: # added 09/06/11
jarir.com: # added 10/07/11
jarir.com: # added 28/07/11
jarir.com: # added 09/08/11
jarir.com: # added 24/09/11
jarir.com: # added 05/10/11
jarir.com: # added 08/10/11
jarir.com: # added 18/12/11
jarir.com: # added 15/01/12
jarir.com: # added 26/01/12
jarir.com: # added 30/01/12
jarir.com: # added 01/03/12
jarir.com: # added 09/06/12
jarir.com: # added 10/06/12
jarir.com: # added 30/07/12
jarir.com: # added 28/08/12
jarir.com: # added 05/10/12
jarir.com: # added 21/11/12
jarir.com: # added 11/02/13
jarir.com: # added 22/02/13
jarir.com: # added 26/06/13
jarir.com: # added 17/07/13
jarir.com: # added 24/11/13
jarir.com: # added 11/12/13
jarir.com: # added 25/01/14
jarir.com: # added 11/02/14
jarir.com: # added 26/03/14
jarir.com: # added 27/03/14
jarir.com: # added 29/03/14
jarir.com: # added 06/04/14
jarir.com: # added 11/04/14
jarir.com: # added 05/10/14
jarir.com: # added 08/09/15
jarir.com: # added 10/12/15
jarir.com: # added 22/03/16
jarir.com: # added 21/05/16
jarir.com: #Block YandexBots 07/08/19
jarir.com: # Block SemrushBot 07/08/19
allstate.com: # Disallow: /blog/wp-admin/
allstate.com: # Disallow: /blog/wp-includes/
work.ua: # robots.txt
balenciaga.com: # Pages
balenciaga.com: # Product
balenciaga.com: # PLP
balenciaga.com: #sitemaps
bom.gov.au: #################################
bom.gov.au: # robots.txt for www.bom.gov.au #
bom.gov.au: #################################
bom.gov.au: #####################
bom.gov.au: # Rules for Radian6 #
bom.gov.au: #####################
bom.gov.au: ########################
bom.gov.au: # Rules for all robots #
bom.gov.au: # except Googlebot #
bom.gov.au: ########################
bom.gov.au: ######################
bom.gov.au: # Removed 24/02/2016 #
bom.gov.au: ######################
bom.gov.au: # Disallow: /clim_data/
bom.gov.au: # Disallow: /climate/annual_sum/
bom.gov.au: ######################
bom.gov.au: # Added azv 2017/12/13
bom.gov.au: # Disallow: /water/newEvents/document/
bom.gov.au: # Disallow: /water/designRainfalls/document/
bom.gov.au: # Disallow: /cyclone/history/pdf/
bom.gov.au: #######################
bom.gov.au: # Scripts and styling #
bom.gov.au: #######################
bom.gov.au: ################################
bom.gov.au: # Document directories #
bom.gov.au: # REMOVED 24/09/2020 ASK226937 #
bom.gov.au: ################################
bom.gov.au: # Disallow: /docs/
bom.gov.au: # Disallow: /Docs/
bom.gov.au: # Disallow: /document/
bom.gov.au: # Disallow: /DOCUMENT/
bom.gov.au: # Disallow: /documents/
bom.gov.au: # Disallow: /DOCUMENTS/
bom.gov.au: # Disallow: /pdf/
bom.gov.au: # Disallow: /PDF/
bom.gov.au: # Disallow: /pdfs/
bom.gov.au: ######################
bom.gov.au: # Removed 24/02/2016 #
bom.gov.au: ######################
bom.gov.au: # Disallow: /clim_data/
bom.gov.au: # Disallow: /climate/averages/climatology/relhum/
bom.gov.au: # Disallow: /climate/averages/climatology/sunshine_hours/
bom.gov.au: # Disallow: /climate/averages/climatology/windroses/
bom.gov.au: # Disallow: /climate/averages/wind/
bom.gov.au: # Disallow: /climate/change/
bom.gov.au: # Disallow: /climate/enso/archive/
bom.gov.au: # Disallow: /climate/extremes/ # Removed 24/02/2016 stefanw
bom.gov.au: # Disallow: /climate/forms/map_forms/
bom.gov.au: # Disallow: /climate/map/anual_rainfall/
bom.gov.au: # Disallow: /climate/map/graphs/monthly_rain/idl_graphs/
bom.gov.au: # Disallow: /climate/map/pics/
bom.gov.au: # Disallow: /climate/pccsp/
bom.gov.au: ######################
bom.gov.au: #######################
bom.gov.au: # Rules for Googlebot #
bom.gov.au: #######################
bom.gov.au: ###########################
bom.gov.au: # Index State based pages #
bom.gov.au: ###########################
bom.gov.au: ###############################
bom.gov.au: # Don't index these filetypes #
bom.gov.au: ###############################
bom.gov.au: # Disallow: /*.pdf$
bom.gov.au: #Disallow: /*.txt$
bom.gov.au: # Added azv 2017/12/13
bom.gov.au: #######################
bom.gov.au: # Scripts and styling #
bom.gov.au: #######################
bom.gov.au: ################################
bom.gov.au: # Document directories #
bom.gov.au: # REMOVED 24/09/2020 ASK226937 #
bom.gov.au: ################################
bom.gov.au: # Disallow: /docs/
bom.gov.au: # Disallow: /Docs/
bom.gov.au: # Disallow: /document/
bom.gov.au: # Disallow: /DOCUMENT/
bom.gov.au: # Disallow: /documents/
bom.gov.au: # Disallow: /DOCUMENTS/
bom.gov.au: # Disallow: /pdf/
bom.gov.au: # Disallow: /PDF/
bom.gov.au: # Disallow: /pdfs/
bom.gov.au: # Disallow: /amm/ # Removed 24/02/2016
bom.gov.au: # Disallow: /careers/docs/
bom.gov.au: ######################
bom.gov.au: # Removed 24/02/2016 #
bom.gov.au: ######################
bom.gov.au: # Disallow: /climate/averages/climatology/relhum/
bom.gov.au: # Disallow: /climate/averages/climatology/sunshine_hours/
bom.gov.au: # Disallow: /climate/averages/climatology/windroses/
bom.gov.au: # Disallow: /climate/averages/wind/
bom.gov.au: # Disallow: /climate/change/
bom.gov.au: # Disallow: /climate/enso/archive/
bom.gov.au: # Disallow: /climate/extremes/
bom.gov.au: # Disallow: /climate/forms/map_forms/
bom.gov.au: # Disallow: /climate/map/anual_rainfall/
bom.gov.au: # Disallow: /climate/map/graphs/monthly_rain/idl_graphs/
bom.gov.au: # Disallow: /climate/map/pics/
bom.gov.au: # Disallow: /climate/pccsp/
bom.gov.au: ##############################
bom.gov.au: ###############################
bom.gov.au: # Allowed Wildcard Exceptions #
bom.gov.au: ###############################
google.co.nz: # AdsBot
google.co.nz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
jiji.ng: #─███─███─█───█
jiji.ng: #─█───█───█───█
jiji.ng: #─███─███─█───█
jiji.ng: #───█─█───█───█
jiji.ng: #─███─███─███─███
jiji.ng: #─███─███
jiji.ng: #──█───█
jiji.ng: #──█───█
jiji.ng: #──█───█
jiji.ng: #─███──█
jiji.ng: #───██─███───██─███
jiji.ng: #────█──█─────█──█
jiji.ng: #────█──█─────█──█
jiji.ng: #─█──█──█──█──█──█
jiji.ng: #─████─███─████─███
jiji.ng: #─███─███
jiji.ng: #──█───█
jiji.ng: #──█───█
jiji.ng: #──█───█
jiji.ng: #─███──█
acs.org: # robots.txt for http://www.acs.org/
ekstrabladet.dk: # robots.txt, ekstrabladet.dk
ekstrabladet.dk: # adsense
isbank.com.tr: # sitemaps
ladilsa.com: # ladilsa.com (Fri Oct 27 14:25:58 2017)
brobible.com: # Sitemap archive
zoho.in: # ------------------------------------------
zoho.in: # ZOHO Corp. -- http://www.zoho.com
zoho.in: # Robot Exclusion File -- robots.txt
zoho.in: # Author: Zoho Creative
zoho.in: # Last Updated: 24/12/2020
zoho.in: # ------------------------------------------
zoho.in: # unwanted list taken from zoho search list
zoho.in: # unwanted list taken from zoho search list
zoho.in: # unwanted list taken from zoho search for zoholics
zoho.in: # unwanted list taken from zoho search for zoho
newsmax.com: # Disallow: /archives/
computerhope.com: # robots.txt file for https://www.computerhope.com
computerhope.com: # Send comments about this file to <URL:https://www.computerhope.com/contact/>
computerhope.com: # Disobeying the rules of the robots.txt will cause your IP to be banned.
computerhope.com: # Last updated: 5/23/2017
computerhope.com: #Internet Archive doesn't need to archive cgi-bin
computerhope.com: #Other bots not allowed
rpp.pe: #########################
rpp.pe: ## ##
rpp.pe: ## Grupo RPP ##
rpp.pe: ## Sitio: rpp.pe ##
rpp.pe: ## robots.txt ##
rpp.pe: ## ##
rpp.pe: #########################
rpp.pe: # no entry
rpp.pe: #
rpp.pe: # Guia
rpp.pe: #
vip.wordpress.com: # Sitemap archive
google.com.ng: # AdsBot
google.com.ng: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
usda.gov: #
usda.gov: # robots.txt
usda.gov: #
usda.gov: # This file is to prevent the crawling and indexing of certain parts
usda.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
usda.gov: # and Google. By telling these "robots" where not to go on your site,
usda.gov: # you save bandwidth and server resources.
usda.gov: #
usda.gov: # This file will be ignored unless it is at the root of your host:
usda.gov: # Used: http://example.com/robots.txt
usda.gov: # Ignored: http://example.com/site/robots.txt
usda.gov: #
usda.gov: # For more information about the robots.txt standard, see:
usda.gov: # http://www.robotstxt.org/robotstxt.html
usda.gov: # CSS, JS, Images
usda.gov: # Directories
usda.gov: # Files
usda.gov: # Paths (clean URLs)
usda.gov: # Paths (no clean URLs)
dubizzle.com: # Block undesirable pages
dubizzle.com: # Rules for adsense bot
dubizzle.com: # Block crawling software
msnbc.com: # Directories
msnbc.com: # Files
msnbc.com: # Paths (clean URLs)
msnbc.com: # Paths (no clean URLs)
msnbc.com: # DFP Ad slot urls
burberry.com: # robots.txt for https://www.burberry.com
google.com.bd: # AdsBot
google.com.bd: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
the-sun.com: #Archive Sitemaps
the-sun.com: # Sitemap archive
banamex.com: # robots.txt file for https://www.banamex.com
banamex.com: #.htm /.html
banamex.com: #PDFs
banamex.com: #Old content in SWF
banamex.com: #Sitemap files
forums.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
forums.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
forums.wordpress.com: # This file was generated on Thu, 19 Mar 2020 19:23:28 +0000
kathimerini.gr: # Admin Pages
kathimerini.gr: # Allow Those
kathimerini.gr: # Ads
nickis.com: # www.robotstxt.org/
nickis.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
elcomercio.pe: #
elcomercio.pe: # la mayoria de veces causa problemas
elcomercio.pe: #
20min.ch: #
20min.ch: # robots.txt www.20min.ch
20min.ch: #
20min.ch: # V2.0.0, 04.08.2020
20min.ch: #
pantheon.io: #
pantheon.io: # robots.txt
pantheon.io: #
pantheon.io: # This file is to prevent the crawling and indexing of certain parts
pantheon.io: # of your site by web crawlers and spiders run by sites like Yahoo!
pantheon.io: # and Google. By telling these "robots" where not to go on your site,
pantheon.io: # you save bandwidth and server resources.
pantheon.io: #
pantheon.io: # This file will be ignored unless it is at the root of your host:
pantheon.io: # Used: http://example.com/robots.txt
pantheon.io: # Ignored: http://example.com/site/robots.txt
pantheon.io: #
pantheon.io: # For more information about the robots.txt standard, see:
pantheon.io: # http://www.robotstxt.org/robotstxt.html
pantheon.io: # CSS, JS, Images
pantheon.io: # Directories
pantheon.io: # Files
pantheon.io: # Paths (clean URLs)
pantheon.io: # Paths (no clean URLs)
pantheon.io: # Resource Confirmation Paths
pantheon.io: # Miscelaneous Paths
yale.edu: #
yale.edu: # robots.txt
yale.edu: #
yale.edu: # This file is to prevent the crawling and indexing of certain parts
yale.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
yale.edu: # and Google. By telling these "robots" where not to go on your site,
yale.edu: # you save bandwidth and server resources.
yale.edu: #
yale.edu: # This file will be ignored unless it is at the root of your host:
yale.edu: # Used: http://example.com/robots.txt
yale.edu: # Ignored: http://example.com/site/robots.txt
yale.edu: #
yale.edu: # For more information about the robots.txt standard, see:
yale.edu: # http://www.robotstxt.org/robotstxt.html
yale.edu: # CSS, JS, Images
yale.edu: # Directories
yale.edu: # Files
yale.edu: # Paths (clean URLs)
yale.edu: # Paths (no clean URLs)
yale.edu: # Biblio module rules to prevent recursive searches by bots.
upmusics.com: # Google Image
google.com.ec: # AdsBot
google.com.ec: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
peardeck.com: # Squarespace Robots Txt
willhaben.at: # It is expressively forbidden to use spiders, search robots or other automatic methods
willhaben.at: # to access willhaben.at. Only if willhaben.at has given such access is allowed.
paytm.com: # robotstxt.org
statefarm.com: # Disallow
statefarm.com: # Sitemaps
capital.gr: # robots.txt for https://www.capital.gr
animoto.com: # /projects requires you to be logged in
manhuagui.com: # robots.txt generated at http://www.mcanerin.com
kemenag.go.id: #User-agent: *
kemenag.go.id: #Disallow: /
kemenag.go.id: # Group 1
kemenag.go.id: # Group 2
kemenag.go.id: # Group 3
whimsical.com: # www.robotstxt.org/
whimsical.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
mercadolibre.com.pe: #siteId: MPE
mercadolibre.com.pe: #country: peru
mercadolibre.com.pe: ##Block - Referidos
mercadolibre.com.pe: ##Block - siteinfo urls
mercadolibre.com.pe: ##Block - Cart
mercadolibre.com.pe: ##Block Checkout
mercadolibre.com.pe: ##Block - User Logged
mercadolibre.com.pe: #Shipping selector
mercadolibre.com.pe: ##Block - last search
mercadolibre.com.pe: ## Block - Profile - By Id
mercadolibre.com.pe: ## Block - Profile - By Id and role (old version)
mercadolibre.com.pe: ## Block - Profile - Leg. Req.
mercadolibre.com.pe: ##Block - noindex
mercadolibre.com.pe: # Mercado-Puntos
mercadolibre.com.pe: # Viejo mundo
mercadolibre.com.pe: ##Block recommendations listing
justice.gov: #
justice.gov: # robots.txt
justice.gov: #
justice.gov: # This file is to prevent the crawling and indexing of certain parts
justice.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
justice.gov: # and Google. By telling these "robots" where not to go on your site,
justice.gov: # you save bandwidth and server resources.
justice.gov: #
justice.gov: # This file will be ignored unless it is at the root of your host:
justice.gov: # Used: http://example.com/robots.txt
justice.gov: # Ignored: http://example.com/site/robots.txt
justice.gov: #
justice.gov: # For more information about the robots.txt standard, see:
justice.gov: # http://www.robotstxt.org/robotstxt.html
justice.gov: #
justice.gov: # For syntax checking, see:
justice.gov: # http://www.frobee.com/robots-txt-check
justice.gov: # Directories
justice.gov: # Files
justice.gov: # Paths (clean URLs)
justice.gov: # Paths (no clean URLs)
justice.gov: # Paths from legacy origin
car.gr: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
car.gr: #
car.gr: # To ban all spiders from the entire site uncomment the next two lines:
car.gr: # User-Agent: *
car.gr: # Disallow: /
greenhouse.io: # robots.txt for https://www.greenhouse.io/
greenhouse.io: # live - don't allow web crawlers to index cpresources/ or vendor/
brainyquote.com: # --------------------------------------------------------------------------------------
brainyquote.com: # Using bots or scrapers?
brainyquote.com: # Please read the 'BrainyQuote Terms Of Service'
brainyquote.com: # (specifically the 'License to Access and Use' section) at:
brainyquote.com: # https://www.brainyquote.com/about/terms
brainyquote.com: # https://www.brainyquote.com/es/sobre-nosotros/t%C3%A9rminos-de-servicio
brainyquote.com: # https://www.brainyquote.com/fr/%C3%A0-propos/conditions-d-utilisation
brainyquote.com: # --------------------------------------------------------------------------------------
getrevue.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
getrevue.co: #
getrevue.co: # To ban all spiders from the entire site uncomment the next two lines:
getrevue.co: # User-agent: *
getrevue.co: # Disallow: /
alaraby.co.uk: #
alaraby.co.uk: # robots.txt
alaraby.co.uk: #
alaraby.co.uk: # This file is to prevent the crawling and indexing of certain parts
alaraby.co.uk: # of your site by web crawlers and spiders run by sites like Yahoo!
alaraby.co.uk: # and Google. By telling these "robots" where not to go on your site,
alaraby.co.uk: # you save bandwidth and server resources.
alaraby.co.uk: #
alaraby.co.uk: # This file will be ignored unless it is at the root of your host:
alaraby.co.uk: # Used: http://example.com/robots.txt
alaraby.co.uk: # Ignored: http://example.com/site/robots.txt
alaraby.co.uk: #
alaraby.co.uk: # For more information about the robots.txt standard, see:
alaraby.co.uk: # http://www.robotstxt.org/robotstxt.html
alaraby.co.uk: # CSS, JS, Images
alaraby.co.uk: # Directories
alaraby.co.uk: # Files
alaraby.co.uk: # Paths (clean URLs)
alaraby.co.uk: # Paths (no clean URLs)
alaraby.co.uk: # search
centurylink.com: #SITEMAP
unity3d.com: #
unity3d.com: # robots.txt
unity3d.com: #
unity3d.com: # This file is to prevent the crawling and indexing of certain parts
unity3d.com: # of your site by web crawlers and spiders run by sites like Yahoo!
unity3d.com: # and Google. By telling these "robots" where not to go on your site,
unity3d.com: # you save bandwidth and server resources.
unity3d.com: #
unity3d.com: # This file will be ignored unless it is at the root of your host:
unity3d.com: # Used: http://example.com/robots.txt
unity3d.com: # Ignored: http://example.com/site/robots.txt
unity3d.com: #
unity3d.com: # For more information about the robots.txt standard, see:
unity3d.com: # http://www.robotstxt.org/robotstxt.html
unity3d.com: # CSS, JS, Images
unity3d.com: # Directories
unity3d.com: # Files
unity3d.com: # Paths (clean URLs)
unity3d.com: # Paths (no clean URLs)
unity3d.com: # Chinese Search Engines
stepstone.de: #
stepstone.de: # Any other use of robots or failure to obey the robots exclusion standards
stepstone.de: # set forth at <http://www.robotstxt.org/wc/exclusion.html> is strictly
stepstone.de: # prohibited.
stepstone.de: #
stepstone.de: # StepStone
therecipecritic.com: # Sitemap
eyny.com: # Disallow: /search.php
eyny.com: # Disallow: /*mobile=yes*
zbj.com: #update:2020-02-10
careers360.com: #
careers360.com: # robots.txt
careers360.com: #
careers360.com: # This file is to prevent the crawling and indexing of certain parts
careers360.com: # of your site by web crawlers and spiders run by sites like Yahoo!
careers360.com: # and Google. By telling these "robots" where not to go on your site,
careers360.com: # you save bandwidth and server resources.
careers360.com: #
careers360.com: # This file will be ignored unless it is at the root of your host:
careers360.com: # Used: http://example.com/robots.txt
careers360.com: # Ignored: http://example.com/site/robots.txt
careers360.com: #
careers360.com: # For more information about the robots.txt standard, see:
careers360.com: # http://www.robotstxt.org/robotstxt.html
careers360.com: #
careers360.com: # For syntax checking, see:
careers360.com: # http://www.frobee.com/robots-txt-check
careers360.com: # Directories
careers360.com: # Files
careers360.com: # Paths (clean URLs)
careers360.com: # Paths (no clean URLs)
indiatoday.in: # Disallow directive
indiatoday.in: # Directories
indiatoday.in: # Files
indiatoday.in: # Paths (clean URLs)
indiatoday.in: # Paths (no clean URLs)
commentcamarche.net: # https://commentcamarche.net
michigan.gov: # robots.txt for https://www.michigan.gov
michigan.gov: # SOM - WEX IBM Watson Explorer: VSE/1.0 (thompsonj@michigan.gov)
michigan.gov: # Yahoo!
michigan.gov: # MSN
michigan.gov: # BingBot
michigan.gov: # 80legs Crawler
michigan.gov: # Yandex
michigan.gov: # discoveryengine.com
michigan.gov: # Ahrefs.com
michigan.gov: # kalooga.com
michigan.gov: # blekko.com
michigan.gov: # changedetection.com
michigan.gov: # paper.li
michigan.gov: # Google
lankasri.com: # Disallow: /*? This is match ? anywhere in the URL
csc.gov.in: #container {
weightwatchers.com: #
weightwatchers.com: # robots.txt
weightwatchers.com: #
weightwatchers.com: # This file is to prevent the crawling and indexing of certain parts
weightwatchers.com: # of your site by web crawlers and spiders run by sites like Yahoo!
weightwatchers.com: # and Google. By telling these "robots" where not to go on your site,
weightwatchers.com: # you save bandwidth and server resources.
weightwatchers.com: #
weightwatchers.com: # This file will be ignored unless it is at the root of your host:
weightwatchers.com: # Used: http://example.com/robots.txt
weightwatchers.com: # Ignored: http://example.com/site/robots.txt
weightwatchers.com: #
weightwatchers.com: # For more information about the robots.txt standard, see:
weightwatchers.com: # http://www.robotstxt.org/robotstxt.html
weightwatchers.com: # Directories
weightwatchers.com: # Files
weightwatchers.com: # Paths (clean URLs)
weightwatchers.com: # Paths (no clean URLs)
weightwatchers.com: # Checkout
weightwatchers.com: #GPC
weightwatchers.com: # Sitemap
thrillist.com: # www.robotstxt.org/
thrillist.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
thrillist.com: # Directories
cibercuba.com: # Directories
cibercuba.com: # Files
cibercuba.com: # Paths (clean URLs)
cibercuba.com: # Paths (no clean URLs)
unsw.edu.au: #
unsw.edu.au: # robots.txt
unsw.edu.au: #
unsw.edu.au: # This file is to prevent the crawling and indexing of certain parts
unsw.edu.au: # of your site by web crawlers and spiders run by sites like Yahoo!
unsw.edu.au: # and Google. By telling these "robots" where not to go on your site,
unsw.edu.au: # you save bandwidth and server resources.
unsw.edu.au: #
unsw.edu.au: # This file will be ignored unless it is at the root of your host:
unsw.edu.au: # Used: http://example.com/robots.txt
unsw.edu.au: # Ignored: http://example.com/site/robots.txt
unsw.edu.au: #
unsw.edu.au: # For more information about the robots.txt standard, see:
unsw.edu.au: # http://www.robotstxt.org/robotstxt.html
unsw.edu.au: # CSS, JS, Images
unsw.edu.au: # Directories
unsw.edu.au: # Files
unsw.edu.au: # Paths (clean URLs)
unsw.edu.au: # Paths (no clean URLs)
nrk.no: # Served by akamai
stuff.co.nz: # robots for http://www.stuff.co.nz
stuff.co.nz: #Disallowed paths
stuff.co.nz: # Site Scrapers and bots that are not desirable
matterport.com: #
matterport.com: # robots.txt
matterport.com: #
matterport.com: # This file is to prevent the crawling and indexing of certain parts
matterport.com: # of your site by web crawlers and spiders run by sites like Yahoo!
matterport.com: # and Google. By telling these "robots" where not to go on your site,
matterport.com: # you save bandwidth and server resources.
matterport.com: #
matterport.com: # This file will be ignored unless it is at the root of your host:
matterport.com: # Used: http://example.com/robots.txt
matterport.com: # Ignored: http://example.com/site/robots.txt
matterport.com: #
matterport.com: # For more information about the robots.txt standard, see:
matterport.com: # http://www.robotstxt.org/robotstxt.html
matterport.com: # CSS, JS, Images
matterport.com: # Directories
matterport.com: # Files
matterport.com: # Paths (clean URLs)
matterport.com: # Paths (no clean URLs)
domain.com.au: ### BEGIN FILE ###
domain.com.au: #
domain.com.au: # allow access to off-market landing page, and NOT individual off-market property pages
domain.com.au: # early-access was re-branded to off-market, but we currently support both
domain.com.au: # Block dugg mirror
domain.com.au: # Block trovit bot
netacad.com: #
netacad.com: # robots.txt
netacad.com: #
netacad.com: # This file is to prevent the crawling and indexing of certain parts
netacad.com: # of your site by web crawlers and spiders run by sites like Yahoo!
netacad.com: # and Google. By telling these "robots" where not to go on your site,
netacad.com: # you save bandwidth and server resources.
netacad.com: #
netacad.com: # This file will be ignored unless it is at the root of your host:
netacad.com: # Used: http://example.com/robots.txt
netacad.com: # Ignored: http://example.com/site/robots.txt
netacad.com: #
netacad.com: # For more information about the robots.txt standard, see:
netacad.com: # http://www.robotstxt.org/robotstxt.html
netacad.com: # CSS, JS, Images
netacad.com: # Directories
netacad.com: # Files
netacad.com: # Paths (clean URLs)
netacad.com: # Paths (no clean URLs)
imore.com: #
imore.com: # robots.txt
imore.com: #
imore.com: # This file is to prevent the crawling and indexing of certain parts
imore.com: # of your site by web crawlers and spiders run by sites like Yahoo!
imore.com: # and Google. By telling these "robots" where not to go on your site,
imore.com: # you save bandwidth and server resources.
imore.com: #
imore.com: # This file will be ignored unless it is at the root of your host:
imore.com: # Used: http://example.com/robots.txt
imore.com: # Ignored: http://example.com/site/robots.txt
imore.com: #
imore.com: # For more information about the robots.txt standard, see:
imore.com: # http://www.robotstxt.org/robotstxt.html
imore.com: #
imore.com: # For syntax checking, see:
imore.com: # http://www.frobee.com/robots-txt-check
imore.com: # Directories
imore.com: # Files
imore.com: # Paths (clean URLs)
imore.com: # Paths (no clean URLs)
google.cz: # AdsBot
google.cz: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
gymshark.com: # we use Shopify as our ecommerce platform
gymshark.com: # Google adsbot ignores robots.txt unless specifically named!
joemonster.org: # www.robotstxt.org/
joemonster.org: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
ufmg.br: # www.robotstxt.org/
ufmg.br: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
onlinejobs.ph: # Then we start disallowing stuff
onlinejobs.ph: # Directories
onlinejobs.ph: # Disallow bots and crawlers
ox.ac.uk: #
ox.ac.uk: # robots.txt
ox.ac.uk: #
ox.ac.uk: # This file is to prevent the crawling and indexing of certain parts
ox.ac.uk: # of your site by web crawlers and spiders run by sites like Yahoo!
ox.ac.uk: # and Google. By telling these "robots" where not to go on your site,
ox.ac.uk: # you save bandwidth and server resources.
ox.ac.uk: #
ox.ac.uk: # This file will be ignored unless it is at the root of your host:
ox.ac.uk: # Used: http://example.com/robots.txt
ox.ac.uk: # Ignored: http://example.com/site/robots.txt
ox.ac.uk: #
ox.ac.uk: # For more information about the robots.txt standard, see:
ox.ac.uk: # http://www.robotstxt.org/wc/robots.html
ox.ac.uk: #
ox.ac.uk: # For syntax checking, see:
ox.ac.uk: # http://www.sxw.org.uk/computing/robots/check.html
ox.ac.uk: # Directories
ox.ac.uk: # Files
ox.ac.uk: # Paths (clean URLs)
ox.ac.uk: # Paths (no clean URLs)
ox.ac.uk: # Stop access to videos used in overlays
mi.com: # 2015/12/11
cnnbrasil.com.br: #
cnnbrasil.com.br: # robots.txt
cnnbrasil.com.br: #
gitlab.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
gitlab.com: #
gitlab.com: # To ban all spiders from the entire site uncomment the next two lines:
gitlab.com: # User-Agent: *
gitlab.com: # Disallow: /
gitlab.com: # Add a 1 second delay between successive requests to the same server, limits resources used by crawler
gitlab.com: # Only some crawlers respect this setting, e.g. Googlebot does not
gitlab.com: # Crawl-delay: 1
gitlab.com: # Based on details in https://gitlab.com/gitlab-org/gitlab/blob/master/config/routes.rb,
gitlab.com: # https://gitlab.com/gitlab-org/gitlab/blob/master/spec/routing, and using application
gitlab.com: # Global routes
gitlab.com: # Restrict allowed routes to avoid very ugly search results
gitlab.com: # Generic resource routes like new, edit, raw
gitlab.com: # This will block routes like:
gitlab.com: # - /projects/new
gitlab.com: # - /gitlab-org/gitlab-foss/issues/123/-/edit
gitlab.com: # Group details
gitlab.com: # Project details
homeadvisor.com: # robots.txt for http://www.homeadvisor.com/
abb.com: # start robots.txt
abb.com: # PIS Product detail pages
abb.com: # Robotics:
abb.com: # Image Bank:
abb.com: # CCD:
abb.com: # Other:
abb.com: # End manual section
abb.com: #Finished OK
moj.gov.sa: #ctl00_PlaceHolderMain_SiteMapPath1 span:nth-child(2)
teepublic.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
teepublic.com: #
teepublic.com: # To ban all spiders from the entire site uncomment the next two lines:
teepublic.com: # User-Agent: *
teepublic.com: # Disallow: /
saraba1st.com: #
saraba1st.com: # robots.txt for Discuz! X3
saraba1st.com: #
ethnos.gr: #
ethnos.gr: # robots.txt
ethnos.gr: #
ethnos.gr: # This file is to prevent the crawling and indexing of certain parts
ethnos.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
ethnos.gr: # and Google. By telling these "robots" where not to go on your site,
ethnos.gr: # you save bandwidth and server resources.
ethnos.gr: #
ethnos.gr: # This file will be ignored unless it is at the root of your host:
ethnos.gr: # Used: http://example.com/robots.txt
ethnos.gr: # Ignored: http://example.com/site/robots.txt
ethnos.gr: #
ethnos.gr: # For more information about the robots.txt standard, see:
ethnos.gr: # http://www.robotstxt.org/robotstxt.html
ethnos.gr: # CSS, JS, Images
ethnos.gr: # Directories
ethnos.gr: # Files
ethnos.gr: # Paths (clean URLs)
ethnos.gr: # Paths (no clean URLs)
nyu.edu: # Disallow: /registrar/ -- Commented out by Jim on 2020-01-16
filehippo.com: # v.1.4
filehippo.com: # /// .//.
filehippo.com: # ///////////// *////////////
filehippo.com: # ///////////// /, ////////////,
filehippo.com: # ./////////// *// ////////////
filehippo.com: # ///////////. ///* ///////////. .... .. ,. ... * * , ... ... */*
filehippo.com: # /// //// .////// //// //// ./*,, // // //,,. /* // // //,,// //,*// /// ,//
filehippo.com: # .//////// /////////. ///////// .//// // // ///// /////// // // // // .// ,/* //
filehippo.com: # //////, /////////////. //////. ./, // // // /* // // //.. //.. // *//
filehippo.com: #/////. /////////////////// ///// /, // ///// ///// /, */ // // // /////.
filehippo.com: #//// *////////////////////// ////
filehippo.com: # // * /////////////// /* //
filehippo.com: # /// *///////////* ///
filehippo.com: # *///////////*
filehippo.com: #
filehippo.com: # Disallow: /search?q=*
filehippo.com: # Disallow: */search?q=*
michaels.com: ##############################
michaels.com: # Welcome to Michaels Robots.txt File #
michaels.com: # Last Updated 11/08/2018 #
michaels.com: ##############################
metacritic.com: # Google is crawling the Ad defineSlot() parameters. Exclude them so we don't get a bunch of 404s.
techbang.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
techbang.com: #
techbang.com: # To ban all spiders from the entire site uncomment the next two lines:
techbang.com: # User-Agent: *
techbang.com: # Disallow: /
prada.com: # Sitemaps
openclassrooms.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
france.tv: # www.robotstxt.org/
france.tv: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
uhc.com: # Exclude DAM nodes serving pdfs for /shop. iex, landing and statedpls
uhc.com: # Exclude site paths
aawsat.com: #allow amp key
aawsat.com: # Paths (clean URLs)
aawsat.com: # Paths (no clean URLs)
aawsat.com: #HybridAuth paths
khaleejtimes.com: # Updated: 2009-10-19
khaleejtimes.com: # Robots.txt
khaleejtimes.com: # Block Nocache=1
khaleejtimes.com: # Block Static Update
khaleejtimes.com: # Block Folders
khaleejtimes.com: #Disallow: /assets/
khaleejtimes.com: #Disallow: /images/
khaleejtimes.com: #Block certain actions
khaleejtimes.com: #Disallow: /apps/pbcsi.dll
khaleejtimes.com: #Force certain actions
khaleejtimes.com: # Sitemap files
team-bhp.com: #
team-bhp.com: # robots.txt
team-bhp.com: #
team-bhp.com: # This file is to prevent the crawling and indexing of certain parts
team-bhp.com: # of your site by web crawlers and spiders run by sites like Yahoo!
team-bhp.com: # and Google. By telling these "robots" where not to go on your site,
team-bhp.com: # you save bandwidth and server resources.
team-bhp.com: #
team-bhp.com: # This file will be ignored unless it is at the root of your host:
team-bhp.com: # Used: http://example.com/robots.txt
team-bhp.com: # Ignored: http://example.com/site/robots.txt
team-bhp.com: #
team-bhp.com: # For more information about the robots.txt standard, see:
team-bhp.com: # http://www.robotstxt.org/wc/robots.html
team-bhp.com: #
team-bhp.com: # For syntax checking, see:
team-bhp.com: # http://www.sxw.org.uk/computing/robots/check.html
team-bhp.com: # Directories
team-bhp.com: # Disallow: /misc/
team-bhp.com: # Files
team-bhp.com: # Paths (clean URLs)
team-bhp.com: # Paths (no clean URLs)
team-bhp.com: # Forum (was present before 14th Feb 2013 new portal, few more added later)
team-bhp.com: # Sitemaps have also been submitted directly in GSC, but left here for other search engines
rtings.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
rtings.com: #
rtings.com: # To ban all spiders from the entire site uncomment the next two lines:
easeus.com: # Robots.txt file from https://www.easeus.com
easeus.com: #
easeus.com: # All robots will spider the domain
jetbrains.com: # Global
jetbrains.com: # Allow crawling: these are blocked using the robots meta tag instead
jetbrains.com: # disallow: */promo/
jetbrains.com: # disallow: */download-thanks
jetbrains.com: # disallow: */download/download_thanks.jsp
jetbrains.com: # Sitemaps
jetbrains.com: # AppCode
jetbrains.com: # CLion
jetbrains.com: # DataGrip
jetbrains.com: # dotCover
jetbrains.com: # dotMemory
jetbrains.com: # dotPeak
jetbrains.com: # dotTrace
jetbrains.com: # Hub
jetbrains.com: # Idea
jetbrains.com: # MPS
jetbrains.com: # PhpStorm
jetbrains.com: # PyCharm
jetbrains.com: # PyCharm Edu
jetbrains.com: # ReSharper
jetbrains.com: # ReSharper –°++
jetbrains.com: # Research
jetbrains.com: # Rider
jetbrains.com: # RubyMine
jetbrains.com: # TeamCity
jetbrains.com: # Upsource
jetbrains.com: # WebStorm
jetbrains.com: # YouTrack
cjn.cn: # Robots.txt file from http://www.cjn.cn
cjn.cn: # All robots will spider the domain
videolan.org: # $Id$
videolan.org: # Do not crawl CVS and .svn directories
videolan.org: # "This robot collects content from the Internet for the sole purpose of
videolan.org: # helping educational institutions prevent plagiarism. [...] we compare
videolan.org: # student papers against the content we find on the Internet to see if we
videolan.org: # can find similarities." (http://www.turnitin.com/robot/crawlerinfo.html)
videolan.org: # --> fuck off.
videolan.org: # "NameProtect engages in crawling activity in search of a wide range of
videolan.org: # brand and other intellectual property violations that may be of interest
videolan.org: # to our clients." (http://www.nameprotect.com/botinfo.html)
videolan.org: # --> fuck off.
videolan.org: # "iThenticate® is a new service we have developed to combat the piracy
videolan.org: # of intellectual property and ensure the originality of written work for#
videolan.org: # publishers, non-profit agencies, corporations, and newspapers."
videolan.org: # (http://www.slysearch.com/)
videolan.org: # --> fuck off.
mailshake.com: # robotstxt.org
straitstimes.com: #
straitstimes.com: # robots.txt
straitstimes.com: #
straitstimes.com: # This file is to prevent the crawling and indexing of certain parts
straitstimes.com: # of your site by web crawlers and spiders run by sites like Yahoo!
straitstimes.com: # and Google. By telling these "robots" where not to go on your site,
straitstimes.com: # you save bandwidth and server resources.
straitstimes.com: #
straitstimes.com: # This file will be ignored unless it is at the root of your host:
straitstimes.com: # Used: http://example.com/robots.txt
straitstimes.com: # Ignored: http://example.com/site/robots.txt
straitstimes.com: #
straitstimes.com: # For more information about the robots.txt standard, see:
straitstimes.com: # http://www.robotstxt.org/robotstxt.html
straitstimes.com: # Directories
straitstimes.com: # Files
straitstimes.com: # Paths (clean URLs)
straitstimes.com: # Paths (no clean URLs)
shopclues.com: #Baiduspider
shopclues.com: #Yandex
shopclues.com: #Sosospider
shopclues.com: #Ezooms
shopclues.com: #Sogou
shopclues.com: #80legs.com
kenyans.co.ke: #
kenyans.co.ke: # robots.txt
kenyans.co.ke: #
kenyans.co.ke: # This file is to prevent the crawling and indexing of certain parts
kenyans.co.ke: # of your site by web crawlers and spiders run by sites like Yahoo!
kenyans.co.ke: # and Google. By telling these "robots" where not to go on your site,
kenyans.co.ke: # you save bandwidth and server resources.
kenyans.co.ke: #
kenyans.co.ke: # This file will be ignored unless it is at the root of your host:
kenyans.co.ke: # Used: http://example.com/robots.txt
kenyans.co.ke: # Ignored: http://example.com/site/robots.txt
kenyans.co.ke: #
kenyans.co.ke: # For more information about the robots.txt standard, see:
kenyans.co.ke: # http://www.robotstxt.org/robotstxt.html
kenyans.co.ke: # CSS, JS, Images
kenyans.co.ke: # Directories
kenyans.co.ke: # Files
kenyans.co.ke: # Paths (clean URLs)
kenyans.co.ke: # Paths (no clean URLs)
thepointsguy.com: # Sitemap archive
uci.edu: # www.robotstxt.org/
uci.edu: # http://code.google.com/web/controlcrawlindex/
penademo.wordpress.com: # This file was generated on Mon, 15 Feb 2021 00:08:23 +0000
utexas.edu: #
utexas.edu: # robots.txt
utexas.edu: #
utexas.edu: # This file is to prevent the crawling and indexing of certain parts
utexas.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
utexas.edu: # and Google. By telling these "robots" where not to go on your site,
utexas.edu: # you save bandwidth and server resources.
utexas.edu: #
utexas.edu: # This file will be ignored unless it is at the root of your host:
utexas.edu: # Used: http://example.com/robots.txt
utexas.edu: # Ignored: http://example.com/site/robots.txt
utexas.edu: #
utexas.edu: # For more information about the robots.txt standard, see:
utexas.edu: # http://www.robotstxt.org/robotstxt.html
utexas.edu: # CSS, JS, Images
utexas.edu: # Directories
utexas.edu: # Files
utexas.edu: # Paths (clean URLs)
utexas.edu: # Paths (no clean URLs)
departementfeminin.com: # Sitemap
departementfeminin.com: #URL Parameters
fashionnova.com: # we use Shopify as our ecommerce platform
fashionnova.com: # Google adsbot ignores robots.txt unless specifically named!
neimanmarcus.com: # Updated 03-19-2020
diplomatie.gouv.fr: # robots.txt
diplomatie.gouv.fr: # @url: https://www.diplomatie.gouv.fr
diplomatie.gouv.fr: # @generator: SPIP
ionos.de: #print
ionos.de: #terms and conditions
ionos.de: #Popups etc.
ionos.de: #Results
ionos.de: #crawl delay
saglik.gov.tr: #header {
pinterest.de: # Pinterest is hiring!
pinterest.de: #
pinterest.de: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.de: #
pinterest.de: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
bt.dk: # www.robotstxt.org/
bt.dk: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
newsbreak.com: # New crawlers to block 2016
pointtown.com: # sitemap url
walla.co.il: # robots.txt - 2018-03-13
nielsen.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
liberal.gr: #Alexa
liberal.gr: #All others
topuniversities.com: #
topuniversities.com: # robots.txt
topuniversities.com: #
topuniversities.com: # This file is to prevent the crawling and indexing of certain parts
topuniversities.com: # of your site by web crawlers and spiders run by sites like Yahoo!
topuniversities.com: # and Google. By telling these "robots" where not to go on your site,
topuniversities.com: # you save bandwidth and server resources.
topuniversities.com: #
topuniversities.com: # This file will be ignored unless it is at the root of your host:
topuniversities.com: # Used: http://example.com/robots.txt
topuniversities.com: # Ignored: http://example.com/site/robots.txt
topuniversities.com: #
topuniversities.com: # For more information about the robots.txt standard, see:
topuniversities.com: # http://www.robotstxt.org/robotstxt.html
topuniversities.com: # CSS, JS, Images
topuniversities.com: # Directories
topuniversities.com: # Files
topuniversities.com: # Paths (clean URLs)
topuniversities.com: # Paths (no clean URLs)
topuniversities.com: # Paths (others)
colourpop.com: # we use Shopify as our ecommerce platform
colourpop.com: # Google adsbot ignores robots.txt unless specifically named!
google.com.kw: # AdsBot
google.com.kw: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
vhlcentral.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
vhlcentral.com: #
vhlcentral.com: # To ban all spiders from the entire site uncomment the next two lines:
vhlcentral.com: # User-Agent: *
vhlcentral.com: # Disallow: /
polimi.it: # Da: http://www.typo3blog.nl/seo/what-robotstxt-to-use-with-typo3.html
polimi.it: #Disallow: /uploads/
fsu.edu: # robots.txt for http://www.fsu.edu/
fsu.edu: # see http://info.webcrawler.com/mak/projects/robots/norobots.html
fsu.edu: # see http://www.robotstxt.org/wc/exclusion.html
fsu.edu: # see http://www.robotstxt.org/wc/norobots.html
fsu.edu: # see http://www.robotstxt.org/wc/norobots-rfc.html
fsu.edu: #Disallow: /Books/
fsu.edu: #Disallow: /Phones/
fsu.edu: #Disallow: /directories/
fsu.edu: #Disallow: /Jobs/
fsu.edu: #Disallow: /Campus/
fsu.edu: #Disallow: /Links/
fsu.edu: #Disallow: /Employee/
fsu.edu: #Disallow: /Students/
covid19india.org: # https://www.robotstxt.org/robotstxt.html
seemorgh.com: # JSitemap entries
gov.bc.ca: # robots.txt
gov.bc.ca: #HRSS
healthgrades.com: # Robots.txt file HealthGrades.com
healthgrades.com: # July 27, 2020
healthgrades.com: # XML Sitemap Root File
healthgrades.com: #trying noindex vs disallow to avoid SEO blocking of versioned node page resources
healthgrades.com: # Disallow certain directories for b2b site
healthgrades.com: # Disallow consumer javascript and css folders
healthgrades.com: # Commented out to test impact on ability of Google to crawl / rankings
healthgrades.com: # Disallow: /Consumer/styles/
healthgrades.com: # Disallow: /consumer/styles/
healthgrades.com: # Disallow: /Consumer/scripts/
healthgrades.com: # Disallow: /consumer/scripts/
healthgrades.com: # End robots.txt file
ricardo.ch: # robots.txt for https://www.ricardo.ch/
ricardo.ch: # English Pages until release
ricardo.ch: # 20th anniversary dedicated pages
ricardo.ch: # Selling Form
ricardo.ch: # 28.04.2016
ricardo.ch: # CMS Pages
ricardo.ch: # Archived Article
ricardo.ch: # Category XML Pages
ricardo.ch: # Legacy and new online shop
ricardo.ch: # New ratings pages
ricardo.ch: # Legacy
ricardo.ch: # Legacy French Pages
ricardo.ch: # Disallow commercial bots we don't like
ricardo.ch: # Disallow static resources and API endpoints crawling
ricardo.ch: # User agent names for Google AdsBot can be found here : https://support.google.com/webmasters/answer/1061943?hl=en
ricardo.ch: # Instruction for OnCrawl bot can be found here : http://help.oncrawl.com/en/articles/2767653-oncrawl-crawler-how-does-the-oncrawl-bot-find-and-crawl-pages#:~:text=OnCrawl%20follows%20all%20instructions%20to,will%20apply%20to%20your%20crawl.
serpro.gov.br: # Define access-restrictions for robots/spiders
serpro.gov.br: # http://www.robotstxt.org/wc/norobots.html
serpro.gov.br: # By default we allow robots to access all areas of our site
serpro.gov.br: # already accessible to anonymous users
serpro.gov.br: # Add Googlebot-specific syntax extension to exclude forms
serpro.gov.br: # that are repeated for each piece of content in the site
serpro.gov.br: # the wildcard is only supported by Googlebot
serpro.gov.br: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling
karenmillen.com: # Pages
karenmillen.com: # Product Filter #
karenmillen.com: # Ordering & Product per page #
karenmillen.com: # Number of product per page - Default 60
karenmillen.com: # Order By
karenmillen.com: # Price
karenmillen.com: # Faceted Navigation #
karenmillen.com: # Search #
karenmillen.com: # Ensure no Static Ressources is blocked #
karenmillen.com: # Crawl Delay - 5 URL max per second
showroomprive.com: # Site Desktop FR
showroomprive.com: # BackgroundAcqui
showroomprive.com: # Accueil
showroomprive.com: # Erreur
showroomprive.com: # MonCompte
showroomprive.com: # Boutique
showroomprive.com: # NousContacter
showroomprive.com: # PagesP
showroomprive.com: # JeuOpe
showroomprive.com: # Voyages
showroomprive.com: # Livraison
showroomprive.com: # ErreursExploration
showroomprive.com: # Cms
thedoublef.com: # Sitemap files
glassdoor.co.in: # India
glassdoor.co.in: # Greetings, human beings!,
glassdoor.co.in: #
glassdoor.co.in: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
glassdoor.co.in: #
glassdoor.co.in: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs?
glassdoor.co.in: #
glassdoor.co.in: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com
glassdoor.co.in: #
glassdoor.co.in: #
glassdoor.co.in: #logging related
glassdoor.co.in: # Blocking track urls (ACQ-2468)
glassdoor.co.in: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831)
glassdoor.co.in: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's)
glassdoor.co.in: # TRFC-4037 Block page from being indexed
glassdoor.co.in: # TRFC-4037 Block page from being indexed
glassdoor.co.in: #
glassdoor.co.in: # Note that this file has the extension '.text' rather than the more-standard '.txt'
glassdoor.co.in: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and
glassdoor.co.in: # there doesn't seem to be a way to turn this off.)
glassdoor.co.in: #
dhl.de: # robots.txt for /content/de/de
ipindiaonline.gov.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
ipindiaonline.gov.in: #content{margin:0 0 0 2%;position:relative;}
guru99.com: # If the Joomla site is installed within a folder such as at
guru99.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be
guru99.com: # moved to the site root at e.g. www.example.com/robots.txt
guru99.com: # AND the joomla folder name MUST be prefixed to the disallowed
guru99.com: # path, e.g. the Disallow rule for the /administrator/ folder
guru99.com: # MUST be changed to read Disallow: /joomla/administrator/
guru99.com: #
guru99.com: # For more information about the robots.txt standard, see:
guru99.com: # http://www.robotstxt.org/orig.html
guru99.com: #
guru99.com: # For syntax checking, see:
guru99.com: # http://tool.motoricerca.info/robots-checker.phtml
ga.gov: #
ga.gov: # robots.txt
ga.gov: #
ga.gov: # This file is to prevent the crawling and indexing of certain parts
ga.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
ga.gov: # and Google. By telling these "robots" where not to go on your site,
ga.gov: # you save bandwidth and server resources.
ga.gov: #
ga.gov: # This file will be ignored unless it is at the root of your host:
ga.gov: # Used: http://example.com/robots.txt
ga.gov: # Ignored: http://example.com/site/robots.txt
ga.gov: #
ga.gov: # For more information about the robots.txt standard, see:
ga.gov: # http://www.robotstxt.org/robotstxt.html
ga.gov: # CSS, JS, Images
ga.gov: # Directories
ga.gov: # Files
ga.gov: # Paths (clean URLs)
ga.gov: # Paths (no clean URLs)
ga.gov: # Book printer-friendly pages
beinsports.com: #Disallow: /$
beinsports.com: #Allow: /*/*/news/*/*$
beinsports.com: #Disallow: /*/news/*$
beinsports.com: #Disallow: /*/videos/*$
beinsports.com: #Allow: /us/*/video/*/*$
beinsports.com: #Disallow: /us/*/video/*$
beinsports.com: #Disallow: /us/soccer/video$
beinsports.com: #team&player
beinsports.com: #Disallow: /*/*/team/player/*
beinsports.com: #Disallow: /*/*/team/2018/*
beinsports.com: #Disallow: /*/*/team/2017/*
beinsports.com: #Disallow: /*/*/team/2016/*
beinsports.com: #Disallow: /*/*/team/2015/*
beinsports.com: #tags
beinsports.com: #Disallow: /*/tag/*/*
beinsports.com: #Disallow: /*/search*q*
beinsports.com: #Disallow: /*/search/
beinsports.com: # data-pages - galleries
logi.com: # Logitech
logi.com: # Modified 9.2.2009
voanews.com: #
voanews.com: # robots.txt
voanews.com: #
voanews.com: # This file is to prevent the crawling and indexing of certain parts
voanews.com: # of your site by web crawlers and spiders run by sites like Yahoo!
voanews.com: # and Google. By telling these "robots" where not to go on your site,
voanews.com: # you save bandwidth and server resources.
voanews.com: #
voanews.com: # This file will be ignored unless it is at the root of your host:
voanews.com: # Used: http://example.com/robots.txt
voanews.com: # Ignored: http://example.com/site/robots.txt
voanews.com: #
voanews.com: # For more information about the robots.txt standard, see:
voanews.com: # http://www.robotstxt.org/robotstxt.html
voanews.com: # CSS, JS, Images
voanews.com: # Directories
voanews.com: # Files
voanews.com: # Paths (clean URLs)
voanews.com: # Paths (no clean URLs)
luminpdf.com: # robots.txt generated by atozseotools.com
gumtree.pl: #Sitemaps
gumtree.pl: #Sorting parameters
gumtree.pl: #Other comments:
gumtree.pl: #Sorting parameters
gumtree.pl: #Other comments:
gumtree.pl: #Sorting parameters
gumtree.pl: #Other comments:
gumtree.pl: #Sorting parameters
gumtree.pl: #Other comments:
next.co.uk: ##### 500s #####
coltortiboutique.com: ##Disallow: /*?
coltortiboutique.com: # Disable checkout & customer account
coltortiboutique.com: # Disable Search pages
coltortiboutique.com: # Disable common folders
coltortiboutique.com: # Disable Tag & Review (Avoid duplicate content)
coltortiboutique.com: # Common files
coltortiboutique.com: # Disable sorting (Avoid duplicate content)
coltortiboutique.com: # Disable version control folders and others
coltortiboutique.com: #Disable Bitcoin
taojindi.com: # file: home robots.txt, 2012/09/13
taojindi.com: #
game.co.uk: # __PUBLIC_IP_ADDR__ - Internet facing IP Address or Domain name.
creately.com: #
creately.com: # robots.txt
creately.com: #
creately.com: # This file is to prevent the crawling and indexing of certain parts
creately.com: # of your site by web crawlers and spiders run by sites like Yahoo!
creately.com: # and Google. By telling these "robots" where not to go on your site,
creately.com: # you save bandwidth and server resources.
creately.com: #
creately.com: # This file will be ignored unless it is at the root of your host:
creately.com: # Used: http://example.com/robots.txt
creately.com: # Ignored: http://example.com/site/robots.txt
creately.com: #
creately.com: # For more information about the robots.txt standard, see:
creately.com: # http://www.robotstxt.org/wc/robots.html
creately.com: #
creately.com: # For syntax checking, see:
creately.com: # http://www.sxw.org.uk/computing/robots/check.html
creately.com: # Directories
creately.com: # Files
creately.com: # Paths (clean URLs)
creately.com: # Paths (no clean URLs)
creately.com: # Directories
creately.com: # Files
creately.com: # Paths (clean URLs)
creately.com: # Paths (no clean URLs)
creately.com: # ========================================= #
articulate.com: # ***********************************************************************
articulate.com: # ***********************************************************************
articulate.com: # ***********************************************************************
articulate.com: # *************** ****************
articulate.com: # *************** ****************
articulate.com: # *************** ****************
articulate.com: # *************** ****************
articulate.com: # ***********************************************************************
articulate.com: # ***********************************************************************
articulate.com: # *****************************. .*****************************
articulate.com: # ************************* *************************
articulate.com: # ********************** **********************
articulate.com: # ******************** ********************
articulate.com: # ****************** ,***********. ******************
articulate.com: # ***************** ***************** *****************
articulate.com: # **************** ,*******************, ****************
articulate.com: # ***************. ,*********************, ****************
articulate.com: # *************** .*********************** ***************
articulate.com: # *************** ************************. ***************
articulate.com: # *************** ************************, ***************
articulate.com: # *************** ,***********************. ***************
articulate.com: # *************** *********************** ***************
articulate.com: # **************** ********************* ***************
articulate.com: # ***************** ******************* ***************
articulate.com: # *****************, *************** ***************
articulate.com: # ******************* ,*****, ***************
articulate.com: # ********************* ***************
articulate.com: # *********************** *, ***************
articulate.com: # **************************. ****, ***************
articulate.com: # ***********************************************************************
articulate.com: # ***********************************************************************
articulate.com: # ***********************************************************************
gittigidiyor.com: ###################################################################
gittigidiyor.com: # #
gittigidiyor.com: # //. #
gittigidiyor.com: # /***/ #
gittigidiyor.com: # /*****/ #
gittigidiyor.com: # /*****/ #
gittigidiyor.com: # //****/ #
gittigidiyor.com: # ////*// #
gittigidiyor.com: # /############/*/************ #
gittigidiyor.com: # #########%### *//*********** #
gittigidiyor.com: # #########%%####************. #
gittigidiyor.com: # ##########%%%##/************ #
gittigidiyor.com: # ####### #
gittigidiyor.com: # ####### #
gittigidiyor.com: # ####### #
gittigidiyor.com: # ####### #
gittigidiyor.com: # #### #
gittigidiyor.com: # # #
gittigidiyor.com: # #
gittigidiyor.com: ############ Türkiye'nin Lider Alışveriş Sitesi ##############
mnml.la: # we use Shopify as our ecommerce platform
mnml.la: # Google adsbot ignores robots.txt unless specifically named!
kith.com: # we use Shopify as our ecommerce platform
kith.com: # Google adsbot ignores robots.txt unless specifically named!
welt.de: # The Facebook Crawler
welt.de: # Audisto Scraping Tool
airbnb.ca: # ///////
airbnb.ca: # // //
airbnb.ca: # // //
airbnb.ca: # // // /// /// ///
airbnb.ca: # // // /// ///
airbnb.ca: # // /// // //// /// /// /// //// /// //// /// //// /// ////
airbnb.ca: # // /// /// // ////////// /// ////////// /////////// ////////// ///////////
airbnb.ca: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.ca: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.ca: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.ca: # // // // // ////////// /// /// ////////// /// /// //////////
airbnb.ca: # // ///// //
airbnb.ca: # // ///// //
airbnb.ca: # // /// /// //
airbnb.ca: # ////// //////
airbnb.ca: #
airbnb.ca: #
airbnb.ca: # We thought you'd never make it!
airbnb.ca: # We hope you feel right at home in this file...unless you're a disallowed subfolder.
airbnb.ca: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering
airbnb.ca: # There's even a bring your robot to work day.
paris.fr: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
scielo.br: # Allow only major search spiders
scielo.br: # Block all other spiders
authentisign.com: # go away
arabnews.com: #
arabnews.com: # robots.txt
arabnews.com: #
arabnews.com: # This file is to prevent the crawling and indexing of certain parts
arabnews.com: # of your site by web crawlers and spiders run by sites like Yahoo!
arabnews.com: # and Google. By telling these "robots" where not to go on your site,
arabnews.com: # you save bandwidth and server resources.
arabnews.com: #
arabnews.com: # This file will be ignored unless it is at the root of your host:
arabnews.com: # Used: http://example.com/robots.txt
arabnews.com: # Ignored: http://example.com/site/robots.txt
arabnews.com: #
arabnews.com: # For more information about the robots.txt standard, see:
arabnews.com: # http://www.robotstxt.org/robotstxt.html
arabnews.com: # CSS, JS, Images
arabnews.com: # Directories
arabnews.com: # Files
arabnews.com: # Paths (clean URLs)
arabnews.com: # Paths (no clean URLs)
tickertape.in: #sitemaps
ucdavis.edu: #
ucdavis.edu: # robots.txt
ucdavis.edu: #
ucdavis.edu: # This file is to prevent the crawling and indexing of certain parts
ucdavis.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
ucdavis.edu: # and Google. By telling these "robots" where not to go on your site,
ucdavis.edu: # you save bandwidth and server resources.
ucdavis.edu: #
ucdavis.edu: # This file will be ignored unless it is at the root of your host:
ucdavis.edu: # Used: http://example.com/robots.txt
ucdavis.edu: # Ignored: http://example.com/site/robots.txt
ucdavis.edu: #
ucdavis.edu: # For more information about the robots.txt standard, see:
ucdavis.edu: # http://www.robotstxt.org/robotstxt.html
ucdavis.edu: # CSS, JS, Images
ucdavis.edu: # Directories
ucdavis.edu: # Files
ucdavis.edu: # Paths (clean URLs)
ucdavis.edu: # Paths (no clean URLs)
sir.kr: # 200305 생성
sir.kr: # 200403 iptables 차단을 했으나 막히지 않아 nginx 에서 강제로 막음
sir.kr: ### 5
sir.kr: ### 10
nzherald.co.nz: # robots.txt for https://www.nzherald.co.nz
nzherald.co.nz: #
nzherald.co.nz: #
nzherald.co.nz: # Good Bots allowed
nzherald.co.nz: #
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: # Prevent Google from incorrectly indexing ad link vars and non-screen pages
nzherald.co.nz: # User Account Pages
nzherald.co.nz: #
nzherald.co.nz: # Restrictions to all bots
nzherald.co.nz: #
nzherald.co.nz: # User Account Pages
nzherald.co.nz: #
nzherald.co.nz: # Image Bots
nzherald.co.nz: #
nzherald.co.nz: # User Account Pages
nzherald.co.nz: #
nzherald.co.nz: # Site scrapers & other known bad bots that are completely disallowed
nzherald.co.nz: #
google.com.qa: # AdsBot
google.com.qa: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
ontario.ca: #
ontario.ca: # robots.txt
ontario.ca: #
ontario.ca: # This file is to prevent the crawling and indexing of certain parts
ontario.ca: # of your site by web crawlers and spiders run by sites like Yahoo!
ontario.ca: # and Google. By telling these "robots" where not to go on your site,
ontario.ca: # you save bandwidth and server resources.
ontario.ca: #
ontario.ca: # This file will be ignored unless it is at the root of your host:
ontario.ca: # Used: http://example.com/robots.txt
ontario.ca: # Ignored: http://example.com/site/robots.txt
ontario.ca: #
ontario.ca: # For more information about the robots.txt standard, see:
ontario.ca: # http://www.robotstxt.org/wc/robots.html
ontario.ca: #
ontario.ca: # For syntax checking, see:
ontario.ca: # http://www.sxw.org.uk/computing/robots/check.html
ontario.ca: # Directories
ontario.ca: # Files
ontario.ca: # Paths (clean URLs)
ontario.ca: # Paths (no clean URLs)
ontario.ca: # Ontario.ca
skai.gr: #
skai.gr: # robots.txt
skai.gr: #
skai.gr: # This file is to prevent the crawling and indexing of certain parts
skai.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
skai.gr: # and Google. By telling these "robots" where not to go on your site,
skai.gr: # you save bandwidth and server resources.
skai.gr: #
skai.gr: # This file will be ignored unless it is at the root of your host:
skai.gr: # Used: http://example.com/robots.txt
skai.gr: # Ignored: http://example.com/site/robots.txt
skai.gr: #
skai.gr: # For more information about the robots.txt standard, see:
skai.gr: # http://www.robotstxt.org/robotstxt.html
skai.gr: #
skai.gr: # CSS, JS, Images
skai.gr: # Directories
skai.gr: # Files
skai.gr: # Paths (clean URLs)
skai.gr: # Paths (no clean URLs)
ceneo.pl: #Disallow: /*clr$
ceneo.pl: #User-agent: ia_archiver
ceneo.pl: #Crawl-delay: 30
ceneo.pl: #User-agent: Slurp
ceneo.pl: #Crawl-delay: 30
ceneo.pl: #User-agent: Yandex
ceneo.pl: #Crawl-delay: 30
ceneo.pl: #Disallow: /Click/
ceneo.pl: #User-agent: NetSprint
ceneo.pl: #Crawl-delay: 120
ceneo.pl: #User-agent: Speedy
ceneo.pl: #Crawl-Delay: 120
ceneo.pl: #zablokowane
indiatyping.com: # If the Joomla site is installed within a folder such as at
indiatyping.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be
indiatyping.com: # moved to the site root at e.g. www.example.com/robots.txt
indiatyping.com: # AND the joomla folder name MUST be prefixed to the disallowed
indiatyping.com: # path, e.g. the Disallow rule for the /administrator/ folder
indiatyping.com: # MUST be changed to read Disallow: /joomla/administrator/
indiatyping.com: #
indiatyping.com: # For more information about the robots.txt standard, see:
indiatyping.com: # http://www.robotstxt.org/orig.html
indiatyping.com: #
indiatyping.com: # For syntax checking, see:
indiatyping.com: # http://tool.motoricerca.info/robots-checker.phtml
getflywheel.com: # Default Flywheel robots file
orbitz.com: #
orbitz.com: # General bots
orbitz.com: #
orbitz.com: #hotel
orbitz.com: #flight
orbitz.com: #package
orbitz.com: #car
orbitz.com: #activities
orbitz.com: #cruise
orbitz.com: #other
orbitz.com: #
orbitz.com: # Google Ads
orbitz.com: #
orbitz.com: #
orbitz.com: #
orbitz.com: # Bing Ads
orbitz.com: #
orbitz.com: #
orbitz.com: # SemrushBot
orbitz.com: #
besoccer.com: # Google AdSense
besoccer.com: # Adsbot-Google
glassdoor.ca: # Canada
glassdoor.ca: # Greetings, human beings!,
glassdoor.ca: #
glassdoor.ca: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
glassdoor.ca: #
glassdoor.ca: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs?
glassdoor.ca: #
glassdoor.ca: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com
glassdoor.ca: #
glassdoor.ca: #
glassdoor.ca: #logging related
glassdoor.ca: # Blocking track urls (ACQ-2468)
glassdoor.ca: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831)
glassdoor.ca: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's)
glassdoor.ca: # TRFC-4037 Block page from being indexed
glassdoor.ca: #
glassdoor.ca: # Note that this file has the extension '.text' rather than the more-standard '.txt'
glassdoor.ca: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and
glassdoor.ca: # there doesn't seem to be a way to turn this off.)
glassdoor.ca: #
ekaie.com: #
ekaie.com: # robots.txt for Discuz! X3
ekaie.com: #
jining.com: #
jining.com: # robots.txt for Discuz! X3
jining.com: #
jisilu.cn: #
jisilu.cn: # robots.txt for WeCenter
jisilu.cn: #
ebay.es: ## BEGIN FILE ###
ebay.es: #
ebay.es: # allow-all
ebay.es: # DR
ebay.es: #
ebay.es: # The use of robots or other automated means to access the eBay site
ebay.es: # without the express permission of eBay is strictly prohibited.
ebay.es: # Notwithstanding the foregoing, eBay may permit automated access to
ebay.es: # access certain eBay pages but soley for the limited purpose of
ebay.es: # including content in publicly available search engines. Any other
ebay.es: # use of robots or failure to obey the robots exclusion standards set
ebay.es: # forth at <https://www.robotstxt.org/orig.html> is strictly
ebay.es: # prohibited.
ebay.es: #
ebay.es: # v10_ROW_Feb_2021
ebay.es: ### DIRECTIVES ###
ebay.es: # VIS Sitemaps
ebay.es: # PRP Sitemaps
ebay.es: # CLP Sitemaps
ebay.es: # BROWSE Sitemaps
ebay.es: ### END FILE ###
jusbrasil.com.br: # Disable search pagination
jusbrasil.com.br: ## Specific Artifact SERP
jusbrasil.com.br: # Lawsuits
jusbrasil.com.br: # Should be allowed
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes/
jusbrasil.com.br: # Shouldn't be allowed:
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes/artigos
jusbrasil.com.br: # https://www.jusbrasil.com.br/processos/nome/45622652/francisco-costa-peixoto-guimaraes/artigos/
lucidchart.com: #
lucidchart.com: # robots.txt
lucidchart.com: #
lucidchart.com: # This file is to prevent the crawling and indexing of certain parts
lucidchart.com: # of your site by web crawlers and spiders run by sites like Yahoo!
lucidchart.com: # and Google. By telling these "robots" where not to go on your site,
lucidchart.com: # you save bandwidth and server resources.
lucidchart.com: #
lucidchart.com: # This file will be ignored unless it is at the root of your host:
lucidchart.com: # Used: http://example.com/robots.txt
lucidchart.com: # Ignored: http://example.com/site/robots.txt
lucidchart.com: #
lucidchart.com: # For more information about the robots.txt standard, see:
lucidchart.com: # http://www.robotstxt.org/wc/robots.html
lucidchart.com: #
lucidchart.com: # For syntax checking, see:
lucidchart.com: # http://www.sxw.org.uk/computing/robots/check.html
lucidchart.com: # Directories
lucidchart.com: # Paths (no clean URLs)
lucidchart.com: #####
lucidchart.com: # Drupal
lucidchart.com: #####
lucidchart.com: # Directories
lucidchart.com: # Allow some content from /pages/misc
lucidchart.com: # Files
lucidchart.com: # Paths (clean URLs)
lucidchart.com: # Paths (no clean URLs)
lucidchart.com: # Noindex i18n Pages
lucidchart.com: #####
lucidchart.com: # Code-Base
lucidchart.com: #
lucidchart.com: # The following URL's are defined in our routing files,
lucidchart.com: # but have no value for indexing. Several of them should
lucidchart.com: # definitely NOT be indexed.
lucidchart.com: #####
yuque.com: # If you would like to crawl Yuque contact us at support@yuque.com.
yuque.com: # We also provide an extensive API: https://yuque.com/yuque/developer
ratopati.com: # vestacp autogenerated robots.txt
vc.ru: # .-----------------------------------.
vc.ru: # ( –ò—Ç—é, —á—Ç–æ –ø–æ–∫–∞–∑–∞—Ç—å —Ä–æ–±–æ—Ç–∞–º, –ø–æ–¥–æ–¥–∏—Ç–µ )
vc.ru: # ,-----------------------------------'
vc.ru: # -'
vc.ru: # ,
vc.ru: # ,-. _,---._ __ / \
vc.ru: # / ) .-' `./ / \
vc.ru: # ( ( ,' `/ /|
vc.ru: # \ `-" \'\ / |
vc.ru: # `. , \ \ / |
vc.ru: # /`. ,'-`----Y |
vc.ru: # ( ; | '
vc.ru: # | ,-. ,-' | /
vc.ru: # | | ( | CMTT.RU | /
vc.ru: # ) | \ `.___________|/
vc.ru: # `--' `--'
almalnews.com: # vestacp autogenerated robots.txt
bolavip.com: # Bloqueo de bots
ysl.com: # Pages
ysl.com: # Product
ysl.com: #sitemaps
ycharts.com: # Disallow data URLs, allow everything else
grailed.com: # _____ _____ _____ _ ______ _____
grailed.com: # / ____|| __ \ /\ |_ _|| | | ____|| __ \
grailed.com: # | | __ | |__) | / \ | | | | | |__ | | | |
grailed.com: # | | |_ || _ / / /\ \ | | | | | __| | | | |
grailed.com: # | |__| || | \ \ / ____ \ _| |_ | |____ | |____ | |__| |
grailed.com: # \_____||_| \_\/_/ \_\|_____||______||______||_____/
grailed.com: #
grailed.com: # Hello Robot,
grailed.com: #
grailed.com: # Very nice to e-meet you. We've been waiting for you. There are some cookies
grailed.com: # next to the sitemap, if you're hungry of course.
grailed.com: #
grailed.com: # With love,
grailed.com: # Grailed
grailed.com: #
fubo.tv: # robotstxt.org
lanzous.com: #link{background: #0088ff;color: #fff;padding: 10px 30px;border-radius: 3px;text-decoration: none;display: block;width: 100px;margin: 30px auto;}
upenn.edu: #
upenn.edu: # robots.txt
upenn.edu: #
upenn.edu: # This file is to prevent the crawling and indexing of certain parts
upenn.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
upenn.edu: # and Google. By telling these "robots" where not to go on your site,
upenn.edu: # you save bandwidth and server resources.
upenn.edu: #
upenn.edu: # This file will be ignored unless it is at the root of your host:
upenn.edu: # Used: http://example.com/robots.txt
upenn.edu: # Ignored: http://example.com/site/robots.txt
upenn.edu: #
upenn.edu: # For more information about the robots.txt standard, see:
upenn.edu: # http://www.robotstxt.org/robotstxt.html
upenn.edu: # CSS, JS, Images
upenn.edu: # Directories
upenn.edu: # Files
upenn.edu: # Paths (clean URLs)
upenn.edu: # Paths (no clean URLs)
qhdsny.com: Binary file (standard input) matches
desjardins.com: # Allow all
datareportal.com: # Squarespace Robots Txt
viettishop.com: ## robots.txt for Magento Community and Enterprise
viettishop.com: ## GENERAL SETTINGS
viettishop.com: ## Enable robots.txt rules for all crawlers
viettishop.com: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server.
viettishop.com: ## Set a custom crawl rate if you're experiencing traffic problems with your server.
viettishop.com: # Crawl-delay: 30
viettishop.com: ## Vietti sitemap:
viettishop.com: ## DEVELOPMENT RELATED SETTINGS
viettishop.com: ## Do not crawl development files and folders: CVS, svn directories and dump files
viettishop.com: ## GENERAL MAGENTO SETTINGS
viettishop.com: ## Do not crawl Magento admin page
viettishop.com: ## Do not crawl common Magento technical folders
viettishop.com: ## Do not crawl common Magento files
viettishop.com: ## MAGENTO SEO IMPROVEMENTS
viettishop.com: ## Do not crawl sub category pages that are sorted or filtered.
viettishop.com: ## Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs.
viettishop.com: ## Disallow: /index.php/
viettishop.com: ## Do not crawl links with session IDs
viettishop.com: ## Do not crawl checkout and user account pages
viettishop.com: ## Do not crawl seach pages and not-SEO optimized catalog links
viettishop.com: ## SERVER SETTINGS
viettishop.com: ## Do not crawl common server technical folders and files
viettishop.com: ## IMAGE CRAWLERS SETTINGS
viettishop.com: ## Extra: Uncomment if you do not wish Google and Bing to index your images
viettishop.com: # User-agent: Googlebot-Image
viettishop.com: # Disallow: /
viettishop.com: # User-agent: msnbot-media
viettishop.com: # Disallow: /
mangools.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
gradeup.co: # Block all user pages, hindi pages
gradeup.co: # Block urls with query params
gradeup.co: #Disallow: /user/*
gradeup.co: #Disallow: /hindi/*
gradeup.co: #Disallow: /post-i-*
gradeup.co: #Disallow: /query-i-*
gradeup.co: #Disallow: /shared-info-i-*
gradeup.co: #Disallow: /mcq-i-*
gradeup.co: #Sitemap: https://s3.amazonaws.com/sitemaps-gradeup/sitemap_index.xml
gradeup.co: #sitemap
standardbank.co.za: # robots.txt for Sites
standardbank.co.za: # Do Not delete this file.
rutgers.edu: #
rutgers.edu: # robots.txt
rutgers.edu: #
rutgers.edu: # This file is to prevent the crawling and indexing of certain parts
rutgers.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
rutgers.edu: # and Google. By telling these "robots" where not to go on your site,
rutgers.edu: # you save bandwidth and server resources.
rutgers.edu: #
rutgers.edu: # This file will be ignored unless it is at the root of your host:
rutgers.edu: # Used: http://example.com/robots.txt
rutgers.edu: # Ignored: http://example.com/site/robots.txt
rutgers.edu: #
rutgers.edu: # For more information about the robots.txt standard, see:
rutgers.edu: # http://www.robotstxt.org/robotstxt.html
rutgers.edu: # CSS, JS, Images
rutgers.edu: # Directories
rutgers.edu: # Files
rutgers.edu: # Paths (clean URLs)
rutgers.edu: # Paths (no clean URLs)
smile.io: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
smile.io: #
smile.io: # To ban all spiders from the entire site uncomment the next two lines:
smile.io: # User-agent: *
smile.io: # Disallow: /
cebraspe.org.br: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
cebraspe.org.br: #content{margin:0 0 0 2%;position:relative;}
ilduomo.it: # robots.txt automatically generated by PrestaShop e-commerce open-source solution
ilduomo.it: # http://www.prestashop.com - http://www.prestashop.com/forums
ilduomo.it: # This file is to prevent the crawling and indexing of certain parts
ilduomo.it: # of your site by web crawlers and spiders run by sites like Yahoo!
ilduomo.it: # and Google. By telling these "robots" where not to go on your site,
ilduomo.it: # you save bandwidth and server resources.
ilduomo.it: # For more information about the robots.txt standard, see:
ilduomo.it: # http://www.robotstxt.org/robotstxt.html
ilduomo.it: # Allow Directives
ilduomo.it: # Private pages
ilduomo.it: # Directories
ilduomo.it: # Files
browserstack.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
browserstack.com: #
browserstack.com: # To ban all spiders from the entire site uncomment the next two lines:
cheezburger.com: # Moist
colg.cn: #
colg.cn: # robots.txt for Discuz! X3
colg.cn: #
colg.cn: #Disallow: /forum.php?mod=redirect*
colg.cn: #Disallow: /forum.php?mod=post*
semanticscholar.org: # We are a non-profit research institute. If you would like to collaborate with us,
semanticscholar.org: # please contact us at: ai2-info@allenai.org
semanticscholar.org: # Or check out our public API http://api.semanticscholar.org/
sme.sk: # www.robotstxt.org/
sme.sk: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
monash.edu: # www.monash.edu
monash.edu: # Added for Dey Alexander. Templates no be indexed. RK dec 2003
monash.edu: #Added for migration access issue 10/9/03 sms#
monash.edu: #Disallow: /library/ # Removed 21/11/2012 by DMa - Google needs to index library's site
monash.edu: #Disallow: /arts/ # removed on 20/11/19 - after domain change for arts
monash.edu: ##INS555156 HJiang, will be on 1st of Oct
monash.edu: # INC000001675021
monash.edu: # INC000001890005
monash.edu: # Disallow: /library/search
monash.edu: # INC000001918907
monash.edu: ##### Don't index web server statistics
monash.edu: ##### Don't index user disks - they should be accessed as ~username
monash.edu: #lls site moved to celts server
monash.edu: #Disallow: /lls/
monash.edu: #Disallow: /orientation/
monash.edu: #fixing issue with mrgs progress reports indexing
monash.edu: #HEAT 450520 - removing /alumni./assets/images/ from Google image search
monash.edu: #User-agent: Googlebot-Image
monash.edu: #for Gary Gopinathan REMEDY INC200916 by HM 21 Feb 2012
monash.edu: #for Rachel Zelada REMEDY INC378928 by DMa 14 Dec 2012
monash.edu: #added for Derek Brown REMEDY INC400190 - 30 Jan 2013 by HM
monash.edu: #REMEDY INC513114 - 1 Aug 2013 by DMa
monash.edu: #REMEDY INC542349 - 11 Nov by DMa
monash.edu: #REMEDY INC693434 - 15 May 2014 by DMa
monash.edu: #REMEDY INC842290 - 12/01/2015 by MathewR
monash.edu: #Squiz Zd 38741
monash.edu: # INC000002051888
monash.edu: # Don't index Internal news folder as requested by Internal communications team - 2 June 2017 - done by Shefali Joshi
monash.edu: # MS-81 - Move Study to monash.edu domain - 19 Aug 2017 by dcook
monash.edu: # SDVIC-607 - Prevent crawl of old majors
monash.edu: #added for Fiona McQueen by SMC digital team 09/18
monash.edu: # INC000001972784 added for Greg McKeown by HDo 14 Sep 2018
monash.edu: # INC000002074170
monash.edu: # Disallow MUMA's Design Files
monash.edu: # SDVIC-4380 Prevent crawling of Funnelback search queries or facets.
monash.edu: # ### OTHER SETTINGS ###
monash.edu: # INC000002334371 - Disallow MADA's /artdes/ paths
monash.edu: #added by Simiao Luo from SMC, requested by Harriosn Gist, 11/12/2019#
monash.edu: #added by Wilson for Jenny Legg INC000002497957
monash.edu: #added by Angelene Wong, 22/10/2020
monash.edu: #added by Angelene Wong, 27/10/2020
ylsw.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
ylsw.com: #content{margin:0 0 0 2%;position:relative;}
in.gov: # robots.txt for http://www.IN.gov/
bol.com: # Sitemap
bol.com: # SEO-3529
bol.com: # Shop
bol.com: # SEB-1632
bol.com: # SEB-1013
bol.com: # SEB-2874
bol.com: # SEB-2131
bol.com: # SEB-2022
bol.com: # Excluding double list and category pages created through CMS
bol.com: # Excluding links to review tools
bol.com: # Excluding links to reponsive review, q&a forms and ajax calls supporting them
bol.com: # Excluding all /catalog/ urls (link to e-mail a product, recommendations, compare, tab-content, etc.)
bol.com: # Excluding non-relevant ATG links
bol.com: # Excluding non-relevant brand pages
bol.com: # Excluding non-relevant prijsoverzicht urls
bol.com: # Excluding error urls
bol.com: # Track and trace page
bol.com: # SEB-1294
bol.com: # SEB-1822
bol.com: # SEB-1574
bol.com: # Partner Service
bol.com: # MAR-2754
bol.com: # SEB-1916
bol.com: # SEB-1574
anthem.com: # Below two items are to exclude bad URLs from Google Bot as of 6/2014
anthem.com: #microsites
anthem.com: # bcbsga.com
anthem.com: # unicare.com
anthem.com: # Internal Search Bots
officeworks.com.au: # Non-seo URL
officeworks.com.au: # Customer specific urls
officeworks.com.au: # Excluded seo URLs
officeworks.com.au: # Old mobile & desktop urls
officeworks.com.au: # Old campaigns
officeworks.com.au: # business terms temporary
officeworks.com.au: # Sitemap
cnn.gr: # If the Joomla site is installed within a folder such as at
cnn.gr: # e.g. www.example.com/joomla/ the robots.txt file MUST be
cnn.gr: # moved to the site root at e.g. www.example.com/robots.txt
cnn.gr: # AND the joomla folder name MUST be prefixed to the disallowed
cnn.gr: # path, e.g. the Disallow rule for the /administrator/ folder
cnn.gr: # MUST be changed to read Disallow: /joomla/administrator/
cnn.gr: #
cnn.gr: # For more information about the robots.txt standard, see:
cnn.gr: # http://www.robotstxt.org/orig.html
cnn.gr: #
cnn.gr: # For syntax checking, see:
cnn.gr: # http://tool.motoricerca.info/robots-checker.phtml
ryerson.ca: # /robots.txt file for http://ryerson.ca/
ryerson.ca: # mail webmaster@ryerson.ca
support.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
support.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
support.wordpress.com: # This file was generated on Thu, 28 Jan 2021 13:33:58 +0000
google.dk: # AdsBot
google.dk: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
belgium.be: #
belgium.be: # robots.txt
belgium.be: #
belgium.be: # This file is to prevent the crawling and indexing of certain parts
belgium.be: # of your site by web crawlers and spiders run by sites like Yahoo!
belgium.be: # and Google. By telling these "robots" where not to go on your site,
belgium.be: # you save bandwidth and server resources.
belgium.be: #
belgium.be: # This file will be ignored unless it is at the root of your host:
belgium.be: # Used: http://example.com/robots.txt
belgium.be: # Ignored: http://example.com/site/robots.txt
belgium.be: #
belgium.be: # For more information about the robots.txt standard, see:
belgium.be: # http://www.robotstxt.org/robotstxt.html
belgium.be: # CSS, JS, Images
belgium.be: # Directories
belgium.be: # Files
belgium.be: # Paths (clean URLs)
belgium.be: # Paths (no clean URLs)
brightedge.com: # CSS, JS, Images
brightedge.com: # Directories
brightedge.com: # Files
brightedge.com: # Paths (clean URLs)
brightedge.com: # Paths (no clean URLs)
brightedge.com: # Paths (thank you pages)
ixl.com: # -----------------------------------------------------------------------------
ixl.com: #
ixl.com: # Areas that search robots should avoid
ixl.com: # (c) 2011 IXL Learning. All rights reserved.
ixl.com: #
ixl.com: # created by jkent on 8 Mar 2002
ixl.com: #
ixl.com: # Site-friendly search robots use this file to determine where _not_
ixl.com: # to go. Some URL spaces are simply counterproductive.
ixl.com: # -----------------------------------------------------------------------------
musixmatch.com: # Allow only major search spiders
musixmatch.com: # Block all other spiders
musixmatch.com: # Block Directories for all spiders
facebookblueprint.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
facebookblueprint.com: # Required for activities shared to Twitter, see https://dev.twitter.com/cards/getting-started "URL Crawling & Caching"
jomashop.com: # Directories
jomashop.com: # Session ID
restream.io: # Those landings are used by the marketing to track email campaigns
tasteofhome.com: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/
proprofs.com: # Sitemap files
centrum24.pl: #error_title > div {
express.co.uk: #170820-DXD-6728
nolo.com: #Crawl-delay: 10
nolo.com: # Directories
nolo.com: # Nolo urls:
nolo.com: # Ehub Paths
nolo.com: # Paths (clean URLs)
nolo.com: # Paths (no clean URLs)
nolo.com: # NCMS
denverpost.com: # Sitemap archive
36kr.com: # robots.txt
aetna.com: # robots.txt for http://www.aetna.com
aetna.com: #
aetna.com: # Owner - Aetna.com User Interface Design and Development Team / AIS ADS Web Services
aetna.com: #
aetna.com: # List of Orphan urls - not linked from site - not part of search:
aetna.com: # http://www.aetna.com/employer/AetnaLink/
aetna.com: # http://www.aetna.com/producer/marsh_broker.html
aetna.com: #
aetna.com: # http://www.aetna.com/TIBRF.html
aetna.com: # http://www.aetna.com/about/AetnaHealthFund/
aetna.com: # http://www.aetna.com/about/AetnaHealthFund/before_fund_deductible/
aetna.com: # http://www.aetna.com/about/MemberRights/
aetna.com: # http://www.aetna.com/about/pdf/draft_privacy_notice.pdf
aetna.com: # http://www.aetna.com/about/pdf/Aetna_MCP.pdf
aetna.com: # http://www.aetna.com/about/dolregs.html
aetna.com: #
aetna.com: # http://www.aetna.com/help/logo/index.html
aetna.com: # http://www.aetna.com/info/nextel.html
aetna.com: # http://www.aetna.com/info/citibusiness.html
aetna.com: #
aetna.com: # http://www.aetna.com/provider/eob/
aetna.com: #
aetna.com: #
aetna.com: # keep these allows out of all main catalogs
aetna.com: # Allow: /inyourstate/employer.html
aetna.com: # Allow: /inyourstate/member.html
aetna.com: # Allow: /inyourstate/producer.html
ucsd.edu: # Block all google tag manager tracking links
khabarfarsi.com: #
khabarfarsi.com: # robots.txt
khabarfarsi.com: #
khabarfarsi.com: # This file is to prevent the crawling and indexing of certain parts
khabarfarsi.com: # of your site by web crawlers and spiders run by sites like Yahoo!
khabarfarsi.com: # and Google. By telling these "robots" where not to go on your site,
khabarfarsi.com: # you save bandwidth and server resources.
khabarfarsi.com: #
khabarfarsi.com: # This file will be ignored unless it is at the root of your host:
khabarfarsi.com: # Used: http://example.com/robots.txt
khabarfarsi.com: # Ignored: http://example.com/site/robots.txt
khabarfarsi.com: #
khabarfarsi.com: # For more information about the robots.txt standard, see:
khabarfarsi.com: # http://www.robotstxt.org/wc/robots.html
khabarfarsi.com: #
khabarfarsi.com: # For syntax checking, see:
khabarfarsi.com: # http://www.sxw.org.uk/computing/robots/check.html
khabarfarsi.com: # Files
khabarfarsi.com: # Paths (clean URLs)
khabarfarsi.com: # Paths (no clean URLs)
pluto.tv: # Hidden channels
ballotpedia.org: #Disallow: /wiki/skins/
ballotpedia.org: #Disallow: /wiki/index.php/
ballotpedia.org: #Crawl-delay: 5
ballotpedia.org: #Request-rate: 1/5 # maximum rate is one page every 5 seconds
ballotpedia.org: #Visit-time: 0600-0845 # only visit between 06:00 and 08:45 UTC (GMT)
ballotpedia.org: #User-agent: Slurp
ballotpedia.org: #Disallow: /
ballotpedia.org: #--------------------------
google.co.ma: # AdsBot
google.co.ma: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
diariopanorama.com: # robots.txt for http://www.elliberal.com.ar
diariopanorama.com: # Last modified: 2014-12-30 T15:00:00 -0300
techacademy.jp: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
techacademy.jp: #
techacademy.jp: # To ban all spiders from the entire site uncomment the next two lines:
zoho.eu: # ------------------------------------------
zoho.eu: # ZOHO Corp. -- http://www.zoho.com
zoho.eu: # Robot Exclusion File -- robots.txt
zoho.eu: # Author: Zoho Creative
zoho.eu: # Last Updated: 24/12/2020
zoho.eu: # ------------------------------------------
zoho.eu: # unwanted list taken from zoho search list
zoho.eu: # unwanted list taken from zoho search list
zoho.eu: # unwanted list taken from zoho search for zoholics
zoho.eu: # unwanted list taken from zoho search for zoho
bu.edu: # Directions for robots. See this URL:
bu.edu: # http://info.webcrawler.com/mak/projects/robots/norobots.html
bu.edu: # for a description of the file format.
bu.edu: # 2008-08-21
bu.edu: #####
bu.edu: # Here is where we override the default action
bu.edu: ## Due to a bug in linklint, must first specify a disallow in order for
bu.edu: ## for all other directories to be allowed. Feel free to add other
bu.edu: ## disallows below the first disallow line.
bu.edu: #####
bu.edu: # Allow W3C link Validator for /dev/ and /nisdev/
bu.edu: # skipping other dynamic content or private areas
bu.edu: # 2004-08-27 gaudette
bu.edu: #
bu.edu: #####
bu.edu: # default action - currently it allows access to most of the site
bu.edu: # skipping dynamic content or private areas
bu.edu: #
bu.edu: # BUniverse exclusions added by kgrin on 2010-04-26
bu.edu: ###
bu.edu: # Emergency change 2012-02-14 bfenster, in response to incident
bu.edu: ###
bu.edu: # Emergency change 2014-11-17 bfenster, in response to incident
bu.edu: #####
bu.edu: # default action - currently it allows access to most of the site
bu.edu: # skipping dynamic content or private areas
bu.edu: #
bu.edu: # BUniverse exclusions added by kgrin on 2010-04-21
bu.edu: # academics/summer archive exclusions added by kgrin on 2011-07-17
local.com: #robots.txt for all our sites
kartable.fr: # www.robotstxt.org/
kartable.fr: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
kartable.fr: # https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
kartable.fr: # old site urls
bookdepository.com: # START: Temporarily SEO Experiment; Ticket: WEBOPS-1445
bookdepository.com: # END: Temporarily SEO Experiment
bookdepository.com: # Semrushbot does not have disallow rule types implemented - WEBOPS-2925
macrotrends.net: #
macrotrends.net: # robots.txt for https://www.macrotrends.net
macrotrends.net: #
macrotrends.net: # Allow MOZ to crawl the site
macrotrends.net: # advertising-related bots:
macrotrends.net: # Wikipedia work bots:
macrotrends.net: # Crawlers that are kind enough to obey, but which we'd rather not have
macrotrends.net: # unless they're feeding search engines.
macrotrends.net: # Some bots are known to be trouble, particularly those designed to copy
macrotrends.net: # entire sites. Please obey robots.txt.
macrotrends.net: # Misbehaving: requests much too fast:
macrotrends.net: #
macrotrends.net: # Sorry, wget in its recursive mode is a frequent problem.
macrotrends.net: # Please read the man page and use it properly; there is a
macrotrends.net: # --wait option you can use to set the delay between hits,
macrotrends.net: # for instance.
macrotrends.net: #
macrotrends.net: #
macrotrends.net: # The 'grub' distributed client has been *very* poorly behaved.
macrotrends.net: #
macrotrends.net: #
macrotrends.net: # Doesn't follow robots.txt anyway, but...
macrotrends.net: #
macrotrends.net: #
macrotrends.net: # Hits many times per second, not acceptable
macrotrends.net: # http://www.nameprotect.com/botinfo.html
macrotrends.net: # A capture bot, downloads gazillions of pages with no public benefit
macrotrends.net: # http://www.webreaper.net/
bell.ca: #Disallow: /Business/Mobility
bell.ca: #Disallow: /Entreprise/Mobilite
bell.ca: # Sitemap files
boohooman.com: # Pages
boohooman.com: # Product Filter #
boohooman.com: # Ordering & Product per page #
boohooman.com: # Number of product per page
boohooman.com: # Order By
boohooman.com: # Price
boohooman.com: # Faceted Navigation #
boohooman.com: # UK & ALL Search #
boohooman.com: # EU Search #
boohooman.com: # Search #
boohooman.com: # Ensure no Static Ressources is blocked #
boohooman.com: # Crawl Delay - 5 URL max per second
baseball-reference.com: # Disallow all robots on the sandbox for now.
baseball-reference.com: # Allow only specific directories
baseball-reference.com: # talk to me if you want us to unblock this. $$$$$
baseball-reference.com: # tris
canon.com: # robots.txt for http://www.canon.com/
vodafone.de: # robots.txt for www.vodafone.de 05.11.2020
vodafone.de: # Sitemap
siemens.com: #
gumtree.co.za: #Sitemaps
gumtree.co.za: #Sorting parameters
gumtree.co.za: #Other comments:
gumtree.co.za: #Sorting parameters
gumtree.co.za: #Other comments:
gumtree.co.za: #Sorting parameters
gumtree.co.za: #Other comments:
gumtree.co.za: #Sorting parameters
gumtree.co.za: #Other comments:
google.tn: # AdsBot
google.tn: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
iaai.com: #Sitemap: https://www.iaai.com/sitemap.xml
signifyd.com: # Permanent redirects - Added 10-17-18 - modified 08-18-19
signifyd.com: # Added 11-21-18
signifyd.com: # Added 10-18-18
signifyd.com: # Added 08-18-19
signifyd.com: # Added 09-06-19
signifyd.com: # Added 11-16-19
signifyd.com: # Added 12-5-19
signifyd.com: # Sitemaps
shangxueba.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
shangxueba.com: #content{margin:0 0 0 2%;position:relative;}
skyeng.ru: #Sitemap: https://skyeng.ru/sitemap/sitemap-videos.xml
skyeng.ru: #Sitemap: https://skyeng.ru/sitemap/sitemap-videos.xml
skyeng.ru: #Sitemap: https://skyeng.ru/sitemap/sitemap-videos-for-yandex.xml
fontspace.com: # Sitemap
discover.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
discover.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
discover.wordpress.com: # This file was generated on Wed, 24 Feb 2021 20:13:01 +0000
usaddress.com: # robots.txt - 15/12/2016
eventbrite.co.uk: # http://www.google.co.uk/adsbot.html - AdsBot ignores * wildcard
dicionarioinformal.com.br: # fill the form contact on dicionarioinformal.com.br/contato.php for constructive criticism
surfline.com: # Robots.txt file for https://www.surfline.com
surfline.com: # Wikipedia work bots:
blick.ch: #Special Areas of the Page
blick.ch: #Special parameters
blick.ch: #Special File Endings:
blick.ch: #Bots which make unnecessary 10% of our (non search) bot traffic
blick.ch: #Sitemap URLs
codal.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
codal.ir: #content{margin:0 0 0 2%;position:relative;}
club-k.net: # If the Joomla site is installed within a folder
club-k.net: # eg www.example.com/joomla/ then the robots.txt file
club-k.net: # MUST be moved to the site root
club-k.net: # eg www.example.com/robots.txt
club-k.net: # AND the joomla folder name MUST be prefixed to all of the
club-k.net: # paths.
club-k.net: # eg the Disallow rule for the /administrator/ folder MUST
club-k.net: # be changed to read
club-k.net: # Disallow: /joomla/administrator/
club-k.net: #
club-k.net: # For more information about the robots.txt standard, see:
club-k.net: # http://www.robotstxt.org/orig.html
club-k.net: #
club-k.net: # For syntax checking, see:
club-k.net: # http://tool.motoricerca.info/robots-checker.phtml
google.com.uy: # AdsBot
google.com.uy: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
sciencemag.org: #
sciencemag.org: # robots.txt
sciencemag.org: #
sciencemag.org: # This file is to prevent the crawling and indexing of certain parts
sciencemag.org: # of your site by web crawlers and spiders run by sites like Yahoo!
sciencemag.org: # and Google. By telling these "robots" where not to go on your site,
sciencemag.org: # you save bandwidth and server resources.
sciencemag.org: #
sciencemag.org: # This file will be ignored unless it is at the root of your host:
sciencemag.org: # Used: http://example.com/robots.txt
sciencemag.org: # Ignored: http://example.com/site/robots.txt
sciencemag.org: #
sciencemag.org: # For more information about the robots.txt standard, see:
sciencemag.org: # http://www.robotstxt.org/robotstxt.html
sciencemag.org: # CSS, JS, Images
sciencemag.org: # Directories
sciencemag.org: # Files
sciencemag.org: # Paths (clean URLs)
sciencemag.org: # Paths (no clean URLs)
medicinenet.com: #
medicinenet.com: # robots.txt for MedicineNet, Inc. Properties
medicinenet.com: #
society6.com: # Robots.txt file for https://society6.com
society6.com: # November 14, 2017
comcast.com: # Comcast
comcast.com: # robots.txt for www.comcast.com
comcast.com: # Modified on 1/25/17
ipaddress.com: #Disallow: /jstream/
ipaddress.com: #Disallow: /vote/
cuelinks.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
cuelinks.com: #
cuelinks.com: # To ban all spiders from the entire site uncomment the next two lines:
airbnb.co.in: # ///////
airbnb.co.in: # // //
airbnb.co.in: # // //
airbnb.co.in: # // // /// /// ///
airbnb.co.in: # // // /// ///
airbnb.co.in: # // /// // //// /// /// /// //// /// //// /// //// /// ////
airbnb.co.in: # // /// /// // ////////// /// ////////// /////////// ////////// ///////////
airbnb.co.in: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.co.in: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.co.in: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.co.in: # // // // // ////////// /// /// ////////// /// /// //////////
airbnb.co.in: # // ///// //
airbnb.co.in: # // ///// //
airbnb.co.in: # // /// /// //
airbnb.co.in: # ////// //////
airbnb.co.in: #
airbnb.co.in: #
airbnb.co.in: # We thought you'd never make it!
airbnb.co.in: # We hope you feel right at home in this file...unless you're a disallowed subfolder.
airbnb.co.in: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering
airbnb.co.in: # There's even a bring your robot to work day.
starbucks.com: # Slow an overly aggressive MJ12bot from the UK
faz.net: # robots.txt updated 2018-12-13
goethe.de: #robots.txt for http://www.goethe.de/
virginmedia.com: #This message has been scanned for viruses
maisonmargiela.com: # Disallow tricombot.
cjol.com: # robots.txt for http://cjol.com/
olx.co.id: #Base Filters
olx.co.id: #Cars Filters
olx.co.id: #RE Filters
olx.co.id: # Generated on 2019-12-12T23:22:18.976Z
tripadvisor.com.br: # Hi there,
tripadvisor.com.br: #
tripadvisor.com.br: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
tripadvisor.com.br: #
tripadvisor.com.br: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet?
tripadvisor.com.br: #
tripadvisor.com.br: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team
tripadvisor.com.br: #
tripadvisor.com.br: # Email seoRockstar@tripadvisor.com
tripadvisor.com.br: #
tripadvisor.com.br: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo
tripadvisor.com.br: #
tripadvisor.com.br: #
ucas.com: #
ucas.com: # robots.txt
ucas.com: #
ucas.com: # This file is to prevent the crawling and indexing of certain parts
ucas.com: # of your site by web crawlers and spiders run by sites like Yahoo!
ucas.com: # and Google. By telling these "robots" where not to go on your site,
ucas.com: # you save bandwidth and server resources.
ucas.com: #
ucas.com: # This file will be ignored unless it is at the root of your host:
ucas.com: # Used: http://example.com/robots.txt
ucas.com: # Ignored: http://example.com/site/robots.txt
ucas.com: #
ucas.com: # For more information about the robots.txt standard, see:
ucas.com: # http://www.robotstxt.org/robotstxt.html
ucas.com: # CSS, JS, Images
ucas.com: # Directories
ucas.com: # Files
ucas.com: # Paths (clean URLs)
ucas.com: # Paths (no clean URLs)
nextdoor.com: # Twitter specifies this format here https://dev.twitter.com/cards/getting-started#crawling
lexpress.mu: #
lexpress.mu: # robots.txt
lexpress.mu: #
lexpress.mu: # This file is to prevent the crawling and indexing of certain parts
lexpress.mu: # of your site by web crawlers and spiders run by sites like Yahoo!
lexpress.mu: # and Google. By telling these "robots" where not to go on your site,
lexpress.mu: # you save bandwidth and server resources.
lexpress.mu: #
lexpress.mu: # This file will be ignored unless it is at the root of your host:
lexpress.mu: # Used: http://example.com/robots.txt
lexpress.mu: # Ignored: http://example.com/site/robots.txt
lexpress.mu: #
lexpress.mu: # For more information about the robots.txt standard, see:
lexpress.mu: # http://www.robotstxt.org/robotstxt.html
lexpress.mu: # CSS, JS, Images
lexpress.mu: # Directories
lexpress.mu: # Files
lexpress.mu: # Paths (clean URLs)
lexpress.mu: # Paths (no clean URLs)
adobelogin.com: # The use of robots or other automated means to access the Adobe site
adobelogin.com: # without the express permission of Adobe is strictly prohibited.
adobelogin.com: # Notwithstanding the foregoing, Adobe may permit automated access to
adobelogin.com: # access certain Adobe pages but solely for the limited purpose of
adobelogin.com: # including content in publicly available search engines. Any other
adobelogin.com: # use of robots or failure to obey the robots exclusion standards set
adobelogin.com: # forth at http://www.robotstxt.org/ is strictly prohibited.
adobelogin.com: # Details about Googlebot available at: http://www.google.com/bot.html
adobelogin.com: # The Google search engine can see everything
adobelogin.com: # The Omniture search engine can see everything
adobelogin.com: # XML sitemaps updates per SH10272020
adobelogin.com: # XML sitemaps updates per BW10202020
adobelogin.com: # Hreflang sitemap
adobelogin.com: # Hreflang sitemap updates per SH10122020
adobelogin.com: # PSFl individual sitemaps HS07082020
agora.io: #mega-menu-wrap-primary #mega-menu-primary &gt; li.mega-menu-item.flrigt.btn &gt; a:hover {
agora.io: #mega-menu-item-23839.mobile-btn a.mega-menu-link {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu &gt; li.mega-mobile-btn {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu &gt; .mega-menu-item.mega-mobile-btn {
agora.io: #mega-menu-wrap-primary #mega-menu-primary &gt; li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link sup {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu &gt; li.mega-menu-item.mega-toggle-on &gt; a.mega-menu-link,
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu &gt; li.mega-menu-item &gt; a.mega-menu-link:hover {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-megamenu &gt; ul.mega-sub-menu &gt; li.mega-menu-row &gt; ul.mega-sub-menu &gt; li.mega-menu-columns-5-of-12,
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-megamenu &gt; ul.mega-sub-menu &gt; li.mega-menu-row &gt; ul.mega-sub-menu &gt; li.mega-menu-columns-6-of-12,
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-megamenu &gt; ul.mega-sub-menu &gt; li.mega-menu-row &gt; ul.mega-sub-menu &gt; li.mega-menu-columns-7-of-12 {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu &gt; li.mega-menu-item &gt; a.mega-menu-link {
agora.io: #menu-1 {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-item.advantage-button {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-item.advantage-button > a.mega-menu-link {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-item.advantage-button > a.mega-menu-link:hover {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item.advantage-button > a.mega-menu-link {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item.advantage-button > a.mega-menu-link:hover {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.flrigt.btn > a:hover {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-mobile-btn {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a.mega-menu-link {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li a
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu .mobile-btn > a.mega-menu-link
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.flrigt {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.flrigt.btn > a {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li > a {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu:after { content: "";
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item > a.mega-menu-link:hover {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link:hover, #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu li.mega-menu-item a.mega-menu-link:focus {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout.mega-toggle-on ul.mega-sub-menu {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-flyout ul.mega-sub-menu {
agora.io: #footer-widgets h3 {
agora.io: #footer-widgets a {
agora.io: #footer-widgets ul {
agora.io: #column-one a {
agora.io: #column-one {
agora.io: #footer-widgets .elementor-image {
agora.io: #footer-widgets .elementor-image img {
agora.io: #social-icon-container {
agora.io: #mega-menu-wrap-primary #mega-menu-primary {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu a.mega-menu-link .mega-description-group .mega-menu-description {
agora.io: #mega-menu-item-22860.mobile-btn a.mega-menu-link {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-megamenu > ul.mega-sub-menu > li.mega-menu-item, #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-megamenu > ul.mega-sub-menu li.mega-menu-column > ul.mega-sub-menu > li.mega-menu-item {
agora.io: #mega-menu-wrap-primary #mega-menu-primary[data-effect="fade_up"] li.mega-menu-item.mega-menu-megamenu > ul.mega-sub-menu, #mega-menu-wrap-primary #mega-menu-primary[data-effect="fade_up"] li.mega-menu-item.mega-menu-flyout ul.mega-sub-menu {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li#mega-menu-item-5732 ul#menu-customer-stories-mega-menu-1 li {
agora.io: #mega-menu-primary > li.mega-menu-item > a.mega-menu-link,
agora.io: #mega-menu-primary > li.nav-btn-signup > a.mega-menu-link:hover {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.nav-btn-signup > a.mega-menu-link {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item.nav-btn-signup > a.mega-menu-link:hover {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item > a.mega-menu-link {
agora.io: #mega-menu-wrap-primary #mega-menu-primary > li.mega-menu-item > a.mega-menu-link:hover {
agora.io: #team-thumb img {
agora.io: #team-thumb img.active {
agora.io: #heading-bluebox:after {
agora.io: #heading-graybox:before {
agora.io: #heading-graybox:after {
agora.io: #game-box-shadow {
agora.io: #client-slider.tabs > div {
agora.io: #client-slider.tabs > div span {
agora.io: #client-slider.tabs ul.horizontal {
agora.io: #client-slider.tabs li {
agora.io: #client-slider.tabs li img {
agora.io: #client-slider.tabs a {
agora.io: #client-slider.tabs li:hover,
agora.io: #client-slider.tabs li.active {
agora.io: #client-slider.tabs .prev,
agora.io: #client-slider.tabs .next {
agora.io: #client-slider.tabs .next {
agora.io: #client-slider.tabs .prev:focus,
agora.io: #client-slider.tabs .next:focus {
agora.io: #benefit-center-image {
agora.io: #benefit-featured-one:after,
agora.io: #benefit-featured-two:after,
agora.io: #benefit-featured-three:after,
agora.io: #benefit-featured-four:after {
agora.io: #benefit-featured-one:after {
agora.io: #benefit-featured-two:after {
agora.io: #benefit-featured-three:after {
agora.io: #benefit-featured-four:after {
agora.io: #original-audio,
agora.io: #agora-audio {
agora.io: #swiper-slide01 .button-primary {
agora.io: #scrollTop.show {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.mega-toggle-on > a + ul.mega-sub-menu {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a:hover {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item.mega-menu-megamenu ul.mega-sub-menu ul.mega-sub-menu {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.mega-toggle-on > a.mega-menu-link {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.mega-toggle-on > a.mega-menu-link span:after {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu li.mega-menu-item-has-children > a.mega-menu-link > span.mega-indicator{
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.nav-btn-sales a {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item.nav-btn-signup a {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html i.menu-featured-icon {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html .menu-featured-content {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html h3 {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a.cst-html p {
agora.io: #mega-menu-wrap-mobile_menu #mega-menu-mobile_menu > li.mega-menu-item > a + ul.mega-sub-menu li a {
archives-ouvertes.fr: # HAL robots.txt
archives-ouvertes.fr: # If you want to download lots of metadata, please use our API at https://api.archives-ouvertes.fr/
archives-ouvertes.fr: # The API is far more efficient for metadata harvesting
archives-ouvertes.fr: # To learn more, please contact hal-support@ccsd.cnrs.fr
archives-ouvertes.fr: # Sitemap
spectrum.com: # Allowed Paths
spectrum.com: # Excluded Pages
spectrum.com: # Excluded Tags
spectrum.com: # Excluded Paths
turbosquid.com: #
turbosquid.com: # robots.txt
turbosquid.com: #
turbosquid.com: # Excludes
turbosquid.com: # XML Sitemap
trafficfactory.biz: #raven-field-group-e22bac2, #raven-field-group-d8b42b7{
trafficfactory.biz: #raven-field-group-e22bac2, #raven-field-group-d8b42b7{
patagonia.com: # patagonia.com robots.txt
patagonia.com: #
patagonia.com: #
patagonia.com: #
patagonia.com: #
patagonia.com: # __ _______ .___________. ______ __ __ .______ .______ ______ .______ ______ .___________. _______.
patagonia.com: # | | | ____|| | / __ \ | | | | | _ \ | _ \ / __ \ | _ \ / __ \ | | / |
patagonia.com: # | | | |__ `---| |----` | | | | | | | | | |_) | | |_) | | | | | | |_) | | | | | `---| |----` | (----`
patagonia.com: # | | | __| | | | | | | | | | | | / | / | | | | | _ < | | | | | | \ \
patagonia.com: # | `----.| |____ | | | `--' | | `--' | | |\ \----. | |\ \----.| `--' | | |_) | | `--' | | | .----) |
patagonia.com: # |_______||_______| |__| \______/ \______/ | _| `._____| | _| `._____| \______/ |______/ \______/ |__| |_______/
patagonia.com: # _______ ______ _______. __ __ .______ _______ __ .__ __. _______
patagonia.com: # / _____| / __ \ / || | | | | _ \ | ____|| | | \ | | / _____|
patagonia.com: # | | __ | | | | | (----`| | | | | |_) | | |__ | | | \| | | | __
patagonia.com: # | | |_ | | | | | \ \ | | | | | / | __| | | | . ` | | | |_ |
patagonia.com: # | |__| | | `--' | .----) | | `--' | | |\ \----.| | | | | |\ | | |__| |
patagonia.com: # \______| \______/ |_______/ \______/ | _| `._____||__| |__| |__| \__| \______|
patagonia.com: #
patagonia.com: #
patagonia.com: #
patagonia.com: #
patagonia.com: #
slideteam.net: ## robots.txt for Magento Community and Enterprise
slideteam.net: ## GENERAL SETTINGS
slideteam.net: ## Enable robots.txt rules for selected crawlers
slideteam.net: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server.
slideteam.net: ## Set a custom crawl rate if you're experiencing traffic problems with your server.
slideteam.net: # Crawl-delay: 30
slideteam.net: ## Magento sitemap: uncomment and replace the URL to your Magento sitemap file
slideteam.net: # Sitemap: http://www.example.com/sitemap/sitemap.xml
slideteam.net: ## DEVELOPMENT RELATED SETTINGS
slideteam.net: ## Do not crawl development files and folders: CVS, svn directories and dump files
slideteam.net: ## GENERAL MAGENTO SETTINGS
slideteam.net: ## Do not crawl Magento admin page
slideteam.net: ## Do not crawl common Magento technical folders
slideteam.net: ## Do not crawl common Magento files
slideteam.net: ## MAGENTO2 disallowed URLs Begins
slideteam.net: ## MAGENTO2 disallowed URLs Ends
slideteam.net: ##SLI Configuration Begins
slideteam.net: ##SLI Configuration Ends
slideteam.net: ## MAGENTO SEO IMPROVEMENTS
slideteam.net: ## Do not crawl sub category pages that are sorted or filtered.
slideteam.net: #Disallow: /*?dir*
slideteam.net: #Disallow: /*?dir=desc
slideteam.net: #Disallow: /*?dir=asc
slideteam.net: #Disallow: /*?limit=all
slideteam.net: #Disallow: /*?mode*
slideteam.net: ## Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs.
slideteam.net: ## Disallow: /index.php/
slideteam.net: ## Do not crawl links with session IDs
slideteam.net: ## Do not crawl checkout and user account pages
slideteam.net: #Disallow: /onestepcheckout/
slideteam.net: #Disallow: /customer/
slideteam.net: #Disallow: /customer/account/
slideteam.net: #Disallow: /customer/account/login/
slideteam.net: ## Do not crawl seach pages and not-SEO optimized catalog links
slideteam.net: ## SERVER SETTINGS
slideteam.net: ## Do not crawl common server technical folders and files
slideteam.net: ## IMAGE CRAWLERS SETTINGS
slideteam.net: ## Extra: Uncomment if you do not wish Google and Bing to index your images
slideteam.net: # User-agent: Googlebot-Image
slideteam.net: # Disallow: /
slideteam.net: # User-agent: msnbot-media
slideteam.net: # Disallow: /
slideteam.net: ##FOR OTHER CRAWLERS DISALLOW ALL
digicenter.pt: # sitemap generated by the Jumpseller ecommerce platform
colorado.gov: # robots.txt for http://www.colorado.gov/
nbc.com: #
nbc.com: # robots.txt
nbc.com: #
nbc.com: # This file is to prevent the crawling and indexing of certain parts
nbc.com: # of your site by web crawlers and spiders run by sites like Yahoo!
nbc.com: # and Google. By telling these "robots" where not to go on your site,
nbc.com: # you save bandwidth and server resources.
nbc.com: #
nbc.com: # This file will be ignored unless it is at the root of your host:
nbc.com: # Used: http://example.com/robots.txt
nbc.com: # Ignored: http://example.com/site/robots.txt
nbc.com: #
nbc.com: # For more information about the robots.txt standard, see:
nbc.com: # http://www.robotstxt.org/robotstxt.html
nbc.com: # Directories
nbc.com: # Files
nbc.com: # Paths (clean URLs)
nbc.com: # Paths (no clean URLs)
nbc.com: # Disallow users paths
nbc.com: # USA - Shows
nbc.com: # USA - Movies
nbc.com: # Sitemap details.
nbc.com: # Sitemap for the Google PlayGuide.
anz.com.au: # /robots.txt for http://www.anz.com/
anz.com.au: # comments to InternetAdministration@anz.com
anz.com.au: #
collegedekho.com: # robots.txt for https://www.collegedekho.com/
collegedekho.com: #Study Abroad Url Start
collegedekho.com: #Study Abroad Url End
collegedekho.com: #User-agent: Screaming Frog SEO Spider
collegedekho.com: #Disallow: /
dealmoon.com: # disable 2019.02.27
dealmoon.com: # Disallow: /gift/
dealmoon.com: #sitemap
bitbucket.org: # Google Site Link Exclusions
37signals.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
hhs.gov: #
hhs.gov: # robots.txt
hhs.gov: #
hhs.gov: # This file is to prevent the crawling and indexing of certain parts
hhs.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
hhs.gov: # and Google. By telling these "robots" where not to go on your site,
hhs.gov: # you save bandwidth and server resources.
hhs.gov: #
hhs.gov: # This file will be ignored unless it is at the root of your host:
hhs.gov: # Used: http://example.com/robots.txt
hhs.gov: # Ignored: http://example.com/site/robots.txt
hhs.gov: #
hhs.gov: # For more information about the robots.txt standard, see:
hhs.gov: # http://www.robotstxt.org/robotstxt.html
hhs.gov: # CSS, JS, Images
hhs.gov: # Directories
hhs.gov: # Files
hhs.gov: # Paths (clean URLs)
hhs.gov: # Paths (no clean URLs)
helsinki.fi: #
helsinki.fi: # robots.txt
helsinki.fi: #
helsinki.fi: # This file is to prevent the crawling and indexing of certain parts
helsinki.fi: # of your site by web crawlers and spiders run by sites like Yahoo!
helsinki.fi: # and Google. By telling these "robots" where not to go on your site,
helsinki.fi: # you save bandwidth and server resources.
helsinki.fi: #
helsinki.fi: # This file will be ignored unless it is at the root of your host:
helsinki.fi: # Used: http://example.com/robots.txt
helsinki.fi: # Ignored: http://example.com/site/robots.txt
helsinki.fi: #
helsinki.fi: # For more information about the robots.txt standard, see:
helsinki.fi: # http://www.robotstxt.org/robotstxt.html
helsinki.fi: #
helsinki.fi: # For syntax checking, see:
helsinki.fi: # http://www.frobee.com/robots-txt-check
helsinki.fi: # Directories
helsinki.fi: # Files
helsinki.fi: # Paths (clean URLs)
helsinki.fi: # Paths (no clean URLs)
helsinki.fi: #vanha www.helsinki.fi
defimedia.info: #
defimedia.info: # robots.txt
defimedia.info: #
defimedia.info: # This file is to prevent the crawling and indexing of certain parts
defimedia.info: # of your site by web crawlers and spiders run by sites like Yahoo!
defimedia.info: # and Google. By telling these "robots" where not to go on your site,
defimedia.info: # you save bandwidth and server resources.
defimedia.info: #
defimedia.info: # This file will be ignored unless it is at the root of your host:
defimedia.info: # Used: http://example.com/robots.txt
defimedia.info: # Ignored: http://example.com/site/robots.txt
defimedia.info: #
defimedia.info: # For more information about the robots.txt standard, see:
defimedia.info: # http://www.robotstxt.org/robotstxt.html
defimedia.info: # CSS, JS, Images
defimedia.info: # Directories
defimedia.info: # Files
defimedia.info: # Paths (clean URLs)
defimedia.info: # Paths (no clean URLs)
marktplaats.nl: # Here is our sitemap (this line is independent of UA blocks, per the spec)
marktplaats.nl: #Please keep blocking of all URLs in place for at least 2 years after removing a specific module
marktplaats.nl: #internal URLs
marktplaats.nl: #SOI subpage
marktplaats.nl: # login, confirm and forgot password pages
marktplaats.nl: # mymp pages
marktplaats.nl: # ASQ pages
marktplaats.nl: # SYI Pages
marktplaats.nl: # Flagging/tipping ads
marktplaats.nl: # bidding on ads
marktplaats.nl: # external url redirects
marktplaats.nl: # google analytics
marktplaats.nl: #korean spam
marktplaats.nl: # widgets
marktplaats.nl: # prevent unnecessary crawling
marktplaats.nl: # New vip
marktplaats.nl: # Block VIPs with parameters
marktplaats.nl: #block homepage feeds
bit.ly: # Welcome to Bitly =)
bit.ly: # robots welcome;
bit.ly: # API documentation can be found at https://dev.bitly.com/
timeout.com: # robots.txt file for Time Out .com
timeout.com: # updated 14th May 2020
db.com: # Favicon /docroot/favicon.ico
vidnami.com: # Google Image
vidnami.com: # Google AdSense
vidnami.com: # digg mirror
vidnami.com: # global
asianetnews.com: # Robots Text for AN production portals
zaubacorp.com: #
zaubacorp.com: # robots.txt
zaubacorp.com: #
zaubacorp.com: # This file is to prevent the crawling and indexing of certain parts
zaubacorp.com: # of your site by web crawlers and spiders run by sites like Yahoo!
zaubacorp.com: # and Google. By telling these "robots" where not to go on your site,
zaubacorp.com: # you save bandwidth and server resources.
zaubacorp.com: #
zaubacorp.com: # This file will be ignored unless it is at the root of your host:
zaubacorp.com: # Used: http://example.com/robots.txt
zaubacorp.com: # Ignored: http://example.com/site/robots.txt
zaubacorp.com: #
zaubacorp.com: # For more information about the robots.txt standard, see:
zaubacorp.com: # http://www.robotstxt.org/robotstxt.html
zaubacorp.com: # CSS, JS, Images
zaubacorp.com: # Directories
zaubacorp.com: # Files
zaubacorp.com: # Paths (clean URLs)
zaubacorp.com: # Paths (no clean URLs)
abril.com.br: # Sitemap archive
ithome.com.tw: #div_cube3d_pic { width: 100% !important}
ithome.com.tw: #div_cube3d_exp {left:0 !important; position: relative !important; margin: 0 auto}
ithome.com.tw: #headerbar .container-fluid {max-width: 1350px !important}
ithome.com.tw: #block-block-41 {background-color:#f7f7f7}
ithome.com.tw: #block-block-41 .wrap {text-align: center; margin: 0 auto 14px;}
ithome.com.tw: #hpad930up {margin-bottom:15px !important}
rfi.fr: # France Medias Monde [2020-10-21] - francemediasmonde.com
rfi.fr: ## RFI - rfi.fr - HTTPS
rfi.fr: ### Sitemaps
rfi.fr: ### Sitemaps News
therealreal.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file.
therealreal.com: #
zameen.com: # Disallow crawling relative links for the most populous cities in Pakistan
zameen.com: # Disallow crawling relative links for other locations
zameen.com: # Spiders added on date 2014-06-17
zameen.com: # Baidu
zameen.com: # http://www.baidu.com/search/spider.html
zameen.com: # EasouSpider
zameen.com: # http://www.easou.com/search/spider.html
zameen.com: # Exabot
zameen.com: # http://www.exabot.com/go/robot
banimode.com: # robots.txt automaticaly generated by PrestaShop e-commerce open-source solution
banimode.com: # http://www.prestashop.com - http://www.prestashop.com/forums
banimode.com: # This file is to prevent the crawling and indexing of certain parts
banimode.com: # of your site by web crawlers and spiders run by sites like Yahoo!
banimode.com: # and Google. By telling these "robots" where not to go on your site,
banimode.com: # you save bandwidth and server resources.
banimode.com: # For more information about the robots.txt standard, see:
banimode.com: # http://www.robotstxt.org/robotstxt.html
banimode.com: # Allow Directives
banimode.com: # Private pages
banimode.com: # Directories
banimode.com: # Files
banimode.com: # Persian
banimode.com: # Disallow: /*discount
banimode.com: # Disallow query string
banimode.com: #Disallow: /*?*
banimode.com: # Disallow query string
banimode.com: #Disallow: /*?sort*
banimode.com: #SiteMap
job5156.com: # robots.txt for https://www.job5156.com/
motherless.com: # All other robots will spider the domain
motherless.com: # Don't let spiders report stuff
mediamarkt.es: # Disallow: /MultiChannelMA
mediamarkt.es: # Disallow: /MultiChannelLocal
mediamarkt.es: # Disallow: /MultiChannelSearch
mediamarkt.es: # Disallow: /MultiChannelRedirect
mediamarkt.es: # Disallow: /MultiChannelMARegister
mediamarkt.es: # Disallow: /MultiChannelMyStoreEvents
mediamarkt.es: # Disallow: /MultiChannelAutoCompletion
mediamarkt.es: # Disallow: /MultiChannelCatalogEntryPrint
mediamarkt.es: # Disallow: /MultiChannelPrintCompProducts
mediamarkt.es: # Disallow: /MultiChannelMyStoreAdvertising
mediamarkt.es: # Disallow: /MultiChannelMyStoreSpecialitems
mediamarkt.es: # Disallow: /MultiMultiChannelMAWishlistPrint
mediamarkt.es: # Disallow: /mediapedia
mediamarkt.es: # Disallow: /error404.html
mediamarkt.es: # Disallow: /error500.html
mediamarkt.es: # Disallow: /*storeId=*
mediamarkt.es: # Allow: /*storeId=19601*
mensjournal.com: # Sitemap archive
justanswer.com: # Directories
justanswer.com: # Files
justanswer.com: # Paths (clean URLs)
justanswer.com: # Paths (no clean URLs)
justanswer.com: # Secure and Error pages
justanswer.com: # 404s
justanswer.com: # Sitemaps
justanswer.com: # Exceptions
sendpulse.com: # If the Joomla site is installed within a folder such as at
sendpulse.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be
sendpulse.com: # moved to the site root at e.g. www.example.com/robots.txt
sendpulse.com: # AND the joomla folder name MUST be prefixed to the disallowed
sendpulse.com: # path, e.g. the Disallow rule for the /administrator/ folder
sendpulse.com: # MUST be changed to read Disallow: /joomla/administrator/
sendpulse.com: #
sendpulse.com: # For more information about the robots.txt standard, see:
sendpulse.com: # http://www.robotstxt.org/orig.html
sendpulse.com: #
sendpulse.com: # For syntax checking, see:
sendpulse.com: # http://tool.motoricerca.info/robots-checker.phtml
scribbr.com: # Google Image
scribbr.com: # Google AdSense
scribbr.com: # Internet Archiver Wayback Machine
scribbr.com: # digg mirror
tesla.com: #
tesla.com: # robots.txt
tesla.com: #
tesla.com: # This file is to prevent the crawling and indexing of certain parts
tesla.com: # of your site by web crawlers and spiders run by sites like Yahoo!
tesla.com: # and Google. By telling these "robots" where not to go on your site,
tesla.com: # you save bandwidth and server resources.
tesla.com: #
tesla.com: # This file will be ignored unless it is at the root of your host:
tesla.com: # Used: http://example.com/robots.txt
tesla.com: # Ignored: http://example.com/site/robots.txt
tesla.com: #
tesla.com: # For more information about the robots.txt standard, see:
tesla.com: # http://www.robotstxt.org/robotstxt.html
tesla.com: # CSS, JS, Images
tesla.com: # Directories
tesla.com: # Files
tesla.com: # Paths (clean URLs)
tesla.com: # Paths (no clean URLs)
tesla.com: ##############################
tesla.com: # START TESLA CONTENT.
tesla.com: ##############################
tesla.com: # Tesla content landing pages.
tesla.com: ##############################
tesla.com: # STOP TESLA CONTENT.
tesla.com: ##############################
addic7ed.com: # Robots file for Addic7ed.com.
mercadolibre.com.uy: #siteId: MLU
mercadolibre.com.uy: #country: uruguay
mercadolibre.com.uy: ##Block - Referidos
mercadolibre.com.uy: ##Block - siteinfo urls
mercadolibre.com.uy: ##Block - Cart
mercadolibre.com.uy: ##Block Checkout
mercadolibre.com.uy: ##Block - User Logged
mercadolibre.com.uy: #Shipping selector
mercadolibre.com.uy: ##Block - last search
mercadolibre.com.uy: ## Block - Profile - By Id
mercadolibre.com.uy: ## Block - Profile - By Id and role (old version)
mercadolibre.com.uy: ## Block - Profile - Leg. Req.
mercadolibre.com.uy: ##Block - noindex
mercadolibre.com.uy: # Mercado-Puntos
mercadolibre.com.uy: # Viejo mundo
mercadolibre.com.uy: ##Block recommendations listing
delfi.ee: # robots.txt for http://www.delfi.ee/
delfi.ee: #
delfi.ee: # http://www.robotstxt.org/wc/norobots-rfc.txt
delfi.ee: # $Revision: 1.19 $ $Date: 2015/05/20 14:31:07 $
nest.com: # Realm nest.com
google.com.py: # AdsBot
google.com.py: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
wikibooks.org: #
wikibooks.org: # Please note: There are a lot of pages on this site, and there are
wikibooks.org: # some misbehaved spiders out there that go _way_ too fast. If you're
wikibooks.org: # irresponsible, your access to the site may be blocked.
wikibooks.org: #
wikibooks.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN
wikibooks.org: # and ignoring 429 ratelimit responses, claims to respect robots:
wikibooks.org: # http://mj12bot.com/
wikibooks.org: # advertising-related bots:
wikibooks.org: # Wikipedia work bots:
wikibooks.org: # Crawlers that are kind enough to obey, but which we'd rather not have
wikibooks.org: # unless they're feeding search engines.
wikibooks.org: # Some bots are known to be trouble, particularly those designed to copy
wikibooks.org: # entire sites. Please obey robots.txt.
wikibooks.org: # Misbehaving: requests much too fast:
wikibooks.org: #
wikibooks.org: # Sorry, wget in its recursive mode is a frequent problem.
wikibooks.org: # Please read the man page and use it properly; there is a
wikibooks.org: # --wait option you can use to set the delay between hits,
wikibooks.org: # for instance.
wikibooks.org: #
wikibooks.org: #
wikibooks.org: # The 'grub' distributed client has been *very* poorly behaved.
wikibooks.org: #
wikibooks.org: #
wikibooks.org: # Doesn't follow robots.txt anyway, but...
wikibooks.org: #
wikibooks.org: #
wikibooks.org: # Hits many times per second, not acceptable
wikibooks.org: # http://www.nameprotect.com/botinfo.html
wikibooks.org: # A capture bot, downloads gazillions of pages with no public benefit
wikibooks.org: # http://www.webreaper.net/
wikibooks.org: #
wikibooks.org: # Friendly, low-speed bots are welcome viewing article pages, but not
wikibooks.org: # dynamically-generated pages please.
wikibooks.org: #
wikibooks.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your
wikibooks.org: # bot supports such a thing using the 'Crawl-delay' or another
wikibooks.org: # instruction, please let us know.
wikibooks.org: #
wikibooks.org: # There is a special exception for API mobileview to allow dynamic
wikibooks.org: # mobile web & app views to load section content.
wikibooks.org: # These views aren't HTTP-cached but use parser cache aggressively
wikibooks.org: # and don't expose special: pages etc.
wikibooks.org: #
wikibooks.org: # Another exception is for REST API documentation, located at
wikibooks.org: # /api/rest_v1/?doc.
wikibooks.org: #
wikibooks.org: #
wikibooks.org: # ar:
wikibooks.org: #
wikibooks.org: # dewiki:
wikibooks.org: # T6937
wikibooks.org: # sensible deletion and meta user discussion pages:
wikibooks.org: # 4937#5
wikibooks.org: # T14111
wikibooks.org: # T15961
wikibooks.org: #
wikibooks.org: # enwiki:
wikibooks.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for
wikibooks.org: # their name. See T6776
wikibooks.org: # T15398
wikibooks.org: # T16075
wikibooks.org: # T13261
wikibooks.org: # T12288
wikibooks.org: # T16793
wikibooks.org: #
wikibooks.org: # eswiki:
wikibooks.org: # T8746
wikibooks.org: #
wikibooks.org: # fiwiki:
wikibooks.org: # T10695
wikibooks.org: #
wikibooks.org: # hewiki:
wikibooks.org: #T11517
wikibooks.org: #
wikibooks.org: # huwiki:
wikibooks.org: #
wikibooks.org: # itwiki:
wikibooks.org: # T7545
wikibooks.org: #
wikibooks.org: # jawiki
wikibooks.org: # T7239
wikibooks.org: # nowiki
wikibooks.org: # T13432
wikibooks.org: #
wikibooks.org: # plwiki
wikibooks.org: # T10067
wikibooks.org: #
wikibooks.org: # ptwiki:
wikibooks.org: # T7394
wikibooks.org: #
wikibooks.org: # rowiki:
wikibooks.org: # T14546
wikibooks.org: #
wikibooks.org: # ruwiki:
wikibooks.org: #
wikibooks.org: # svwiki:
wikibooks.org: # T12229
wikibooks.org: # T13291
wikibooks.org: #
wikibooks.org: # zhwiki:
wikibooks.org: # T7104
wikibooks.org: #
wikibooks.org: # sister projects
wikibooks.org: #
wikibooks.org: # enwikinews:
wikibooks.org: # T7340
wikibooks.org: #
wikibooks.org: # itwikinews
wikibooks.org: # T11138
wikibooks.org: #
wikibooks.org: # enwikiquote:
wikibooks.org: # T17095
wikibooks.org: #
wikibooks.org: # enwikibooks
wikibooks.org: #
wikibooks.org: # working...
wikibooks.org: #
wikibooks.org: #
wikibooks.org: #
wikibooks.org: #----------------------------------------------------------#
wikibooks.org: #
wikibooks.org: #
wikibooks.org: #
wikibooks.org: # robots.txt for http://en.wikibooks.org/
wikibooks.org: # Edit at http://en.wikibooks.org/w/index.php?title=MediaWiki:Robots.txt&action=edit
wikibooks.org: # Don't add newlines here. All rules set here are active for every user-agent.
wikibooks.org: #
wikibooks.org: # Please check any changes using a syntax validator such as http://tool.motoricerca.info/robots-checker.phtml
wikibooks.org: # Enter http://en.wikibooks.org/robots.txt as the URL to check.
wikibooks.org: #
wikibooks.org: # Don't index anything in the MediaWiki namespace
wikibooks.org: #
wikibooks.org: # Don't index anything in the Transwiki namespace
wikibooks.org: #
wikibooks.org: # Don't index discussions
wikibooks.org: #
wikibooks.org: #
wikibooks.org: # </source><!--leave this line alone-->
sportsmansoutdoorsuperstore.com: # Begin robots.txt file
sportsmansoutdoorsuperstore.com: # End robots.txt file
fritz.box: ## Disallows all robots
coolchasgamer.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
coolchasgamer.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
coolchasgamer.wordpress.com: # This file was generated on Wed, 09 Dec 2020 18:39:59 +0000
invertironline.com: # Alexa
invertironline.com: # Ask
invertironline.com: # Google
invertironline.com: # MSN
invertironline.com: # Yahoo!
invertironline.com: # Others
auto.ru: #part2
auto.ru: #part3
auto.ru: #after
auto.ru: #parts
auto.ru: #part2
auto.ru: #part3
auto.ru: #after
auto.ru: #parts
auto.ru: #part2
auto.ru: #part3
auto.ru: #after
auto.ru: #parts
customs.gov.az: # robots.txt
customs.gov.az: # DGK 1.0.0
pingidentity.com: # Updated 12.11.20 (www.pingidentity.com)
pingidentity.com: # For all bots
whoishostingthis.com: # This rule means it applies to all user-agents
whoishostingthis.com: # wordpress blog
whoishostingthis.com: # The Googlebot is the main search bot for google
whoishostingthis.com: # feed urls
whoishostingthis.com: # Disallow all files ending with these extensions
whoishostingthis.com: # Disallow Google from parsing indididual post feeds and trackbacks..
whoishostingthis.com: # Disallow all files with ? in url
whoishostingthis.com: #block access to internal search result pages
yandex.ua: # yandex.ua
bangbros.com: # www.robotstxt.org/
bangbros.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
banorte.com: #pulseChatButton{
banorte.com: #start-chat{
banorte.com: #meta-title .homepage_circles_container div{
yoomark.com: #
yoomark.com: # robots.txt
yoomark.com: #
yoomark.com: # This file is to prevent the crawling and indexing of certain parts
yoomark.com: # of your site by web crawlers and spiders run by sites like Yahoo!
yoomark.com: # and Google. By telling these "robots" where not to go on your site,
yoomark.com: # you save bandwidth and server resources.
yoomark.com: #
yoomark.com: # This file will be ignored unless it is at the root of your host:
yoomark.com: # Used: http://example.com/robots.txt
yoomark.com: # Ignored: http://example.com/site/robots.txt
yoomark.com: #
yoomark.com: # For more information about the robots.txt standard, see:
yoomark.com: # http://www.robotstxt.org/robotstxt.html
yoomark.com: # CSS, JS, Images
yoomark.com: # Directories
yoomark.com: # Files
yoomark.com: # Paths (clean URLs)
yoomark.com: # Paths (no clean URLs)
nbps.org: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
nbps.org: #content{margin:0 0 0 2%;position:relative;}
musescore.com: #
musescore.com: # robots.txt
musescore.com: #
musescore.com: # This file is to prevent the crawling and indexing of certain parts
musescore.com: # of your site by web crawlers and spiders run by sites like Yahoo!
musescore.com: # and Google. By telling these "robots" where not to go on your site,
musescore.com: # you save bandwidth and server resources.
musescore.com: #
musescore.com: # This file will be ignored unless it is at the root of your host:
musescore.com: # Used: http://example.com/robots.txt
musescore.com: # Ignored: http://example.com/site/robots.txt
musescore.com: #
musescore.com: # For more information about the robots.txt standard, see:
musescore.com: # http://www.robotstxt.org/robotstxt.html
musescore.com: # CSS, JS, Images
musescore.com: # Directories
musescore.com: # Files
musescore.com: # Paths (clean URLs)
musescore.com: # Paths (no clean URLs)
musescore.com: # Musescore.com
dailystar.co.uk: #Agent Specific Disallowed Sections
kensaq.com: ## Default robots.txt
mentimeter.com: # If you are not a robot, please stop reading
mentimeter.com: # If you are human, please go to //humans.txt
imprint5.com: #****************************************************************************
imprint5.com: # robots.txt
imprint5.com: # : Robots, spiders, and search engines use this file to detmine which
imprint5.com: # content they should *not* crawl while indexing your website.
imprint5.com: # : This system is called "The Robots Exclusion Standard."
imprint5.com: # : It is strongly encouraged to use a robots.txt validator to check
imprint5.com: # for valid syntax before any robots read it!
imprint5.com: #
imprint5.com: # Examples:
imprint5.com: #
imprint5.com: # Instruct all robots to stay out of the admin area.
imprint5.com: # : User-agent: *
imprint5.com: # : Disallow: /admin/
imprint5.com: #
imprint5.com: # Restrict Google and MSN from indexing your images.
imprint5.com: # : User-agent: Googlebot
imprint5.com: # : Disallow: /images/
imprint5.com: # : User-agent: MSNBot
imprint5.com: # : Disallow: /images/
imprint5.com: #****************************************************************************
imprint5.com: # Website Sitemap
imprint5.com: # Crawlers Setup
imprint5.com: # Allowable Index
imprint5.com: # Mind that Allow is not an official standard
imprint5.com: # Directories
imprint5.com: #Disallow: /js/
imprint5.com: #Disallow: /lib/
imprint5.com: # Disallow: /media/
imprint5.com: #Disallow: /media/catalog/
imprint5.com: #Disallow: /media/css/
imprint5.com: #Disallow: /media/css_secure/
imprint5.com: #Disallow: /media/js/
imprint5.com: #Disallow: /media/wysiwyg/
imprint5.com: #Disallow: /media/po_compressor/
imprint5.com: #Disallow: /skin/
imprint5.com: # Paths (clean URLs)
imprint5.com: #Disallow: /checkout/
imprint5.com: # Files
imprint5.com: # Paths (no clean URLs)
imprint5.com: #Disallow: /*.js$
imprint5.com: #Disallow: /*.css$
imprint5.com: #Disallow: *?price=*
imprint5.com: #Disallow: *capacity=*
imprint5.com: #Disallow: *?material=*
imprint5.com: #Disallow: *?decoration=*
imprint5.com: # Pre-existing robots rule
imprint5.com: # # SETTINGS Image indexing
imprint5.com: # # Optional: If you do not want to Google and Bing to index your images
imprint5.com: # User-agent: Googlebot-Image
imprint5.com: # Disallow: /
imprint5.com: # User-agent: msnbot-media
imprint5.com: # Disallow: /
alexa.com: # The crawlers listed below are allowed on the Alexa site.
alexa.com: # Alexa allows other crawlers on a case by case basis.
alexa.com: #
alexa.com: # Alexa provides access to traffic ranking data via Amazon Web Services.
alexa.com: # More information here: <URL: http://aws.amazon.com/awis>
alexa.com: # Disallow all other crawlers
ifixit.com: # iFixit robots.txt
underarmour.com: # Block Multiple Refinement Buckets
underarmour.com: # Block Sort Parameters
underarmour.com: # Block Price Parameters
underarmour.com: # Block Search Refinement Parameters
underarmour.com: # Block Site Search Parameters
underarmour.com: # Block URLS containing pipes
underarmour.com: # Block Pipelines
underarmour.com: # Block Misc Utility Pipelines
underarmour.com: # Sitemaps
underarmour.com: # International Sitemaps
prepscholar.com: # robotstxt.org/
qatarliving.com: #
qatarliving.com: # robots.txt
qatarliving.com: #
qatarliving.com: # This file is to prevent the crawling and indexing of certain parts
qatarliving.com: # of your site by web crawlers and spiders run by sites like Yahoo!
qatarliving.com: # and Google. By telling these "robots" where not to go on your site,
qatarliving.com: # you save bandwidth and server resources.
qatarliving.com: #
qatarliving.com: # This file will be ignored unless it is at the root of your host:
qatarliving.com: # Used: http://example.com/robots.txt
qatarliving.com: # Ignored: http://example.com/site/robots.txt
qatarliving.com: #
qatarliving.com: # For more information about the robots.txt standard, see:
qatarliving.com: # http://www.robotstxt.org/robotstxt.html
qatarliving.com: # CSS, JS, Images
qatarliving.com: # Directories
qatarliving.com: # Files
qatarliving.com: # Paths (clean URLs)
qatarliving.com: #Disallow: /admin/
qatarliving.com: #Disallow: /comment/reply/
qatarliving.com: #Disallow: /filter/tips/
qatarliving.com: #Disallow: /node/add/
qatarliving.com: #Disallow: /search/
qatarliving.com: #Disallow: /user/register/
qatarliving.com: #Disallow: /user/password/
qatarliving.com: #Disallow: /user/login/
qatarliving.com: #Disallow: /user/logout/
qatarliving.com: # Paths (no clean URLs)
qatarliving.com: #Disallow: /?q=admin/
qatarliving.com: #Disallow: /?q=comment/reply/
qatarliving.com: #Disallow: /?q=filter/tips/
qatarliving.com: #Disallow: /?q=node/add/
qatarliving.com: #Disallow: /?q=search/
qatarliving.com: #Disallow: /messages/new/
qatarliving.com: #Disallow: /?destination=user/
qatarliving.com: #Disallow: /?q=user/password/
qatarliving.com: #Disallow: /?q=user/register/
qatarliving.com: #Disallow: /?q=user/login/
qatarliving.com: #Disallow: /?q=user/logout/
qatarliving.com: # Disallow URLs with destination parameter
qatarliving.com: #Disallow: /user/login?destination=*
qatarliving.com: #Disallow: /user/register?destination=*
qatarliving.com: #Disallow: /user?destination=*
qatarliving.com: # Disallow individual user content
qatarliving.com: #Disallow: /user/*/groups
qatarliving.com: #Disallow: /user/*/posts
qatarliving.com: #Disallow: /user/*/pages
qatarliving.com: #Disallow: /user/*/comments
qatarliving.com: #Disallow: /user/*/classifieds
qatarliving.com: #Disallow: /user/*/jobs
qatarliving.com: #Disallow: /user/*/jobs/*
qatarliving.com: #Disallow: /user/*/wishlist
qatarliving.com: #Disallow: /events?type=*
qatarliving.com: #Disallow: /community-group/*
qatarliving.com: #Disallow: /email/node/*/field_email
qatarliving.com: #Disallow: /email/node/*
qatarliving.com: #Disallow: /email/*/*/*
qatarliving.com: #Disallow: /forum/*?page=*
qatarliving.com: #Disallow: /api/*
fb.com: # Notice: Collection of data on Facebook through automated means is
fb.com: # prohibited unless you have express written permission from Facebook
fb.com: # and may only be conducted for the limited purpose contained in said
fb.com: # permission.
fb.com: # See: http://www.facebook.com/apps/site_scraping_tos_terms.php
mcgill.ca: #
mcgill.ca: # robots.txt
mcgill.ca: #
mcgill.ca: # This file is to prevent the crawling and indexing of certain parts
mcgill.ca: # of your site by web crawlers and spiders run by sites like Yahoo!
mcgill.ca: # and Google. By telling these "robots" where not to go on your site,
mcgill.ca: # you save bandwidth and server resources.
mcgill.ca: #
mcgill.ca: # This file will be ignored unless it is at the root of your host:
mcgill.ca: # Used: http://example.com/robots.txt
mcgill.ca: # Ignored: http://example.com/site/robots.txt
mcgill.ca: #
mcgill.ca: # For more information about the robots.txt standard, see:
mcgill.ca: # http://www.robotstxt.org/robotstxt.html
mcgill.ca: # CSS, JS, Images
mcgill.ca: # Directories
mcgill.ca: # Files
mcgill.ca: # Paths (clean URLs)
mcgill.ca: # Paths (no clean URLs)
getepic.com: #updated 12/15/2020
zoosk.com: # robots.txt
torontomls.net: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
torontomls.net: #content{margin:0 0 0 2%;position:relative;}
cedcommerce.com: # robots.txt
cedcommerce.com: #path(Clean URLs)
cedcommerce.com: # Stop Inxeding from search engine
cedcommerce.com: # Directories
cedcommerce.com: # Stop Crawling user account and checkout pages
sdna.gr: #
sdna.gr: # robots.txt
sdna.gr: #
sdna.gr: # This file is to prevent the crawling and indexing of certain parts
sdna.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
sdna.gr: # and Google. By telling these "robots" where not to go on your site,
sdna.gr: # you save bandwidth and server resources.
sdna.gr: #
sdna.gr: # This file will be ignored unless it is at the root of your host:
sdna.gr: # Used: http://example.com/robots.txt
sdna.gr: # Ignored: http://example.com/site/robots.txt
sdna.gr: #
sdna.gr: # For more information about the robots.txt standard, see:
sdna.gr: # http://www.robotstxt.org/robotstxt.html
sdna.gr: # CSS, JS, Images
sdna.gr: # Directories
sdna.gr: # Files
sdna.gr: # Paths (clean URLs)
sdna.gr: # Paths (no clean URLs)
uservoice.com: # Tell MSN to simmer down
uservoice.com: # Tell 80legs to get bent
uservoice.com: # Same for TurnitinBot
uservoice.com: # Fuck off WareBay
elboticarioencasa.com: # robots.txt automatically generated by PrestaShop e-commerce open-source solution
elboticarioencasa.com: # http://www.prestashop.com - http://www.prestashop.com/forums
elboticarioencasa.com: # This file is to prevent the crawling and indexing of certain parts
elboticarioencasa.com: # of your site by web crawlers and spiders run by sites like Yahoo!
elboticarioencasa.com: # and Google. By telling these "robots" where not to go on your site,
elboticarioencasa.com: # you save bandwidth and server resources.
elboticarioencasa.com: # For more information about the robots.txt standard, see:
elboticarioencasa.com: # http://www.robotstxt.org/robotstxt.html
elboticarioencasa.com: # Allow Directives
elboticarioencasa.com: # Private pages
elboticarioencasa.com: # Directories for elboticarioencasa.com
elboticarioencasa.com: # Files
wwe.com: #
wwe.com: # robots.txt
wwe.com: #
wwe.com: # This file is to prevent the crawling and indexing of certain parts
wwe.com: # of your site by web crawlers and spiders run by sites like Yahoo!
wwe.com: # and Google. By telling these "robots" where not to go on your site,
wwe.com: # you save bandwidth and server resources.
wwe.com: #
wwe.com: # This file will be ignored unless it is at the root of your host:
wwe.com: # Used: http://example.com/robots.txt
wwe.com: # Ignored: http://example.com/site/robots.txt
wwe.com: #
wwe.com: # For more information about the robots.txt standard, see:
wwe.com: # http://www.robotstxt.org/robotstxt.html
wwe.com: # Directories
wwe.com: # Paths (clean URLs)
wwe.com: # Paths (no clean URLs)
evaluate.market: # https://www.robotstxt.org/robotstxt.html
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@, ,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@* *@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@ @@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@* *@@@@@@@@@@
truecaller.com: # @@@@@@@@@ (((((( @@@@@@@@@
truecaller.com: # @@@@@@@ ((((((((((( *@@@@@@@
truecaller.com: # @@@@@@ ((((((((((((( @@@@@@
truecaller.com: # @@@@@ ((((((((((((((( @@@@@
truecaller.com: # @@@@ (((((((((((((((( @@@@
truecaller.com: # @@@, (((((((((((((((( ,@@@
truecaller.com: # @@@ ((((((((((((((( @@@
truecaller.com: # @@ (((((((((((((( @@
truecaller.com: # @@ (((((((((((, @@
truecaller.com: # @& ((((((((( &@
truecaller.com: # @ ((((((((/ @
truecaller.com: # @ (((((//// @
truecaller.com: # @ ((/////// @
truecaller.com: # @ /////////* @
truecaller.com: # @& ///////// &@
truecaller.com: # @@ ////////// @@
truecaller.com: # @@ *////////// ****, @@
truecaller.com: # @@@ //////////// ************ @@@
truecaller.com: # @@@. ////////////// ****************** .@@@
truecaller.com: # @@@@ .//////////********************** @@@@
truecaller.com: # @@@@@ */////************************ @@@@@
truecaller.com: # @@@@@@ ************************* @@@@@@
truecaller.com: # @@@@@@@ ******************** ,@@@@@@@
truecaller.com: # @@@@@@@@@ *************. @@@@@@@@@
truecaller.com: # @@@@@@@@@@. .@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@ @@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@. .@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@, ,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
truecaller.com: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
boss.az: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
boss.az: #
boss.az: # To ban all spiders from the entire site uncomment the next two lines:
boss.az: # User-Agent: *
boss.az: # Disallow: /
koreatimes.co.kr: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
koreatimes.co.kr: #content{margin:0 0 0 2%;position:relative;}
cutestat.com: # BEGIN XML Sitemap
cutestat.com: # END XML Sitemap
uba.ar: #container {
abebooks.com: # Sitemap files
pakwheels.com: # /robots.txt for https://www.pakwheels.com
pakwheels.com: # block nofollow
pakwheels.com: # block nofollow
thrillophilia.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
thrillophilia.com: #
thrillophilia.com: # To ban all spiders from the entire site uncomment the next two lines:
thrillophilia.com: # User-Agent: *
thrillophilia.com: # Disallow:/
thrillophilia.com: # Blog wp-contents
thrillophilia.com: # Disallow /tags
google.iq: # AdsBot
google.iq: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
seo.com.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
seo.com.cn: #content{margin:0 0 0 2%;position:relative;}
elschool.ru: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
elschool.ru: #content{margin:0 0 0 2%;position:relative;}
tiki.vn: # Disallow all crawlers access to certain pages.
nta.nic.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
nta.nic.in: #content{margin:0 0 0 2%;position:relative;}
bunjang.co.kr: # www.robotstxt.org/
bunjang.co.kr: # Allow crawling of all content
bunjang.co.kr: # Google Search Engine Sitemap
luluhypermarket.com: #For all robots
luluhypermarket.com: # Block access to specific groups of pages
udg.mx: #
udg.mx: # robots.txt
udg.mx: #
udg.mx: # This file is to prevent the crawling and indexing of certain parts
udg.mx: # of your site by web crawlers and spiders run by sites like Yahoo!
udg.mx: # and Google. By telling these "robots" where not to go on your site,
udg.mx: # you save bandwidth and server resources.
udg.mx: #
udg.mx: # This file will be ignored unless it is at the root of your host:
udg.mx: # Used: http://example.com/robots.txt
udg.mx: # Ignored: http://example.com/site/robots.txt
udg.mx: #
udg.mx: # For more information about the robots.txt standard, see:
udg.mx: # http://www.robotstxt.org/robotstxt.html
udg.mx: # CSS, JS, Images
udg.mx: # Directories
udg.mx: # Files
udg.mx: # Paths (clean URLs)
udg.mx: # Paths (no clean URLs)
queensu.ca: #
queensu.ca: # robots.txt
queensu.ca: #
queensu.ca: # This file is to prevent the crawling and indexing of certain parts
queensu.ca: # of your site by web crawlers and spiders run by sites like Yahoo!
queensu.ca: # and Google. By telling these "robots" where not to go on your site,
queensu.ca: # you save bandwidth and server resources.
queensu.ca: #
queensu.ca: # This file will be ignored unless it is at the root of your host:
queensu.ca: # Used: http://example.com/robots.txt
queensu.ca: # Ignored: http://example.com/site/robots.txt
queensu.ca: #
queensu.ca: # For more information about the robots.txt standard, see:
queensu.ca: # http://www.robotstxt.org/robotstxt.html
queensu.ca: # Directories
queensu.ca: # Files
queensu.ca: # Paths (clean URLs)
queensu.ca: # Paths (no clean URLs)
queensu.ca: #
queensu.ca: # Sites going away
queensu.ca: ##Disallow: /calendars/artsci/
sxl.cn: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
sxl.cn: #
sxl.cn: # To ban all spiders from the entire site uncomment the next two lines:
sxl.cn: # User-Agent: *
sxl.cn: # Disallow: /
sxl.cn: # Google adsbot ignores robots.txt unless specifically named!
amna.gr: #User-agent: Googlebot
amna.gr: #Disallow: /feeds
fstoppers.com: #
fstoppers.com: # robots.txt
fstoppers.com: #
fstoppers.com: # CSS, JS, Images
fstoppers.com: # Directories
fstoppers.com: # Files
fstoppers.com: # Paths (clean URLs)
fstoppers.com: #Disallow: /node/
fstoppers.com: # Paths (no clean URLs)
fstoppers.com: # No access for quicktabs in the URL
fstoppers.com: #Disallow: *?quicktabs_*
fstoppers.com: #Disallow: *&quicktabs_*
northwestern.edu: # robots.txt generated by cron job
northwestern.edu: # Produced 02/24/2021 at 03:48
northwestern.edu: # Directives generated from .NOINDEX files
northwestern.edu: #
zaobao.com: #
zaobao.com: # robots.txt
zaobao.com: #
zaobao.com: # This file is to prevent the crawling and indexing of certain parts
zaobao.com: # of your site by web crawlers and spiders run by sites like Yahoo!
zaobao.com: # and Google. By telling these "robots" where not to go on your site,
zaobao.com: # you save bandwidth and server resources.
zaobao.com: #
zaobao.com: # This file will be ignored unless it is at the root of your host:
zaobao.com: # Used: http://example.com/robots.txt
zaobao.com: # Ignored: http://example.com/site/robots.txt
zaobao.com: #
zaobao.com: # For more information about the robots.txt standard, see:
zaobao.com: # http://www.robotstxt.org/wc/robots.html
zaobao.com: #
zaobao.com: # For syntax checking, see:
zaobao.com: # http://www.sxw.org.uk/computing/robots/check.html
zaobao.com: # CSS, JS, Images
zaobao.com: #Disallow Sogou spider crawling - 17-01-2017 by huy
zaobao.com: # Directories
zaobao.com: # Disallow: /sites/all
zaobao.com: # Files
zaobao.com: # Paths (clean URLs)
zaobao.com: # Paths (no clean URLs)
zaobao.com: # disallow all files ending in specific extension
zaobao.com: # block freemium paywall Images
reklama5.mk: # Group 1
reklama5.mk: # Group 2
reklama5.mk: # Group 3
reklama5.mk: # Group 4
tehran.ir: # Begin robots.txt file
tehran.ir: #/-----------------------------------------------\
tehran.ir: #| In single portal/domain situations, uncomment the sitmap line and enter domain name
tehran.ir: #\-----------------------------------------------/
tehran.ir: #Sitemap: http://www.DomainNamehere.com/sitemap.aspx
tehran.ir: # End of robots.txt file
mushroommarket.net: # Robots For MushroomMarket.net B2C
xht888.com: #myFocus img{ width:100%; height:338px;}
xht888.com: #myFocus{ width:100%; height:338px;}
xht888.com: #gwea td { padding-left:10px}
msdmanuals.com: #Baiduspider
business.com: # www.robotstxt.org/
business.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
edpuzzle.com: #
edpuzzle.com: # _ _
edpuzzle.com: # ___ __| | ___ _ _ ____ ____ | | ___
edpuzzle.com: # / _ \/ _' | _ \| | | |__ ||__ || | / _ \
edpuzzle.com: # | __/ (_| | (_) | |_| | / /_ / /_| || __/
edpuzzle.com: # \___|\__,_| __/|_____|/____|/____|_| \___|
edpuzzle.com: # |_|
edpuzzle.com: #
edpuzzle.com: #
edpuzzle.com: # Allow all robots
novojornal.co.ao: # robots.txt
klook.com: # Hi, we're KLOOK tech team, Nice to meet you.
klook.com: # If you're an engineer, we'd be interested to have a chat with you.
klook.com: # Our tech team base on Shenzhen, China.
klook.com: # You can find our positions in the link below
klook.com: #
klook.com: # https://klook.com/careers?department=Engineering%20%26%20Technology
klook.com: ## ## ## ####### ####### ## ##
klook.com: ## ## ## ## ## ## ## ## ##
klook.com: ## ## ## ## ## ## ## ## ##
klook.com: ##### ## ## ## ## ## #####
klook.com: ## ## ## ## ## ## ## ## ##
klook.com: ## ## ## ## ## ## ## ## ##
klook.com: ## ## ######## ####### ####### ## ##
klook.com: # klook.com
klook.com: #Naver Setting separate
klook.com: #block JP site & Wifi vertical
klook.com: # block test activities
klook.com: # block some activities : AU team has onboarded a merchant (Australia Zoo)
klook.com: # wait mobile
klook.com: # sitemap
popular.com.kh: # vestacp autogenerated robots.txt
google.jo: # AdsBot
google.jo: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
mercadolibre.com.co: #siteId: MCO
mercadolibre.com.co: #country: colombia
mercadolibre.com.co: ##Block - Referidos
mercadolibre.com.co: ##Block - siteinfo urls
mercadolibre.com.co: ##Block - Cart
mercadolibre.com.co: ##Block Checkout
mercadolibre.com.co: ##Block - User Logged
mercadolibre.com.co: #Shipping selector
mercadolibre.com.co: ##Block - last search
mercadolibre.com.co: ## Block - Profile - By Id
mercadolibre.com.co: ## Block - Profile - By Id and role (old version)
mercadolibre.com.co: ## Block - Profile - Leg. Req.
mercadolibre.com.co: ##Block - noindex
mercadolibre.com.co: # Mercado-Puntos
mercadolibre.com.co: # Viejo mundo
mercadolibre.com.co: ##Block recommendations listing
pinterest.ch: # Pinterest is hiring!
pinterest.ch: #
pinterest.ch: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.ch: #
pinterest.ch: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
joomla.org: ##Please don't remove folders from disallow.
joomla.org: ##The allows at the top allow any of the mimetypes listed to be crawled within any folder
joomla.org: ##using long-tail wildcards, these ignore the disallows for the folders below.
joomla.org: ##This gives full render for the search engines whilst preventing full crawls of system
joomla.org: ##folders
joomla.org: #THIS ALLOWS FULL RENDER AT ENGINES
joomla.org: #THESE FOLDERS SHOULD NEVER BE CRAWLED
larepublica.pe: # 19/02/2021
amobbs.com: #
amobbs.com: # robots.txt for Discuz! X3
amobbs.com: #
wipo.int: # robots.txt for https://www.wipo.int/
telerik.com: # All robots will spider the domain
telerik.com: #Image Sitemap
telerik.com: #Video Sitemap
canarabank.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
canarabank.in: #content{margin:0 0 0 2%;position:relative;}
goodrx.com: # Hey there! You don't look like a robot...
goodrx.com: #
goodrx.com: # We'd love to hear from curious humans such as yourself.
goodrx.com: #
goodrx.com: # While you're poking around, why not check out our open positions at GoodRx?
goodrx.com: #
goodrx.com: # https://www.goodrx.com/jobs
goodrx.com: #
goodrx.com: # Or reach out directly via seo@goodrx.com
goodrx.com: #
goodrx.com: # If you are in fact a robot, pardon my presumptions. Here's what you came for:
scholastic.com: # Directories
scholastic.com: # Files
scholastic.com: # Paths (clean URLs)
scholastic.com: # Paths (no clean URLs)
scholastic.com: # AEM
leprogres.fr: # Boutique
leprogres.fr: # Elections
leprogres.fr: # Examen
jutarnji.hr: # If the Joomla site is installed within a folder
jutarnji.hr: # eg www.example.com/joomla/ then the robots.txt file
jutarnji.hr: # MUST be moved to the site root
jutarnji.hr: # eg www.example.com/robots.txt
jutarnji.hr: # AND the joomla folder name MUST be prefixed to all of the
jutarnji.hr: # paths.
jutarnji.hr: # eg the Disallow rule for the /administrator/ folder MUST
jutarnji.hr: # be changed to read
jutarnji.hr: # Disallow: /joomla/administrator/
jutarnji.hr: #
jutarnji.hr: # For more information about the robots.txt standard, see:
jutarnji.hr: # http://www.robotstxt.org/orig.html
jutarnji.hr: #
jutarnji.hr: # For syntax checking, see:
jutarnji.hr: # http://tool.motoricerca.info/robots-checker.phtml
downxia.com: #
downxia.com: #
global-free-classified-ads.com: #User-agent: ia_archiver
global-free-classified-ads.com: #User-agent: ia_archiver/1.6
global-free-classified-ads.com: #User-agent: sogou
global-free-classified-ads.com: ##User-agent: proximic
pitneybowes.com: # Robots.txt file for http://www.pitneybowes.com/
pitneybowes.com: #
pitneybowes.com: # 2020.04.22
pitneybowes.com: # YYYY.MM.DD
pitneybowes.com: # --------------------------------------------------------------------------
pitneybowes.com: # Global Directives
pitneybowes.com: # --------------------------------------------------------------------------
pitneybowes.com: # --------------------------------------------------------------------------
pitneybowes.com: # SEO Disallows
pitneybowes.com: # --------------------------------------------------------------------------
pitneybowes.com: # --------------------------------------------------------------------------
pitneybowes.com: # XML Sitemaps
pitneybowes.com: # --------------------------------------------------------------------------
tueren-fachhandel.de: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server.
tueren-fachhandel.de: ## Set a custom crawl rate if you're experiencing traffic problems with your server.
tueren-fachhandel.de: # Crawl-delay: 30
tueren-fachhandel.de: ## Do not crawl development files and folders: CVS, svn directories and dump files
tueren-fachhandel.de: ## Allow: /*?p=
tueren-fachhandel.de: ## Do not crawl common Magento technical folders
tueren-fachhandel.de: ## Do not crawl common Magento technical folders
tueren-fachhandel.de: ## Do not crawl common Magento files
tueren-fachhandel.de: ## Do not crawl sub category pages that are sorted or filtered.
tueren-fachhandel.de: ## Do not crawl links with session IDs
tueren-fachhandel.de: ## Do not crawl links with filetypes
tueren-fachhandel.de: ## Do not crawl checkout and user account pages
tueren-fachhandel.de: # Secific pages (comment to allow indexing)
tueren-fachhandel.de: #Disallow: /*contacts/
tueren-fachhandel.de: ## Do not crawl seach pages and not-SEO optimized catalog links
tueren-fachhandel.de: ## Do not crawl attributes
tueren-fachhandel.de: ## SERVER SETTINGS
tueren-fachhandel.de: ## Do not crawl common server technical folders and files
tueren-fachhandel.de: ## IMAGE CRAWLERS SETTINGS
tueren-fachhandel.de: ## Extra: Uncomment if you do not wish Google and Bing to index your images
tueren-fachhandel.de: # User-agent: Googlebot-Image
tueren-fachhandel.de: # Disallow: /
tueren-fachhandel.de: # User-agent: msnbot-media
tueren-fachhandel.de: # Disallow: /
tueren-fachhandel.de: # Romove product filter URL
tueren-fachhandel.de: # Remove specific pages from index
hemnet.se: # Kul att du hittat hit!
hemnet.se: # Vill du också jobba på en av Sveriges största sajter med miljontals unika besökare?
hemnet.se: # Gör en spontanansökan på https://jobba.hemnet.se/
1001freefonts.com: #Sitemap: /sitemap.xml
lingojam.com: # robotstxt.org/
tesla.cn: #
tesla.cn: # robots.txt
tesla.cn: #
tesla.cn: # This file is to prevent the crawling and indexing of certain parts
tesla.cn: # of your site by web crawlers and spiders run by sites like Yahoo!
tesla.cn: # and Google. By telling these "robots" where not to go on your site,
tesla.cn: # you save bandwidth and server resources.
tesla.cn: #
tesla.cn: # This file will be ignored unless it is at the root of your host:
tesla.cn: # Used: http://example.com/robots.txt
tesla.cn: # Ignored: http://example.com/site/robots.txt
tesla.cn: #
tesla.cn: # For more information about the robots.txt standard, see:
tesla.cn: # http://www.robotstxt.org/robotstxt.html
tesla.cn: # CSS, JS, Images
tesla.cn: # Directories
tesla.cn: # Files
tesla.cn: # Paths (clean URLs)
tesla.cn: # Paths (no clean URLs)
tesla.cn: ##############################
tesla.cn: # START TESLA CONTENT.
tesla.cn: ##############################
tesla.cn: # Tesla content landing pages.
tesla.cn: ##############################
tesla.cn: # STOP TESLA CONTENT.
tesla.cn: ##############################
pingboard.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
pingboard.com: #
pingboard.com: # To ban all spiders from the entire site uncomment the next two lines:
dhs.gov: #
dhs.gov: # robots.txt
dhs.gov: #
dhs.gov: # This file is to prevent the crawling and indexing of certain parts
dhs.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
dhs.gov: # and Google. By telling these "robots" where not to go on your site,
dhs.gov: # you save bandwidth and server resources.
dhs.gov: #
dhs.gov: # This file will be ignored unless it is at the root of your host:
dhs.gov: # Used: http://example.com/robots.txt
dhs.gov: # Ignored: http://example.com/site/robots.txt
dhs.gov: #
dhs.gov: # For more information about the robots.txt standard, see:
dhs.gov: # http://www.robotstxt.org/robotstxt.html
dhs.gov: # CSS, JS, Images
dhs.gov: # Directories
dhs.gov: # Files
dhs.gov: # Paths (clean URLs)
dhs.gov: # Paths (no clean URLs)
cosmote.gr: # Allow crawlers
cosmote.gr: # Disallow all fixed URLs
cosmote.gr: # Disallow all fixed URLs
cosmote.gr: # Disallow WCS business URLs (content migrated under /cs/business URL)
cosmote.gr: # Disallow specific mobile URLs
cosmote.gr: # Disallow all old OTEGroup URLs
cosmote.gr: # Disallow old Pricelist
cosmote.gr: # NoIndex
kmib.co.kr: # robots.txt generated at http://www.adop.cc
onlinecv.it: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/
bible.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
bible.com: #
bible.com: # To ban all spiders from the entire site uncomment the next two lines:
bible.com: # User-Agent: *
bible.com: # Disallow: /
easybib.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
usgs.gov: #-----GLOBALS-------------
usgs.gov: #Global non-indexing of any item matching paths below. Do not add individual elsewhere.
usgs.gov: # Paths (clean URLs)
usgs.gov: # Paths (no clean URLs)
usgs.gov: #-----END GLOBALS-------
usgs.gov: #SCIENCE EXPLORER
usgs.gov: #PUBS
usgs.gov: #ECOSYSTEMS
usgs.gov: #ENVIRONMENTS PROGRAM
usgs.gov: #ENERGY AND WILDLIFE PROGRAM
usgs.gov: #FISHERIES
usgs.gov: #FISH AND WILDLIFE DISEASE
usgs.gov: #INVASIVE SPECIES PROGRAM
usgs.gov: #SSSP AKA SAGE GROUSE AND SAGEBRUSH ECOSYSTEM
usgs.gov: #STP STATUS AND TRENDS PROGRAM
usgs.gov: #WILDLIFE PROGRAM
usgs.gov: #------
usgs.gov: #WRET site for classroom
usgs.gov: #------
usgs.gov: #SPECIAL TOPICS
usgs.gov: #FLOODS
usgs.gov: #BIG SUR
usgs.gov: #MICROBIOME
usgs.gov: #MISSISSIPPI
usgs.gov: #------
usgs.gov: #MERGED NOT DELETED YET
usgs.gov: #ILIA-WATER
usgs.gov: #INKY
usgs.gov: #MI-OH
usgs.gov: #---------------------
usgs.gov: #Sites allow ie now live
usgs.gov: #---------------------
usgs.gov: #ASC
usgs.gov: #ASTRO
usgs.gov: #AZ-WATER
usgs.gov: #CA-WATER
usgs.gov: #CA WATER LS
usgs.gov: #CAR-FL-WATER
usgs.gov: #CASC PROGRAM
usgs.gov: #CASC CENTER
usgs.gov: #CBA
usgs.gov: #CDI
usgs.gov: #CBP CONTAMINANT BIOLOGY PROGRAM
usgs.gov: #CERC
usgs.gov: #CERSC
usgs.gov: #CGGSC
usgs.gov: #CM-WATER
usgs.gov: #CMERSC
usgs.gov: #CMGP AKA COASTAL MARINE HAZARDS AND RESOURCES PROGRAM
usgs.gov: #CO-WATER
usgs.gov: #DAKOTA
usgs.gov: #DML
usgs.gov: #EERSC
usgs.gov: #EGGL
usgs.gov: #EARTHMRI
usgs.gov: #EHMA
usgs.gov: #EHP EARTHQUAKE HAZARDS PROGRAM
usgs.gov: #EMERSC
usgs.gov: #EMP ENERGY AND MINERALS PROGRAM
usgs.gov: #ENERGY & MINERALS MA
usgs.gov: #EROS
usgs.gov: #ERP ENERGY RESOURCES PROGRAM
usgs.gov: #FIRE
usgs.gov: #FBGC
usgs.gov: #FORT
usgs.gov: #FRESC
usgs.gov: #GAP GAP ANALYSIS PROJECT
usgs.gov: #GAPP
usgs.gov: #GECSC
usgs.gov: #GEOHAZARDS
usgs.gov: #GEOMAGNETISM
usgs.gov: #GLR
usgs.gov: #GLSC
usgs.gov: #GMEG
usgs.gov: #GOM
usgs.gov: #GWSIP GROUNDWATER AND STREAMFLOW INFORMATION PROGRAM
usgs.gov: #HURRICANE FLORENCE
usgs.gov: #HURRICANE MICHAEL
usgs.gov: #ID-WATER
usgs.gov: #INNOVATION CENTER
usgs.gov: #LEETOWN
usgs.gov: #LHP LANDSLIDE HAZARDS PROGRAM
usgs.gov: #LRMA LAND RESOURCES MA
usgs.gov: #LSP LAND CHANGE SCIENCE PROGRAM
usgs.gov: #MENDENHALL
usgs.gov: #MRL
usgs.gov: #MRP MINERAL RESOURCES PROGRAM
usgs.gov: #NCGMP NATIONAL COOPERATIVE GEOLOGIC MAPPING PROGRAM
usgs.gov: #NE-WATER
usgs.gov: #NEHP
usgs.gov: #NGGDPP NATIONAL GEOGRAPHIC AND GEOSPATIAL DATA PRESERVATION PROGRAM
usgs.gov: #NHMA
usgs.gov: #NJ-WATER
usgs.gov: #NPWRC
usgs.gov: #NWHC
usgs.gov: #NV-WATER
usgs.gov: #NWQL
usgs.gov: #NWQP NATIONAL WATER QUALITY PROGRAM
usgs.gov: #PA-WATER
usgs.gov: #PCMSC
usgs.gov: #PIWSC
usgs.gov: #POWELL CENTER
usgs.gov: #MD-DE-DC
usgs.gov: #OR-WATER
usgs.gov: #ORGANIC MATTER RESEARCH LAB OMRL
usgs.gov: #PWRC
usgs.gov: #RBPGL
usgs.gov: #RSP REMOTE SENSING PHENOLOGY
usgs.gov: #SAFRR
usgs.gov: #SDC
usgs.gov: #SPCMSC
usgs.gov: #SPECLAB
usgs.gov: #TX-WATER
usgs.gov: #TOXIC SUBSTANCES HYDROLOGY PROGRAM
usgs.gov: #UMID-WATER
usgs.gov: #UT-WATER
usgs.gov: #VA-WV
usgs.gov: #WA-WATER
usgs.gov: #WAUSP WATER AVAILABILITY AND USE SCIENCE PROGRAM
usgs.gov: #WATER SCIENCE SCHOOL
usgs.gov: #WERC
usgs.gov: #WFRC
usgs.gov: #WGSC
usgs.gov: #WHCMSC
usgs.gov: #WMA
usgs.gov: #WRRP AKA WRRI
usgs.gov: #WY-MT-WATER
usgs.gov: #FOIA-FAQ
metro.co.uk: # News Sitemap
metro.co.uk: # Sitemap archive
metro.co.uk: # Sitemap archive
exame.com: # Sitemap archive
toy-people.com: # sitemap
toy-people.com: # Disallow: /toy-people
hostloc.com: #
hostloc.com: # robots.txt for Discuz! X3
hostloc.com: #
uisdc.com: # robots.txt for http://www.uisdc.com
vistaprint.in: # Crawling Rules - Last Update on 11/07/2019
logicworld.co.za: # Sitemap is also available on /sitemap.xml
sebrae-sc.com.br: # remova os diretorios
sebrae-sc.com.br: #Sitemap
winit.com.cn: # If the Joomla site is installed within a folder such as at
winit.com.cn: # e.g. www.example.com/joomla/ the robots.txt file MUST be
winit.com.cn: # moved to the site root at e.g. www.example.com/robots.txt
winit.com.cn: # AND the joomla folder name MUST be prefixed to the disallowed
winit.com.cn: # path, e.g. the Disallow rule for the /administrator/ folder
winit.com.cn: # MUST be changed to read Disallow: /joomla/administrator/
winit.com.cn: #
winit.com.cn: # For more information about the robots.txt standard, see:
winit.com.cn: # http://www.robotstxt.org/orig.html
winit.com.cn: #
winit.com.cn: # For syntax checking, see:
winit.com.cn: # http://tool.motoricerca.info/robots-checker.phtml
vivino.com: # Sitemap files
thedailystar.net: #
thedailystar.net: # robots.txt
thedailystar.net: #
thedailystar.net: # This file is to prevent the crawling and indexing of certain parts
thedailystar.net: # of your site by web crawlers and spiders run by sites like Yahoo!
thedailystar.net: # and Google. By telling these "robots" where not to go on your site,
thedailystar.net: # you save bandwidth and server resources.
thedailystar.net: #
thedailystar.net: # This file will be ignored unless it is at the root of your host:
thedailystar.net: # Used: http://example.com/robots.txt
thedailystar.net: # Ignored: http://example.com/site/robots.txt
thedailystar.net: #
thedailystar.net: # For more information about the robots.txt standard, see:
thedailystar.net: # http://www.robotstxt.org/robotstxt.html
thedailystar.net: # CSS, JS, Images
thedailystar.net: # Directories
thedailystar.net: # Files
thedailystar.net: # Paths (clean URLs)
thedailystar.net: # Paths (no clean URLs)
efsyn.gr: #
efsyn.gr: # robots.txt
efsyn.gr: #
efsyn.gr: # This file is to prevent the crawling and indexing of certain parts
efsyn.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
efsyn.gr: # and Google. By telling these "robots" where not to go on your site,
efsyn.gr: # you save bandwidth and server resources.
efsyn.gr: #
efsyn.gr: # This file will be ignored unless it is at the root of your host:
efsyn.gr: # Used: http://example.com/robots.txt
efsyn.gr: # Ignored: http://example.com/site/robots.txt
efsyn.gr: #
efsyn.gr: # For more information about the robots.txt standard, see:
efsyn.gr: # http://www.robotstxt.org/robotstxt.html
efsyn.gr: # CSS, JS, Images
efsyn.gr: # Directories
efsyn.gr: # Files
efsyn.gr: # Paths (clean URLs)
efsyn.gr: # Paths (no clean URLs)
elcorreo.com: ### Robots www.elcorreo.com ###
elcorreo.com: ## Sitemaps ##
elcorreo.com: ## User Agents ##
elcorreo.com: #redi2014 #
elcorreo.com: #temp #
elcorreo.com: #25/10/17
jobsdb.com: # Robots.txt file for www.jobsdb.com
jobsdb.com: # URLs are case sensitive!!
jobsdb.com: # Bingbot
jobsdb.com: # LinkedIn Bot
gitee.com: ### BEGIN FILE ###
gitee.com: #
gitee.com: # allow-all
gitee.com: #
gitee.com: #
gitee.com: ### END FILE ###
eldiariomontanes.es: ## Sitemaps ##
eldiariomontanes.es: ## User Agents ##
eldiariomontanes.es: #redi14#
eldiariomontanes.es: #mobile#
eldiariomontanes.es: #temp#
angieslist.com: #
angieslist.com: # robots.txt
angieslist.com: #
angieslist.com: # This file is to prevent the crawling and indexing of certain parts
angieslist.com: # of your site by web crawlers and spiders run by sites like Yahoo!
angieslist.com: # and Google. By telling these "robots" where not to go on your site,
angieslist.com: # you save bandwidth and server resources.
angieslist.com: #
angieslist.com: # This file will be ignored unless it is at the root of your host:
angieslist.com: # Used: http://example.com/robots.txt
angieslist.com: # Ignored: http://example.com/site/robots.txt
angieslist.com: #
angieslist.com: # For more information about the robots.txt standard, see:
angieslist.com: # http://www.robotstxt.org/robotstxt.html
angieslist.com: # CSS, JS, Images
angieslist.com: # Directories
angieslist.com: # Files
angieslist.com: # Paths (clean URLs)
angieslist.com: # Paths (no clean URLs)
angieslist.com: # almodule endpoints
angieslist.com: # alapi endpoints
angieslist.com: # Near ME CMT content
angieslist.com: # nothing under sites
angieslist.com: # Favicon
yandex.com.tr: # yandex.com.tr
giuliofashion.com: # we use Shopify as our ecommerce platform
giuliofashion.com: # Google adsbot ignores robots.txt unless specifically named!
health.com: # Sitemaps
health.com: #legacy
health.com: #Onecms
health.com: #content
health.com: #legacy
health.com: #Onecms
health.com: #content
indiainfoline.com: #Disallow: /search/
indiainfoline.com: #User-agent: AhrefsBot
indiainfoline.com: #Crawl-Delay: 86400
univie.ac.at: #Baiduspider
univie.ac.at: # for: http://www.univie.ac.at/
univie.ac.at: # contact: webmaster@univie.ac.at
univie.ac.at: # for: http://www.univie.ac.at/ZID/
univie.ac.at: # contact: webmaster.zid@univie.ac.at
univie.ac.at: #
support.squarespace.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
fotojet.com: # If the Joomla site is installed within a folder
fotojet.com: # eg www.example.com/joomla/ then the robots.txt file
fotojet.com: # MUST be moved to the site root
fotojet.com: # eg www.example.com/robots.txt
fotojet.com: # AND the joomla folder name MUST be prefixed to all of the
fotojet.com: # paths.
fotojet.com: # eg the Disallow rule for the /administrator/ folder MUST
fotojet.com: # be changed to read
fotojet.com: # Disallow: /joomla/administrator/
fotojet.com: #
fotojet.com: # For more information about the robots.txt standard, see:
fotojet.com: # http://www.robotstxt.org/orig.html
fotojet.com: #
fotojet.com: # For syntax checking, see:
fotojet.com: # http://tool.motoricerca.info/robots-checker.phtml
decider.com: # Sitemap archive
decider.com: # Additional sitemaps
asrv.com: # we use Shopify as our ecommerce platform
asrv.com: # Google adsbot ignores robots.txt unless specifically named!
zumiez.com: # Zumiez prod <www.zumiez.com>
league-funny.com: # sitemap
league-funny.com: #Disallow: /league-funny
worldmarket.com: #
worldmarket.com: # robots.txt - Cost Plus World Market https://www.worldmarket.com
worldmarket.com: #
worldmarket.com: #Sitemaps
worldpay.com: #termly-code-snippet-support label[for="performance"] {
worldpay.com: #termly-code-snippet-support label > input:checked + label {
worldpay.com: #termly-code-snippet-support label > input + label {
etam.com: # HTTP Exclusion Rules
etam.com: # Last Mod: 07/2015
etam.com: # GG UA reminder: https://support.google.com/webmasters/answer/1061943?hl=en
etam.com: # Authorised UA
etam.com: # GWT robots tester : https://www.google.com/webmasters/tools/robots-testing-tool
etam.com: # Disallow: /accessoires/pret-a-porter* # dup sous-cat dans ACCESSOIRES
etam.com: # Disallow: /lingerie-de-nuit/accessoires*
etam.com: # Disallow: /pret-a-porter/les-collants*
etam.com: #Disallow: /nuit/nuisettes-et-chemises-de-nuit*
etam.com: #Disallow: /nuit/kimonos-et-deshabilles*
etam.com: # paramètres à bloquer
etam.com: # Allow: *?*prefn1=refinementColor
etam.com: # Disallow: *?*prefv1=*
etam.com: # Disallow: *?*prefn1=*
etam.com: # Disallow: *?*prefn2=*
etam.com: # répertoires à bloquer
etam.com: # Produit Bally
etam.com: #Disallow: /soldes/*
etam.com: #Disallow: /promos/*
etam.com: #Disallow: /bonnes-affaires/*
etam.com: # Marketing campains Authorised UA
etam.com: # END
flipboard.com: # robots.txt for http://flipboard.com
flipboard.com: # Some references on why 2 and the duplication
flipboard.com: # https://searchengineland.com/robots-txt-tip-from-bing-include-all-relevant-directives-if-you-have-a-bingbot-section-309970
flipboard.com: # https://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/
hkgolden.com: #It's for search engine indexes, aka Google
hyundai.com: #Disallow: /pa/
hyundai.com: # Disallow: /*?*page=
hyundai.com: # Sitemap files
watchmovies5.com.pk: #Begin Attracta SEO Tools Sitemap. Do not remove
watchmovies5.com.pk: #End Attracta SEO Tools Sitemap. Do not remove
naukrigulf.com: # Created September, 01, 2006.
naukrigulf.com: # Author: Jai P Sharma
naukrigulf.com: # Email : jai.sharma[at]naukri.com
naukrigulf.com: # Edited : Mar 27, 2018
redbull.com: #PCS
redbull.com: #Wingfinder
redbull.com: #update04–07-2018#https
postfinance.ch: #robots.txt for PostFinance
supercoloring.com: #
supercoloring.com: # robots.txt
supercoloring.com: #
supercoloring.com: # This file is to prevent the crawling and indexing of certain parts
supercoloring.com: # of your site by web crawlers and spiders run by sites like Yahoo!
supercoloring.com: # and Google. By telling these "robots" where not to go on your site,
supercoloring.com: # you save bandwidth and server resources.
supercoloring.com: #
supercoloring.com: # This file will be ignored unless it is at the root of your host:
supercoloring.com: # Used: http://example.com/robots.txt
supercoloring.com: # Ignored: http://example.com/site/robots.txt
supercoloring.com: #
supercoloring.com: # For more information about the robots.txt standard, see:
supercoloring.com: # http://www.robotstxt.org/robotstxt.html
supercoloring.com: #
supercoloring.com: # For syntax checking, see:
supercoloring.com: # http://www.frobee.com/robots-txt-check
supercoloring.com: # Directories
supercoloring.com: # Files
supercoloring.com: # Paths (clean URLs)
supercoloring.com: # Paths (no clean URLs)
globalclassified.net: # Blocks robots from specific folders / directories
getharvest.com: # http://www.robotstxt.org/
zeczec.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
zeczec.com: #
zeczec.com: # To ban all spiders from the entire site uncomment the next two lines:
zeczec.com: # User-Agent: *
zeczec.com: # Disallow: /
gmx.at: #https://www.gmx.ch/robots.txt
ludwig.guru: # Disallow: /dictionary/*
ludwig.guru: # Disallow: /it/dictionary/*
ludwig.guru: # Disallow: /ru/dictionary/*
ludwig.guru: # Disallow: /en/dictionary/*
ludwig.guru: # Disallow: /pt/dictionary/*
ludwig.guru: # Disallow: /zh/dictionary/*
ludwig.guru: # Disallow: /tr/dictionary/*
ludwig.guru: # Disallow: /share
ludwig.guru: # Disallow: /share/*
ludwig.guru: # Disallow: /it/share
ludwig.guru: # Disallow: /ru/share
ludwig.guru: # Disallow: /en/share
ludwig.guru: # Disallow: /zh/share
ludwig.guru: # Disallow: /pt/share
ludwig.guru: # Disallow: /tr/share
ludwig.guru: # Disallow: /it/share/*
ludwig.guru: # Disallow: /ru/share/*
ludwig.guru: # Disallow: /en/share/*
ludwig.guru: # Disallow: /zh/share/*
ludwig.guru: # Disallow: /pt/share/*
ludwig.guru: # Disallow: /tr/share/*
ludwig.guru: # Disallow: /dictionary/*
ludwig.guru: # Disallow: /it/dictionary/*
ludwig.guru: # Disallow: /ru/dictionary/*
ludwig.guru: # Disallow: /en/dictionary/*
ludwig.guru: # Disallow: /pt/dictionary/*
ludwig.guru: # Disallow: /zh/dictionary/*
ludwig.guru: # Disallow: /tr/dictionary/*
ludwig.guru: # Disallow: /share
ludwig.guru: # Disallow: /share/*
ludwig.guru: # Disallow: /it/share
ludwig.guru: # Disallow: /ru/share
ludwig.guru: # Disallow: /en/share
ludwig.guru: # Disallow: /zh/share
ludwig.guru: # Disallow: /pt/share
ludwig.guru: # Disallow: /tr/share
ludwig.guru: # Disallow: /it/share/*
ludwig.guru: # Disallow: /ru/share/*
ludwig.guru: # Disallow: /en/share/*
ludwig.guru: # Disallow: /zh/share/*
ludwig.guru: # Disallow: /pt/share/*
ludwig.guru: # Disallow: /tr/share/*
ludwig.guru: # Disallow: /dictionary/*
ludwig.guru: # Disallow: /it/dictionary/*
ludwig.guru: # Disallow: /ru/dictionary/*
ludwig.guru: # Disallow: /en/dictionary/*
ludwig.guru: # Disallow: /pt/dictionary/*
ludwig.guru: # Disallow: /zh/dictionary/*
ludwig.guru: # Disallow: /tr/dictionary/*
ludwig.guru: # Disallow: /share
ludwig.guru: # Disallow: /share/*
ludwig.guru: # Disallow: /it/share
ludwig.guru: # Disallow: /ru/share
ludwig.guru: # Disallow: /en/share
ludwig.guru: # Disallow: /zh/share
ludwig.guru: # Disallow: /pt/share
ludwig.guru: # Disallow: /tr/share
ludwig.guru: # Disallow: /it/share/*
ludwig.guru: # Disallow: /ru/share/*
ludwig.guru: # Disallow: /en/share/*
ludwig.guru: # Disallow: /zh/share/*
ludwig.guru: # Disallow: /pt/share/*
ludwig.guru: # Disallow: /tr/share/*
delfi.lt: # $Revision: 1.25 $ $Date: 2020-01-31 08:44:56 $
bmail.uol.com.br: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
sketch.com: # If you’re trying to hide a page from search results use the 'noindex' instead:
sketch.com: # https://developers.google.com/search/docs/advanced/crawling/block-indexing
sketch.com: #
sketch.com: # Please ensure you have gone through the documentation before editing:
sketch.com: # https://developers.google.com/search/reference/robots_txt
sketch.com: #
sketch.com: # In case of conflicts, the less restrictive rules will prevail.
sudouest.fr: # Allowed search engines directives
sudouest.fr: #Sitemaps
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
sudouest.fr: #
juooo.com: # robots.txt generated at http://tool.chinaz.com/robots/
lexisnexis.com: # Ignore FrontPage files
lexisnexis.com: # Ignore Other Files
lexisnexis.com: # Ignore some forms
lexisnexis.com: # Ignore Law School test dir
lexisnexis.com: # Ignore Other folders
lexisnexis.com: # Ignore clients
lexisnexis.com: # Ignore au communities
lexisnexis.com: # Include sitemap
lexisnexis.com: # Ignore search.aspx
lexisnexis.com: # Ignore Martindale-Hubbell
lexisnexis.com: # Ignore flash
lexisnexis.com: # Ignore support
lexisnexis.com: #store pages
lexisnexis.com: # Ignore Webcasting
lexisnexis.com: # Ignore LSBO
lexisnexis.com: #Ignore Accurint
lexisnexis.com: #Ignore ppc pages
lexisnexis.com: #Ignore downloads
lexisnexis.com: #Ignore lexisONE
lexisnexis.com: #Ignore campaign Ravel View
lexisnexis.com: #Ignore old newsroom
lexisnexis.com: #Ignore old lawschool page
lexisnexis.com: #Ignore members pages
lexisnexis.com: #Ignore test pages
fema.gov: #
fema.gov: # robots.txt
fema.gov: #
fema.gov: # This file is to prevent the crawling and indexing of certain parts
fema.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
fema.gov: # and Google. By telling these "robots" where not to go on your site,
fema.gov: # you save bandwidth and server resources.
fema.gov: #
fema.gov: # This file will be ignored unless it is at the root of your host:
fema.gov: # Used: http://example.com/robots.txt
fema.gov: # Ignored: http://example.com/site/robots.txt
fema.gov: #
fema.gov: # For more information about the robots.txt standard, see:
fema.gov: # http://www.robotstxt.org/robotstxt.html
fema.gov: # CSS, JS, Images
fema.gov: # Directories
fema.gov: # Files
fema.gov: # Paths (clean URLs)
fema.gov: # Paths (no clean URLs)
desidime.com: # Hello
desidime.com: #
desidime.com: # If you are a Human and reading this, It means you eat, sleep, dream SEO.
desidime.com: #
desidime.com: # We are implementing white-hat SEO growth hacking techniques on our site.
desidime.com: #
desidime.com: # If you are a growth hacker and technical aspects of SEO makes you excited, you have found a right team. Apply to us at jobs@desidime.com and dont forget to mention that you found us via Robots.txt for bonus points. ;)
desidime.com: #
programme.tv: # robots.txt file for Télé 2 Semaines
programme.tv: # desktop & mobile
programme.tv: # https://www.robotstxt.org/
yunexpress.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
yunexpress.com: #content{margin:0 0 0 2%;position:relative;}
rsc.org: ##ACAP version=1.0
rsc.org: # allow contracted search
rsc.org: # block GuideBot
rsc.org: # block robots
rsc.org: # Editors symposium files
rsc.org: # allow contracted search
rsc.org: # User-agent: gsa-crawler
rsc.org: # block GuideBot
rsc.org: # User-agent: Guidebot
rsc.org: # Disallow: /
rsc.org: # block robots
rsc.org: # User-agent: *
rsc.org: # Disallow: /Membership/Memberzone/
rsc.org: # Disallow: /is/
rsc.org: # Disallow: /publishing/journals/rssfeed.asp
rsc.org: # Yahoo crawl
rsc.org: #Conference Pages
rsc.org: #Exam File for Robert Bowles
rsc.org: # e-Membership
rsc.org: #HD75942 11:11 08/02/2012
rsc.org: # Denial URLs
cruisefashion.com: #divMobSearch {
cruisefashion.com: #mp-menu {
cruisefashion.com: #BodyWrap.headerFix header.HeaderWrap {
cruisefashion.com: #BodyWrap.headerFix #divMobSearch {
cruisefashion.com: #BodyWrap #divMobSearch {
cruisefashion.com: #BodyWrap.headerFix header.HeaderWrap {
cruisefashion.com: #BodyWrap.headerFix #divMobSearch {
cruisefashion.com: #BodyWrap.headerFix .HeaderTopCrus {
cruisefashion.com: #BodyWrap #divMobSearch {
cruisefashion.com: #BodyWrap.headerFix #divMobSearch {
cruisefashion.com: #mp-menu {
oregonstate.edu: #
oregonstate.edu: # robots.txt
oregonstate.edu: #
oregonstate.edu: # This file is to prevent the crawling and indexing of certain parts
oregonstate.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
oregonstate.edu: # and Google. By telling these "robots" where not to go on your site,
oregonstate.edu: # you save bandwidth and server resources.
oregonstate.edu: #
oregonstate.edu: # This file will be ignored unless it is at the root of your host:
oregonstate.edu: # Used: http://example.com/robots.txt
oregonstate.edu: # Ignored: http://example.com/site/robots.txt
oregonstate.edu: #
oregonstate.edu: # For more information about the robots.txt standard, see:
oregonstate.edu: # http://www.robotstxt.org/robotstxt.html
oregonstate.edu: # CSS, JS, Images
oregonstate.edu: # Directories
oregonstate.edu: # Files
oregonstate.edu: # Paths (clean URLs)
oregonstate.edu: # Paths (no clean URLs)
buenosaires.gob.ar: #
buenosaires.gob.ar: # robots.txt
buenosaires.gob.ar: #
buenosaires.gob.ar: # This file is to prevent the crawling and indexing of certain parts
buenosaires.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo!
buenosaires.gob.ar: # and Google. By telling these "robots" where not to go on your site,
buenosaires.gob.ar: # you save bandwidth and server resources.
buenosaires.gob.ar: #
buenosaires.gob.ar: # This file will be ignored unless it is at the root of your host:
buenosaires.gob.ar: # Used: http://example.com/robots.txt
buenosaires.gob.ar: # Ignored: http://example.com/site/robots.txt
buenosaires.gob.ar: #
buenosaires.gob.ar: # For more information about the robots.txt standard, see:
buenosaires.gob.ar: # http://www.robotstxt.org/robotstxt.html
buenosaires.gob.ar: # Directories
buenosaires.gob.ar: # Files
buenosaires.gob.ar: # Paths (clean URLs)
buenosaires.gob.ar: # Paths (no clean URLs)
buenosaires.gob.ar: #Mantis 82858
form.run: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
form.run: #
form.run: # To ban all spiders from the entire site uncomment the next two lines:
form.run: # User-agent: *
form.run: # Disallow: /
osvita.ua: # robots.txt for http://osvita.ua
doxy.me: # www.robotstxt.org/
doxy.me: # Allow crawling of all content
the-qrcode-generator.com: # robotstxt.org/
varsitytutors.com: # production site robots.txt
varsitytutors.com: # for site: www.varsitytutors.com
puravidabracelets.com: # we use Shopify as our ecommerce platform
puravidabracelets.com: # Google adsbot ignores robots.txt unless specifically named!
alberta.ca: # robot exclusion file for www.gov.ab.ca/www2.gov.ab.ca
alberta.ca: # see http://www.robotstxt.org/wc/exclusion-admin.html for format
affiliates.one: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
affiliates.one: #
affiliates.one: # To ban all spiders from the entire site uncomment the next two lines:
affiliates.one: # User-Agent: *
affiliates.one: # Disallow: /
islamweb.net: # Rule 1
islamweb.net: # Rule 2 (indexing on new)
islamweb.net: # Rule 3 (old pages - indexing on new)
planoly.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
gla.ac.uk: #added by dg to stop google indexing old sites [2011/08/15]
gla.ac.uk: #added 2011/08/18 SG
gla.ac.uk: #from access.conf
gla.ac.uk: #Disallow: /services/library/
gla.ac.uk: #Disallow: /undergraduate/prospectus/
gla.ac.uk: #aliases from httpd.conf
gla.ac.uk: #Disallow: /Media/
google.lv: # AdsBot
google.lv: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
tecnoblog.net: # All robots
tecnoblog.net: # remova os diretorios
tecnoblog.net: # Comunidade
tecnoblog.net: #Produtos
tecnoblog.net: # Multilang
tecnoblog.net: # remover scrips css e afins
tecnoblog.net: # Bloqueando URLs dinĆ¢micas
tecnoblog.net: # Robôs Diversos
tecnoblog.net: # Yandex
tecnoblog.net: # Bingbot
tecnoblog.net: # Sogou
tecnoblog.net: #Adsense
linkin.bio: # http://www.robotstxt.org
fzg360.com: #
fzg360.com: # robots.txt for fzg360.com
fzg360.com: # Version v2018
fzg360.com: #
fzg360.com: # Allow
fzg360.com: # Disallow
huamu.com: ## -----------------------------------------------------------------------------
huamu.com: ## fileEncoding = UTF-8
huamu.com: ## 禁止爬虫爬取无效URL,提升网站核心静态资源抓取及索引效率。
huamu.com: ## 无效URL包含:已下线产品线的URL,全动态URL,需权限验证的URL,存在问题的旧静态URL
huamu.com: ## 等各种无需被搜索引擎收录的URL。
huamu.com: ## -----------------------------------------------------------------------------
huamu.com: # robots.txt for careless3 2016.01.13
ironcladapp.com: # robotstxt.org/
newsis.com: # robots.txt generated at http://www.adop.cc
jbhifi.com.au: # we use Shopify as our ecommerce platform
jbhifi.com.au: #modified using Cloudflare Workers
jbhifi.com.au: # Google adsbot ignores robots.txt unless specifically named!
alza.cz: # robots.txt for https://www.alza.cz/
qvc.com: # Throttle bingbot
qvc.com: # HTML Pages
qvc.com: # Affiliates
qvc.com: # Internal Pages
qvc.com: # HTML and PDF Includes
qvc.com: #Legacy
qvc.com: #AEM CHECKOUT
build.com: #Sitemaps
princeton.edu: #
princeton.edu: # robots.txt
princeton.edu: #
princeton.edu: # This file is to prevent the crawling and indexing of certain parts
princeton.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
princeton.edu: # and Google. By telling these "robots" where not to go on your site,
princeton.edu: # you save bandwidth and server resources.
princeton.edu: #
princeton.edu: # This file will be ignored unless it is at the root of your host:
princeton.edu: # Used: http://example.com/robots.txt
princeton.edu: # Ignored: http://example.com/site/robots.txt
princeton.edu: #
princeton.edu: # For more information about the robots.txt standard, see:
princeton.edu: # http://www.robotstxt.org/robotstxt.html
princeton.edu: # CSS, JS, Images
princeton.edu: # Directories
princeton.edu: # Files
princeton.edu: # Paths (clean URLs)
princeton.edu: # Paths (no clean URLs)
pacsun.com: # Specifically allow search result pages to be crawled and indexed
pacsun.com: # Allow: *Search-Show*q=*
pacsun.com: # Prevent indexing of category-specific searches
pacsun.com: # Disallow crawling of specific pages and resources
pacsun.com: # Disallow: /*demandware.static*
pacsun.com: # Prevent indexing of specific pages and resources
pacsun.com: # Noindex: *country=*
toluna.com: #Disallow: /Content/
toluna.com: #Disallow: /
justmote.me: # https://www.robotstxt.org/robotstxt.html
meteofrance.com: #
meteofrance.com: # robots.txt
meteofrance.com: #
meteofrance.com: # This file is to prevent the crawling and indexing of certain parts
meteofrance.com: # of your site by web crawlers and spiders run by sites like Yahoo!
meteofrance.com: # and Google. By telling these "robots" where not to go on your site,
meteofrance.com: # you save bandwidth and server resources.
meteofrance.com: #
meteofrance.com: # This file will be ignored unless it is at the root of your host:
meteofrance.com: # Used: http://example.com/robots.txt
meteofrance.com: # Ignored: http://example.com/site/robots.txt
meteofrance.com: #
meteofrance.com: # For more information about the robots.txt standard, see:
meteofrance.com: # http://www.robotstxt.org/robotstxt.html
meteofrance.com: # CSS, JS, Images
meteofrance.com: # Directories
meteofrance.com: # Files
meteofrance.com: # Paths (clean URLs)
meteofrance.com: # Paths (no clean URLs)
meteofrance.com: # URL
newspicks.com: # robotstxt.org
riverisland.com: #UK
riverisland.com: #additions {31/10/16}
riverisland.com: # price - disallow any URL with price
riverisland.com: # sizes - disallow URLs with any size
riverisland.com: # combination of four facets
piazza.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
piazza.com: #
piazza.com: # To ban all spiders from the entire site uncomment the next two lines:
umass.edu: # Directories
umass.edu: # Files
umass.edu: # Paths (clean URLs)
umass.edu: # Paths (no clean URLs)
pinterest.ca: # Pinterest is hiring!
pinterest.ca: #
pinterest.ca: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.ca: #
pinterest.ca: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
shrm.org: # System Disallow Urls:
shrm.org: # User defined configurations:
ikman.lk: # Sitemap
ikman.lk: # Excludes
ikman.lk: # Blog
ikman.lk: # Promotions
ikman.lk: # msn
transactiondesk.com: # go away
telerama.fr: # robots.txt
telerama.fr: #Directories
telerama.fr: #Disallow: /sites/tr_master/
telerama.fr: #Files
telerama.fr: #Paths (clean URLs)
telerama.fr: #Paths (no clean URLs)
telerama.fr: #Ne pas indexer la recherche
telerama.fr: #CSS, JS, Images
telerama.fr: # Sitemaps
expensify.com: # For all crawlers
expensify.com: # Whitelist specific pages
expensify.com: # Allow: /$ is to prevent us from blocking our root domain as is since we are doing Disallow: /
expensify.com: # Disallow everything else
bbcgoodfood.com: #Member Sitemap
bbcgoodfood.com: # News Sitemap
bbcgoodfood.com: # Sitemap archive
bbcgoodfood.com: # Sitemap archive
gatech.edu: #
gatech.edu: # robots.txt
gatech.edu: #
gatech.edu: # This file is to prevent the crawling and indexing of certain parts
gatech.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
gatech.edu: # and Google. By telling these "robots" where not to go on your site,
gatech.edu: # you save bandwidth and server resources.
gatech.edu: #
gatech.edu: # This file will be ignored unless it is at the root of your host:
gatech.edu: # Used: http://example.com/robots.txt
gatech.edu: # Ignored: http://example.com/site/robots.txt
gatech.edu: #
gatech.edu: # For more information about the robots.txt standard, see:
gatech.edu: # http://www.robotstxt.org/wc/robots.html
gatech.edu: #
gatech.edu: # For syntax checking, see:
gatech.edu: # http://www.sxw.org.uk/computing/robots/check.html
gatech.edu: # Directories
gatech.edu: # Files
gatech.edu: # Paths (clean URLs)
gatech.edu: # Paths (no clean URLs)
selfgrowth.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $
selfgrowth.com: #
selfgrowth.com: # robots.txt
selfgrowth.com: #
selfgrowth.com: # This file is to prevent the crawling and indexing of certain parts
selfgrowth.com: # of your site by web crawlers and spiders run by sites like Yahoo!
selfgrowth.com: # and Google. By telling these "robots" where not to go on your site,
selfgrowth.com: # you save bandwidth and server resources.
selfgrowth.com: #
selfgrowth.com: # This file will be ignored unless it is at the root of your host:
selfgrowth.com: # Used: http://example.com/robots.txt
selfgrowth.com: # Ignored: http://example.com/site/robots.txt
selfgrowth.com: #
selfgrowth.com: # For more information about the robots.txt standard, see:
selfgrowth.com: # http://www.robotstxt.org/wc/robots.html
selfgrowth.com: #
selfgrowth.com: # For syntax checking, see:
selfgrowth.com: # http://www.sxw.org.uk/computing/robots/check.html
selfgrowth.com: # Directories
selfgrowth.com: # Files
selfgrowth.com: # Paths (clean URLs)
selfgrowth.com: # Paths (no clean URLs)
cdslindia.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
cdslindia.com: #content{margin:0 0 0 2%;position:relative;}
keywordtool.io: #
keywordtool.io: # robots.txt
keywordtool.io: #
keywordtool.io: # This file is to prevent the crawling and indexing of certain parts
keywordtool.io: # of your site by web crawlers and spiders run by sites like Yahoo!
keywordtool.io: # and Google. By telling these "robots" where not to go on your site,
keywordtool.io: # you save bandwidth and server resources.
keywordtool.io: #
keywordtool.io: # This file will be ignored unless it is at the root of your host:
keywordtool.io: # Used: http://example.com/robots.txt
keywordtool.io: # Ignored: http://example.com/site/robots.txt
keywordtool.io: #
keywordtool.io: # For more information about the robots.txt standard, see:
keywordtool.io: # http://www.robotstxt.org/robotstxt.html
keywordtool.io: # Crawl-delay: 10
keywordtool.io: # CSS, JS, Images
keywordtool.io: # Directories
keywordtool.io: # Files
keywordtool.io: # Paths (clean URLs)
keywordtool.io: # Paths (no clean URLs)
keywordtool.io: # Wordpress Blog
roberthalf.com: #
roberthalf.com: # robots.txt
roberthalf.com: #
roberthalf.com: # This file is to prevent the crawling and indexing of certain parts
roberthalf.com: # of your site by web crawlers and spiders run by sites like Yahoo!
roberthalf.com: # and Google. By telling these "robots" where not to go on your site,
roberthalf.com: # you save bandwidth and server resources.
roberthalf.com: #
roberthalf.com: # This file will be ignored unless it is at the root of your host:
roberthalf.com: # Used: http://example.com/robots.txt
roberthalf.com: # Ignored: http://example.com/site/robots.txt
roberthalf.com: #
roberthalf.com: # For more information about the robots.txt standard, see:
roberthalf.com: # http://www.robotstxt.org/robotstxt.html
roberthalf.com: # CSS, JS, Images
roberthalf.com: # Directories
roberthalf.com: # Files
roberthalf.com: ## Allow rh-job-search.xml to be crawled
roberthalf.com: # Paths (clean URLs)
roberthalf.com: # Paths (no clean URLs)
roberthalf.com: # XML sitemap
expansion.com: # Diciembre 2020
pinterest.ru: # Pinterest is hiring!
pinterest.ru: #
pinterest.ru: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.ru: #
pinterest.ru: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
pinterest.cl: # Pinterest is hiring!
pinterest.cl: #
pinterest.cl: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.cl: #
pinterest.cl: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
epa.gov: #
epa.gov: # robots.txt
epa.gov: #
epa.gov: # This file is to prevent the crawling and indexing of certain parts
epa.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
epa.gov: # and Google. By telling these "robots" where not to go on your site,
epa.gov: # you save bandwidth and server resources.
epa.gov: #
epa.gov: # This file will be ignored unless it is at the root of your host:
epa.gov: # Used: http://example.com/robots.txt
epa.gov: # Ignored: http://example.com/site/robots.txt
epa.gov: #
epa.gov: # For more information about the robots.txt standard, see:
epa.gov: # http://www.robotstxt.org/robotstxt.html
epa.gov: # CSS, JS, Images
epa.gov: # Directories
epa.gov: # Files
epa.gov: # Paths (clean URLs)
epa.gov: # Paths (no clean URLs)
epa.gov: # 15Jul2020 pbuch replaced dynamically inserted sitemap links
epa.gov: # with single static sitemap index
syncfusion.com: # All robots will spider the domain
syncfusion.com: # To disallow kb tags listing pages
syncfusion.com: # To disallow the retired products
dietdoctor.com: ## DD + global rules
dietdoctor.com: # Avoid duplicate content based on comment query arguments
dietdoctor.com: # Avoid unnecessary crawling of news archives with from_post argument
dietdoctor.com: # Archives, internal search and similar
dietdoctor.com: # Members only content
dietdoctor.com: # News archive beyond first two pages
dietdoctor.com: # Old date archive
dietdoctor.com: # New date archive
dietdoctor.com: # Sitemap
dietdoctor.com: ## SE
dietdoctor.com: # Avoid duplicate content based on comment query arguments
dietdoctor.com: # Archives, and similar
dietdoctor.com: # Members only content
dietdoctor.com: # News archive beyond first two pages
dietdoctor.com: # Old date archive
dietdoctor.com: # New date archive
dietdoctor.com: # Sitemap
dietdoctor.com: ## ES
dietdoctor.com: # News archive beyond first two pages
dietdoctor.com: # New date archive
dietdoctor.com: # Members only content
dietdoctor.com: # Sitemap
cretalive.gr: #
cretalive.gr: # robots.txt
cretalive.gr: #
cretalive.gr: # This file is to prevent the crawling and indexing of certain parts
cretalive.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
cretalive.gr: # and Google. By telling these "robots" where not to go on your site,
cretalive.gr: # you save bandwidth and server resources.
cretalive.gr: #
cretalive.gr: # This file will be ignored unless it is at the root of your host:
cretalive.gr: # Used: http://example.com/robots.txt
cretalive.gr: # Ignored: http://example.com/site/robots.txt
cretalive.gr: #
cretalive.gr: # For more information about the robots.txt standard, see:
cretalive.gr: # http://www.robotstxt.org/robotstxt.html
cretalive.gr: # CSS, JS, Images
cretalive.gr: # Directories
cretalive.gr: # Files
cretalive.gr: # Paths (clean URLs)
cretalive.gr: # Paths (no clean URLs)
447651.com: #robots.txt for all our sites
wines-info.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
wines-info.com: #content{margin:0 0 0 2%;position:relative;}
minepi.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
bankofindia.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
bankofindia.com: #content{margin:0 0 0 2%;position:relative;}
nv.gov: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
nv.gov: #content{margin:0 0 0 2%;position:relative;}
thestar.com: # Allow Mediapartners-Google
thestar.com: # Disallow Specific Robots
thenorthface.com: # robots.txt for www.thenorthface.com
thenorthface.com: #added 1-19-18
thenorthface.com: #added 7-31-19
thenorthface.com: #sitemaps
travelandleisure.com: # Sitemaps
travelandleisure.com: #legacy
travelandleisure.com: #Onecms
travelandleisure.com: #content
travelandleisure.com: #legacy
travelandleisure.com: #Onecms
travelandleisure.com: #content
androidcentral.com: # $Id: robots.txt,v 1.4.4.3 2008/11/04 09:14:25 hass Exp $
androidcentral.com: #
androidcentral.com: # robots.txt
androidcentral.com: #
androidcentral.com: # This file is to prevent the crawling and indexing of certain parts
androidcentral.com: # of your site by web crawlers and spiders run by sites like Yahoo!
androidcentral.com: # and Google. By telling these &quot;robots&quot; where not to go on your site,
androidcentral.com: # you save bandwidth and server resources.
androidcentral.com: #
androidcentral.com: # This file will be ignored unless it is at the root of your host:
androidcentral.com: # Used: http://example.com/robots.txt
androidcentral.com: # Ignored: http://example.com/site/robots.txt
androidcentral.com: #
androidcentral.com: # For more information about the robots.txt standard, see:
androidcentral.com: # http://www.robotstxt.org/robotstxt.html
androidcentral.com: #
androidcentral.com: # For syntax checking, see:
androidcentral.com: # http://www.frobee.com/robots-txt-check
androidcentral.com: # Directories
androidcentral.com: # Files
androidcentral.com: # Paths (clean URLs)
androidcentral.com: # Paths (no clean URLs)
fintel.io: # LinkFluence
ad.nl: # Tell robots that the webview pages are not very interesting
ad.nl: # Articles which should not be listed in google search index:
ad.nl: # tu-e-zet-directeur-op-non-actief~ab9e5892/
bbva.es: # Directorios:
bbva.es: # URL:
bbva.es: # Buscador interno:
timesofisrael.com: # Google Image
timesofisrael.com: # Google AdSense
timesofisrael.com: # digg mirror
timesofisrael.com: # Twiiter
timesofisrael.com: # Google News
timesofisrael.com: # MSN
timesofisrael.com: # global
lloydsbank.co.uk: # v 1.1
lloydsbank.co.uk: # www.lloydsbank.com
duden.de: #
duden.de: # robots.txt
duden.de: #
duden.de: # This file is to prevent the crawling and indexing of certain parts
duden.de: # of your site by web crawlers and spiders run by sites like Yahoo!
duden.de: # and Google. By telling these "robots" where not to go on your site,
duden.de: # you save bandwidth and server resources.
duden.de: #
duden.de: # This file will be ignored unless it is at the root of your host:
duden.de: # Used: http://example.com/robots.txt
duden.de: # Ignored: http://example.com/site/robots.txt
duden.de: #
duden.de: # For more information about the robots.txt standard, see:
duden.de: # http://www.robotstxt.org/robotstxt.html
duden.de: # CSS, JS, Images
duden.de: # Directories
duden.de: # Files
duden.de: # Paths (clean URLs)
duden.de: # Paths (no clean URLs)
starfall.com: # Rule 1
starfall.com: #
starfall.com: # Exclusions
starfall.com: #
starfall.com: #
starfall.com: # Begin special list for /n/
starfall.com: #
starfall.com: # Allow: /n/N-info
starfall.com: #
starfall.com: # Begin special list for /n/level-*
starfall.com: #
starfall.com: #
starfall.com: # Continue more general /n/
starfall.com: #
starfall.com: #
starfall.com: # End special list for /n/
starfall.com: #
starfall.com: #
starfall.com: # Special addition since these were previously disallowed and are now allowed.
starfall.com: # 20170421 - RBW
starfall.com: #
starfall.com: #
starfall.com: # Sitemap addition for HTML5 content
starfall.com: #
autotrader.co.uk: # This is the robots.txt for autotrader.co.uk
autotrader.co.uk: # _ ___________ _
autotrader.co.uk: # ////////// / \ |____ ____| | |
autotrader.co.uk: # /////////// / _ \ _ _ _ ____ | | _ ____ ___| | ____ _
autotrader.co.uk: # //////////// / /_\ \ | | | | / \__ / \ | | | \__ / \ / | / _ \ | \___
autotrader.co.uk: # / _____ \ | | | | | _/ | /\ | | | | __/ | /\ | | /\ | | [_] \ | __/
autotrader.co.uk: # //////////// / / \ \ | | | | | | | | | | | | | | | | | | | | | | | ____| | |
autotrader.co.uk: # /////////// / / \ \ | \/ | | |__/\ | \/ | | | | | | \/ _ \ | \/ | | \____ | |
autotrader.co.uk: # ////////// /_/ \_\ \____/ \_____/ \____/ |_| |_| \___/ \/ \___/\/ \_____/ |_|
autotrader.co.uk: #
autotrader.co.uk: # ========================================================================================================
autotrader.co.uk: # | Auto Trader are hiring - Check out our jobs at https://careers.autotrader.co.uk/jobs |
autotrader.co.uk: # ========================================================================================================
visa.com: #logo { position: absolute; top: 20px; left: 16px; }
visa.com: #content { position: absolute; top: 146px; left: 96px; color: #000000; width: 623px; }
visa.com: #footer { position: absolute; top: 384px; left: 2px; width: 500px; height: 76px; margin: 45px 0 0 13px; float: left; font-size: 0.85em; color: #003399; overflow: hidden; }
visa.com: #copyright { color: #999999; margin-top: 5px; }
visa.com: #footer a { text-decoration: none; }
visa.com: #footer a:hover { text-decoration: underline;}
proposify.com: # robots.txt for https://www.proposify.com/
proposify.com: # live - don't allow web crawlers to index cpresources/ or vendor/
india.gov.in: #searchForm label { display: none;}
bdjobs.com: # robots.txt file for www.bdjobs.com
bdjobs.com: # All other agents will not spider
bdjobs.com: # Google will not spider
bdjobs.com: # Google Ad Sense
bdjobs.com: # Yahoo
bdjobs.com: # Bing
bdjobs.com: # GA Checker
bdjobs.com: # Screaming Frog SEO Spider
bdjobs.com: # Visual SEO Studio
bdjobs.com: # LinkedInBot
bdjobs.com: # All other agents will not spider
halifax.co.uk: # v 1.1
halifax.co.uk: # www.halifax.co.uk
realsimple.com: # Sitemaps
realsimple.com: # legacy
realsimple.com: #Onecms
realsimple.com: #Content
realsimple.com: # legacy
realsimple.com: #Onecms
realsimple.com: #Content
brainly.pl: #Brainly Robots.txt 31.07.2017
brainly.pl: # Disallow Marketing bots
brainly.pl: #Disallow exotic search engine crawlers
brainly.pl: #Disallow other crawlers
brainly.pl: # Good bots whitelisting:
brainly.pl: #Other bots
brainly.pl: #Neticle Crawler v1.0 ( http://bot.neticle.hu/ ) https://bot.neticle.hu/ - brand monitoring
brainly.pl: #Mega https://megaindex.com/crawler - link indexer tool (supports directives in user-agent:*)
brainly.pl: #Obot - IBM X-Force service
brainly.pl: #SafeDNSBot (https://www.safedns.com/searchbot)
getgo.com: # Sitemaps and Autodiscovers
exportersindia.com: #Robots.txt for ExportersIndia.com
surokkha.gov.bd: # https://www.robotstxt.org/robotstxt.html
zola.com: # Sitemaps
zola.com: # Disallows
zola.com: # Allow Overrides
zola.com: # Grant access for Find a Couple
abc.com: # Block trendkite-akashic-crawler
pinterest.it: # Pinterest is hiring!
pinterest.it: #
pinterest.it: # Learn about the SEO work that we're doing at https://medium.com/@Pinterest_Engineering/demystifying-seo-with-experiments-a183b325cf4c
pinterest.it: #
pinterest.it: # Check out some of our available positions at https://careers.pinterest.com/careers/engineering
truecar.com: # prod
chengdun.com: #1 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/Library/Think/App.class.php(38): Think\Dispatcher::dispatch()<br />
chengdun.com: #2 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/Library/Think/App.class.php(195): Think\App::init()<br />
chengdun.com: #3 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/Library/Think/Think.class.php(120): Think\App::run()<br />
chengdun.com: #4 /application/nginx-1.12.2/html/www/chengdunbaozhang/ThinkPHP/ThinkPHP.php(97): Think\Think::start()<br />
chengdun.com: #5 /application/nginx-1.12.2/html/www/chengdunbaozhang/web/index.php(37): require('/application/ng...')<br />
chengdun.com: #6 {main}</p>
expedia.com.hk: #
expedia.com.hk: # General bots
expedia.com.hk: #
expedia.com.hk: #hotel
expedia.com.hk: #flight
expedia.com.hk: #package
expedia.com.hk: #car
expedia.com.hk: #activities
expedia.com.hk: #cruise
expedia.com.hk: #other
expedia.com.hk: #
expedia.com.hk: # Google Ads
expedia.com.hk: #
expedia.com.hk: #
expedia.com.hk: #
expedia.com.hk: # Bing Ads
expedia.com.hk: #
expedia.com.hk: #
expedia.com.hk: # SemrushBot
expedia.com.hk: #
im286.net: #
im286.net: # robots.txt for Discuz! X3
im286.net: #
okdiario.com: #Permitir rastreo en paginaciones de contenidos evergreen
okdiario.com: # Paginaciones limitadas
okdiario.com: #Paginaciones LOOK
okdiario.com: #Directorios bloqueados
okdiario.com: #Extensiones de contenidos no indexables
okdiario.com: #Sitemaps Okdiario
okdiario.com: #Sitemaps Look
okdiario.com: #Bloqueo de agentes
directhit.com: ## Default robots.txt
ontraport.com: # www.robotstxt.org/
ontraport.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
fark.com: # $Id: robotsalt.txt 10113 2010-10-25 19:37:18Z mandrews $
oregon.gov: # Last Updated: 11/13/19 by MCS, NIC
oregon.gov: # ---------------------- Statewide ----------------------
oregon.gov: # Note: Site-stored robots.txt are not honored, except on host-header sites / subdomains
oregon.gov: # All directives for www.oregon.gov are to be stored here
oregon.gov: # Remove an indexed SERP and/or submit a 'Remove URL' request in Webmaster Tools
oregon.gov: # Note: The Allow directives are added for many Google-specific Mobile Tests to fully render the page.
oregon.gov: # Without these directives, the state sites could get poor grades for mobile-friendliness which
oregon.gov: # can result is a lower Page Rank and other SEO scores, as well as incorrect analytics in
oregon.gov: # the Google Analytics product.
oregon.gov: # ---------------------- DCBS ----------------------
oregon.gov: # Enables DCBS's Google Search Applicance to index paths otherwise blocked (e.g. Orders)
georgia.gov: #
georgia.gov: # robots.txt
georgia.gov: #
georgia.gov: # This file is to prevent the crawling and indexing of certain parts
georgia.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
georgia.gov: # and Google. By telling these "robots" where not to go on your site,
georgia.gov: # you save bandwidth and server resources.
georgia.gov: #
georgia.gov: # This file will be ignored unless it is at the root of your host:
georgia.gov: # Used: http://example.com/robots.txt
georgia.gov: # Ignored: http://example.com/site/robots.txt
georgia.gov: #
georgia.gov: # For more information about the robots.txt standard, see:
georgia.gov: # http://www.robotstxt.org/robotstxt.html
georgia.gov: # CSS, JS, Images
georgia.gov: # Directories
georgia.gov: # Files
georgia.gov: # Paths (clean URLs)
georgia.gov: # Paths (no clean URLs)
georgia.gov: # Book printer-friendly pages
infojobs.com.br: #
infojobs.com.br: # robots.txt Infojobs
infojobs.com.br: #
infojobs.com.br: # $ID: robots.txt,v 1.0 2006/05/17 17:14:00 Exp $
infojobs.com.br: #
infojobs.com.br: # Web site: infojobs.com.br
infojobs.com.br: # Descomentar esto cuando tengamos sitemaps y links internos en mobile.
infojobs.com.br: # User-agent: Googlebot-Mobile
infojobs.com.br: # User-Agent: YahooSeeker/M1A1-R2D2
infojobs.com.br: # User-Agent: MSNBOT_Mobile
infojobs.com.br: # Disallow: /
k5learning.com: #
k5learning.com: # robots.txt
k5learning.com: #
k5learning.com: # This file is to prevent the crawling and indexing of certain parts
k5learning.com: # of your site by web crawlers and spiders run by sites like Yahoo!
k5learning.com: # and Google. By telling these "robots" where not to go on your site,
k5learning.com: # you save bandwidth and server resources.
k5learning.com: #
k5learning.com: # This file will be ignored unless it is at the root of your host:
k5learning.com: # Used: http://example.com/robots.txt
k5learning.com: # Ignored: http://example.com/site/robots.txt
k5learning.com: #
k5learning.com: # For more information about the robots.txt standard, see:
k5learning.com: # http://www.robotstxt.org/robotstxt.html
k5learning.com: # CSS, JS, Images
k5learning.com: # Directories
k5learning.com: # Files
k5learning.com: # Paths (clean URLs)
k5learning.com: # Paths (no clean URLs)
k5learning.com: # MJ12bot
easypost.com: # robotstxt.org
easypost.com: #
easypost.com: # Access to easypost.com is pursuant to our terms of service, located at
easypost.com: # https://www.easypost.com/privacy
easypost.com: #
easypost.com: # .
easypost.com: # c.
easypost.com: # ;,
easypost.com: # ,c
easypost.com: # .. ':;:,... .
easypost.com: # ,;;',,,;;; .,;,,,,,,,,;. '' :,;.''
easypost.com: # ;,,;c;:l,; .;,,,,,,,,,,,:. .;,;:;c:,:
easypost.com: # :::;,,,:, :,,,,;;:::;;;;;. ,c:;;,;::
easypost.com: # :,,,,,: .:,;;'.lcdddNOdkOO; ;,,,,,;.
easypost.com: # :cccc, c,:..,. ' ' Ox' ;ccccc
easypost.com: # ,cccc, :,:...'....;....d:. occcc
easypost.com: # 'llll: :,;;''''''''''''.. olloo
easypost.com: # dlcco ;,,,,,,,,,,,,,,c :ccco
easypost.com: # ccccl. ,,,,;:,.;..;.;c; collx
easypost.com: # ,lollc ',,:,;;'c'':':. :cccd
easypost.com: # dcccl. .;,:;.' '..,..' ollod
easypost.com: # ,cclol c,,,;;;;;,,,,c. .occcl
easypost.com: # lolcc; ;;;;,,,,''''';.. cllol;
easypost.com: # .lcccl; .'...............'.. .lccco.
easypost.com: # .lclooc. '.....'',,,,,,,,,,,,,;..ololc;
easypost.com: # locccl;. .';',,;;;,,,,,,,,,,,,,,,,lclccol
easypost.com: # 'lccooolc,,::,,,,,;::::::;;:::::,c,ocl:
easypost.com: # ,odcccco;,c,,:::;,,,,,,,,,,,,;cc;oc.
easypost.com: # .;lccl:,c,,c;,,,,,,,,,,,,,,;ccc.
easypost.com: # .lc:,;:,,;c,,,,,,,,,,,,,,;c:
easypost.com: # .'c;,,,,c,,,,,,,,,,,,,,;c;
easypost.com: # ;,,,,,c,,,,,,,,,,,,,,;c;
easypost.com: # ,,,,,,c,,,,,,,,,,,,;l:c;
easypost.com: #Baiduspider
pymnts.com: #bad bots#
trailblazer.me: #
trailblazer.me: # default robots.txt for sfdc communities sites
trailblazer.me: #
trailblazer.me: # For use by salesforce.com
trailblazer.me: #
nc.gov: #
nc.gov: # robots.txt
nc.gov: #
nc.gov: # This file is to prevent the crawling and indexing of certain parts
nc.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
nc.gov: # and Google. By telling these "robots" where not to go on your site,
nc.gov: # you save bandwidth and server resources.
nc.gov: #
nc.gov: # This file will be ignored unless it is at the root of your host:
nc.gov: # Used: http://example.com/robots.txt
nc.gov: # Ignored: http://example.com/site/robots.txt
nc.gov: #
nc.gov: # For more information about the robots.txt standard, see:
nc.gov: # http://www.robotstxt.org/robotstxt.html
nc.gov: # CSS, JS, Images
nc.gov: # Directories
nc.gov: # Files
nc.gov: # Paths (clean URLs)
nc.gov: # Paths (no clean URLs)
nc.gov: # AWS WAF Honeypot Endpoint Trap
coles.com.au: # /robots.txt for coles.com.au
livecoinwatch.com: # no, thank you
livecoinwatch.com: # everyone else, welcome... for now
livecoinwatch.com: # always fresh
gigazine.net: # /robots.txt file for Disallow: /
gigazine.net: # 2008/04/07 11:51
gigazine.net: # 2013/11/05 10:17 add ia_archiver by takaki
gigazine.net: # 2016/04/13 12:30 modify
gigazine.net: # 2018/12/26 09:42 refactored by log1d
gigazine.net: # 2020/04/16 10:26 modify by log1d
adthrive.com: # This space intentionally left blank
optum.com: #optum
getapp.com: # Blocks crawlers that are kind enough to obey robots
star-clicks.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
star-clicks.com: #content{margin:0 0 0 2%;position:relative;}
klix.ba: # robotstxt.org
upmedia.mg: #Googlebot
upmedia.mg: #Googlebot-Mobile
upmedia.mg: #Googlebot-News
upmedia.mg: #Googlebot-Image
upmedia.mg: #Facebot
upmedia.mg: #Twitterbot
upmedia.mg: #Bingbot
upmedia.mg: #Yahoo
upmedia.mg: #Alexa
upmedia.mg: #Baidu
poetryfoundation.org: # robots.txt for https://www.poetryfoundation.org/
poetryfoundation.org: # live - don't allow web crawlers to index cpresources/ or vendor/
digit.in: # www.robotstxt.org/
digit.in: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
brightmls.com: # https://www.robotstxt.org/robotstxt.html
digitec.ch: # @/ @/
digitec.ch: # @/ @/ Hello, fellow humans!
digitec.ch: # @/ @/
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@
digitec.ch: # @@@@@@ @@@@@@@@@@@@ @@@@@@ @@@% @
digitec.ch: # @@@@@ /@@@@@@@@@@ @@@@@@ @@@ @@
digitec.ch: # @@@@@@ @@@@@@@@@@@, @@@@@@ @@@@ @@@@
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@
digitec.ch: # @@@@@@@@@@@@@ @@@@@@@@@@@@@@ @@@@@@@@
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@
digitec.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@
sanfoundry.com: # Allow the following useragents for crawling the site
finra.org: #
finra.org: # robots.txt
finra.org: #
finra.org: # This file is to prevent the crawling and indexing of certain parts
finra.org: # of your site by web crawlers and spiders run by sites like Yahoo!
finra.org: # and Google. By telling these "robots" where not to go on your site,
finra.org: # you save bandwidth and server resources.
finra.org: #
finra.org: # This file will be ignored unless it is at the root of your host:
finra.org: # Used: http://example.com/robots.txt
finra.org: # Ignored: http://example.com/site/robots.txt
finra.org: #
finra.org: # For more information about the robots.txt standard, see:
finra.org: # http://www.robotstxt.org/robotstxt.html
finra.org: # CSS, JS, Images
finra.org: # Directories
finra.org: # Files
finra.org: # Paths (clean URLs)
finra.org: # Paths (no clean URLs)
milenio.com: #
milenio.com: # robots.txt
milenio.com: #
milenio.com: # This file is to prevent the crawling and indexing of certain parts
milenio.com: # of your site by web crawlers and spiders run by sites like Yahoo!
milenio.com: # and Google. By telling these "robots" where not to go on your site,
milenio.com: # you save bandwidth and server resources.
milenio.com: #
milenio.com: # This file will be ignored unless it is at the root of your host:
milenio.com: # Used: http://example.com/robots.txt
milenio.com: # Ignored: http://example.com/site/robots.txt
milenio.com: #
milenio.com: # For more information about the robots.txt standard, see:
milenio.com: # http://www.robotstxt.org/wc/robots.html
milenio.com: #
milenio.com: # For syntax checking, see:
milenio.com: # http://www.sxw.org.uk/computing/robots/check.html
ring.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
classkick.com: # Squarespace Robots Txt
20minutos.es: # Agentes permitidos explicitamente
20minutos.es: # Agentes bloqueados por idioma
20minutos.es: # Agentes nocivos
tkgm.gov.tr: #
tkgm.gov.tr: # robots.txt
tkgm.gov.tr: #
tkgm.gov.tr: # This file is to prevent the crawling and indexing of certain parts
tkgm.gov.tr: # of your site by web crawlers and spiders run by sites like Yahoo!
tkgm.gov.tr: # and Google. By telling these "robots" where not to go on your site,
tkgm.gov.tr: # you save bandwidth and server resources.
tkgm.gov.tr: #
tkgm.gov.tr: # This file will be ignored unless it is at the root of your host:
tkgm.gov.tr: # Used: http://example.com/robots.txt
tkgm.gov.tr: # Ignored: http://example.com/site/robots.txt
tkgm.gov.tr: #
tkgm.gov.tr: # For more information about the robots.txt standard, see:
tkgm.gov.tr: # http://www.robotstxt.org/robotstxt.html
tkgm.gov.tr: # CSS, JS, Images
tkgm.gov.tr: # Directories
tkgm.gov.tr: # Files
tkgm.gov.tr: # Paths (clean URLs)
tkgm.gov.tr: # Paths (no clean URLs)
scopus.com: # /robots.txt file for http://www.scopus.com/
ahoramismo.com: # Sitemap archive
jigsawplanet.com: # --- allow ad bots
jigsawplanet.com: # ---
diigo.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
diigo.com: #Disallow: /user
coindeskjapan.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
alltrails.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
alltrails.com: #
123moviesfree.net: #logo {
123moviesfree.net: #menu {
123moviesfree.net: #menu ul.top-menu {
123moviesfree.net: #menu ul.top-menu li {
123moviesfree.net: #menu ul.top-menu li a {
123moviesfree.net: #menu ul.top-menu li:hover a,
123moviesfree.net: #menu ul.top-menu li.active a {
123moviesfree.net: #menu ul.top-menu li.active a {
123moviesfree.net: #menu .sub-container {
123moviesfree.net: #menu .sub-container ul.sub-menu {
123moviesfree.net: #menu .sub-container ul.sub-menu li {
123moviesfree.net: #menu .sub-container ul.sub-menu li a {
123moviesfree.net: #menu .sub-container ul.sub-menu li:hover a {
123moviesfree.net: #menu ul.top-menu li:hover .sub-container ul.sub-menu li a {
123moviesfree.net: #search {
123moviesfree.net: #search input.search-input {
123moviesfree.net: #search .search-submit {
123moviesfree.net: #search .search-submit i {
123moviesfree.net: #top-user {
123moviesfree.net: #top-user .top-user-content.guest {
123moviesfree.net: #main {
123moviesfree.net: #slider {
123moviesfree.net: #slider .swiper-slide {
123moviesfree.net: #slider .swiper-slide .slide-link {
123moviesfree.net: #slider .slide-caption {
123moviesfree.net: #slider:hover .slide-caption {
123moviesfree.net: #slider .slide-caption h2 {
123moviesfree.net: #slider .slide-caption .slide-caption-info {
123moviesfree.net: #slider .slide-caption .slide-caption-info .block {
123moviesfree.net: #slider .slide-caption .slide-caption-info .block strong {
123moviesfree.net: #top-news {
123moviesfree.net: #top-news .nav {
123moviesfree.net: #top-news .nav li {
123moviesfree.net: #top-news .nav li a {
123moviesfree.net: #top-news .nav li.active a,
123moviesfree.net: #top-news .nav li:hover a {
123moviesfree.net: #top-news .top-news {
123moviesfree.net: #top-news .top-news-content {
123moviesfree.net: #top-news .top-news-content .tn-news {
123moviesfree.net: #top-news .top-news-content .tn-notice {
123moviesfree.net: #top-news .top-news-content ul {
123moviesfree.net: #top-news .top-news-content ul.tn-news li {
123moviesfree.net: #top-news .top-news-content ul.tn-news li:hover {
123moviesfree.net: #top-news .top-news-content ul.tn-news li:hover .tnc-info h4 a {
123moviesfree.net: #top-news .top-news-content ul.tn-news li:hover .news-thumb {
123moviesfree.net: #top-news .top-news-content ul.tn-news li .news-thumb {
123moviesfree.net: #top-news .top-news-content ul.tn-news li .tnc-info {
123moviesfree.net: #top-news .top-news-content ul.tn-news li .tnc-info h4 {
123moviesfree.net: #top-news .top-news-content ul.tn-news li .tnc-info h4 a {
123moviesfree.net: #top-news .top-news-content ul.tn-news li.view-more {
123moviesfree.net: #top-news .top-news-content ul.tn-news li.view-more a {
123moviesfree.net: #top-news .top-news-content ul.tn-news li.view-more a i {
123moviesfree.net: #top-news .top-news-content ul.tn-notice li {
123moviesfree.net: #top-news .top-news-content ul.tn-notice li:hover {
123moviesfree.net: #top-news .top-news-content ul.tn-notice li a {
123moviesfree.net: #top-news .top-news-content ul.tn-notice li a span {
123moviesfree.net: #top-news .top-news-content ul.tn-notice li a span i {
123moviesfree.net: #top-news .top-news-content .tab-pane {
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps {
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block {
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block span {
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block i {
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block:hover {
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block.ios:hover i {
123moviesfree.net: #top-news .top-news-content .tab-pane .tnc-apps .tnca-block.android:hover i {
123moviesfree.net: #top-news .top-news-content ul.tn-premium {
123moviesfree.net: #top-news .top-news-content ul.tn-premium li {
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a {
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a:hover {
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a .price {
123moviesfree.net: #top-news .top-news-content ul.tn-premium li a .btn {
123moviesfree.net: #bread .breadcrumb {
123moviesfree.net: #bread .breadcrumb a {
123moviesfree.net: #mv-info {
123moviesfree.net: #mv-info .mvi-cover {
123moviesfree.net: #mv-info .mvi-cover:after {
123moviesfree.net: #mv-info .mvi-cover:before {
123moviesfree.net: #mv-info .mvi-cover:hover:before {
123moviesfree.net: #mv-info .mvi-cover:hover:after {
123moviesfree.net: #mv-info .mvi-view {
123moviesfree.net: #mv-info .mvi-content {
123moviesfree.net: #mv-info .mvi-content h3 {
123moviesfree.net: #mv-info .mvi-content .block-trailer {
123moviesfree.net: #mv-info .mvi-content .block-trailer a {
123moviesfree.net: #mv-info .mvi-content .mvic-desc {
123moviesfree.net: #mv-info .mvi-content .mvic-desc .desc {
123moviesfree.net: #mv-info .mvi-content .mvic-info {
123moviesfree.net: #mv-info .mvi-content .mvic-info p {
123moviesfree.net: #mv-info .mvi-content .mvic-info .mvici-left {
123moviesfree.net: #mv-info .mvi-content .mvic-info .mvici-right {
123moviesfree.net: #mv-info .mvi-content .mvic-thumb {
123moviesfree.net: #mv-info .mvi-content .mvic-btn {
123moviesfree.net: #mv-info .mvi-content .mvic-btn .btn {
123moviesfree.net: #mv-info .mvi-content .quality {
123moviesfree.net: #mv-info .mvi-content .block-social {
123moviesfree.net: #mv-keywords {
123moviesfree.net: #mv-keywords a {
123moviesfree.net: #mv-keywords a:hover {
123moviesfree.net: #mv-keywords a h5 {
123moviesfree.net: #mv-keywords a h5:before {
123moviesfree.net: #media-player,
123moviesfree.net: #content-embed {
123moviesfree.net: #media-player.active,
123moviesfree.net: #content-embed.active {
123moviesfree.net: #bar-player {
123moviesfree.net: #bar-player .bp-view {
123moviesfree.net: #bar-player .bp-btn-light span:after {
123moviesfree.net: #bar-player .bp-btn-light.active span:after {
123moviesfree.net: #bar-player .bp-btn-auto span:after {
123moviesfree.net: #bar-player .bp-btn-auto.active span:after {
123moviesfree.net: #bar-player .btn {
123moviesfree.net: #bar-player .btn:hover {
123moviesfree.net: #bar-player .btn.active {
123moviesfree.net: #bar-player .bp-btn-light.active {
123moviesfree.net: #bar-player .btn i {
123moviesfree.net: #overlay {
123moviesfree.net: #overlay.active {
123moviesfree.net: #comment-area {
123moviesfree.net: #comment-area.active {
123moviesfree.net: #comment-area #toggle {
123moviesfree.net: #comment-area #comment {
123moviesfree.net: #comment-area #comment.active {
123moviesfree.net: #comment-area #comment .content {
123moviesfree.net: #comment-area #comment .cac-close {
123moviesfree.net: #comment-area #comment .cac-close i {
123moviesfree.net: #footer .footer-link {
123moviesfree.net: #footer .footer-link.end {
123moviesfree.net: #footer .footer-link-head {
123moviesfree.net: #footer .footer-link {
123moviesfree.net: #footer .footer-link.end {
123moviesfree.net: #footer .footer-link-head {
123moviesfree.net: #copyright {
123moviesfree.net: #footer .heading {
123moviesfree.net: #footer a {
123moviesfree.net: #footer a:hover {
123moviesfree.net: #footer b,
123moviesfree.net: #footer strong {
123moviesfree.net: #footer .links a {
123moviesfree.net: #footer .text-lighter {
123moviesfree.net: #footer .desc {
123moviesfree.net: #commentfb {
123moviesfree.net: #pop-login .modal-dialog,
123moviesfree.net: #pop-register .modal-dialog,
123moviesfree.net: #pop-forgot .modal-dialog {
123moviesfree.net: #pagination {
123moviesfree.net: #open-forgot {
123moviesfree.net: #menu.active {
123moviesfree.net: #search.active {
123moviesfree.net: #filter {
123moviesfree.net: #filter.active {
123moviesfree.net: #filter .fc-title {
123moviesfree.net: #filter ul {
123moviesfree.net: #filter ul li {
123moviesfree.net: #filter ul li.active {
123moviesfree.net: #filter ul li label {
123moviesfree.net: #filter ul li label input {
123moviesfree.net: #filter ul.fc-main-list {
123moviesfree.net: #filter ul.fc-main-list li {
123moviesfree.net: #filter ul.fc-main-list li a {
123moviesfree.net: #filter ul.fc-main-list li a.active {
123moviesfree.net: #filter ul.fc-main-list li a:hover {
123moviesfree.net: #filter .filter-btn {
123moviesfree.net: #filter .cs10-top .fc-filmtype {
123moviesfree.net: #filter .cs10-top .fc-quality {
123moviesfree.net: #list-eps {
123moviesfree.net: #list-eps .le-server {
123moviesfree.net: #list-eps .le-server:last-of-type {
123moviesfree.net: #list-eps .le-server .les-title {
123moviesfree.net: #list-eps .le-server .les-content {
123moviesfree.net: #list-eps .le-server .les-content .btn-eps {
123moviesfree.net: #list-eps .le-server .les-content .btn-eps.active {
123moviesfree.net: #list-eps .le-server .les-content .btn-eps.active:before {
123moviesfree.net: #list-eps .le-server .les-content .btn-eps:hover {
123moviesfree.net: #donate-paypal .modal-body form {
123moviesfree.net: #donate-paypal .modal-body form input[type=image] {
123moviesfree.net: #schedule-eps {
123moviesfree.net: #schedule-eps .se-next {
123moviesfree.net: #schedule-eps .se-next .fa-close {
123moviesfree.net: #schedule-eps .se-left {
123moviesfree.net: #schedule-eps .se-right {
123moviesfree.net: #schedule-eps .se-right a {
123moviesfree.net: #schedule-eps .se-list {
123moviesfree.net: #schedule-eps .se-list li {
123moviesfree.net: #schedule-eps .se-list li:hover {
123moviesfree.net: #schedule-eps .se-list li .se-left {
123moviesfree.net: #toggle-schedule {
123moviesfree.net: #toggle-schedule.active {
123moviesfree.net: #toggle-schedule.active .fa-close {
123moviesfree.net: #install-app {
123moviesfree.net: #install-app .container {
123moviesfree.net: #install-app .ia-icon {
123moviesfree.net: #install-app .ia-icon img {
123moviesfree.net: #install-app .ia-info {
123moviesfree.net: #install-app .ia-info .ia-title {
123moviesfree.net: #install-app .ia-info p {
123moviesfree.net: #install-app .ia-close {
123moviesfree.net: #watch-alert {}
123moviesfree.net: #watch-alert .alert {
123moviesfree.net: #switch-mode {
123moviesfree.net: #switch-mode .sm-icon {
123moviesfree.net: #switch-mode .sm-text {
123moviesfree.net: #switch-mode .sm-button {
123moviesfree.net: #switch-mode .sm-button span {
123moviesfree.net: #switch-mode.active .sm-button span {
123moviesfree.net: #switch-mode.active .sm-button {
123moviesfree.net: #homenews {
123moviesfree.net: #homenews h2 {
123moviesfree.net: #media-player{position:relative;}
elnabaa.net: # WebMatrix 1.0
shopdisney.com: # WKS 20190802 12:34
fontawesome.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
fontawesome.com: #
fontawesome.com: # To ban all spiders from the entire site uncomment the next two lines:
fontawesome.com: # User-agent: *
fontawesome.com: # Disallow: /
k73.com: #
k73.com: # robots.txt for PHPcom
k73.com: #
hdfc.com: #
hdfc.com: # robots.txt
hdfc.com: #
hdfc.com: # This file is to prevent the crawling and indexing of certain parts
hdfc.com: # of your site by web crawlers and spiders run by sites like Yahoo!
hdfc.com: # and Google. By telling these "robots" where not to go on your site,
hdfc.com: # you save bandwidth and server resources.
hdfc.com: #
hdfc.com: # This file will be ignored unless it is at the root of your host:
hdfc.com: # Used: http://example.com/robots.txt
hdfc.com: # Ignored: http://example.com/site/robots.txt
hdfc.com: #
hdfc.com: # For more information about the robots.txt standard, see:
hdfc.com: # http://www.robotstxt.org/robotstxt.html
hdfc.com: # CSS, JS, Images
hdfc.com: # Directories
hdfc.com: # Paths (clean URLs)
hdfc.com: # Sitemap
erickson.it: # Admin section
erickson.it: # CSS e JS
iminent.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
iminent.com: #content{margin:0 0 0 2%;position:relative;}
bportugal.pt: #sliding-popup.sliding-popup-bottom,#sliding-popup.sliding-popup-bottom .eu-cookie-withdraw-banner,.eu-cookie-withdraw-tab{background:#023F5A;}#sliding-popup.sliding-popup-bottom.eu-cookie-withdraw-wrapper{background:transparent}#sliding-popup .popup-content #popup-text h1,#sliding-popup .popup-content #popup-text h2,#sliding-popup .popup-content #popup-text h3,#sliding-popup .popup-content #popup-text p,.eu-cookie-compliance-secondary-button,.eu-cookie-withdraw-tab{color:#fff !important;}.eu-cookie-withdraw-tab{border-color:#fff;}.eu-cookie-compliance-more-button{color:#fff !important;}
si.edu: #
si.edu: # robots.txt
si.edu: #
si.edu: # This file is to prevent the crawling and indexing of certain parts
si.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
si.edu: # and Google. By telling these "robots" where not to go on your site,
si.edu: # you save bandwidth and server resources.
si.edu: #
si.edu: # This file will be ignored unless it is at the root of your host:
si.edu: # Used: http://example.com/robots.txt
si.edu: # Ignored: http://example.com/site/robots.txt
si.edu: #
si.edu: # For more information about the robots.txt standard, see:
si.edu: # http://www.robotstxt.org/robotstxt.html
si.edu: # CSS, JS, Images
si.edu: # Directories
si.edu: # Files
si.edu: # Paths (clean URLs)
si.edu: # Paths (no clean URLs)
pcone.com.tw: #Allow: /
pcone.com.tw: #User-agent: bingbot
pcone.com.tw: #Crawl-Delay: 1
pcone.com.tw: #User-agent: Googlebot
pcone.com.tw: #Crawl-Delay: 1
cdkeys.com: # Google Image Crawler Setup
cdkeys.com: #User-agent: Googlebot-Image
cdkeys.com: #Crawl-delay:10
cdkeys.com: # Bing Image Crawler Setup
cdkeys.com: # Crawlers Setup
cdkeys.com: # Directories
cdkeys.com: # Paths (clean URLs)
cdkeys.com: # Paths (no clean URLs)
cdkeys.com: # ga
cdkeys.com: #Disallow: /*utm_*
cdkeys.com: # Extras
jobplanet.co.kr: # Yeti
jobplanet.co.kr: # NaverBot
jobplanet.co.kr: # https://mj12bot.com/
jobplanet.co.kr: # huawei
jobplanet.co.kr: # https://aspiegel.com/petalbot
jobplanet.co.kr: # Pin code for daum 'web master tool'
shopshashi.com: # we use Shopify as our ecommerce platform
shopshashi.com: # Google adsbot ignores robots.txt unless specifically named!
ibaotu.com: #2019-08-09–fi∏ƒ∫Û#
lernsax.de: ########################################################
lernsax.de: # #
lernsax.de: # ACHTUNG: Diese Datei wird automatisch generiert. #
lernsax.de: # Manuelle Aenderungen werden ueberschrieben! #
lernsax.de: # #
lernsax.de: ########################################################
zynga.com: # Default Flywheel robots file
sophos.com: # robots.txt for www.sophos.com
sophos.com: # web server 121
sophos.com: # Sitemaps Pre
sophos.com: # Requests for Previous Versions
sophos.com: # Requested ML
sophos.com: # Requests 20-21
sophos.com: # IX Removals
sophos.com: # Company Removals
sophos.com: # Translated Removals
sophos.com: # Investors
sophos.com: # KB
sophos.com: # Request Regional Migrations
sophos.com: # Requests
sophos.com: # PDF Issues
sophos.com: # Special
sophos.com: # Search
sophos.com: # Sophos Home Microsites
sophos.com: # New Requests 20-21
sophos.com: # New Requests 21
sophos.com: # Bot Requests
cyberpuerta.mx: #Baiduspider
cyberpuerta.mx: #Yandex
cyberpuerta.mx: #20150623
cyberpuerta.mx: #Crawl-delay: 5
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: # wildcards at the end, because of some crawlers see it as errors
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #
cyberpuerta.mx: #CMS pages
cyberpuerta.mx: #Sitemap
cyberpuerta.mx: #No follow sitemap parts
nettruyen.com: # robots.txt
nationwide.co.uk: #pageBody{position:relative;z-index:1}
levi.com: #For all robots
levi.com: #Block access to specific groups of pages
levi.com: #EU markets - blocks over 2 facet combinations
levi.com: #Phase 1 SEEU - blocks all facets. Short URLs are the vanity facets, to be re-opened upon further research.
levi.com: #Phase 2 SEEU - blocks all facets. Short URLs are the vanity facets, to be re-opened upon further research.
levi.com: #Levi GB Robots.txt Test
levi.com: #US - Allow up to 3 facet combinations
levi.com: #CA - Allow up to 2 facet combinations
levi.com: #Block colorgroup facet for all EU products that aren't jeans, excludes US & CA
levi.com: #Phase 1 SEEU - block all colourgroup facets regardless of other rules
levi.com: #Phase 2 SEEU - block all colourgroup facets regardless of other rules
levi.com: #Allow colorgroup facet navigation with specific conditions for each market
levi.com: #ES - Allow for jeans only
levi.com: #PL - Allow for jeans only
levi.com: #RU - Allow for jeans only
levi.com: #Allow colorgroup facet for all markets - jean products only, except for US & CA
levi.com: #Block search facets for colourgroup
levi.com: #Block over two or more facet combinations for vaqueros
levi.com: #Block the stretch feature
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Block size facet for all markets
levi.com: #Block feature-size_group facet for all markets
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Blocks jeans product item type facet nav for EU markets
levi.com: #Phase 1 SEEU
levi.com: #Phase 2 SEEU
levi.com: #Block feature-fit facet for all markets where it includes a combination of two or more facets
levi.com: #Block feature-fit_name facet for all markets where it includes a combination of two or more facets
levi.com: #Block feature-rise facet for all markets where it includes a combination of two or more facets
levi.com: #Block plusbottoms facet for all markets
levi.com: #Block tops facet for all markets
levi.com: #Block plustops facet for all markets
levi.com: #Block bigandtalltops facet for all markets
levi.com: #Block custom facet for all markets
levi.com: #Block dressesandjumpsuits facet for all markets
levi.com: #Block feature-sustainability facet for all markets
levi.com: #Block shoes facet for all markets
levi.com: #Block underwear facet for all markets
levi.com: #Block void facet for all markets until generation can be stopped
levi.com: #Block privacy policy excess rules for all markets until generation can be stopped
levi.com: #Block averageoverallrating facet for all markets
levi.com: #Block waist length and price facets in US
levi.com: #Allow fit + color in the US
levi.com: #SEEU Phase 1
levi.com: #SEEU Phase 2
levi.com: #TBD for removal
levi.com: # Block CazoodleBot as it does not present correct accept content headers
levi.com: # Block MJ12bot as it is just noise
levi.com: # Block dotbot as it cannot parse base urls properly
levi.com: # Block Gigabot
levi.com: # Block Social Boost
www.gov.za: #
www.gov.za: # robots.txt
www.gov.za: #
www.gov.za: # This file is to prevent the crawling and indexing of certain parts
www.gov.za: # of your site by web crawlers and spiders run by sites like Yahoo!
www.gov.za: # and Google. By telling these "robots" where not to go on your site,
www.gov.za: # you save bandwidth and server resources.
www.gov.za: #
www.gov.za: # This file will be ignored unless it is at the root of your host:
www.gov.za: # Used: http://example.com/robots.txt
www.gov.za: # Ignored: http://example.com/site/robots.txt
www.gov.za: #
www.gov.za: # For more information about the robots.txt standard, see:
www.gov.za: # http://www.robotstxt.org/robotstxt.html
www.gov.za: # CSS, JS, Images
www.gov.za: # Directories
www.gov.za: # Files
www.gov.za: # Paths (clean URLs)
www.gov.za: # Paths (no clean URLs)
hasznaltauto.hu: # robots.txt, www.hasznaltauto.hu
deccanherald.com: # Directories
deccanherald.com: # Files
deccanherald.com: # Paths (clean URLs)
deccanherald.com: # Paths (no clean URLs)
deccanherald.com: # Custom paths
tripadvisor.ru: # Hi there,
tripadvisor.ru: #
tripadvisor.ru: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
tripadvisor.ru: #
tripadvisor.ru: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet?
tripadvisor.ru: #
tripadvisor.ru: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team
tripadvisor.ru: #
tripadvisor.ru: # Email seoRockstar@tripadvisor.com
tripadvisor.ru: #
tripadvisor.ru: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo
tripadvisor.ru: #
tripadvisor.ru: #
ecuavisa.com: #
ecuavisa.com: # robots.txt
ecuavisa.com: #
ecuavisa.com: # This file is to prevent the crawling and indexing of certain parts
ecuavisa.com: # of your site by web crawlers and spiders run by sites like Yahoo!
ecuavisa.com: # and Google. By telling these "robots" where not to go on your site,
ecuavisa.com: # you save bandwidth and server resources.
ecuavisa.com: #
ecuavisa.com: # This file will be ignored unless it is at the root of your host:
ecuavisa.com: # Used: http://example.com/robots.txt
ecuavisa.com: # Ignored: http://example.com/site/robots.txt
ecuavisa.com: #
ecuavisa.com: # For more information about the robots.txt standard, see:
ecuavisa.com: # http://www.robotstxt.org/robotstxt.html
ecuavisa.com: # CSS, JS, Images
ecuavisa.com: # Directories
ecuavisa.com: # Files
ecuavisa.com: # Paths (clean URLs)
ecuavisa.com: # Paths (URLs Alexa)
ecuavisa.com: # Disallow: /busqueda
ecuavisa.com: # Disallow: /taxonomy/term/*
ecuavisa.com: # Paths (URLs Adsense)
ecuavisa.com: # Paths (no clean URLs)
ecuavisa.com: # Paths not bots
ecuavisa.com: # Disallow: /taxonomy/term/*
ecuavisa.com: # Disallow: /tags/*
ecuavisa.com: # Disallow: /lo-mas-visto-de-televistazo/*
ecuavisa.com: # Disallow: /fotogaleria/todos/*
ecuavisa.com: # Disallow: /categoria/noticias/*
ecuavisa.com: # Disallow: /categoria/internacionales/*
ecuavisa.com: # Disallow: /categoria/espectaculo/*
ecuavisa.com: # Disallow: /categoria/actualidad/*
ecuavisa.com: # Fix search console
ecuavisa.com: # Disallow: /taxonomy/term/75920/all/feed
ecuavisa.com: # Disallow: /busqueda?*
ecuavisa.com: # Disallow: /taxonomy/term/
ecuavisa.com: # Disallow: /fotogaleria/todos*
ecuavisa.com: # Disallow: /lo-mas-leido/*
heureka.sk: # Webmasters contact: seo@heureka.cz
heureka.sk: #SearchRelated
heureka.sk: #Bugs
heureka.sk: # Filters
heureka.sk: # Rating
heureka.sk: # Ordering
heureka.sk: ###
heureka.sk: # Wait list
heureka.sk: ###
heureka.sk: #
heureka.sk: # User-agent: PetalBot (Huawei search engine)
heureka.sk: # Disallow: /
heureka.sk: #
heureka.sk: ###
igihe.com: # robots.txt
igihe.com: # @url: http://igihe.com
igihe.com: # @generator: SPIP 3.1.10 [24286]
igihe.com: # @template: squelettes-dist/robots.txt.html
cmcmarkets.com: # instrument changes start
cmcmarkets.com: # instruments changes end
kiwilimon.com: # Chefs autorizados
kiwilimon.com: # Fin Chefs autorizados
heureka.cz: # Webmasters contact: seo@heureka.cz
heureka.cz: #SearchRelated
heureka.cz: #Bugs
heureka.cz: # Filters
heureka.cz: # Rating
heureka.cz: # Ordering
heureka.cz: ###
heureka.cz: # Wait list
heureka.cz: ###
heureka.cz: #
heureka.cz: # User-agent: PetalBot (Huawei search engine)
heureka.cz: # Disallow: /
heureka.cz: #
heureka.cz: ###
cnrs.fr: #
cnrs.fr: # robots.txt
cnrs.fr: #
cnrs.fr: # This file is to prevent the crawling and indexing of certain parts
cnrs.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
cnrs.fr: # and Google. By telling these "robots" where not to go on your site,
cnrs.fr: # you save bandwidth and server resources.
cnrs.fr: #
cnrs.fr: # This file will be ignored unless it is at the root of your host:
cnrs.fr: # Used: http://example.com/robots.txt
cnrs.fr: # Ignored: http://example.com/site/robots.txt
cnrs.fr: #
cnrs.fr: # For more information about the robots.txt standard, see:
cnrs.fr: # http://www.robotstxt.org/robotstxt.html
cnrs.fr: # CSS, JS, Images
cnrs.fr: # Directories
cnrs.fr: # Files
cnrs.fr: # Paths (clean URLs)
cnrs.fr: # Paths (no clean URLs)
atterley.com: # Google Image Crawler Setup
goosedefi.com: # https://www.robotstxt.org/robotstxt.html
enedis.fr: #
enedis.fr: # robots.txt
enedis.fr: #
enedis.fr: # This file is to prevent the crawling and indexing of certain parts
enedis.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
enedis.fr: # and Google. By telling these "robots" where not to go on your site,
enedis.fr: # you save bandwidth and server resources.
enedis.fr: #
enedis.fr: # This file will be ignored unless it is at the root of your host:
enedis.fr: # Used: http://example.com/robots.txt
enedis.fr: # Ignored: http://example.com/site/robots.txt
enedis.fr: #
enedis.fr: # For more information about the robots.txt standard, see:
enedis.fr: # http://www.robotstxt.org/robotstxt.html
enedis.fr: # CSS, JS, Images
enedis.fr: # Directories
enedis.fr: # Files
enedis.fr: # Paths (clean URLs)
enedis.fr: # Paths (no clean URLs)
vertex42.com: #User-agent: *
vertex42.com: #Disallow: /blog/wp-admin
vertex42.com: #Disallow: /blog/trackback
vertex42.com: #Disallow: /blog/cgi-bin
vertex42.com: #Disallow: /blog/search
vertex42.com: #Disallow: /blog/rss
vertex42.com: #Disallow: /blog/tag/*
vertex42.com: #Disallow: /blog/tag
vertex42.com: #Disallow: /blog/comments/feed
vertex42.com: #Disallow: /blog/comments
vertex42.com: #Disallow: /blog/login/
vertex42.com: #Disallow: /blog/feed
vertex42.com: #Disallow: /blog/feed/$
vertex42.com: #Disallow: /blog/*/feed/$
vertex42.com: #Disallow: /blog/*/feed/rss/$
vertex42.com: #Disallow: /blog/*/trackback/$
vertex42.com: #Disallow: /blog/wp-login.php
vertex42.com: #Disallow: /blog/*wp-login.php*
vertex42.com: # Disallow Collectors and Spam
vertex42.com: # Disallow Offline Browsers
acnestudios.com: #robots.txt for http://www.acnestudios.com
acnestudios.com: #My Account section
acnestudios.com: #Cart page
acnestudios.com: #Checkout pages
acnestudios.com: #Gift pages
acnestudios.com: #Old collections
acnestudios.com: #Old peterschlesinger
acnestudios.com: #Sale pages
acnestudios.com: #Old Personalisation page
acnestudios.com: #Old refinements
acnestudios.com: #Old homepage
acnestudios.com: #site switcher pages
acnestudios.com: #search pages
acnestudios.com: #locales
acnestudios.com: #Sitemap files
sensortower.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
umd.edu: #
umd.edu: # robots.txt
umd.edu: #
umd.edu: # This file is to prevent the crawling and indexing of certain parts
umd.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
umd.edu: # and Google. By telling these "robots" where not to go on your site,
umd.edu: # you save bandwidth and server resources.
umd.edu: #
umd.edu: # This file will be ignored unless it is at the root of your host:
umd.edu: # Used: http://example.com/robots.txt
umd.edu: # Ignored: http://example.com/site/robots.txt
umd.edu: #
umd.edu: # For more information about the robots.txt standard, see:
umd.edu: # http://www.robotstxt.org/robotstxt.html
umd.edu: # CSS, JS, Images
umd.edu: # Directories
umd.edu: # Files
umd.edu: # Paths (clean URLs)
umd.edu: # Paths (no clean URLs)
home.kpmg: # Version 2020.10.22
home.kpmg: # home.kpmg
arbetsformedlingen.se: #Disallow: /*91.*
vivaaerobus.com: # Exclude Files From All Robots:
vivaaerobus.com: # SPANISH SITE
vivaaerobus.com: # ENGLISH SITE
vivaaerobus.com: # SITEMAPS
vivaaerobus.com: # End robots.txt file
jobhero.com: #COST-2205
fuq.com: # www.robotstxt.org/
fuq.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
time.is: # Frequent, automatic reloading of Time.is is not allowed!
time.is: # If you want to reload Time.is automatically, please use a refresh interval of 1 hour or more.
time.is: # Time.is is made for humans. Automatic refresh and any usage from within scripts and apps is forbidden.
time.is: # If you need time synchronization for your app, please contact us about our API.
time.is: #Disallow: /*_2010*
time.is: #Disallow: /*Jan_2011*
time.is: #Disallow: /0*
time.is: #Disallow: /1*
time.is: #Disallow: /200*
time.is: #Disallow: *.js
time.is: #Disallow: /*/facts/
time.is: # maximum rate is one page every 5 seconds
time.is: #
time.is: # Yahoo Pipes is for feeds not web pages.
time.is: #
depop.com: #Prevent Bot Crawl of applied search filters
juzimi.com: # $Id: robots.txt,v 1.9.2.2 2010/09/06 10:37:16 goba Exp $
juzimi.com: #
juzimi.com: # robots.txt
juzimi.com: #
juzimi.com: # This file is to prevent the crawling and indexing of certain parts
juzimi.com: # of your site by web crawlers and spiders run by sites like Yahoo!
juzimi.com: # and Google. By telling these "robots" where not to go on your site,
juzimi.com: # you save bandwidth and server resources.
juzimi.com: #
juzimi.com: # This file will be ignored unless it is at the root of your host:
juzimi.com: # Used: http://example.com/robots.txt
juzimi.com: # Ignored: http://example.com/site/robots.txt
juzimi.com: #
juzimi.com: # For more information about the robots.txt standard, see:
juzimi.com: # http://www.robotstxt.org/wc/robots.html
juzimi.com: #
juzimi.com: # For syntax checking, see:
juzimi.com: # http://www.sxw.org.uk/computing/robots/check.html
juzimi.com: # Directories
juzimi.com: # Files
juzimi.com: # Paths (clean URLs)
juzimi.com: # Paths (no clean URLs)
nesine.com: # robots.txt for https://www.nesine.com/
galaxus.ch: # @/ @/
galaxus.ch: # @/ @/ Hello, fellow humans!
galaxus.ch: # @/ @/
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@
galaxus.ch: # @@@@@@ @@@@@@@@@@@@ @@@@@@ @@@% @
galaxus.ch: # @@@@@ /@@@@@@@@@@ @@@@@@ @@@ @@
galaxus.ch: # @@@@@@ @@@@@@@@@@@, @@@@@@ @@@@ @@@@
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@
galaxus.ch: # @@@@@@@@@@@@@ @@@@@@@@@@@@@@ @@@@@@@@
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@
galaxus.ch: # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@
bestprice.gr: # Ref: pricegrabber.com/robots.txt
extra.com: # For all robots
extra.com: # Sitemap files
extra.com: # 530506 / 2019
extra.com: ##743301 / 2019
extra.com: #CS20200000366474
extra.com: #Blocking Base URL
mudah.my: # It is expressly forbidden to use spiders or other
mudah.my: # automated methods to access mudah.my. Only if mudah.my
mudah.my: # has given special permit such access is allowed.
mudah.my: ## Google
mudah.my: # all
mudah.my: #Disallow: /aw
mudah.my: #Disallow: /st
mudah.my: #Google Doubleclick gpt network ID
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time)
mudah.my: # all
mudah.my: #Disallow: /aw
mudah.my: #Disallow: /st
mudah.my: # all
mudah.my: #Disallow: /aw
mudah.my: #Disallow: /st
mudah.my: # all
mudah.my: #Disallow: /aw
mudah.my: #Disallow: /st
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time)
mudah.my: # all
mudah.my: #Disallow: /aw
mudah.my: #Disallow: /st
mudah.my: ## Yahoo
mudah.my: # all
mudah.my: #Disallow: /aw
mudah.my: #Disallow: /st
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time)
mudah.my: # all
mudah.my: #Disallow: /aw
mudah.my: #Disallow: /st
mudah.my: #Visit-time: 2000-2359 # 04:00-08:00 in Malaysia Time (non-peak time)
demae-can.com: #testchains
demae-can.com: #testshops
demae-can.com: #testshop_detail
tctelevision.com: # robots.txt for https://www.tctelevision.com/
tctelevision.com: # live - don't allow web crawlers to index cpresources/ or vendor/
consolegameswiki.com: #Bad Bots
consolegameswiki.com: # Crawlers that are kind enough to obey, but which we'd rather not have
consolegameswiki.com: # unless they're feeding search engines.
consolegameswiki.com: # Some bots are known to be trouble, particularly those designed to copy
consolegameswiki.com: # entire sites. Please obey robots.txt.
consolegameswiki.com: #
consolegameswiki.com: # Sorry, wget in its recursive mode is a frequent problem.
consolegameswiki.com: # Please read the man page and use it properly; there is a
consolegameswiki.com: # --wait option you can use to set the delay between hits,
consolegameswiki.com: # for instance.
consolegameswiki.com: #
consolegameswiki.com: #
consolegameswiki.com: # Doesn't follow robots.txt anyway, but...
consolegameswiki.com: #
consolegameswiki.com: #
consolegameswiki.com: # Hits many times per second, not acceptable
consolegameswiki.com: # http://www.nameprotect.com/botinfo.html
consolegameswiki.com: # A capture bot, downloads gazillions of pages with no public benefit
consolegameswiki.com: # http://www.webreaper.net/
consolegameswiki.com: #Allow!
athenahealth.com: #
athenahealth.com: # robots.txt
athenahealth.com: #
athenahealth.com: # This file is to prevent the crawling and indexing of certain parts
athenahealth.com: # of your site by web crawlers and spiders run by sites like Yahoo!
athenahealth.com: # and Google. By telling these "robots" where not to go on your site,
athenahealth.com: # you save bandwidth and server resources.
athenahealth.com: #
athenahealth.com: # This file will be ignored unless it is at the root of your host:
athenahealth.com: # Used: http://example.com/robots.txt
athenahealth.com: # Ignored: http://example.com/site/robots.txt
athenahealth.com: #
athenahealth.com: # For more information about the robots.txt standard, see:
athenahealth.com: # http://www.robotstxt.org/robotstxt.html
athenahealth.com: # CSS, JS, Images
athenahealth.com: #Allow: /core/*.css?
athenahealth.com: #Allow: /core/*.css$
athenahealth.com: #Allow: /core/*.js$
athenahealth.com: #Allow: /core/*.js?
athenahealth.com: #Allow: /core/*.gif
athenahealth.com: #Allow: /core/*.jpg
athenahealth.com: #Allow: /core/*.jpeg
athenahealth.com: #Allow: /core/*.png
athenahealth.com: #Allow: /core/*.svg
athenahealth.com: #Allow: /profiles/*.css$
athenahealth.com: #Allow: /profiles/*.css?
athenahealth.com: #Allow: /profiles/*.js$
athenahealth.com: #Allow: /profiles/*.js?
athenahealth.com: #Allow: /profiles/*.gif
athenahealth.com: #Allow: /profiles/*.jpg
athenahealth.com: #Allow: /profiles/*.jpeg
athenahealth.com: #Allow: /profiles/*.png
athenahealth.com: #Allow: /profiles/*.svg
athenahealth.com: # Directories
athenahealth.com: # Files
athenahealth.com: # Paths (clean URLs)
athenahealth.com: # Paths (no clean URLs)
athenahealth.com: # D7 Paths
athenahealth.com: # /robots.txt file for http://www.athenahealth.com
athenahealth.com: # User Agent Exclusion (Legacy site)
dataquest.io: # Sitemap for pages (landing pages)
dataquest.io: # Sitemap for Blog posts
al-maktaba.org: #container {
mendeley.com: # Careers: CSS, JS, Images
mendeley.com: # Careers: Directories
mendeley.com: # Careers: Files
mendeley.com: # Careers: Paths (clean URLs)
mendeley.com: # Careers: Paths (no clean URLs)
rxlist.com: #
rxlist.com: # robots.txt for MedicineNet, Inc. Properties
rxlist.com: #
perfect-english-grammar.com: # perfect-english-grammar.com (Fri Oct 27 12:46:12 2017)
kth.se: #
kth.se: # robots.txt from 17.392
kth.se: #
loc.gov: #Baiduspider
rentcafe.com: #robots.txt document for http://www.rentcafe.com/robots.txt
olx.kz: # sitecode:olxkz-desktop
film2movie.asia: # BEGIN XML-SITEMAP-PLUGIN
film2movie.asia: # END XML-SITEMAP-PLUGIN
shorouknews.com: #Baiduspider
shorouknews.com: #User-agent: Baiduspider
shorouknews.com: #Disallow: /
redalyc.org: # Google Image
redalyc.org: # Google AdSense
getvideo.org: #
getvideo.org: # robots.txt
getvideo.org: #
getvideo.org: # This file is to prevent the crawling and indexing of certain parts
getvideo.org: # of your site by web crawlers and spiders run by sites like Yahoo!
getvideo.org: # and Google. By telling these "robots" where not to go on your site,
getvideo.org: # you save bandwidth and server resources.
getvideo.org: #
getvideo.org: # This file will be ignored unless it is at the root of your host:
getvideo.org: # Used: http://example.com/robots.txt
getvideo.org: # Ignored: http://example.com/site/robots.txt
getvideo.org: #
getvideo.org: # For more information about the robots.txt standard, see:
getvideo.org: # http://www.robotstxt.org/wc/robots.html
getvideo.org: #
getvideo.org: # For syntax checking, see:
getvideo.org: # http://www.sxw.org.uk/computing/robots/check.html
getvideo.org: # Directories
getvideo.org: # Directories
iwara.tv: #
iwara.tv: # robots.txt
iwara.tv: #
iwara.tv: # This file is to prevent the crawling and indexing of certain parts
iwara.tv: # of your site by web crawlers and spiders run by sites like Yahoo!
iwara.tv: # and Google. By telling these "robots" where not to go on your site,
iwara.tv: # you save bandwidth and server resources.
iwara.tv: #
iwara.tv: # This file will be ignored unless it is at the root of your host:
iwara.tv: # Used: http://example.com/robots.txt
iwara.tv: # Ignored: http://example.com/site/robots.txt
iwara.tv: #
iwara.tv: # For more information about the robots.txt standard, see:
iwara.tv: # http://www.robotstxt.org/robotstxt.html
iwara.tv: # CSS, JS, Images
iwara.tv: # Directories
iwara.tv: # Files
iwara.tv: # Paths (clean URLs)
iwara.tv: # Paths (no clean URLs)
lucid.app: #
lucid.app: # robots.txt
lucid.app: #
lucid.app: # This file is to prevent the crawling and indexing of certain parts
lucid.app: # of your site by web crawlers and spiders run by sites like Yahoo!
lucid.app: # and Google. By telling these "robots" where not to go on your site,
lucid.app: # you save bandwidth and server resources.
lucid.app: #
lucid.app: # This file will be ignored unless it is at the root of your host:
lucid.app: # Used: http://example.com/robots.txt
lucid.app: # Ignored: http://example.com/site/robots.txt
lucid.app: #
lucid.app: # For more information about the robots.txt standard, see:
lucid.app: # http://www.robotstxt.org/wc/robots.html
lucid.app: #
lucid.app: # For syntax checking, see:
lucid.app: # http://www.sxw.org.uk/computing/robots/check.html
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/bulletins_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/coupon_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/domainhub_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/lists_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/neighbor_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/neighborhood_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/newsletter_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/pictures_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/product_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/promotion_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/shoutout_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/static_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/vanity_index.xml
merchantcircle.com: #Sitemap: https://www.merchantcircle.com/autos_index.xml
141jav.com: #Disallow: /hot/
yoast.com: # This space intentionally left blank
yoast.com: # If you want to learn about why our robots.txt looks like this, read this post: https://yoa.st/robots-txt
travian.com: # robots.txt für travian.com
csun.edu: #
csun.edu: # robots.txt
csun.edu: #
csun.edu: # This file is to prevent the crawling and indexing of certain parts
csun.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
csun.edu: # and Google. By telling these "robots" where not to go on your site,
csun.edu: # you save bandwidth and server resources.
csun.edu: #
csun.edu: # This file will be ignored unless it is at the root of your host:
csun.edu: # Used: http://example.com/robots.txt
csun.edu: # Ignored: http://example.com/site/robots.txt
csun.edu: #
csun.edu: # For more information about the robots.txt standard, see:
csun.edu: # http://www.robotstxt.org/robotstxt.html
csun.edu: # CSS, JS, Images
csun.edu: # Custom Disallows
csun.edu: # Directories
csun.edu: # Files
csun.edu: # Paths (clean URLs)
csun.edu: # Paths (no clean URLs)
builtin.com: #
builtin.com: # robots.txt
builtin.com: #
builtin.com: # This file is to prevent the crawling and indexing of certain parts
builtin.com: # of your site by web crawlers and spiders run by sites like Yahoo!
builtin.com: # and Google. By telling these "robots" where not to go on your site,
builtin.com: # you save bandwidth and server resources.
builtin.com: #
builtin.com: # This file will be ignored unless it is at the root of your host:
builtin.com: # Used: http://example.com/robots.txt
builtin.com: # Ignored: http://example.com/site/robots.txt
builtin.com: #
builtin.com: # For more information about the robots.txt standard, see:
builtin.com: # http://www.robotstxt.org/robotstxt.html
builtin.com: # CSS, JS, Images
builtin.com: # Directories
builtin.com: # Files
builtin.com: # Paths (clean URLs)
builtin.com: # Paths (no clean URLs)
builtin.com: # Company Directory Paths
builtin.com: # Mirrored Company Profiles on builtin.com
darademo.wordpress.com: # This file was generated on Sat, 10 Oct 2020 02:19:50 +0000
home.blog: # This file was generated on Tue, 31 Mar 2020 18:03:55 +0000
vivo.com.br: #
vivo.com.br: #
vivo.com.br: # robots.txt
vivo.com.br: #
vivo.com.br: # This file is to prevent the crawling and indexing of certain parts
vivo.com.br: # of your site by web crawlers and spiders run by sites like Yahoo!
vivo.com.br: # and Google. By telling these "robots" where not to go on your site,
vivo.com.br: # you save bandwidth and server resources.
vivo.com.br: #
vivo.com.br: # This file will be ignored unless it is at the root of your host:
vivo.com.br: # Used: http://example.com/robots.txt
vivo.com.br: # Ignored: http://example.com/site/robots.txt
vivo.com.br: #
vivo.com.br: # For more information about the robots.txt standard, see:
vivo.com.br: # http://www.robotstxt.org/robotstxt.html
vivo.com.br: # Directories
vivo.com.br: # Files
vivo.com.br: # Paths (clean URLs)
vivo.com.br: # Paths (no clean URLs)
jira.com: # JIRA:
jira.com: # Disallow all SearchRequestViews in the IssueNavigator (Word, XML, RSS,
jira.com: # etc), all IssueViews (XML, Printable and Word), all charts and reports.
jira.com: # Disallow admin.
jira.com: #
jira.com: # Confluence:
jira.com: # Confluence uses in-page robot exclusion tags for non-indexable pages.
jira.com: # Disallow admin explicitly.
jira.com: #
jira.com: # General:
jira.com: # Disallow login, logout
manualslib.com: #Baiduspider
manualslib.com: ## Added by PTN
eprice.com.tw: #allow: /ad/redir.html
travelocity.com: #
travelocity.com: # General bots
travelocity.com: #
travelocity.com: #hotel
travelocity.com: #flight
travelocity.com: #package
travelocity.com: #car
travelocity.com: #activities
travelocity.com: #cruise
travelocity.com: #other
travelocity.com: #
travelocity.com: # Google Ads
travelocity.com: #
travelocity.com: #
travelocity.com: #
travelocity.com: # Bing Ads
travelocity.com: #
travelocity.com: #
travelocity.com: # SemrushBot
travelocity.com: #
boulanger.com: # BOULANGER.COM
boulanger.com: # Robot Exclusion File -- robots.txt
boulanger.com: # Last Updated: 22/02/2021
boulanger.com: # Disallow
boulanger.com: # Fichiers & Scripts
boulanger.com: #Mon compte
boulanger.com: # Sitemap files
upbit.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
president.az: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
president.az: #
president.az: # To ban all spiders from the entire site uncomment the next two lines:
president.az: # User-Agent: *
president.az: # Disallow: /
bejson.com: # robots.txt generated at http://www.bejson.com
poxiao.com: #
poxiao.com: # robots.txt for EmpireCMS
poxiao.com: #
avvo.com: # :@HA##@@@@@@s S@@@HHHAGGAhAMHA&&AB@@
avvo.com: # i@AHHH#@@@@@@@i. ;@@@#BA&HH&X23B#B&&HAH@@
avvo.com: # ##99hAHH#@@@@@@@2, , ,::. . ,::. ,, &@@@#BHAA&A&AGSX##HBBHB@#
avvo.com: # #3hh&hXXA@@@@@@@@@S :rSis;:;isrsssirrsriSs: :@@@@#MAGGAA&Ahh9sA@##BB#@s
avvo.com: # sG2Ssi2322A#@@@@@@@@ ,si;, ,: .,. ...:s5iSr 3@@@##M&322X339AGh5X#@MB#@@
avvo.com: # sHhSissiSisS53M#@AXA25i,..... . .. . ..,:,s5rrr9@@@###AhhX5SSiS5GhGG@@MM#@@
avvo.com: # .A3i2irrrrsiisis:;::;, ,..:,. .:, ,,:..:.;rrssAAX#H&AH#M##BA&hi59G##MMMM@r
avvo.com: # ;SsiS523XSsr;,;, ,,.. .,:;r: ..,., .;r:.,,,,;r;;rrr;2M##@@@###@BXM#MBHBM@#
avvo.com: # ii3hh9XSri;,. ., ....,,;rr, . ,. r,.:s;:,,,,:,,::,,sSsShAAHMMMMMBM######@.
avvo.com: # rAHAAGSr, . .....,.,::r;,,:,r;;;::;Ss;:,::,,,.,,;;,,;;iGAM####BHB###M@2
avvo.com: # A#&ir,.. .,,:,:::;;si.,;;srr;:rsir;r;:;:::::::,,:,:ri5GHM@@##MA&A@@
avvo.com: # @@#9s. ...,,,;r;:rr;srii,;:;2rrir22S;;rrrr;rr;;::::::;;;sis5M@@##MBB@S
avvo.com: # #@#Bi, .,,:;;;;srr22iiri3i;rr5iihii3X5isiisSisr;;:;::::::::,;G#@##@@#@s
avvo.com: # @@#s.,:;r;rrsSirXXh2irSX#X:r;r2&AXS2hGA2iSXSrrr;;;::::::::;;2#@@@@##@@,
avvo.com: # #i::;sisiS25ssXG#Bir53A#2,:sSGBAXXS&MB925iisisrr;;;;;:;;;;rS3#@###@@X
avvo.com: # ri;:s52Siii232hB##HS293MH;:s,2@HH93X3#MA322AGXirrrr;;;;;;rrrr9MB#@@@r
avvo.com: # ;X;r2XX2Siiii52M@@@B593hMi,Sh ;#A93hHAM@@BGABGX5iiisrrrr;rrrrs&@@@@@@:
avvo.com: # .3rri3G&X2XXXXS2&@@#AhAh#2.rGG; iB3G9M#@@@M3AG93933X3225ssrrr;;r3M@@@@s
avvo.com: # rrs2ABB23M2;X@@H9A#B&AAH3,,i2i, :@#HM#@@@##B&h&h2Xh9hGAGG2rrrr;::iGM@@@
avvo.com: # ;;5B@#HrAB,,#@@@h :A#B99i ,;;. ;#@#@@@#@@@@@@MBAGhHAA#@#95srr;;::;iB@@
avvo.com: # :rhMG93i##i9@@@@5 9H#MM2 . .,5@@#@@@#@BA@@@@@#3:9@MH@@@@H2ir;;:::r&@
avvo.com: # :;i22X2S&@#AA@@2 3G9@@@X. .,;iH@@@@@##A2G@@@@@@&sX@hAHB@@@#hisr;;;;rX2
avvo.com: # .,:r53hhXXA#M&9hXShBSMAHs ,:2B#@@@MAH2;i#@@@@@AG#HhHHG&ABBHXSirrrrrs3
avvo.com: # ;;,rXABH&9X5X93XGH#hs#hr ..,,;2&@@#MhAAi;r9##B&H#BAAA&GGGG93X2irrrrrs5:
avvo.com: # .:,r5GHBAA9XSi29G&M##Br ...,;:, :s#@@#M###&2239GBBA&HA&GAGGG2iS55irrr;;S@.
avvo.com: # .,.:rS9AA&AAhh&A&AHM#S ,,,,::... ,2@@@@@@M#MHB#BBBMMM##MHAA92isiiiirr;:rH;
avvo.com: # .;,,:;rs239hh9&h9h99h; .::,::,,,.,:X@@@BMHBHBB#BB####MBHGX225irrrrrrr;:;X3
avvo.com: # .;.,,:rssiiSSiissssr: .:;;:;:,,,. ,2AHHHHHBAHHHHAHHA&h9933X2irr;rr;;;::SH,
avvo.com: # ,:.,::;rrssrrr;r;, ;:rS;ir;2Xr;5S ;XAh3&AAG&A&GG&&&9X222Sisrr;rr;;:::r&.
avvo.com: # r:.,,,,::;;;;;:, .;AXsis@Hsrr: .rXGh92SXh32SSS2XX5Siisr;r;;;:::,:SB
avvo.com: # ;:. ...,:,:::, .iAh;, .. ,;XMHhXXX22555225Ssssrrr;;;:::,.:29;
avvo.com: # rr. ..,::;: .....,;rGh:..,.,:,:,,, .rS9A325SS5iisrsssssr;;;:::,. .rA5
avvo.com: # :r,... ..,,,:, ..,,,;rrrriX25srr;s;;;::,.. ,;rssrrriisrrrsiiisr;::,,,....;Gh
avvo.com: # r,. .,:::,. .,,:r;,.,,,;rsXhh2ir;r;:,:,,:;rr;::;;rrr;rrrrr;::,,,.. ..:2A
avvo.com: # r, .::::,.,,;r,. .::;:;ri3AXssrr;:::;rrrr;;sssiis;;:,,,,,,,.. . .;As
avvo.com: # r: .,;:. . ,:::;;:;rXXir;;;;;rsr;rssrssrrr;::,,,.... .....,i#r
avvo.com: # ,i,. .:,,,.. .:;::;r;;;sis;:;:r;:;risrr;;:,,::,,,,. .......,rAA
avvo.com: # :i:. .,:;;;:::. .,,,,;::::;;::;:,;;:;r;;::::,,.,,,........,...,,:rA9
avvo.com: # 2r,. .,,:;;;r;;, ..,,,,::;;::;;:::::::,,,,.......,.......,,,:rA;
avvo.com: # .AX;:.. ...,::;;;;::,. .,,,::::;,:::,,:,,:,,,................,,,,r2
avvo.com: # ,A9r:,,.. ..,..,,,::::;;::,.... , .,,,,,.,,.,,.,.,,....,..... ......,,,,:i9.
avvo.com: # You're not a robot. Why are you snooping around here?
avvo.com: # This might be a better use of your time, human: avvo.com/about_avvo/jobs
avvo.com: # If you've ever built a great product, started a bustling community, or told an intriguing story, we want to hear from you.
avvo.com: # kbai,
avvo.com: # Team Avvo
hessen.de: #
hessen.de: # robots.txt
hessen.de: #
hessen.de: # This file is to prevent the crawling and indexing of certain parts
hessen.de: # of your site by web crawlers and spiders run by sites like Yahoo!
hessen.de: # and Google. By telling these "robots" where not to go on your site,
hessen.de: # you save bandwidth and server resources.
hessen.de: #
hessen.de: # This file will be ignored unless it is at the root of your host:
hessen.de: # Used: http://example.com/robots.txt
hessen.de: # Ignored: http://example.com/site/robots.txt
hessen.de: #
hessen.de: # For more information about the robots.txt standard, see:
hessen.de: # http://www.robotstxt.org/wc/robots.html
hessen.de: #
hessen.de: # For syntax checking, see:
hessen.de: # http://www.sxw.org.uk/computing/robots/check.html
hessen.de: # Directories
hessen.de: # Files
hessen.de: # Paths (clean URLs)
hessen.de: # Paths (no clean URLs)
hessen.de: #ChM-00000411067
ftc.gov: #
ftc.gov: # robots.txt
ftc.gov: #
ftc.gov: # This file is to prevent the crawling and indexing of certain parts
ftc.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
ftc.gov: # and Google. By telling these "robots" where not to go on your site,
ftc.gov: # you save bandwidth and server resources.
ftc.gov: #
ftc.gov: # This file will be ignored unless it is at the root of your host:
ftc.gov: # Used: http://example.com/robots.txt
ftc.gov: # Ignored: http://example.com/site/robots.txt
ftc.gov: #
ftc.gov: # For more information about the robots.txt standard, see:
ftc.gov: # http://www.robotstxt.org/robotstxt.html
ftc.gov: # CSS, JS, Images
ftc.gov: # Directories
ftc.gov: # Files
ftc.gov: # Paths (clean URLs)
ftc.gov: # Paths (no clean URLs)
ftc.gov: # For link-checking site crawlers.
comic-walker.com: # Google News Robot
comic-walker.com: # Google Search Engine Robot
comic-walker.com: # Yahoo! Search Engine Robot
comic-walker.com: # Microsoft Search Engine Robot
comic-walker.com: # Yandex Search Engine Robot
comic-walker.com: # Other crawller or bot that might possibly access or crawling respect below
comic-walker.com: # Sitemap
fbi.gov: # Define access-restrictions for robots/spiders
fbi.gov: # http://www.robotstxt.org/wc/norobots.html
fbi.gov: # By default we allow robots to access all areas of our site
fbi.gov: # already accessible to anonymous users
fbi.gov: # Add Googlebot-specific syntax extension to exclude forms
fbi.gov: # that are repeated for each piece of content in the site
fbi.gov: # the wildcard is only supported by Googlebot
fbi.gov: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling
concursolutions.com: # robots.txt for myouttask.com - there is nothing here for a search engine
img2go.com: # www.robotstxt.org/
img2go.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
airarabia.com: #
airarabia.com: # robots.txt
airarabia.com: #
airarabia.com: # This file is to prevent the crawling and indexing of certain parts
airarabia.com: # of your site by web crawlers and spiders run by sites like Yahoo!
airarabia.com: # and Google. By telling these "robots" where not to go on your site,
airarabia.com: # you save bandwidth and server resources.
airarabia.com: #
airarabia.com: # This file will be ignored unless it is at the root of your host:
airarabia.com: # Used: http://example.com/robots.txt
airarabia.com: # Ignored: http://example.com/site/robots.txt
airarabia.com: #
airarabia.com: # For more information about the robots.txt standard, see:
airarabia.com: # http://www.robotstxt.org/robotstxt.html
airarabia.com: # CSS, JS, Images
airarabia.com: # Directories
airarabia.com: # Files
airarabia.com: # Paths (clean URLs)
airarabia.com: # Paths (no clean URLs)
biography.com: # Tempest - biography
trinasolar.com: #
trinasolar.com: # robots.txt
trinasolar.com: #
trinasolar.com: # This file is to prevent the crawling and indexing of certain parts
trinasolar.com: # of your site by web crawlers and spiders run by sites like Yahoo!
trinasolar.com: # and Google. By telling these "robots" where not to go on your site,
trinasolar.com: # you save bandwidth and server resources.
trinasolar.com: #
trinasolar.com: # This file will be ignored unless it is at the root of your host:
trinasolar.com: # Used: http://example.com/robots.txt
trinasolar.com: # Ignored: http://example.com/site/robots.txt
trinasolar.com: #
trinasolar.com: # For more information about the robots.txt standard, see:
trinasolar.com: # http://www.robotstxt.org/robotstxt.html
trinasolar.com: # Directories
trinasolar.com: # Files
trinasolar.com: # Paths (clean URLs)
trinasolar.com: # Paths (no clean URLs)
freebcc.org: #notfound {
freebcc.org: #notfound .notfound {
zanerobe.com: # we use Shopify as our ecommerce platform
zanerobe.com: # Google adsbot ignores robots.txt unless specifically named!
coolmathgames.com: #
coolmathgames.com: # robots.txt
coolmathgames.com: #
coolmathgames.com: # This file is to prevent the crawling and indexing of certain parts
coolmathgames.com: # of your site by web crawlers and spiders run by sites like Yahoo!
coolmathgames.com: # and Google. By telling these "robots" where not to go on your site,
coolmathgames.com: # you save bandwidth and server resources.
coolmathgames.com: #
coolmathgames.com: # This file will be ignored unless it is at the root of your host:
coolmathgames.com: # Used: http://example.com/robots.txt
coolmathgames.com: # Ignored: http://example.com/site/robots.txt
coolmathgames.com: #
coolmathgames.com: # For more information about the robots.txt standard, see:
coolmathgames.com: # http://www.robotstxt.org/robotstxt.html
coolmathgames.com: # CSS, JS, Images
coolmathgames.com: # Directories
coolmathgames.com: # Files
coolmathgames.com: # Paths (clean URLs)
coolmathgames.com: # Paths (no clean URLs)
digikey.com: # Google-Adsbot
digikey.com: # all crawlers
digikey.com: # Sitemaps
braze.com: # robots.txt for https://www.braze.com/
braze.com: # live - don't allow web crawlers to index cpresources/ or vendor/
linguee.es: # In ANY CASE, you are NOT ALLOWED to train Machine Translation Systems
linguee.es: # on data crawled on Linguee.
linguee.es: #
linguee.es: # Linguee contains fake entries - changes in the wording of sentences,
linguee.es: # complete fake entries.
linguee.es: # These entries can be used to identify even small parts of our material
linguee.es: # if you try to copy it without our permission.
linguee.es: # Machine Translation systems trained on these data will learn these errors
linguee.es: # and can be identified easily. We will take all legal measures against anyone
linguee.es: # training Machine Translation systems on data crawled from this website.
lamiareport.gr: # If the Joomla site is installed within a folder such as at
lamiareport.gr: # e.g. www.example.com/joomla/ the robots.txt file MUST be
lamiareport.gr: # moved to the site root at e.g. www.example.com/robots.txt
lamiareport.gr: # AND the joomla folder name MUST be prefixed to the disallowed
lamiareport.gr: # path, e.g. the Disallow rule for the /administrator/ folder
lamiareport.gr: # MUST be changed to read Disallow: /joomla/administrator/
lamiareport.gr: #
lamiareport.gr: # For more information about the robots.txt standard, see:
lamiareport.gr: # http://www.robotstxt.org/orig.html
lamiareport.gr: #
lamiareport.gr: # For syntax checking, see:
lamiareport.gr: # http://www.sxw.org.uk/computing/robots/check.html
lamiareport.gr: #Disallow: /images/
lamiareport.gr: #Disallow: /media/
lamiareport.gr: #Disallow: /templates/
flightaware.com: #
flightaware.com: # robots.txt for flightaware.com hosted by ahock.hou.flightaware.com
flightaware.com: #
flightaware.com: #
flightaware.com: # Specific unwanted clients
flightaware.com: #
flightaware.com: #
flightaware.com: # Command line recursive requests as well as automated fetching from the non-
flightaware.com: # exportable data is not acceptable.
flightaware.com: #
flightaware.com: # See:
flightaware.com: # https://flightaware.com/about/termsofuse
flightaware.com: # https://flightaware.com/commercial/flightxml/
flightaware.com: #
flightaware.com: #
flightaware.com: # General robot rules
flightaware.com: #
flightaware.com: #
flightaware.com: # Stop Applebot from beating the crap out of ajax endpoints (specifically the
flightaware.com: # static flight map one)
flightaware.com: #
flightaware.com: # Allow Twitter to grab article and careers blobs
bexio.com: # robots.txt for https://www.bexio.com/de-CH/
bexio.com: # live - don't allow web crawlers to index cpresources/ or vendor/
bexio.com: #Baiduspider
bexio.com: #Sogou
xtube.com: # Twitterbot
blogmura.com: # 2020-10-19 追加
carparts-cat.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
carparts-cat.com: #content{margin:0 0 0 2%;position:relative;}
open.ac.uk: #
open.ac.uk: # This file is to prevent the crawling and indexing of certain parts
open.ac.uk: # of our site by web crawlers and spiders run by sites like Google.
open.ac.uk: # By telling these "robots" where not to go on the site,
open.ac.uk: # we save bandwidth and server resources.
open.ac.uk: #
open.ac.uk: # For more information about the robots.txt standard, see:
open.ac.uk: # http://www.robotstxt.org/wc/robots.html
open.ac.uk: # feeds
open.ac.uk: # search results
open.ac.uk: # Paths
open.ac.uk: # parameters
open.ac.uk: # wikis
economipedia.com: # Bloqueo basico para todos los bots y crawlers
economipedia.com: # puede dar problemas por bloqueo de recursos en Google Search Console
economipedia.com: # Disallow: /author/
economipedia.com: # Desindexar p√°ginas y etiquetas
economipedia.com: # Bloqueo de las URL dinamicas
economipedia.com: #Bloqueo de busquedas
economipedia.com: # Bloqueo de trackbacks
economipedia.com: # Bloqueo de feeds para crawlers
economipedia.com: # Ralentizamos algunos bots que se suelen volver locos
economipedia.com: # Previene problemas de recursos bloqueados en Google Webmaster Tools
economipedia.com: # Bloqueo de bots y crawlers poco utiles
bauhaus.info: ###
bauhaus.info: # For all robots
bauhaus.info: # Block access to specific groups of pages
bauhaus.info: # Allow search crawlers to discover the sitemap
bauhaus.info: # Block CazoodleBot as it does not present correct accept content headers
bauhaus.info: # Block MJ12bot as it is just noise
bauhaus.info: # Block dotbot as it cannot parse base urls properly
bauhaus.info: # Block Gigabot
bauhaus.info: # Block Internet Archives
bauhaus.info: # PPS-12633, block SEOkicks crawler
bauhaus.info: # PPS-69657: block Semrush-Bot
g2g.com: # Adsense
g2g.com: # Blekko
g2g.com: # CommonCrawl
beisen.com: # -----------------------------------------------------------------------------
beisen.com: # robots.txt for beisen.com
beisen.com: # -----------------------------------------------------------------------------
eoffcn.com: #robots.txt generated at http://www.eoffcn.com/
gbgame.com.tw: # Robots.txt file from http://www.gbgame.com.tw
hdpfans.com: #
hdpfans.com: # robots.txt for Discuz! X3
hdpfans.com: #
shangc.net: # “‘œ¬Œ™20180826Ã̺”
shangc.net: # 20190101Ã̺”
shangc.net: # 20190522Ã̺”
capgemini.com: # Media
capgemini.com: # Restricted media*
capgemini.com: # Robots.txt Manager
softwareadvice.com: # robots.txt for https://www.softwareadvice.com
softwareadvice.com: # GDM ajax data and template
softwareadvice.com: # Blocks crawlers that are kind enough to obey robots
ambitionbox.com: #Korean search engine
ambitionbox.com: #Czech Republic search engine
ambitionbox.com: #Yahoo
ambitionbox.com: #Ask Jeeves, a U.S.-based search engine
tigerdirect.com: #modalMapNoPrice .modal-body p{font-size:14px;color:#000;font-family:Arial,Helvetica,sans-serif;text-align:left;padding:5px 10px;margin:0}
tigerdirect.com: #modalMapNoPrice .modal-body h5{font-size:16px;text-align:left;font-weight:bold;font-family:Arial,Helvetica,sans-serif;padding:5px 10px;margin:0}
tigerdirect.com: #dc_container{margin:0 auto;width:960px;}
tigerdirect.com: #dc_container iframe{margin:2px 0;width:100%;}
tigerdirect.com: #mast_nav{clear: both;background-repeat: no-repeat;padding: 0;position: relative;text-align: left;margin: 0 auto!important;margin-left: -1px;z-index: 500;width:960px;}
tigerdirect.com: #mast_nav .navItem{float:left;zoom:1;}
tigerdirect.com: #mast_nav, #mast_nav .mastNav-link{background-image:url(https://cdn-eu-ec.yottaa.net/56abbca0312e5815f5000542/e42d88e0d50401335179123dfe2baf36.yottaa.net/v~4b.4c/td/masthead_v2/masthead-nav-vert-5.jpg?yocs=2m_2E_);height: 37px;}
tigerdirect.com: #mast_nav .mastNav-link{display: block;text-indent: -9999px;cursor:default;}
tigerdirect.com: #mast_nav .jHover .mastNav-pop{display:block;}
tigerdirect.com: #navInsiderMesg{font-size:17px;font-weight:bold;line-height:20px;}
tigerdirect.com: #smsMobileWrapper {margin-right:10px;width:220px;}
tigerdirect.com: #txtMobileNav_imgWrap{height:74px;}
tigerdirect.com: #txtMobileNav_imgWrap img{width:84px; float:left;}
tigerdirect.com: #smsMobileWrapper .navInsiderInput {border: 1px solid #ccc;line-height:24px;height:24px;font:normal 15px/1 arial;padding:3px;width:210px;margin-bottom: 5px; display:block;}
tigerdirect.com: #vip-login a{
tigerdirect.com: #vip-login a:hover{
uproxx.com: # Uproxx Start
uproxx.com: # Uproxx End
uproxx.com: # Sitemap archive
iranestekhdam.ir: # Sitemap
instantdomainsearch.com: # *
instantdomainsearch.com: # Host
instantdomainsearch.com: # Sitemaps
rio.rj.gov.br: #button::after {
rio.rj.gov.br: #button:hover {
rio.rj.gov.br: #button:active {
rio.rj.gov.br: #button.show {
ucl.ac.uk: #
ucl.ac.uk: # robots.txt
ucl.ac.uk: #
ucl.ac.uk: # This file is to prevent the crawling and indexing of certain parts
ucl.ac.uk: # of your site by web crawlers and spiders run by sites like Yahoo!
ucl.ac.uk: # and Google. By telling these "robots" where not to go on your site,
ucl.ac.uk: # you save bandwidth and server resources.
ucl.ac.uk: #
ucl.ac.uk: # This file will be ignored unless it is at the root of your host:
ucl.ac.uk: # Used: http://example.com/robots.txt
ucl.ac.uk: # Ignored: http://example.com/site/robots.txt
ucl.ac.uk: #
ucl.ac.uk: # For more information about the robots.txt standard, see:
ucl.ac.uk: # http://www.robotstxt.org/robotstxt.html
ucl.ac.uk: #Drupal default
ucl.ac.uk: # CSS, JS, Images
ucl.ac.uk: # Directories
ucl.ac.uk: # Files
ucl.ac.uk: # Paths (clean URLs)
ucl.ac.uk: # Paths (no clean URLs)
ucl.ac.uk: # Paths (clean URLs) - fixed
ucl.ac.uk: # Paths (no clean URLs) - fixed
ucl.ac.uk: # Sites
ucl.ac.uk: # Sites - fixed
paycor.com: # production
blockchair.com: # Russian localization
blockchair.com: # Chineese localization
blockchair.com: # Spanish localization
blockchair.com: # Portuguese localization
daveramsey.com: # robots.txt for https://www.daveramsey.com/
daveramsey.com: # Disallow all crawlers access to certain folders.
arizona.edu: #
arizona.edu: # robots.txt
arizona.edu: #
arizona.edu: # This file is to prevent the crawling and indexing of certain parts
arizona.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
arizona.edu: # and Google. By telling these "robots" where not to go on your site,
arizona.edu: # you save bandwidth and server resources.
arizona.edu: #
arizona.edu: # This file will be ignored unless it is at the root of your host:
arizona.edu: # Used: http://example.com/robots.txt
arizona.edu: # Ignored: http://example.com/site/robots.txt
arizona.edu: #
arizona.edu: # For more information about the robots.txt standard, see:
arizona.edu: # http://www.robotstxt.org/robotstxt.html
arizona.edu: # CSS, JS, Images
arizona.edu: # Directories
arizona.edu: # Files
arizona.edu: # Paths (clean URLs)
arizona.edu: # Paths (no clean URLs)
theaustralian.com.au: #Agent Specific Disallowed Sections
emagister.com: # Nuevos
emagister.com: # Blog
emagister.com: # Respuestas
emagister.com: # Express
clover.com: # If you are human and can read this, you should apply for a job at Clover.
clover.com: # https://www.clover.com/careers
viator.com: # Hi, we're Viator, Nice to meet you.
viator.com: #
viator.com: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
viator.com: #
viator.com: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet?
viator.com: #
viator.com: # Run - don't crawl - to apply to join Viator's elite SEO team
viator.com: #
viator.com: # Visit https://careers.tripadvisor.com/search-results?keywords=seo
viator.com: #
viator.com: #
viator.com: # viator.com
bonprix.ru: # Company: bonprix.ru
bonprix.ru: # Author: bonprix.ru
bonprix.ru: # URL: https://www.bonprix.ru
bonprix.ru: # Disallow all crawlers access to certain pages.
bonprix.ru: # block special parameters
bonprix.ru: # block NITRO Tracking (added 2021-01-05)
bonprix.ru: # block Internal Search Suggestions (added 2021-01-05)
bonprix.ru: # block viewed products (added 2021-01-05)
bonprix.ru: # block update product variations (added 2021-01-05)
bonprix.ru: # block glossary content in productdetails (added 2021-01-05)
bonprix.ru: # block Wishlist (added 2021-01-05)
bonprix.ru: # Disallow Yandex access to certain parameters
bonprix.ru: # block special parameters
bonprix.ru: # block NITRO Tracking (added 2021-01-11)
bonprix.ru: # block Internal Search Suggestions (added 2021-01-11)
bonprix.ru: # block viewed products (added 2021-01-11)
bonprix.ru: # block update product variations (added 2021-01-11)
bonprix.ru: # block glossary content in productdetails (added 2021-01-11)
bonprix.ru: # block Wishlist (added 2021-01-11)
bonprix.ru: # Sitemap files
2dehands.be: # Here is our sitemap (this line is independent of UA blocks, per the spec)
2dehands.be: #Please keep blocking of all URLs in place for at least 2 years after removing a specific module
2dehands.be: #SOI subpage
2dehands.be: # login, confirm and forgot password pages
2dehands.be: # mymp pages
2dehands.be: # ASQ pages
2dehands.be: # SYI Pages
2dehands.be: # Flagging/tipping ads
2dehands.be: # bidding on ads
2dehands.be: # external url redirects
2dehands.be: # google analytics
2dehands.be: #korean spam
2dehands.be: #legacy
2dehands.be: # prevent unnecessary crawling
2dehands.be: # New vip
2dehands.be: # Block VIPs with parameters
2dehands.be: #block homepage feeds
vidaextra.com: #
vidaextra.com: # robots.txt
vidaextra.com: #
vidaextra.com: # Crawlers that are kind enough to obey, but which we'd rather not have
vidaextra.com: # unless they're feeding search engines.
vidaextra.com: # Some bots are known to be trouble, particularly those designed to copy
vidaextra.com: # entire sites. Please obey robots.txt.
vidaextra.com: # Sorry, wget in its recursive mode is a frequent problem.
vidaextra.com: # Please read the man page and use it properly; there is a
vidaextra.com: # --wait option you can use to set the delay between hits,
vidaextra.com: # for instance.
vidaextra.com: #
vidaextra.com: #
vidaextra.com: # The 'grub' distributed client has been *very* poorly behaved.
vidaextra.com: #
vidaextra.com: #
vidaextra.com: # Doesn't follow robots.txt anyway, but...
vidaextra.com: #
vidaextra.com: #
vidaextra.com: # Hits many times per second, not acceptable
vidaextra.com: # http://www.nameprotect.com/botinfo.html
vidaextra.com: # A capture bot, downloads gazillions of pages with no public benefit
vidaextra.com: # http://www.webreaper.net/
ibctamil.com: # Disallow: /*? This is match ? anywhere in the URL
optus.com.au: # Temporary campaigns that are excluded from organic results
bw-bank.de: # Refuse all robots from these directories:
bw-bank.de: # Blocked Bots
bw-bank.de: # Disallow content
bw-bank.de: # Sitemap URL
zibal.ir: # https://www.robotstxt.org/robotstxt.html
elespectador.com: # CSS, JS, Images
elespectador.com: # Paths (clean URLs)
elespectador.com: # Sitemap:
thenetnaija.com: # hestiacp autogenerated robots.txt
ftvnews.com.tw: #User-agent: SearchmetricsBot
ftvnews.com.tw: #Disallow: /
remotasks.com: # www.robotstxt.org/
remotasks.com: # Allow crawling of all content
clearbit.com: # all
volvocars.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
volvocars.com: #content{margin:0 0 0 2%;position:relative;}
trckapp.com: #header{
trckapp.com: #header>div{
trckapp.com: #contentfull {
trckapp.com: #contentbox {
trckapp.com: #contentbox:before, #contentbox:after {
trckapp.com: #contentbox:after {
trckapp.com: #contentbox blockquote{
aaa.com: # For domain: http://www.aaa.com
indiapostgdsonline.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
indiapostgdsonline.in: #content{margin:0 0 0 2%;position:relative;}
audiusa.com: # functional links
audiusa.com: # editorial links
google.com.ni: # AdsBot
google.com.ni: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
w3.org: #
w3.org: # robots.txt for http://www.w3.org/
w3.org: #
w3.org: # $Id: robots.txt,v 1.85 2020/11/06 21:15:53 gerald Exp $
w3.org: #
w3.org: # For use by search.w3.org
w3.org: # W3C Link checker
w3.org: # Applebot continues to make hundreds of thousands of reqs/day for this area
w3.org: # even though it has been returning permanent redirects for years
w3.org: # the following settings apply to all bots
w3.org: # Blogs - WordPress
w3.org: # https://codex.wordpress.org/Search_Engine_Optimization_for_WordPress#Robots.txt_Optimization
w3.org: # Wikis - Mediawiki
w3.org: # https://www.mediawiki.org/wiki/Manual:Robots.txt
w3.org: # various other access-controlled or expensive areas
w3.org: # WAI indexing
w3.org: # Disallow: /WAI/EO/Drafts/
birdeye.com: # Allow specific URLs for all bots
bitpay.com: # www.robotstxt.org/
bitpay.com: # Allow crawling of all content
education.gouv.fr: #
education.gouv.fr: # robots.txt
education.gouv.fr: #
education.gouv.fr: # This file is to prevent the crawling and indexing of certain parts
education.gouv.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
education.gouv.fr: # and Google. By telling these "robots" where not to go on your site,
education.gouv.fr: # you save bandwidth and server resources.
education.gouv.fr: #
education.gouv.fr: # This file will be ignored unless it is at the root of your host:
education.gouv.fr: # Used: http://example.com/robots.txt
education.gouv.fr: # Ignored: http://example.com/site/robots.txt
education.gouv.fr: #
education.gouv.fr: # For more information about the robots.txt standard, see:
education.gouv.fr: # http://www.robotstxt.org/robotstxt.html
education.gouv.fr: # CSS, JS, Images
education.gouv.fr: # Directories
education.gouv.fr: # Files
education.gouv.fr: # Paths (clean URLs)
education.gouv.fr: # Paths (no clean URLs)
education.gouv.fr: # XML sitemap
closermag.fr: # SPECIFIC
liveauctioneers.com: #Start of Parameters
liveauctioneers.com: #Bingbot
liveauctioneers.com: #Start of Parameters
hi5.com: #########################################################################
hi5.com: # /robots.txt file for http://www.tagged.com/
hi5.com: # mail webmaster@tagged.com for constructive criticism
hi5.com: #########################################################################
hi5.com: # Any others
invisionapp.com: # www.robotstxt.org/
linustechtips.com: # Sogou does not behave correctly. Let this be a warning to all the other bots out there.
sjsu.edu: #########################################
sjsu.edu: # Welcome to San Jose State University
sjsu.edu: #
sjsu.edu: # Note: Please do not over load the servers
sjsu.edu: # http://its.sjsu.edu
sjsu.edu: #
sjsu.edu: # Disallow: /ecampus/
sjsu.edu: # Site Map Listing
hktvmall.com: # For all robots
hktvmall.com: # Allow search crawlers to discover the sitemap
hktvmall.com: # Block CazoodleBot as it does not present correct accept content headers
hktvmall.com: # Block MJ12bot as it is just noise
hktvmall.com: # Exclude evil bots
unity.com: #
unity.com: # robots.txt
unity.com: #
unity.com: # This file is to prevent the crawling and indexing of certain parts
unity.com: # of your site by web crawlers and spiders run by sites like Yahoo!
unity.com: # and Google. By telling these "robots" where not to go on your site,
unity.com: # you save bandwidth and server resources.
unity.com: #
unity.com: # This file will be ignored unless it is at the root of your host:
unity.com: # Used: http://example.com/robots.txt
unity.com: # Ignored: http://example.com/site/robots.txt
unity.com: #
unity.com: # For more information about the robots.txt standard, see:
unity.com: # http://www.robotstxt.org/robotstxt.html
unity.com: # CSS, JS, Images
unity.com: # Directories
unity.com: # Files
unity.com: # Paths (clean URLs)
unity.com: # Paths (no clean URLs)
unity.com: # Chinese Search Engines
cnews.fr: # Directories
cnews.fr: # Paths (clean URLs)
cnews.fr: # Paths (no clean URLs)
virginia.edu: #
virginia.edu: # robots.txt
virginia.edu: #
virginia.edu: # This file is to prevent the crawling and indexing of certain parts
virginia.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
virginia.edu: # and Google. By telling these "robots" where not to go on your site,
virginia.edu: # you save bandwidth and server resources.
virginia.edu: #
virginia.edu: # This file will be ignored unless it is at the root of your host:
virginia.edu: # Used: http://example.com/robots.txt
virginia.edu: # Ignored: http://example.com/site/robots.txt
virginia.edu: #
virginia.edu: # For more information about the robots.txt standard, see:
virginia.edu: # http://www.robotstxt.org/robotstxt.html
virginia.edu: # CSS, JS, Images
virginia.edu: # Directories
virginia.edu: # Files
virginia.edu: # Paths (clean URLs)
virginia.edu: # Paths (no clean URLs)
euskadi.eus: # disallow partial files and contents of type serv_proc_*
oricon.co.jp: # Baidu chinese search engine
oricon.co.jp: # RUSSIA search engine
oricon.co.jp: # sogou.com chinese search engine
oricon.co.jp: # User-agent: Sogou web spider
oricon.co.jp: # Disallow: /
oricon.co.jp: # Grapeshot Allow
tuchong.com: # Robots.txt file from http://www.tuchong.com
tuchong.com: # All robots will spider the domain
sepe.es: # Regla 1
lidiashopping.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
lidiashopping.com: #
lidiashopping.com: # To ban all spiders from the entire site uncomment the next two lines:
lidiashopping.com: # User-agent: *
lidiashopping.com: # Disallow: /
lidiashopping.com: # https://blogs.bing.com/webmaster/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot/
pancakeswap.finance: # https://www.robotstxt.org/robotstxt.html
iitb.ac.in: #
iitb.ac.in: # robots.txt
iitb.ac.in: #
iitb.ac.in: # This file is to prevent the crawling and indexing of certain parts
iitb.ac.in: # of your site by web crawlers and spiders run by sites like Yahoo!
iitb.ac.in: # and Google. By telling these "robots" where not to go on your site,
iitb.ac.in: # you save bandwidth and server resources.
iitb.ac.in: #
iitb.ac.in: # This file will be ignored unless it is at the root of your host:
iitb.ac.in: # Used: http://example.com/robots.txt
iitb.ac.in: # Ignored: http://example.com/site/robots.txt
iitb.ac.in: #
iitb.ac.in: # For more information about the robots.txt standard, see:
iitb.ac.in: # http://www.robotstxt.org/robotstxt.html
iitb.ac.in: # CSS, JS, Images
iitb.ac.in: # Directories
iitb.ac.in: # Files
iitb.ac.in: # Paths (clean URLs)
iitb.ac.in: # Paths (no clean URLs)
syria.tv: #
syria.tv: # robots.txt
syria.tv: #
syria.tv: # This file is to prevent the crawling and indexing of certain parts
syria.tv: # of your site by web crawlers and spiders run by sites like Yahoo!
syria.tv: # and Google. By telling these "robots" where not to go on your site,
syria.tv: # you save bandwidth and server resources.
syria.tv: #
syria.tv: # This file will be ignored unless it is at the root of your host:
syria.tv: # Used: http://example.com/robots.txt
syria.tv: # Ignored: http://example.com/site/robots.txt
syria.tv: #
syria.tv: # For more information about the robots.txt standard, see:
syria.tv: # http://www.robotstxt.org/robotstxt.html
syria.tv: # CSS, JS, Images
syria.tv: # Directories
syria.tv: # Files
syria.tv: # Paths (clean URLs)
syria.tv: # Paths (no clean URLs)
locanto.com: ##############################
locanto.com: # robots.txt file
locanto.com: # based on webmasterworld.com
locanto.com: # and searchengineworld.com
locanto.com: # Please, we do NOT allow nonauthorized robots any longer.
locanto.com: # Yes, feel free to copy and use the following.
locanto.com: # desktop
locanto.com: # mobile
locanto.com: # desktop
locanto.com: # mobile
locanto.com: # desktop
locanto.com: # mobile
locanto.com: # desktop
locanto.com: # mobile
locanto.com: ####################################
ionos.mx: #print
ionos.mx: #terms and conditions
ionos.mx: #Popups etc.
ionos.mx: #Results
ionos.mx: #crawl delay
tripleclicks.com: #dropmenudiv{
tripleclicks.com: #dropmenudiv a{
tripleclicks.com: #dropmenudiv a:hover{ /*hover background color*/
tripleclicks.com: #f1_upload_process{
tripleclicks.com: #footer2012{width:930px; padding:15px; line-height:1.3em;background:#fff;border:1px solid #dedede;border-radius:5px 5px 0 0;color:#666;font-size:.8em;font-family:"Helvetica Neue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;margin:10px auto 0}
tripleclicks.com: #footer2012 .make_money { background:#eeeeee; padding:10px; margin-top:10px }
tripleclicks.com: #footer2012 .footerHeader{font-family:Lato, sans-serif;font-size:16px;color:#690;font-weight:300;margin:0;padding:0}
tripleclicks.com: #footer2012 #customerCare2{float:left;width:120px; margin-right:15px;}
tripleclicks.com: #footer2012 #giving2{float:left;width:150px; margin-right:15px;}
tripleclicks.com: #footer2012 #safeSecure2{float:left;width:310px; margin-right:15px;}
tripleclicks.com: #footer2012 #intl2{float:left;width:170px; margin-right:15px;}
tripleclicks.com: #footer2012 #sponsor2{float:left;width:90px;}
tripleclicks.com: #zackpot_bar { width:962px; padding:0 10px 0 75px; margin:0 auto; height:40px; line-height:40px; background:#666666; border-radius:0 0 5px 5px; color:#CCCCCC; box-sizing:border-box; -moz-box-sizing:border-box; position:relative; font-size:1.1em }
tripleclicks.com: #zackpot_bar a.play { color:#FFF; text-decoration:none; padding:0px 16px; background:#7826a1; border-radius:20px; text-align:center; float:right; line-height:26px; border:1px solid #b042e7; margin-top:6px; font-weight:700; font-size:1.15em; letter-spacing:1px }
tripleclicks.com: #zackpot_bar img { position:absolute; bottom:0px; left:6px }
observer.com: # Sitemap archive
observer.com: ## Disallow search strings.
edesk.com: #
edesk.com: # robots.txt
edesk.com: #
edesk.com: # This file is to prevent the crawling and indexing of certain parts
edesk.com: # of your site by web crawlers and spiders run by sites like Yahoo!
edesk.com: # and Google. By telling these "robots" where not to go on your site,
edesk.com: # you save bandwidth and server resources.
edesk.com: #
edesk.com: # This file will be ignored unless it is at the root of your host:
edesk.com: # Used: http://example.com/robots.txt
edesk.com: # Ignored: http://example.com/site/robots.txt
edesk.com: #
edesk.com: # For more information about the robots.txt standard, see:
edesk.com: # http://www.robotstxt.org/robotstxt.html
edesk.com: # CSS, JS, Images
edesk.com: # Directories
edesk.com: # Files
edesk.com: # Paths (clean URLs)
edesk.com: # Paths (no clean URLs)
ucs.br: #Referencias
ucs.br: #http://www.robotstxt.org/
ucs.br: #http://www.google.com/support/webmasters/bin/answer.py?hl=br&answer=156449
ucs.br: #http://g1.globo.com/robots.txt
ucs.br: #http://en.wikipedia.org/robots.txt
ucs.br: #http://www.terra.com.br/robots.txt
ucs.br: #http://www.google.com.br/robots.txt
ucs.br: #http://www.livejournal.com/robots.txt
ucs.br: #http://www.ubuntu.com/robots.txt
ucs.br: #Disallow: /ucs/*
ucs.br: # Portal antigo da especializacao
ucs.br: # foram mantidas apenas as páginas estáticas abaixo
ucs.br: # advertising-related bots:
ucs.br: # Wikipedia work bots:
ucs.br: # Crawlers that are kind enough to obey, but which we'd rather not have
ucs.br: # unless they're feeding search engines.
ucs.br: # Some bots are known to be trouble, particularly those designed to copy
ucs.br: # entire sites. Please obey robots.txt.
ucs.br: # Hits many times per second, not acceptable
ucs.br: # http://www.nameprotect.com/botinfo.html
ucs.br: # A capture bot, downloads gazillions of pages with no public benefit
ucs.br: # http://www.webreaper.net/
tradervue.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
tradervue.com: #
tradervue.com: # To ban all spiders from the entire site uncomment the next two lines:
tradervue.com: # User-Agent: *
tradervue.com: # Disallow: /
hsbc.co.uk: #Introduce Sitemaps
sitejabber.com: #favorite routes
sitejabber.com: #forum routes
sitejabber.com: #page routes
sitejabber.com: #partner routes
sitejabber.com: #plugin routes
sitejabber.com: #review routes
sitejabber.com: #url routes (non-pages)
sitejabber.com: #user routes
sitejabber.com: # misc
sitejabber.com: #adult content
blocket.se: # Det är uttryckligen förbjudet att använda sökrobotar eller andra
blocket.se: # automatiska metoder för att tillgå blocket.se. Endast om blocket.se
blocket.se: # givit särskilt tillstånd får sådan access ske.
blocket.se: # TODO: fix so links in sitemap.xml points to cdn
blocket.se: # Sitemap: https://assets.blocketcdn.se/adout/public/static/sitemap.xml
microworkers.com: # www.robotstxt.org/
microworkers.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
athoselectronics.com: # Spam Backlink Blocker
athoselectronics.com: # Allow/Disallow Ads.txt
athoselectronics.com: # Allow/Disallow App-ads.txt
athoselectronics.com: # This robots.txt file was created by Better Robots.txt (Index & Rank Booster by Pagup) Plugin. https://www.better-robots.com/
cursou.com.br: # Google AdSense
forever.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
forever.com: #
forever.com: # To ban all spiders from the entire site uncomment the next two lines:
forever.com: # User-agent: *
forever.com: # Disallow: /
imocandidaturas.co.ao: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
imocandidaturas.co.ao: #content{margin:0 0 0 2%;position:relative;}
lucascassianouploader.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
lucascassianouploader.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
lucascassianouploader.wordpress.com: # This file was generated on Wed, 24 Feb 2021 00:48:35 +0000
melhorcambio.com: # remove directories
mundodaeletrica.com.br: # robots.txt file for https://www.mundodaeletrica.com.br/
mundodaeletrica.com.br: # Template version: 20171218
mundodaeletrica.com.br: # Last update of this robots.txt file: 15/02/21 at 20:40
mundodaeletrica.com.br: # Avoid indexing some directories
mundodaeletrica.com.br: # Allow others
mundodaeletrica.com.br: # Avoid indexing somes file extensions
mundodaeletrica.com.br: # Sitemap
mundodaeletrica.com.br: # e32af6360eaa0df255079000158e386710afbf08826790547681c4419158f955
official.ao: # Optimization for Google Ads Bot
portaldeangola.com: # BEGIN WBCPBlocker
portaldeangola.com: # END WBCPBlocker
procenter.co.ao: #Begin Attracta SEO Tools Sitemap. Do not remove
procenter.co.ao: #End Attracta SEO Tools Sitemap. Do not remove
shopify.com.br: # ,:
shopify.com.br: # ,' |
shopify.com.br: # / :
shopify.com.br: # --' /
shopify.com.br: # \/ />/
shopify.com.br: # / <//_\
shopify.com.br: # __/ /
shopify.com.br: # )'-. /
shopify.com.br: # ./ :\
shopify.com.br: # /.' '
shopify.com.br: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers
shopify.com.br: # robots.txt file for www.shopify.com.br
tutorialmonsters.com: # Primero el contenido adjunto.
tutorialmonsters.com: # TambiÈn podemos desindexar todo lo que empiece
tutorialmonsters.com: # por wp-. Es lo mismo que los Disallow de arriba pero
tutorialmonsters.com: # incluye cosas como wp-rss.php
tutorialmonsters.com: #
tutorialmonsters.com: # Sitemap permitido, b˙squedas no.
tutorialmonsters.com: #
tutorialmonsters.com: # Sitemap: http://tutorialmonsters.com/sitemap.xml
tutorialmonsters.com: #
tutorialmonsters.com: # Permitimos el feed general para Google Blogsearch.
tutorialmonsters.com: #
tutorialmonsters.com: # Impedimos que permalink/feed/ sea indexado ya que el
tutorialmonsters.com: # feed con los comentarios suele posicionarse en lugar de
tutorialmonsters.com: # la entrada y desorienta a los usuarios.
tutorialmonsters.com: #
tutorialmonsters.com: # Lo mismo con URLs terminadas en /trackback/ que sÛlo
tutorialmonsters.com: # sirven como Trackback URI (y son contenido duplicado).
tutorialmonsters.com: #
tutorialmonsters.com: #
tutorialmonsters.com: # A partir de aquÌ es opcional pero recomendado.
tutorialmonsters.com: #
tutorialmonsters.com: # Lista de bots que suelen respetar el robots.txt pero rara
tutorialmonsters.com: # vez hacen un buen uso del sitio y abusan bastanteÖ
tutorialmonsters.com: # AÒadir al gusto del consumidorÖ
tutorialmonsters.com: #
tutorialmonsters.com: # Slurp (Yahoo!), Noxtrum y el bot de MSN a veces tienen
tutorialmonsters.com: # idas de pinza, toca decirles que reduzcan la marcha.
tutorialmonsters.com: # El valor es en segundos y podÈis dejarlo bajo e ir
tutorialmonsters.com: # subiendo hasta el punto Ûptimo.
tutorialmonsters.com: #
tutorialmonsters.com: # robots.txt automaticaly generated by PrestaShop e-commerce open-source solution
tutorialmonsters.com: # http://www.prestashop.com - http://www.prestashop.com/forums
tutorialmonsters.com: # This file is to prevent the crawling and indexing of certain parts
tutorialmonsters.com: # of your site by web crawlers and spiders run by sites like Yahoo!
tutorialmonsters.com: # and Google. By telling these "robots" where not to go on your site,
tutorialmonsters.com: # you save bandwidth and server resources.
tutorialmonsters.com: # For more information about the robots.txt standard, see:
tutorialmonsters.com: # http://www.robotstxt.org/wc/robots.html
tutorialmonsters.com: # GoogleBot specific
tutorialmonsters.com: # All bots
tutorialmonsters.com: # Directories
tutorialmonsters.com: # Files
tutorialmonsters.com: # Sitemap
huckberry.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
encar.com: #"Encar" prohibits unauthorized data collection activities (crawling, scraping) using manual or automated tools.
gba.gob.ar: #
gba.gob.ar: # robots.txt
gba.gob.ar: #
gba.gob.ar: # This file is to prevent the crawling and indexing of certain parts
gba.gob.ar: # of your site by web crawlers and spiders run by sites like Yahoo!
gba.gob.ar: # and Google. By telling these "robots" where not to go on your site,
gba.gob.ar: # you save bandwidth and server resources.
gba.gob.ar: #
gba.gob.ar: # This file will be ignored unless it is at the root of your host:
gba.gob.ar: # Used: http://example.com/robots.txt
gba.gob.ar: # Ignored: http://example.com/site/robots.txt
gba.gob.ar: #
gba.gob.ar: # For more information about the robots.txt standard, see:
gba.gob.ar: # http://www.robotstxt.org/robotstxt.html
gba.gob.ar: # CSS, JS, Images
gba.gob.ar: # Directories
gba.gob.ar: # Files
gba.gob.ar: # Paths (clean URLs)
gba.gob.ar: # Paths (no clean URLs)
airbnb.com.au: # ///////
airbnb.com.au: # // //
airbnb.com.au: # // //
airbnb.com.au: # // // /// /// ///
airbnb.com.au: # // // /// ///
airbnb.com.au: # // /// // //// /// /// /// //// /// //// /// //// /// ////
airbnb.com.au: # // /// /// // ////////// /// ////////// /////////// ////////// ///////////
airbnb.com.au: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.com.au: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.com.au: # // // // // /// /// /// /// /// /// /// /// /// ///
airbnb.com.au: # // // // // ////////// /// /// ////////// /// /// //////////
airbnb.com.au: # // ///// //
airbnb.com.au: # // ///// //
airbnb.com.au: # // /// /// //
airbnb.com.au: # ////// //////
airbnb.com.au: #
airbnb.com.au: #
airbnb.com.au: # We thought you'd never make it!
airbnb.com.au: # We hope you feel right at home in this file...unless you're a disallowed subfolder.
airbnb.com.au: # And since you're here, read up on our culture and team: https://www.airbnb.com/careers/departments/engineering
airbnb.com.au: # There's even a bring your robot to work day.
forextime.com: #
forextime.com: # robots.txt
forextime.com: #
forextime.com: # This file is to prevent the crawling and indexing of certain parts
forextime.com: # of your site by web crawlers and spiders run by sites like Yahoo!
forextime.com: # and Google. By telling these "robots" where not to go on your site,
forextime.com: # you save bandwidth and server resources.
forextime.com: #
forextime.com: # This file will be ignored unless it is at the root of your host:
forextime.com: # Used: http://example.com/robots.txt
forextime.com: # Ignored: http://example.com/site/robots.txt
forextime.com: #
forextime.com: # For more information about the robots.txt standard, see:
forextime.com: # http://www.robotstxt.org/robotstxt.html
forextime.com: # CSS, JS, Images
forextime.com: # Directories
forextime.com: # Files
forextime.com: # Paths (clean URLs)
forextime.com: # Paths (no clean URLs)
whatmobile.com.pk: ###
whatmobile.com.pk: # robots.txt file created at http://www.whatmobile.com.pk
whatmobile.com.pk: # For domain: http://www.whatmobile.com.pk
whatmobile.com.pk: ###
whatmobile.com.pk: #Begin Attracta SEO Tools Sitemap. Do not remove
whatmobile.com.pk: #End Attracta SEO Tools Sitemap. Do not remove
rangeme.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
rangeme.com: #
rangeme.com: # To ban all spiders from the entire site uncomment the next two lines:
rangeme.com: # User-agent: *
rangeme.com: # Disallow: /
developpement-durable.gouv.fr: #
developpement-durable.gouv.fr: # robots.txt
developpement-durable.gouv.fr: #
developpement-durable.gouv.fr: # This file is to prevent the crawling and indexing of certain parts
developpement-durable.gouv.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
developpement-durable.gouv.fr: # and Google. By telling these "robots" where not to go on your site,
developpement-durable.gouv.fr: # you save bandwidth and server resources.
developpement-durable.gouv.fr: #
developpement-durable.gouv.fr: # This file will be ignored unless it is at the root of your host:
developpement-durable.gouv.fr: # Used: http://example.com/robots.txt
developpement-durable.gouv.fr: # Ignored: http://example.com/site/robots.txt
developpement-durable.gouv.fr: #
developpement-durable.gouv.fr: # For more information about the robots.txt standard, see:
developpement-durable.gouv.fr: # http://www.robotstxt.org/robotstxt.html
developpement-durable.gouv.fr: # CSS, JS, Images
developpement-durable.gouv.fr: # Directories
developpement-durable.gouv.fr: # Files
developpement-durable.gouv.fr: # Paths (clean URLs)
developpement-durable.gouv.fr: # Paths (no clean URLs)
unicef.org: # Drupal sites
unicef.org: # For Main site
unicef.org: # CSS, JS, Images
unicef.org: # Directories
unicef.org: # Files
unicef.org: # Paths (clean URLs)
unicef.org: # Paths (no clean URLs)
unicef.org: # For ROCO sites
unicef.org: # CSS, JS, Images
unicef.org: # Directories
unicef.org: # Files
unicef.org: # Paths (clean URLs)
unicef.org: # Paths (no clean URLs)
rusvesna.su: #
rusvesna.su: # robots.txt
rusvesna.su: #
rusvesna.su: # This file is to prevent the crawling and indexing of certain parts
rusvesna.su: # of your site by web crawlers and spiders run by sites like Yahoo!
rusvesna.su: # and Google. By telling these "robots" where not to go on your site,
rusvesna.su: # you save bandwidth and server resources.
rusvesna.su: #
rusvesna.su: # This file will be ignored unless it is at the root of your host:
rusvesna.su: # Used: http://example.com/robots.txt
rusvesna.su: # Ignored: http://example.com/site/robots.txt
rusvesna.su: #
rusvesna.su: # For more information about the robots.txt standard, see:
rusvesna.su: # http://www.robotstxt.org/robotstxt.html
rusvesna.su: # Directories
rusvesna.su: # Files
rusvesna.su: # Paths (clean URLs)
rusvesna.su: # Paths (no clean URLs)
huatu.com: #
huatu.com: # robots.txt for huatu.com
huatu.com: #
honeybook.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
nexcess.net: #****************************************************************************
nexcess.net: # robots.txt
nexcess.net: # : Robots, spiders, and search engines use this file to detmine which
nexcess.net: # content they should *not* crawl while indexing your website.
nexcess.net: # : This system is called "The Robots Exclusion Standard."
nexcess.net: # : It is strongly encouraged to use a robots.txt validator to check
nexcess.net: # for valid syntax before any robots read it!
nexcess.net: #
nexcess.net: # Examples:
nexcess.net: #
nexcess.net: # Instruct all robots to stay out of the admin area.
nexcess.net: # : User-agent: *
nexcess.net: # : Disallow: /admin/
nexcess.net: #
nexcess.net: # Restrict Google and MSN from indexing your images.
nexcess.net: # : User-agent: Googlebot
nexcess.net: # : Disallow: /images/
nexcess.net: # : User-agent: MSNBot
nexcess.net: # : Disallow: /images/
nexcess.net: #****************************************************************************
softwaresuggest.com: # Block Uptime robot
uct.ac.za: #
uct.ac.za: # robots.txt
uct.ac.za: #
uct.ac.za: # This file is to prevent the crawling and indexing of certain parts
uct.ac.za: # of your site by web crawlers and spiders run by sites like Yahoo!
uct.ac.za: # and Google. By telling these "robots" where not to go on your site,
uct.ac.za: # you save bandwidth and server resources.
uct.ac.za: #
uct.ac.za: # This file will be ignored unless it is at the root of your host:
uct.ac.za: # Used: http://example.com/robots.txt
uct.ac.za: # Ignored: http://example.com/site/robots.txt
uct.ac.za: #
uct.ac.za: # For more information about the robots.txt standard, see:
uct.ac.za: # http://www.robotstxt.org/robotstxt.html
uct.ac.za: # CSS, JS, Images
uct.ac.za: # Directories
uct.ac.za: # Files
uct.ac.za: # Paths (clean URLs)
uct.ac.za: # Paths (no clean URLs)
lit.link: # https://www.robotstxt.org/robotstxt.html
picsart.com: # Disallow.
picsart.com: # For time being
picsart.com: # Sitemaps.
bravotv.com: #
bravotv.com: # robots.txt
bravotv.com: #
bravotv.com: # This file is to prevent the crawling and indexing of certain parts
bravotv.com: # of your site by web crawlers and spiders run by sites like Yahoo!
bravotv.com: # and Google. By telling these "robots" where not to go on your site,
bravotv.com: # you save bandwidth and server resources.
bravotv.com: #
bravotv.com: # This file will be ignored unless it is at the root of your host:
bravotv.com: # Used: http://example.com/robots.txt
bravotv.com: # Ignored: http://example.com/site/robots.txt
bravotv.com: #
bravotv.com: # For more information about the robots.txt standard, see:
bravotv.com: # http://www.robotstxt.org/robotstxt.html
bravotv.com: # CSS, JS, Images
bravotv.com: # Directories
bravotv.com: # Files
bravotv.com: # Paths (clean URLs)
bravotv.com: # Paths (no clean URLs)
bravotv.com: # Ads, see https://bravotv.atlassian.net/browse/BO-537
theme.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
theme.co: # This robots.txt file is not used. Please append the content below in the robots.txt file located at the root
theme.co: #
resh.edu.ru: # www.robotstxt.org/
resh.edu.ru: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
jcrew.com: #
jcrew.com: # Site: factory.jcrew.com WWW
jcrew.com: #
jcrew.com: # This file is retrieved automatically by crawlers conforming to
jcrew.com: # the Robots.txt standard. It defines what URLs should/shouldn't
jcrew.com: # be indexed.
jcrew.com: # See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt>
jcrew.com: #
jcrew.com: # Format:
jcrew.com: # User-agent: <agent-string>
jcrew.com: # Disallow: <nothing> | <path>
jcrew.com: # -----------------------------------------------------------------------------
jcrew.com: # All User Agent Exclusions
influence.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
influence.co: #
influence.co: # To ban all spiders from the entire site uncomment the next two lines:
vajehyab.com: #container {
rfa.org: # Define access-restrictions for robots/spiders
rfa.org: # http://www.robotstxt.org/wc/norobots.html
rfa.org: # By default we allow robots to access all areas of our site
rfa.org: # already accessible to anonymous users
rfa.org: # Add Googlebot-specific syntax extension to exclude forms
rfa.org: # that are repeated for each piece of content in the site
rfa.org: # the wildcard is only supported by Googlebot
rfa.org: # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling
modern.az: # Sitemap files
manageengine.com: # --------------------------------------------------
manageengine.com: # Robots.txt file for https://www.manageengine.com
manageengine.com: # Author: Webmaster
manageengine.com: # Last Updated Date: 08/02/2021
manageengine.com: # --------------------------------------------------
globes.co.il: # Robots.txt file
globes.co.il: #
globes.co.il: # All robots will spider the domain
turkcell.com.tr: # only access
businessofapps.com: #Googlebot
businessofapps.com: # Global
searchandshopping.org: ## Default robots.txt
donedeal.ie: # added 22/09/2009
donedeal.ie: # added 9/4/2011 by Fred (trying to block Donkiz, but not sure if this works)
donedeal.ie: # added 23/05/2011 by Declan (trying to block Sightup)
donedeal.ie: # added 16/10/2014 by Pete
donedeal.ie: # added 20/02/2015 by Pete
donedeal.ie: # added 16/09/2016 by Pete
uscourts.gov: #
uscourts.gov: # robots.txt
uscourts.gov: #
uscourts.gov: # This file is to prevent the crawling and indexing of certain parts
uscourts.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
uscourts.gov: # and Google. By telling these "robots" where not to go on your site,
uscourts.gov: # you save bandwidth and server resources.
uscourts.gov: #
uscourts.gov: # This file will be ignored unless it is at the root of your host:
uscourts.gov: # Used: http://example.com/robots.txt
uscourts.gov: # Ignored: http://example.com/site/robots.txt
uscourts.gov: #
uscourts.gov: # For more information about the robots.txt standard, see:
uscourts.gov: # http://www.robotstxt.org/robotstxt.html
uscourts.gov: # CSS, JS, Images
uscourts.gov: # Directories
uscourts.gov: # Files
uscourts.gov: # Paths (clean URLs)
uscourts.gov: # Paths (no clean URLs)
pangzitv.com: #notfound {
pangzitv.com: #notfound .notfound {
thetimes.co.uk: #Agent Specific Disallowed Sections
rabobank.nl: # Robots.txt voor www.rabobank.nl
rabobank.nl: # Directories die niet geindexeerd hoeven te worden door externe search
rabobank.nl: # engines #
sendle.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
tamilwin.com: # Disallow: /*? This is match ? anywhere in the URL
xactlycorp.com: #
xactlycorp.com: # robots.txt
xactlycorp.com: #
xactlycorp.com: # This file is to prevent the crawling and indexing of certain parts
xactlycorp.com: # of your site by web crawlers and spiders run by sites like Yahoo!
xactlycorp.com: # and Google. By telling these "robots" where not to go on your site,
xactlycorp.com: # you save bandwidth and server resources.
xactlycorp.com: #
xactlycorp.com: # This file will be ignored unless it is at the root of your host:
xactlycorp.com: # Used: http://example.com/robots.txt
xactlycorp.com: # Ignored: http://example.com/site/robots.txt
xactlycorp.com: #
xactlycorp.com: # For more information about the robots.txt standard, see:
xactlycorp.com: # http://www.robotstxt.org/robotstxt.html
xactlycorp.com: # CSS, JS, Images
xactlycorp.com: # Directories
xactlycorp.com: # Files
xactlycorp.com: # Paths (clean URLs)
xactlycorp.com: # Paths (no clean URLs)
xactlycorp.com: # Query URLs
here.com: #
here.com: # robots.txt
here.com: #
here.com: # This file is to prevent the crawling and indexing of certain parts
here.com: # of your site by web crawlers and spiders run by sites like Yahoo!
here.com: # and Google. By telling these "robots" where not to go on your site,
here.com: # you save bandwidth and server resources.
here.com: #
here.com: # This file will be ignored unless it is at the root of your host:
here.com: # Used: http://example.com/robots.txt
here.com: # Ignored: http://example.com/site/robots.txt
here.com: #
here.com: # For more information about the robots.txt standard, see:
here.com: # http://www.robotstxt.org/robotstxt.html
here.com: # CSS, JS, Images
here.com: # Directories
here.com: # Files
here.com: # Paths (clean URLs)
here.com: # Paths (no clean URLs)
xiaomi.net: # 2015/12/11
alaskaair.com: #Update 4/16/2019 - 9:27AM SLH
alaskaair.com: #Sites
alaskaair.com: #REMARK: to allow SE to de-indices url, remove/uncomment after SE de-indexed all these urls using on Accuwork #
alaskaair.com: #Disallow: /contents.asp
alaskaair.com: #Disallow: /Home.asp
alaskaair.com: #Disallow: /mileageplan/AboutMP.asp
alaskaair.com: #Disallow: /mileageplan/awardsAustralia.asp
alaskaair.com: #Disallow: /mileageplan/awardsSAmerica.asp
alaskaair.com: #Disallow: /mileageplan/CustomerComments.asp
alaskaair.com: #Disallow: /mileageplan/definitions.asp
alaskaair.com: #Disallow: /mileageplan/MemberGuide.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Airline.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Car.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Dining.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Financial.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Hotel.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Specialty.asp
alaskaair.com: #Disallow: /mileageplan/MileagePartners_Telecom.asp
alaskaair.com: #Disallow: /mileageplan/MPpremiums.asp
alaskaair.com: #Disallow: /mileageplan/MVPStatus.asp
alaskaair.com: #Disallow: /mileageplan/OnlineAwardChart.asp
alaskaair.com: #Disallow: /mileageplan/PartnerMilesOps.asp
alaskaair.com: #Disallow: /mileageplan/faqs/Awards.asp
alaskaair.com: #Disallow: /mileageplan/faqs/Credit.asp
alaskaair.com: #Disallow: /mileageplan/faqs/EStatements.asp
alaskaair.com: #Disallow: /mileageplan/faqs/mpfaq.asp
alaskaair.com: #Disallow: /mileageplan/faqs/MVP.asp
alaskaair.com: #Disallow: /mileageplan/faqs/OtherMP.asp
alaskaair.com: #Disallow: /mileageplan/faqs/PDRes.asp
alaskaair.com: #Disallow: /mileageplan/faqs/Upgrades.asp
alaskaair.com: #Disallow: /mileageplan/ssl/partner/PartnerForm.asp
alaskaair.com: #Disallow: /shared/tips/AboutCompanyFares.asp
alaskaair.com: #Disallow: /shared/tips/AboutECertTip.asp
alaskaair.com: #Disallow: /shared/tips/AboutFareOptions.asp
alaskaair.com: #Old Content
alaskaair.com: #site core or as.com partial Content
alaskaair.com: #PDFs
alaskaair.com: #Disallow: /mileageplan/ExpressionOfThanks.pdf
alaskaair.com: #images
alaskaair.com: #parameters
alaskaair.com: #support files
alaskaair.com: #web services
nos.nl: # www.robotstxt.org/
nos.nl: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
mediapost.com: # robots.txt
mediapost.com: # Tell "bitlybot" not to come here at all
mediapost.com: # From NYT.com - nobody seems to like this bot
mediapost.com: # Crawlers that are kind enough to obey, but which we'd rather not have
mediapost.com: # unless they're feeding search engines.
mediapost.com: # Some bots are known to be trouble, particularly those designed to copy
mediapost.com: # entire sites. Please obey robots.txt.
mediapost.com: #
mediapost.com: # Sorry, wget in its recursive mode is a frequent problem.
mediapost.com: # Please read the man page and use it properly; there is a
mediapost.com: # --wait option you can use to set the delay between hits,
mediapost.com: # for instance.
mediapost.com: #
mediapost.com: #
mediapost.com: # The 'grub' distributed client has been *very* poorly behaved.
mediapost.com: #
mediapost.com: #
mediapost.com: # Doesn't follow robots.txt anyway, but...
mediapost.com: #
mediapost.com: #
mediapost.com: # Hits many times per second, not acceptable
mediapost.com: # http://www.nameprotect.com/botinfo.html
mediapost.com: # A capture bot, downloads gazillions of pages with no public benefit
mediapost.com: # http://www.webreaper.net/
mediapost.com: #
mediapost.com: # Friendly, low-speed bots are welcome viewing pages.
mediapost.com: #
mediapost.com: #
mediapost.com: # GoogleBot
mediapost.com: #
mediapost.com: #
mediapost.com: # MSN Bot listens to Crawl-Delay
mediapost.com: #
mediapost.com: #
mediapost.com: # Yahoo/Inktomi listens to Crawl-Delay
mediapost.com: #
mediapost.com: #Baiduspider
mediapost.com: #Yandex
alldatasheet.com: # User-agent: *
alldatasheet.com: # Disallow: /img
alldatasheet.com: # Disallow: /view_distributor2.jsp
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=675
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=836
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=929
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=247
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=164
alldatasheet.com: # Disallow: /manufacturer/companylist.jsp?list=411
alldatasheet.com: #User-agent: Googlebot
alldatasheet.com: #Disallow: /
alldatasheet.com: #Crawl-delay: 1
alldatasheet.com: #User-agent: MSNBot
alldatasheet.com: #Disallow: /
alldatasheet.com: #Crawl-delay: 20
alldatasheet.com: #User-agent: BaiDuSpider
alldatasheet.com: #Disallow: /
alldatasheet.com: #Crawl-delay: 20
alldatasheet.com: #User-agent: bingbot
alldatasheet.com: #Disallow: /
alldatasheet.com: #Crawl-delay: 20
alldatasheet.com: #User-agent: YandexBot
alldatasheet.com: #Disallow: /
alldatasheet.com: #Crawl-delay: 20
stackblitz.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
moneyhouse.ch: #Prevent bots from crawling IT-Person Profiles to boost the hitlist
imf.org: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
imf.org: #content{margin:0 0 0 2%;position:relative;}
weedmaps.com: # WP Defaults
weedmaps.com: # WP Learn
weedmaps.com: # WP Open CA
weedmaps.com: # WP Review guidelines
weedmaps.com: # WP Sports
weedmaps.com: # WP Verified
weedmaps.com: # WP Weed Facts
kotlinlang.org: # Sitemaps
gamepedia.jp: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
leafly.com: # Modified: 01/12/2021
salary.com: # robots.txt for https://www.salary.com/
wuzzuf.net: #Disallow access to iframes
epson.com: # For all robots
epson.com: # Block access to specific groups of pages
epson.com: # Allow search crawlers to discover the sitemap
epson.com: # Block CazoodleBot as it does not present correct accept content headers
epson.com: # Block MJ12bot as it is just noise
epson.com: # Block dotbot as it cannot parse base urls properly
epson.com: # Block Gigabot
xoom.com: # Per robots.txt directives, blank lines are not good in robots.txt
xoom.com: # Please see www.robotstxt.org and the wikipedia page for proper syntax
clinicaltrials.gov: # robots.txt - robot exclusion file - back-end server version - no robots!
clinicaltrials.gov: # ========================================================================
vmware.com: # List folders crawlers are not allowed to Index.
vmware.com: # List PDFs crawlers are not allowed to Index.
foodandwine.com: # Sitemaps
foodandwine.com: #Onecms
foodandwine.com: # Content
foodandwine.com: #Onecms
foodandwine.com: # Content
estrategiaconcursos.com.br: # Robots.txt file from http://www.estrategiaconcursos.com.br
tnt.com: # If you are not a robot sniffing around in this file,
tnt.com: # We might be looking for you to join our SEO team
tnt.com: # Contact us here: search.advertising@tnt.com
tnt.com: #
tnt.com: # _____ _ _ _____
tnt.com: # |_ _| \| |_ _|
tnt.com: # | | | .` | | |
tnt.com: # |_| |_|\_| |_|
tnt.com: #
nist.gov: #
nist.gov: # robots.txt
nist.gov: #
nist.gov: # This file is to prevent the crawling and indexing of certain parts
nist.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
nist.gov: # and Google. By telling these "robots" where not to go on your site,
nist.gov: # you save bandwidth and server resources.
nist.gov: #
nist.gov: # This file will be ignored unless it is at the root of your host:
nist.gov: # Used: http://example.com/robots.txt
nist.gov: # Ignored: http://example.com/site/robots.txt
nist.gov: #
nist.gov: # For more information about the robots.txt standard, see:
nist.gov: # http://www.robotstxt.org/robotstxt.html
nist.gov: # CSS, JS, Images
nist.gov: # Directories
nist.gov: # Files
nist.gov: # Paths (clean URLs)
nist.gov: # Paths (no clean URLs)
nist.gov: # Noindex files
raiffeisen.ch: # Exclude Special Areas
raiffeisen.ch: # Exclude Special Microsites
raiffeisen.ch: # Exclude Special Filetypes
raiffeisen.ch: # Stop Wasting Crawlbudget
raiffeisen.ch: # Exclude old casa urls
raiffeisen.ch: # Exclude livestream
sankei.com: # sitemap
sankei.com: # not contents
sankei.com: # not crawl target
sankei.com: # old contents
shatel.ir: # www.robotstxt.org/
shatel.ir: # Allow crawling of all content
sociopost.com: #
sociopost.com: # robots.txt
sociopost.com: #
sociopost.com: # This file is to prevent the crawling and indexing of certain parts
sociopost.com: # of your site by web crawlers and spiders run by sites like Yahoo!
sociopost.com: # and Google. By telling these "robots" where not to go on your site,
sociopost.com: # you save bandwidth and server resources.
sociopost.com: #
sociopost.com: # This file will be ignored unless it is at the root of your host:
sociopost.com: # Used: http://example.com/robots.txt
sociopost.com: # Ignored: http://example.com/site/robots.txt
sociopost.com: #
sociopost.com: # For more information about the robots.txt standard, see:
sociopost.com: # http://www.robotstxt.org/wc/robots.html
sociopost.com: #
sociopost.com: # For syntax checking, see:
sociopost.com: # http://www.sxw.org.uk/computing/robots/check.html
sociopost.com: # Directories
sociopost.com: # Files
sociopost.com: # Paths (clean URLs)
sociopost.com: # Paths (no clean URLs)
polkastarter.com: # See https://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
electrek.co: # Sitemap archive
newchic.com: #ÊêúÁ¥¢
newchic.com: #搜索分类
state.ma.us: #
state.ma.us: # robots.txt
state.ma.us: #
state.ma.us: # This file is to prevent the crawling and indexing of certain parts
state.ma.us: # of your site by web crawlers and spiders run by sites like Yahoo!
state.ma.us: # and Google. By telling these "robots" where not to go on your site,
state.ma.us: # you save bandwidth and server resources.
state.ma.us: #
state.ma.us: # This file will be ignored unless it is at the root of your host:
state.ma.us: # Used: http://example.com/robots.txt
state.ma.us: # Ignored: http://example.com/site/robots.txt
state.ma.us: #
state.ma.us: # For more information about the robots.txt standard, see:
state.ma.us: # http://www.robotstxt.org/robotstxt.html
state.ma.us: # CSS, JS, Images
state.ma.us: # Directories
state.ma.us: # Files
state.ma.us: # Paths (clean URLs)
state.ma.us: # Paths (no clean URLs)
lucidpress.com: #
lucidpress.com: # robots.txt
lucidpress.com: #
lucidpress.com: # This file is to prevent the crawling and indexing of certain parts
lucidpress.com: # of your site by web crawlers and spiders run by sites like Yahoo!
lucidpress.com: # and Google. By telling these "robots" where not to go on your site,
lucidpress.com: # you save bandwidth and server resources.
lucidpress.com: #
lucidpress.com: # This file will be ignored unless it is at the root of your host:
lucidpress.com: # Used: http://example.com/robots.txt
lucidpress.com: # Ignored: http://example.com/site/robots.txt
lucidpress.com: #
lucidpress.com: # For more information about the robots.txt standard, see:
lucidpress.com: # http://www.robotstxt.org/wc/robots.html
lucidpress.com: #
lucidpress.com: # For syntax checking, see:
lucidpress.com: # http://www.sxw.org.uk/computing/robots/check.html
lucidpress.com: # Directories
lucidpress.com: # Paths (no clean URLs)
lucidpress.com: #####
lucidpress.com: # Drupal
lucidpress.com: #####
lucidpress.com: # Directories
lucidpress.com: # Allow some content from /pages/misc
lucidpress.com: # Files
lucidpress.com: # Paths (clean URLs)
lucidpress.com: # Paths (no clean URLs)
lucidpress.com: # Rewrites
lucidpress.com: #####
lucidpress.com: # Code-Base
lucidpress.com: #
lucidpress.com: # The following URL's are defined in our routing files,
lucidpress.com: # but have no value for indexing. Several of them should
lucidpress.com: # definitely NOT be indexed.
lucidpress.com: #####
tiava.com: # www.robotstxt.org/
tiava.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
upstream.to: #User-agent: *
upstream.to: #Disallow: /
thestudentroom.co.uk: #Disallow: /w/
thestudentroom.co.uk: #Disallow: /ads.php
thestudentroom.co.uk: #Disallow: /*showthread.php?p=
thestudentroom.co.uk: #Disallow: /m/
opensource.com: #
opensource.com: # robots.txt
opensource.com: #
opensource.com: # This file is to prevent the crawling and indexing of certain parts
opensource.com: # of your site by web crawlers and spiders run by sites like Yahoo!
opensource.com: # and Google. By telling these "robots" where not to go on your site,
opensource.com: # you save bandwidth and server resources.
opensource.com: #
opensource.com: # This file will be ignored unless it is at the root of your host:
opensource.com: # Used: http://example.com/robots.txt
opensource.com: # Ignored: http://example.com/site/robots.txt
opensource.com: #
opensource.com: # For more information about the robots.txt standard, see:
opensource.com: # http://www.robotstxt.org/robotstxt.html
opensource.com: # CSS, JS, Images
opensource.com: # Directories
opensource.com: # Files
opensource.com: # Paths (clean URLs)
opensource.com: # Paths (no clean URLs)
yellowimages.com: # https://megaindex.com/crawler
yellowimages.com: # Screaming Frog SEO Spider
yellowimages.com: # https://www.screamingfrog.co.uk/seo-spider/
yellowimages.com: # http://webmeup-crawler.com
tableau.com: # Directories
tableau.com: # Paths (clean URLs)
tableau.com: # Paths (no clean URLs)
tableau.com: # Tableau
tableau.com: # Email only downloads
tableau.com: # BingBot can be overzealous, calm down.
tableau.com: # Bots without value
tableau.com: # re #7789 - temporarily unblock search vendor
tableau.com: # User-agent: SemrushBot
tableau.com: # Disallow: /
gingersoftware.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $
gingersoftware.com: #
gingersoftware.com: # robots.txt
gingersoftware.com: #
gingersoftware.com: # This file is to prevent the crawling and indexing of certain parts
gingersoftware.com: # of your site by web crawlers and spiders run by sites like Yahoo!
gingersoftware.com: # and Google. By telling these "robots" where not to go on your site,
gingersoftware.com: # you save bandwidth and server resources.
gingersoftware.com: #
gingersoftware.com: # This file will be ignored unless it is at the root of your host:
gingersoftware.com: # Used: http://example.com/robots.txt
gingersoftware.com: # Ignored: http://example.com/site/robots.txt
gingersoftware.com: #
gingersoftware.com: # For more information about the robots.txt standard, see:
gingersoftware.com: # http://www.robotstxt.org/wc/robots.html
gingersoftware.com: #
gingersoftware.com: # For syntax checking, see:
gingersoftware.com: # http://www.sxw.org.uk/computing/robots/check.html
gingersoftware.com: # Directories
gingersoftware.com: # Files
gingersoftware.com: # Paths (clean URLs)
gingersoftware.com: # Paths (no clean URLs)
rocketmortgage.com: # password reset
assam.gov.in: #
assam.gov.in: # robots.txt
assam.gov.in: #
assam.gov.in: # This file is to prevent the crawling and indexing of certain parts
assam.gov.in: # of your site by web crawlers and spiders run by sites like Yahoo!
assam.gov.in: # and Google. By telling these "robots" where not to go on your site,
assam.gov.in: # you save bandwidth and server resources.
assam.gov.in: #
assam.gov.in: # This file will be ignored unless it is at the root of your host:
assam.gov.in: # Used: http://example.com/robots.txt
assam.gov.in: # Ignored: http://example.com/site/robots.txt
assam.gov.in: #
assam.gov.in: # For more information about the robots.txt standard, see:
assam.gov.in: # http://www.robotstxt.org/robotstxt.html
assam.gov.in: # CSS, JS, Images
assam.gov.in: # Directories
assam.gov.in: # Files
assam.gov.in: # Paths (clean URLs)
assam.gov.in: # Paths (no clean URLs)
better.com: # http://www.robotstxt.org
coach.com: #4/1/2019
coach.com: #2358
sport.pl: #cookieInfoMsgWrapper {margin-bottom: -2px;}
balr.com: # we use Shopify as our ecommerce platform
balr.com: # Google adsbot ignores robots.txt unless specifically named!
zebra.com: #NS
eluniversal.com.mx: #
eluniversal.com.mx: # robots.txt
eluniversal.com.mx: #
eluniversal.com.mx: # This file is to prevent the crawling and indexing of certain parts
eluniversal.com.mx: # of your site by web crawlers and spiders run by sites like Yahoo!
eluniversal.com.mx: # and Google. By telling these "robots" where not to go on your site,
eluniversal.com.mx: # you save bandwidth and server resources.
eluniversal.com.mx: #
eluniversal.com.mx: # This file will be ignored unless it is at the root of your host:
eluniversal.com.mx: # Used: http://example.com/robots.txt
eluniversal.com.mx: # Ignored: http://example.com/site/robots.txt
eluniversal.com.mx: #
eluniversal.com.mx: # For more information about the robots.txt standard, see:
eluniversal.com.mx: # http://www.robotstxt.org/robotstxt.html
eluniversal.com.mx: # CSS, JS, Images
eluniversal.com.mx: # Directories
eluniversal.com.mx: # Files
eluniversal.com.mx: # Paths (clean URLs)
eluniversal.com.mx: # Paths (no clean URLs)
ucr.edu: #
ucr.edu: # robots.txt
ucr.edu: #
ucr.edu: # This file is to prevent the crawling and indexing of certain parts
ucr.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
ucr.edu: # and Google. By telling these "robots" where not to go on your site,
ucr.edu: # you save bandwidth and server resources.
ucr.edu: #
ucr.edu: # This file will be ignored unless it is at the root of your host:
ucr.edu: # Used: http://example.com/robots.txt
ucr.edu: # Ignored: http://example.com/site/robots.txt
ucr.edu: #
ucr.edu: # For more information about the robots.txt standard, see:
ucr.edu: # http://www.robotstxt.org/robotstxt.html
ucr.edu: # CSS, JS, Images
ucr.edu: # Directories
ucr.edu: # Files
ucr.edu: # Paths (clean URLs)
ucr.edu: # Paths (no clean URLs)
ddengle.com: # robots.txt
theadventurechallenge.com: # we use Shopify as our ecommerce platform
theadventurechallenge.com: # Google adsbot ignores robots.txt unless specifically named!
hkej.com: #User-agent: *
hkej.com: #Disallow: /rss/onlinenews.xml
hkej.com: #Disallow: /rss/shopping.xml
hkej.com: #Disallow: /rss/wine.xml
hkej.com: #Disallow: /template/forum/
hkej.com: #Disallow: /template/blog/
hkej.com: #Disallow: /template/xml/
hkej.com: #Disallow: /rss/onlinenews.xml
hkej.com: #Disallow: /rss/shopping.xml
hkej.com: #Disallow: /rss/wine.xml
hkej.com: #Disallow: /template/forum/
hkej.com: #Disallow: /template/blog/
hkej.com: #Disallow: /template/xml/
hkej.com: #Sitemap: http://www.hkej.com/rss/sitemap.xml
bloter.net: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/
worldofwarships.ru: # General
worldofwarships.ru: # News
worldofwarships.ru: # Media
asianwiki.com: #
asianwiki.com: # Sorry, wget in its recursive mode is a frequent problem.
asianwiki.com: # Please read the man page and use it properly; there is a
asianwiki.com: # --wait option you can use to set the delay between hits,
asianwiki.com: # for instance.
asianwiki.com: #
asianwiki.com: #
asianwiki.com: # Hits many times per second, not acceptable
asianwiki.com: # http://www.nameprotect.com/botinfo.html
asianwiki.com: # A capture bot, downloads gazillions of pages with no public benefit
asianwiki.com: # http://www.webreaper.net/
asianwiki.com: # Don't allow the wayback-maschine to index user-pages
decolar.com: #Robots default PT
decolar.com: #Los siguientes 2 son de las nuevas landings de hoteles en destinos
decolar.com: #Bloquea las landings de hoteles en pais
decolar.com: #Los siguientes 3 son por la estructura de urls de mobile
decolar.com: #Allow para hoteles search
decolar.com: #Bloquea paginas de hoteles version para imprimir
decolar.com: #Paquetes
decolar.com: #Actividades
decolar.com: #Buses
decolar.com: #Transfer
decolar.com: #Special clients
decolar.com: #Bloqueos especificos para el bot de QualityScore de SEM
decolar.com: #Bloqueo Baidu:
decolar.com: #multidestinos
decolar.com: #hermes
rokomari.com: # If you operate a search engine and would like to crawl Rokomari.Com, please
rokomari.com: # email us admin@rokomari.com . Thanks.
rokomari.com: #Disallow: /static/* # Allowed at 2019-11-07_11-17 by the request of Shougat Hossain - Fahad Ahammed
rokomari.com: # Jhokomari Blocked by Shougat Vai
bhg.com: #Sitemaps
bhg.com: # ONECMS
bhg.com: # Content
bhg.com: # ONECMS
bhg.com: # Content - allows syndication
518.com.tw: # Robots.txt file from https://www.518.com.tw
usa.gov: #
usa.gov: # robots.txt
usa.gov: #
usa.gov: # This file is to prevent the crawling and indexing of certain parts
usa.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
usa.gov: # and Google. By telling these "robots" where not to go on your site,
usa.gov: # you save bandwidth and server resources.
usa.gov: #
usa.gov: # This file will be ignored unless it is at the root of your host:
usa.gov: # Used: http://example.com/robots.txt
usa.gov: # Ignored: http://example.com/site/robots.txt
usa.gov: #
usa.gov: # For more information about the robots.txt standard, see:
usa.gov: # http://www.robotstxt.org/robotstxt.html
usa.gov: # Sitemaps
usa.gov: # CSS, JS, Images
usa.gov: # Directories
usa.gov: # Files
usa.gov: # Paths (clean URLs)
usa.gov: # Paths (no clean URLs)
usa.gov: # Specific pathing blocks for USA.gov and GobiernoUSA.gov
apple.com.cn: # robots.txt for https://www.apple.com.cn/
specialized.com: # For all robots
specialized.com: # Block access to specific groups of pages
specialized.com: # Block access to URL filters
specialized.com: # Allow search crawlers to discover the sitemap
specialized.com: # Block CazoodleBot as it does not present correct accept content headers
specialized.com: # Block MJ12bot as it is just noise
specialized.com: # Block dotbot as it cannot parse base urls properly
specialized.com: # Block Gigabot
avvocatoandreani.it: #
avvocatoandreani.it: # robots.txt v. 1.02
avvocatoandreani.it: #
umontreal.ca: # Accepter l'indexation
thread.com: # The following are temporary disallows to exclude noindex'ed pages from scraping
thread.com: # before they are hit. In the long term we should add nofollow to links to any of
thread.com: # pages, which requires efficent computation of whether the linked page is
thread.com: # indexable.
thread.com: # Updating these? You should also update filterset_indexable().
thread.com: # Filters:
thread.com: # Other:
thread.com: # Item sources which are being indexed for an unknown reason (they redirect).
thread.com: # Temporary until we switch to query parameters.
radiojavan.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
somuch.com: # Robots need homes too
cfi.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
cfi.cn: #content{margin:0 0 0 2%;position:relative;}
btctrademart.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
btctrademart.com: #content{margin:0 0 0 2%;position:relative;}
javmost.com: #container {
reforma.com: # Primero deben especificarse los Allow y despues los Disallow
reforma.com: # Para el caso de que twitter pueda llegar al ImageTransformer
reforma.com: # robots.txt es case sensitive. Un url en bajas es difernte a un URL en altas
mubasher.info: # Integration
mubasher.info: # Static resources
mubasher.info: # Chart Files
mubasher.info: # API
mubasher.info: # # # #
mubasher.info: # MIX ORM URLs FI
mubasher.info: # MIX ORM URLs IPOS
olx.uz: # sitecode:olxuz-desktop
wallstreetoasis.com: #
wallstreetoasis.com: # robots.txt
wallstreetoasis.com: #
wallstreetoasis.com: # This file is to prevent the crawling and indexing of certain parts
wallstreetoasis.com: # of your site by web crawlers and spiders run by sites like Yahoo!
wallstreetoasis.com: # and Google. By telling these "robots" where not to go on your site,
wallstreetoasis.com: # you save bandwidth and server resources.
wallstreetoasis.com: #
wallstreetoasis.com: # This file will be ignored unless it is at the root of your host:
wallstreetoasis.com: # Used: http://example.com/robots.txt
wallstreetoasis.com: # Ignored: http://example.com/site/robots.txt
wallstreetoasis.com: #
wallstreetoasis.com: # For more information about the robots.txt standard, see:
wallstreetoasis.com: # http://www.robotstxt.org/robotstxt.html
wallstreetoasis.com: #
wallstreetoasis.com: # For syntax checking, see:
wallstreetoasis.com: # http://www.frobee.com/robots-txt-check
wallstreetoasis.com: # CSS, JS, Images
wallstreetoasis.com: # Directories
wallstreetoasis.com: # Files
wallstreetoasis.com: # Paths (clean URLs)
wallstreetoasis.com: # Paths (no clean URLs)
wallstreetoasis.com: # No access for table sorting paths or any paths that have parameters.
wallstreetoasis.com: # No access for quicktabs in the URL
wallstreetoasis.com: # Disallow URLs with destination parameter
wallstreetoasis.com: # Added by Khalid, 2012-02-27 per Patrick's request
wallstreetoasis.com: # Added by Khalid, 2012-04-02 per Joao's request
wallstreetoasis.com: # Added by Vitor, 2012-05-02 per Joao's request
wallstreetoasis.com: # Added by Vitor, 2012-05-02 per Khalid's request
wallstreetoasis.com: # Added by Vitor, 2012-05-16
wallstreetoasis.com: # Added by Vitor, 2012-05-30
wallstreetoasis.com: # Added by Vitor, 2012-09-04 per Joao's request
wallstreetoasis.com: # Added by Joao, 2013-02-21
wallstreetoasis.com: # Added by Joao, 2013-03-12
wallstreetoasis.com: #Disallow: /company/*/review
wallstreetoasis.com: #Disallow: /?q=company/*/review
wallstreetoasis.com: # comments by jgsantos on 2016-02-11
wallstreetoasis.com: #Disallow: /company/*/interview
wallstreetoasis.com: #Disallow: /?q=company/*/interview
wallstreetoasis.com: #Disallow: /company/*/compensation
wallstreetoasis.com: #Disallow: /?q=company/*/compensation
wallstreetoasis.com: #Allows added by jgsantos 2017-10-19
wallstreetoasis.com: #Allow: /company/*/interview/compensation
wallstreetoasis.com: #Allow: /?q=company/*/interview/compensation
wallstreetoasis.com: #Allow: /company/*/interview/review
wallstreetoasis.com: #Allow: /?q=company/*/interview/review
wallstreetoasis.com: # Added by Joao, 2016-09-02
wallstreetoasis.com: #Added by jgsantos, 2017-10-23
wallstreetoasis.com: # Crawl Delay To Slow Down Rogerbot
wallstreetoasis.com: # @see: https://moz.com/help/moz-procedures/crawlers/rogerbot#crawl-delay-to-slow-down-rogerbot
yamaha.com: # robots.txt
thebump.com: # Updated on 11/16/17
ilfattoquotidiano.it: # BEGIN XML-SITEMAP-PLUGIN
ilfattoquotidiano.it: # END XML-SITEMAP-PLUGIN
dekiru.net: # robots.txt for https://dekiru.net/
freepeople.com: # Sitemap indexes
cimanow.tv: # Google Image
cimanow.tv: # Google AdSense
cimanow.tv: # digg mirror
cimanow.tv: # global
bigstockphoto.com: #Domain: www.bigstockphoto.com
aileensoul.com: #Disallow: /business-profile
aileensoul.com: #Disallow: */profile
stgeorge.com.au: # /robots.txt file for http://www.stgeorge.com.au/
getadsonline.com: # Blocks robots from specific folders / directories
ntvbd.com: #
ntvbd.com: # robots.txt
ntvbd.com: #
ntvbd.com: # This file is to prevent the crawling and indexing of certain parts
ntvbd.com: # of your site by web crawlers and spiders run by sites like Yahoo!
ntvbd.com: # and Google. By telling these "robots" where not to go on your site,
ntvbd.com: # you save bandwidth and server resources.
ntvbd.com: #
ntvbd.com: # This file will be ignored unless it is at the root of your host:
ntvbd.com: # Used: http://example.com/robots.txt
ntvbd.com: # Ignored: http://example.com/site/robots.txt
ntvbd.com: #
ntvbd.com: # For more information about the robots.txt standard, see:
ntvbd.com: # http://www.robotstxt.org/robotstxt.html
ntvbd.com: # CSS, JS, Images
ntvbd.com: # Directories
ntvbd.com: # Files
ntvbd.com: # Paths (clean URLs)
ntvbd.com: # Paths (no clean URLs)
findlaw.com: # Findlaw robots.txt file
uoregon.edu: #
uoregon.edu: # robots.txt
uoregon.edu: #
uoregon.edu: # This file is to prevent the crawling and indexing of certain parts
uoregon.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
uoregon.edu: # and Google. By telling these "robots" where not to go on your site,
uoregon.edu: # you save bandwidth and server resources.
uoregon.edu: #
uoregon.edu: # This file will be ignored unless it is at the root of your host:
uoregon.edu: # Used: http://example.com/robots.txt
uoregon.edu: # Ignored: http://example.com/site/robots.txt
uoregon.edu: #
uoregon.edu: # For more information about the robots.txt standard, see:
uoregon.edu: # http://www.robotstxt.org/robotstxt.html
uoregon.edu: # CSS, JS, Images
uoregon.edu: # Directories
uoregon.edu: # Files
uoregon.edu: # Paths (clean URLs)
uoregon.edu: # Paths (no clean URLs)
gutenberg.org: # User-agent: Baiduspider
gutenberg.org: # Disallow: /
gutenberg.org: # User-agent: Yandex
gutenberg.org: # Disallow: /
google.com.cu: # AdsBot
google.com.cu: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
miansai.com: # we use Shopify as our ecommerce platform
miansai.com: # Google adsbot ignores robots.txt unless specifically named!
zoopla.co.uk: # ___ ___ ___ ___ ___
zoopla.co.uk: # / /\ / /\ / /\ / /\ / /\
zoopla.co.uk: # / /::| / /::\ / /::\ / /::\ / /::\
zoopla.co.uk: # / /:/:| / /:/\:\ / /:/\:\ / /:/\:\ ___ ___ / /:/\:\
zoopla.co.uk: # / /:/|:|__ / /:/ \:\ / /:/ \:\ / /:/~/:/ /__/\ / /\ / /:/~/::\
zoopla.co.uk: #/__/:/ |:| /\ /__/:/ \__\:\ /__/:/ \__\:\ /__/:/ /:/ \ \:\ / /:/ /__/:/ /:/\:\
zoopla.co.uk: #\__\/ |:|/:/ \ \:\ / /:/ \ \:\ / /:/ \ \:\/:/ \ \:\ /:/ \ \:\/:/__\/
zoopla.co.uk: # | |:/:/ \ \:\ /:/ \ \:\ /:/ \ \::/ \ \:\/:/ \ \::/
zoopla.co.uk: # | |::/ \ \:\/:/ \ \:\/:/ \ \:\ \ \::/ \ \:\
zoopla.co.uk: # | |:/ \ \::/ \ \::/ \ \:\ \__\/ \ \:\
zoopla.co.uk: # |__|/ \__\/ \__\/ \__\/ \__\/
zoopla.co.uk: # Disallow: /property/location/edit/*
zoopla.co.uk: # Disallow: /property/edit/
zoopla.co.uk: # Baidu restricted to for sale and new homes
zoopla.co.uk: # Slurp (Still slurping why?)
zoopla.co.uk: # Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)
zoopla.co.uk: # blocked as they are making incorrect requests
zoopla.co.uk: # Let Google Ads crawl everything
fastspring.com: # robotstxt.org/
ebc.com.br: #
ebc.com.br: # robots.txt
ebc.com.br: #
ebc.com.br: # This file is to prevent the crawling and indexing of certain parts
ebc.com.br: # of your site by web crawlers and spiders run by sites like Yahoo!
ebc.com.br: # and Google. By telling these "robots" where not to go on your site,
ebc.com.br: # you save bandwidth and server resources.
ebc.com.br: #
ebc.com.br: # This file will be ignored unless it is at the root of your host:
ebc.com.br: # Used: http://example.com/robots.txt
ebc.com.br: # Ignored: http://example.com/site/robots.txt
ebc.com.br: #
ebc.com.br: # For more information about the robots.txt standard, see:
ebc.com.br: # http://www.robotstxt.org/robotstxt.html
ebc.com.br: # CSS, JS, Images
ebc.com.br: # Directories
ebc.com.br: # Files
ebc.com.br: # Paths (clean URLs)
ebc.com.br: # Paths (no clean URLs)
trysnow.com: # we use Shopify as our ecommerce platform
trysnow.com: # Google adsbot ignores robots.txt unless specifically named!
mavenlink.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
mavenlink.com: #
mavenlink.com: # To ban all spiders from the entire site uncomment the next two lines:
mavenlink.com: # User-Agent: *
mavenlink.com: # Disallow: /
elte.hu: # www.robotstxt.org/
elte.hu: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
wordtracker.com: # Robots file
verizonmedia.com: # http://www.robotstxt.org/robotstxt.html
verizonmedia.com: # TODO: disallow till we go to production
aphrodite1994.com: # Blocking CMS and other Directories
aphrodite1994.com: # Paths (clean URLs)
aphrodite1994.com: #Stop crawling user account and checkout pages by search engine robot
aphrodite1994.com: #Blocking native catalog and search pages:
aphrodite1994.com: # Files
aphrodite1994.com: # Do not index pages that are sorted or filtered.
aphrodite1994.com: # Do not index session ID
aphrodite1994.com: #Disallow: /*?
aphrodite1994.com: # CVS, SVN directory and dump files
aphrodite1994.com: #Webmasters block pages with filters..
aphrodite1994.com: #Host: aphrodite1994.com
aphrodite1994.com: #Sitemap: https://www.aphrodite1994.com/sitemaps/sitemap.xml
mercadolibre.com.ec: #siteId: MEC
mercadolibre.com.ec: #country: ecuador
mercadolibre.com.ec: ##Block - Referidos
mercadolibre.com.ec: ##Block - siteinfo urls
mercadolibre.com.ec: ##Block - Cart
mercadolibre.com.ec: ##Block Checkout
mercadolibre.com.ec: ##Block - User Logged
mercadolibre.com.ec: #Shipping selector
mercadolibre.com.ec: ##Block - last search
mercadolibre.com.ec: ## Block - Profile - By Id
mercadolibre.com.ec: ## Block - Profile - By Id and role (old version)
mercadolibre.com.ec: ## Block - Profile - Leg. Req.
mercadolibre.com.ec: ##Block - noindex
mercadolibre.com.ec: # Mercado-Puntos
mercadolibre.com.ec: # Viejo mundo
mercadolibre.com.ec: ##Block recommendations listing
91wii.com: #
91wii.com: # robots.txt for Discuz! X3
91wii.com: #
cmd5.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
cmd5.com: #content{margin:0 0 0 2%;position:relative;}
pptok.com: #
pptok.com: # robots.txt for EmpireCMS
pptok.com: #
splice.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
splice.com: #
splice.com: # To ban all spiders from the entire site uncomment the next two lines:
splice.com: # User-agent: *
splice.com: # Disallow: /
yummly.com: #
yummly.com: # Misbehaving bot
yummly.com: #
longdo.com: #
longdo.com: # robots.txt
longdo.com: #
longdo.com: # This file is to prevent the crawling and indexing of certain parts
longdo.com: # of your site by web crawlers and spiders run by sites like Yahoo!
longdo.com: # and Google. By telling these "robots" where not to go on your site,
longdo.com: # you save bandwidth and server resources.
longdo.com: #
longdo.com: # This file will be ignored unless it is at the root of your host:
longdo.com: # Used: http://example.com/robots.txt
longdo.com: # Ignored: http://example.com/site/robots.txt
longdo.com: #
longdo.com: # For more information about the robots.txt standard, see:
longdo.com: # http://www.robotstxt.org/robotstxt.html
longdo.com: #
longdo.com: # For syntax checking, see:
longdo.com: # http://www.frobee.com/robots-txt-check
longdo.com: # Directories
longdo.com: # Files
longdo.com: # Paths (clean URLs)
longdo.com: # Paths (no clean URLs)
neighborwebsj.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
bark.com: #
bark.com: # Bark is built by a small team based in London.
bark.com: # We're always looking for clever people.
bark.com: # If you'd like to help us out email team@bark.com
bark.com: # WOOF!
bark.com: # ."";._ _.---._ _.-"".
bark.com: # /_.'_ '-' /`-` \_ \
bark.com: # .' / `\ \ /` \ '.
bark.com: # .' / ; _ _ '-; \ ;'.
bark.com: # _.' ; /\ / \ \ \ ; '._;._
bark.com: # .-'.--. | / | \0|0/ \ | '-.
bark.com: # / /` \ | / .' \ | .---. \
bark.com: # | | | / /--' .-"""-. \ \/ \ |
bark.com: # \ \ / / / ( , , ) /\ \ | /
bark.com: # \ '----' .' | '-(_)-' | | '. / /
bark.com: # `'----'` | '. | `'----'`
bark.com: # \ `/
bark.com: # '. , .'
bark.com: # `-.____.' '.____.-'
bark.com: # \ /
bark.com: # '-'
bark.com: #
espn.com.mx: # robots.txt for deportes
bradsdeals.com: # SEO/SEM Competitor Tool Bot Block
bradsdeals.com: # Bots that obey Robots.txt block
bradsdeals.com: # Original robots disallows
bradsdeals.com: # Special Merchant Requests
bradsdeals.com: #Sitemap
technipages.com: # Google Image
technipages.com: # Google AdSense
technipages.com: # global
ideal.es: ## Sitemaps ##
ideal.es: ## User Agents ##
ideal.es: #redi14 #
ideal.es: #mob #
ideal.es: #temp #
zbporn.com: # Block AhrefsBot
reference.com: ## Reference robots.txt
madarsho.com: #
madarsho.com: # robots.txt
madarsho.com: #
madarsho.com: # This file is to prevent the crawling and indexing of certain parts
madarsho.com: # of your site by web crawlers and spiders run by sites like Yahoo!
madarsho.com: # and Google. By telling these "robots" where not to go on your site,
madarsho.com: # you save bandwidth and server resources.
madarsho.com: #
madarsho.com: # This file will be ignored unless it is at the root of your host:
madarsho.com: # Used: http://example.com/robots.txt
madarsho.com: # Ignored: http://example.com/site/robots.txt
madarsho.com: #
madarsho.com: # For more information about the robots.txt standard, see:
madarsho.com: # http://www.robotstxt.org/robotstxt.html
madarsho.com: # CSS, JS, Images
madarsho.com: # Directories
madarsho.com: # Files
madarsho.com: # Paths (clean URLs)
madarsho.com: # Paths (no clean URLs)
thetoc.gr: #Disallow: /Api/*
thetoc.gr: #Disallow: /api/*
thetoc.gr: #Disallow: /Search*
thetoc.gr: #Disallow: /search*
flvs.net: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
flvs.net: #content{margin:0 0 0 2%;position:relative;}
caracoltv.com: # Bloqueo de URL
caracoltv.com: # Agentes nocivos conocidos
hover.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
hover.com: #
hover.com: # To ban all spiders from the entire site uncomment the next two lines:
hover.com: # User-Agent: *
hover.com: # Disallow: /
hover.com: # Block search engines from the order thankyou pages
hover.com: # Block search engines from the welcome landers
ciudad.com.ar: #
ciudad.com.ar: # robots.txt
ciudad.com.ar: #
ciudad.com.ar: # This file is to prevent the crawling and indexing of certain parts
ciudad.com.ar: # of your site by web crawlers and spiders run by sites like Yahoo!
ciudad.com.ar: # and Google. By telling these "robots" where not to go on your site,
ciudad.com.ar: # you save bandwidth and server resources.
ciudad.com.ar: #
ciudad.com.ar: # This file will be ignored unless it is at the root of your host:
ciudad.com.ar: # Used: http://example.com/robots.txt
ciudad.com.ar: # Ignored: http://example.com/site/robots.txt
ciudad.com.ar: #
ciudad.com.ar: # For more information about the robots.txt standard, see:
ciudad.com.ar: # http://www.robotstxt.org/robotstxt.html
ciudad.com.ar: # CSS, JS, Images
ciudad.com.ar: # Directories
ciudad.com.ar: # Files
ciudad.com.ar: # Paths (clean URLs)
ciudad.com.ar: # Paths (no clean URLs)
prlog.org: # Please keep 10 seconds between requests
mercurynews.com: # Sitemap archive
0818tuan.com: #
0818tuan.com: # robots.txt for EmpireCMS
0818tuan.com: #
mclabels.com: # we use Shopify as our ecommerce platform
mclabels.com: # Google adsbot ignores robots.txt unless specifically named!
coinpot.co: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
coinpot.co: #content{margin:0 0 0 2%;position:relative;}
xatakandroid.com: # Crawlers that are kind enough to obey, but which we'd rather not have
xatakandroid.com: # unless they're feeding search engines.
xatakandroid.com: # Some bots are known to be trouble, particularly those designed to copy
xatakandroid.com: # entire sites. Please obey robots.txt.
xatakandroid.com: # Sorry, wget in its recursive mode is a frequent problem.
xatakandroid.com: # Please read the man page and use it properly; there is a
xatakandroid.com: # --wait option you can use to set the delay between hits,
xatakandroid.com: # for instance.
xatakandroid.com: #
xatakandroid.com: #
xatakandroid.com: # The 'grub' distributed client has been *very* poorly behaved.
xatakandroid.com: #
xatakandroid.com: #
xatakandroid.com: # Doesn't follow robots.txt anyway, but...
xatakandroid.com: #
xatakandroid.com: #
xatakandroid.com: # Hits many times per second, not acceptable
xatakandroid.com: # http://www.nameprotect.com/botinfo.html
xatakandroid.com: # A capture bot, downloads gazillions of pages with no public benefit
xatakandroid.com: # http://www.webreaper.net/
itsmyurls.com: # Rule 1
itsmyurls.com: # Rule 2
mercatoday.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
ktmdainik.com: #Google Search Engine Robot
bullmarketbrokers.com: # Sitemap
public.gr: # allow crawlers plus delaying each successive spider request
public.gr: #Crawl-delay: 10
public.gr: #Disallow: /assets/
public.gr: # Disallow all ../?parentCategoryID=cat...
public.gr: #Disallow: *parentCategoryId*
public.gr: # Sitemap files
public.gr: #Specific URLs
public.gr: # Blocking bad link checker robots
gifi.fr: # https://www.robotstxt.org/robotstxt.html
instyle.com: # Sitemaps
instyle.com: # CMS FE
instyle.com: #OCMS
instyle.com: #content
instyle.com: # CMS FE
instyle.com: #OCMS
instyle.com: #content
hardees.qa: # Search Pages
hardees.qa: # Cart Pages
hardees.qa: # User Pages
hardees.qa: # Other Pages
hardees.qa: # Misc Pages
jisho.org: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
jisho.org: #
jisho.org: # To ban all spiders from the entire site uncomment the next two lines:
jisho.org: #User-Agent: *
jisho.org: #Disallow: /
synonymo.fr: #content {
phonearena.com: # www.robotstxt.org/
phonearena.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
phonearena.com: #
buzzorange.com: # www.robotstxt.org/
buzzorange.com: # Allow crawling of all content
buzzorange.com: # Directories
buzzorange.com: # Files
moovitapp.com: # Block crawling of the web trip planner
youtubekids.com: # robots.txt file for YouTube Kids
cineca.it: #
cineca.it: # robots.txt
cineca.it: #
cineca.it: # This file is to prevent the crawling and indexing of certain parts
cineca.it: # of your site by web crawlers and spiders run by sites like Yahoo!
cineca.it: # and Google. By telling these "robots" where not to go on your site,
cineca.it: # you save bandwidth and server resources.
cineca.it: #
cineca.it: # This file will be ignored unless it is at the root of your host:
cineca.it: # Used: http://example.com/robots.txt
cineca.it: # Ignored: http://example.com/site/robots.txt
cineca.it: #
cineca.it: # For more information about the robots.txt standard, see:
cineca.it: # http://www.robotstxt.org/robotstxt.html
cineca.it: # CSS, JS, Images
cineca.it: # Directories
cineca.it: # Files
cineca.it: # Paths (clean URLs)
cineca.it: # Paths (no clean URLs)
qiita.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
ubagroup.com: # robots.txt for https://www.ubagroup.com/
webcasts.com: #content:before{content:"768";position:absolute;overflow:hidden;opacity:0;visibility:hidden;}@media (max-width:768px){.single.ast-separate-container .ast-author-meta{padding:1.5em 2.14em;}.single .ast-author-meta .post-author-avatar{margin-bottom:1em;}.ast-separate-container .ast-grid-2 .ast-article-post,.ast-separate-container .ast-grid-3 .ast-article-post,.ast-separate-container .ast-grid-4 .ast-article-post{width:100%;}.blog-layout-1 .post-content,.blog-layout-1 .ast-blog-featured-section{float:none;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on{margin-top:0;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:1em;}.ast-separate-container .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content{margin-top:-1.5em;}.ast-separate-container .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content{margin-left:-2.14em;margin-right:-2.14em;}.ast-separate-container .ast-article-single.remove-featured-img-padding .single-layout-1 .entry-header .post-thumb-img-content:first-child{margin-top:-1.5em;}.ast-separate-container .ast-article-single.remove-featured-img-padding .single-layout-1 .post-thumb-img-content{margin-left:-2.14em;margin-right:-2.14em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-left:-1.5em;margin-right:-1.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-left:-0.5em;margin-right:-0.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .square .posted-on{margin-top:0;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:1em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content{margin-top:-1.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content{margin-left:-1.5em;margin-right:-1.5em;}.blog-layout-2{display:flex;flex-direction:column-reverse;}.ast-separate-container .blog-layout-3,.ast-separate-container .blog-layout-1{display:block;}.ast-plain-container .ast-grid-2 .ast-article-post,.ast-plain-container .ast-grid-3 .ast-article-post,.ast-plain-container .ast-grid-4 .ast-article-post,.ast-page-builder-template .ast-grid-2 .ast-article-post,.ast-page-builder-template .ast-grid-3 .ast-article-post,.ast-page-builder-template .ast-grid-4 .ast-article-post{width:100%;}}@media (max-width:768px){.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-top:0;margin-left:-2.14em;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-top:0;margin-left:-1.14em;}}@media (min-width:769px){.single .ast-author-meta .ast-author-details{display:flex;}.ast-separate-container.ast-blog-grid-2 .ast-archive-description,.ast-separate-container.ast-blog-grid-3 .ast-archive-description,.ast-separate-container.ast-blog-grid-4 .ast-archive-description{margin-bottom:1.33333em;}.blog-layout-2.ast-no-thumb .post-content,.blog-layout-3.ast-no-thumb .post-content{width:calc(100% - 5.714285714em);}.blog-layout-2.ast-no-thumb.ast-no-date-box .post-content,.blog-layout-3.ast-no-thumb.ast-no-date-box .post-content{width:100%;}.ast-separate-container .ast-grid-2 .ast-article-post.ast-separate-posts,.ast-separate-container .ast-grid-3 .ast-article-post.ast-separate-posts,.ast-separate-container .ast-grid-4 .ast-article-post.ast-separate-posts{border-bottom:0;}.ast-separate-container .ast-grid-2 > .site-main > .ast-row,.ast-separate-container .ast-grid-3 > .site-main > .ast-row,.ast-separate-container .ast-grid-4 > .site-main > .ast-row{margin-left:-1em;margin-right:-1em;display:flex;flex-flow:row wrap;align-items:stretch;}.ast-separate-container .ast-grid-2 > .site-main > .ast-row:before,.ast-separate-container .ast-grid-2 > .site-main > .ast-row:after,.ast-separate-container .ast-grid-3 > .site-main > .ast-row:before,.ast-separate-container .ast-grid-3 > .site-main > .ast-row:after,.ast-separate-container .ast-grid-4 > .site-main > .ast-row:before,.ast-separate-container .ast-grid-4 > .site-main > .ast-row:after{flex-basis:0;width:0;}.ast-separate-container .ast-grid-2 .ast-article-post,.ast-separate-container .ast-grid-3 .ast-article-post,.ast-separate-container .ast-grid-4 .ast-article-post{display:flex;padding:0;}.ast-plain-container .ast-grid-2 > .site-main > .ast-row,.ast-plain-container .ast-grid-3 > .site-main > .ast-row,.ast-plain-container .ast-grid-4 > .site-main > .ast-row,.ast-page-builder-template .ast-grid-2 > .site-main > .ast-row,.ast-page-builder-template .ast-grid-3 > .site-main > .ast-row,.ast-page-builder-template .ast-grid-4 > .site-main > .ast-row{margin-left:-1em;margin-right:-1em;display:flex;flex-flow:row wrap;align-items:stretch;}.ast-plain-container .ast-grid-2 > .site-main > .ast-row:before,.ast-plain-container .ast-grid-2 > .site-main > .ast-row:after,.ast-plain-container .ast-grid-3 > .site-main > .ast-row:before,.ast-plain-container .ast-grid-3 > .site-main > .ast-row:after,.ast-plain-container .ast-grid-4 > .site-main > .ast-row:before,.ast-plain-container .ast-grid-4 > .site-main > .ast-row:after,.ast-page-builder-template .ast-grid-2 > .site-main > .ast-row:before,.ast-page-builder-template .ast-grid-2 > .site-main > .ast-row:after,.ast-page-builder-template .ast-grid-3 > .site-main > .ast-row:before,.ast-page-builder-template .ast-grid-3 > .site-main > .ast-row:after,.ast-page-builder-template .ast-grid-4 > .site-main > .ast-row:before,.ast-page-builder-template .ast-grid-4 > .site-main > .ast-row:after{flex-basis:0;width:0;}.ast-plain-container .ast-grid-2 .ast-article-post,.ast-plain-container .ast-grid-3 .ast-article-post,.ast-plain-container .ast-grid-4 .ast-article-post,.ast-page-builder-template .ast-grid-2 .ast-article-post,.ast-page-builder-template .ast-grid-3 .ast-article-post,.ast-page-builder-template .ast-grid-4 .ast-article-post{display:flex;}.ast-plain-container .ast-grid-2 .ast-article-post:last-child,.ast-plain-container .ast-grid-3 .ast-article-post:last-child,.ast-plain-container .ast-grid-4 .ast-article-post:last-child,.ast-page-builder-template .ast-grid-2 .ast-article-post:last-child,.ast-page-builder-template .ast-grid-3 .ast-article-post:last-child,.ast-page-builder-template .ast-grid-4 .ast-article-post:last-child{margin-bottom:2.5em;}}@media (min-width:769px){.single .post-author-avatar,.single .post-author-bio{float:left;clear:right;}.single .ast-author-meta .post-author-avatar{margin-right:1.33333em;}.single .ast-author-meta .about-author-title-wrapper,.single .ast-author-meta .post-author-bio{text-align:left;}.blog-layout-2 .post-content{padding-right:2em;}.blog-layout-2.ast-no-date-box.ast-no-thumb .post-content{padding-right:0;}.blog-layout-3 .post-content{padding-left:2em;}.blog-layout-3.ast-no-date-box.ast-no-thumb .post-content{padding-left:0;}.ast-separate-container .ast-grid-2 .ast-article-post.ast-separate-posts:nth-child(2n+0),.ast-separate-container .ast-grid-2 .ast-article-post.ast-separate-posts:nth-child(2n+1),.ast-separate-container .ast-grid-3 .ast-article-post.ast-separate-posts:nth-child(2n+0),.ast-separate-container .ast-grid-3 .ast-article-post.ast-separate-posts:nth-child(2n+1),.ast-separate-container .ast-grid-4 .ast-article-post.ast-separate-posts:nth-child(2n+0),.ast-separate-container .ast-grid-4 .ast-article-post.ast-separate-posts:nth-child(2n+1){padding:0 1em 0;}}@media (max-width:544px){.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:0.5em;}.ast-separate-container .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container .ast-article-single.remove-featured-img-padding .single-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-left:-1em;margin-right:-1em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-left:-0.5em;margin-right:-0.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section:first-child .circle .posted-on{margin-top:0.5em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-content .ast-blog-featured-section:first-child .post-thumb-img-content{margin-top:-1.33333em;}.ast-separate-container.ast-blog-grid-2 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-3 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content,.ast-separate-container.ast-blog-grid-4 .ast-article-post.remove-featured-img-padding .blog-layout-1 .post-thumb-img-content{margin-left:-1em;margin-right:-1em;}.ast-separate-container .ast-grid-2 .ast-article-post .blog-layout-1,.ast-separate-container .ast-grid-2 .ast-article-post .blog-layout-2,.ast-separate-container .ast-grid-2 .ast-article-post .blog-layout-3{padding:1.33333em 1em;}.ast-separate-container .ast-grid-3 .ast-article-post .blog-layout-1,.ast-separate-container .ast-grid-4 .ast-article-post .blog-layout-1{padding:1.33333em 1em;}.single.ast-separate-container .ast-author-meta{padding:1.5em 1em;}}@media (max-width:544px){.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .square .posted-on{margin-left:-1em;}.ast-separate-container .ast-article-post.remove-featured-img-padding.has-post-thumbnail .blog-layout-1 .post-content .ast-blog-featured-section .circle .posted-on{margin-left:-0.5em;}}.ast-article-post .ast-date-meta .posted-on,.ast-article-post .ast-date-meta .posted-on *{background:#00609c;color:#ffffff;}.ast-article-post .ast-date-meta .posted-on .date-month,.ast-article-post .ast-date-meta .posted-on .date-year{color:#ffffff;}.ast-load-more:hover{color:#ffffff;border-color:#00609c;background-color:#00609c;}.ast-loader > div{background-color:#00609c;}.ast-small-footer{color:#bbbbbd;}.ast-small-footer a{color:#77777b;}.ast-small-footer a:hover{color:#00609c;}.ast-separate-container .blog-layout-1,.ast-separate-container .blog-layout-2,.ast-separate-container .blog-layout-3{background-color:transparent;background-image:none;}.ast-separate-container .ast-article-post{background-color:#ffffff;;}.ast-separate-container .ast-article-single,.ast-separate-container .comment-respond,.ast-separate-container .ast-comment-list li,.ast-separate-container .ast-woocommerce-container,.ast-separate-container .error-404,.ast-separate-container .no-results,.single.ast-separate-container .ast-author-meta,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container.ast-two-container #secondary .widget,.ast-separate-container .comments-count-wrapper,.ast-box-layout.ast-plain-container .site-content,.ast-padded-layout.ast-plain-container .site-content{background-color:#ffffff;;}.footer-adv .widget-title,.footer-adv .widget-title a{color:#434758;}.footer-adv{color:#434758;}.footer-adv a{color:#434758;}.footer-adv .tagcloud a:hover,.footer-adv .tagcloud a.current-item{border-color:#434758;background-color:#434758;}.footer-adv a:hover,.footer-adv .no-widget-text a:hover,.footer-adv a:focus,.footer-adv .no-widget-text a:focus{color:#00609c;}.footer-adv .calendar_wrap #today,.footer-adv a:hover + .post-count{background-color:#434758;}.footer-adv .widget-title,.footer-adv .widget-title a.rsswidget,.ast-no-widget-row .widget-title{font-family:'Open Sans',sans-serif;text-transform:inherit;}.footer-adv .widget > *:not(.widget-title){font-family:'Open Sans',sans-serif;}.footer-adv .tagcloud a:hover,.footer-adv .tagcloud a.current-item{color:#ffffff;}.footer-adv .calendar_wrap #today{color:#ffffff;}@media (min-width:769px){.ast-container{max-width:1240px;}}@media (min-width:993px){.ast-container{max-width:1240px;}}@media (min-width:1201px){.ast-container{max-width:1240px;}}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.single.ast-separate-container .ast-author-details,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-top:0px;padding-bottom:0px;}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .comments-count-wrapper,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container .related-posts-title-wrapper,.single.ast-separate-container .ast-author-details,.single.ast-separate-container .about-author-title-wrapper,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-right:0px;padding-left:0px;}.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary,.ast-plain-container #primary{margin-top:0px;margin-bottom:0px;}.ast-left-sidebar #primary,.ast-right-sidebar #primary,.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary{padding-left:0px;padding-right:0px;}.ast-no-sidebar.ast-separate-container .entry-content .alignfull{margin-right:-0px;margin-left:-0px;}@media (max-width:768px){.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.single.ast-separate-container .ast-author-details,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-top:1.5em;padding-bottom:1.5em;}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .comments-count-wrapper,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container .related-posts-title-wrapper,.single.ast-separate-container .ast-author-details,.single.ast-separate-container .about-author-title-wrapper,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-right:2.14em;padding-left:2.14em;}.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary,.ast-plain-container #primary{margin-top:1.5em;margin-bottom:1.5em;}.ast-left-sidebar #primary,.ast-right-sidebar #primary,.ast-separate-container.ast-right-sidebar #primary,.ast-separate-container.ast-left-sidebar #primary,.ast-separate-container #primary{padding-left:0em;padding-right:0em;}.ast-no-sidebar.ast-separate-container .entry-content .alignfull{margin-right:-2.14em;margin-left:-2.14em;}}@media (max-width:544px){.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.single.ast-separate-container .ast-author-details,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-top:1.5em;padding-bottom:1.5em;}.ast-separate-container .ast-article-post,.ast-separate-container .ast-article-single,.ast-separate-container .comments-count-wrapper,.ast-separate-container .ast-comment-list li.depth-1,.ast-separate-container .comment-respond,.ast-separate-container .related-posts-title-wrapper,.ast-separate-container .related-posts-title-wrapper,.single.ast-separate-container .ast-author-details,.single.ast-separate-container .about-author-title-wrapper,.ast-separate-container .ast-related-posts-wrap,.ast-separate-container .ast-woocommerce-container{padding-right:1em;padding-left:1em;}.ast-no-sidebar.ast-separate-container .entry-content .alignfull{margin-right:-1em;margin-left:-1em;}}@media (max-width:768px){.ast-header-break-point .main-header-bar .main-header-bar-navigation .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:calc( 20px - 0.907em );}}@media (max-width:544px){.ast-header-break-point .header-main-layout-2 .site-branding,.ast-header-break-point .ast-mobile-header-stack .ast-mobile-menu-buttons{padding-bottom:0;}}@media (max-width:768px){.ast-separate-container.ast-two-container #secondary .widget,.ast-separate-container #secondary .widget{margin-bottom:1.5em;}}.ast-separate-container #primary{padding-top:0;}@media (max-width:768px){.ast-separate-container #primary{padding-top:0;}}.ast-separate-container #primary{padding-bottom:0;}@media (max-width:768px){.ast-separate-container #primary{padding-bottom:0;}}.ast-default-menu-enable.ast-main-header-nav-open.ast-header-break-point .main-header-bar,.ast-main-header-nav-open .main-header-bar{padding-bottom:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-below-menu-enable.ast-header-break-point .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-below-menu-enable.ast-header-break-point .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}@media (max-width:768px){.main-header-bar,.ast-header-break-point .main-header-bar,.ast-header-break-point .header-main-layout-2 .main-header-bar{padding-top:1.5em;padding-bottom:1.5em;}.ast-default-menu-enable.ast-main-header-nav-open.ast-header-break-point .main-header-bar,.ast-main-header-nav-open .main-header-bar{padding-bottom:0;}.main-navigation ul .menu-item .menu-link,.ast-header-break-point .main-navigation ul .menu-item .menu-link,.ast-header-break-point li.ast-masthead-custom-menu-items,li.ast-masthead-custom-menu-items{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .menu-item-has-children > .ast-menu-toggle{top:0px;}.ast-desktop .main-navigation .ast-mm-template-content,.ast-desktop .main-navigation .ast-mm-custom-content,.ast-desktop .main-navigation .ast-mm-custom-text-content,.main-navigation .sub-menu .menu-item .menu-link,.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-link{padding-top:0px;padding-right:0;padding-bottom:0px;padding-left:30px;}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-link{padding-left:calc( 30px + 10px );}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 30px + 20px );}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 30px + 30px );}.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 30px + 40px );}.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{margin-right:20px;right:0;}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{right:calc( 20px - 0.907em );}.ast-flyout-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .menu-item-has-children .sub-menu .ast-menu-toggle{top:0px;}.ast-fullscreen-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-navigation .sub-menu .menu-item .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-link,.ast-fullscreen-menu-enable.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-link{padding-top:0px;padding-bottom:0px;padding-left:30px;}.ast-below-header,.ast-header-break-point .ast-below-header{padding-top:1em;padding-bottom:1em;}.ast-below-header-menu .menu-link,.below-header-nav-padding-support .below-header-section-1 .below-header-menu > .menu-item > .menu-link,.below-header-nav-padding-support .below-header-section-2 .below-header-menu > .menu-item > .menu-link,.ast-header-break-point .ast-below-header-actual-nav > .ast-below-header-menu > .menu-item > .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-desktop .ast-below-header-menu .ast-mm-template-content,.ast-desktop .ast-below-header-menu .ast-mm-custom-text-content,.ast-below-header-menu .sub-menu .menu-link,.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-link{padding-left:calc( 20px + 10px );}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 20px );}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 30px );}.ast-header-break-point .ast-below-header-actual-nav .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-item .menu-link,.ast-header-break-point .ast-below-header-menu-items .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 40px );}.ast-default-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-default-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle,.ast-flyout-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-flyout-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-default-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle,.ast-flyout-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}.ast-above-header{padding-top:0px;padding-bottom:0px;}.ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu > .menu-item > .menu-link,.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu > .menu-item:first-child > .menu-link,.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu > .menu-item:last-child > .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-header-break-point .ast-above-header-navigation > ul > .menu-item-has-children > .ast-menu-toggle{top:0px;}.ast-desktop .ast-above-header-navigation .ast-mm-custom-text-content,.ast-desktop .ast-above-header-navigation .ast-mm-template-content,.ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item .sub-menu .menu-link,.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-link,.ast-above-header-enabled .ast-above-header-menu > .menu-item:first-child .sub-menu .menu-item .menu-link{padding-top:0px;padding-right:20px;padding-bottom:0px;padding-left:20px;}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-link{padding-left:calc( 20px + 10px );}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-item .menu-link{padding-left:calc( 20px + 20px );}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 30px );}.ast-header-break-point .ast-above-header-enabled .ast-above-header-menu .menu-item .sub-menu .menu-item .menu-item .menu-item .menu-item .menu-link{padding-left:calc( 20px + 40px );}.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-default-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle,.ast-flyout-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{top:0px;right:calc( 20px - 0.907em );}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{margin-right:20px;right:0;}.ast-footer-overlay{padding-top:2em;padding-bottom:2em;}.ast-small-footer .nav-menu a,.footer-sml-layout-2 .ast-small-footer-section-1 .menu-item a,.footer-sml-layout-2 .ast-small-footer-section-2 .menu-item a{padding-top:0em;padding-right:.5em;padding-bottom:0em;padding-left:.5em;}}@media (max-width:544px){.main-header-bar,.ast-header-break-point .main-header-bar,.ast-header-break-point .header-main-layout-2 .main-header-bar,.ast-header-break-point .ast-mobile-header-stack .main-header-bar{padding-top:1em;padding-bottom:1em;}.ast-default-menu-enable.ast-main-header-nav-open.ast-header-break-point .main-header-bar,.ast-main-header-nav-open .main-header-bar{padding-bottom:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .main-header-menu > .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-desktop .main-navigation .ast-mm-template-content,.ast-desktop .main-navigation .ast-mm-custom-content,.ast-desktop .main-navigation .ast-mm-custom-text-content,.main-navigation .sub-menu .menu-item .menu-link,.ast-header-break-point .main-navigation .sub-menu .menu-item .menu-link{padding-right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .main-header-bar .main-header-bar-navigation .sub-menu .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-above-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-menu-enable.ast-header-break-point .ast-below-header-menu .sub-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-below-menu-enable.ast-header-break-point .ast-below-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-below-menu-enable .ast-below-header-enabled .ast-below-header-navigation .ast-below-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}.ast-above-header{padding-top:0.5em;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-default-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link,.ast-flyout-above-menu-enable.ast-header-break-point .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children > .menu-link{padding-right:0;}.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-navigation .menu-item-has-children > .ast-menu-toggle,.ast-fullscreen-above-menu-enable.ast-header-break-point .ast-above-header-menu-items .menu-item-has-children > .ast-menu-toggle{right:0;}.ast-fullscreen-above-menu-enable .ast-above-header-enabled .ast-above-header-navigation .ast-above-header-menu .menu-item.menu-item-has-children .sub-menu .ast-menu-toggle{right:0;}}@media (max-width:544px){.ast-header-break-point .header-main-layout-2 .site-branding,.ast-header-break-point .ast-mobile-header-stack .ast-mobile-menu-buttons{padding-bottom:0;}}.site-title,.site-title a{font-family:'Roboto Slab',serif;text-transform:inherit;}.site-header .site-description{text-transform:inherit;}.secondary .widget-title{font-family:'Roboto Slab',serif;text-transform:inherit;}.secondary .widget > *:not(.widget-title){font-family:'Open Sans',sans-serif;}.ast-single-post .entry-title,.page-title{font-family:'Roboto Slab',serif;text-transform:inherit;}.ast-archive-description .ast-archive-title{font-family:'Roboto Slab',serif;text-transform:inherit;}.blog .entry-title,.blog .entry-title a,.archive .entry-title,.archive .entry-title a,.search .entry-title,.search .entry-title a {font-family:'Roboto Slab',serif;text-transform:inherit;}h1,.entry-content h1{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h2,.entry-content h2{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h3,.entry-content h3{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h4,.entry-content h4{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h5,.entry-content h5{font-weight:300;font-family:'Roboto Slab',serif;text-transform:inherit;}h6,.entry-content h6{font-family:'Roboto Slab',serif;text-transform:inherit;}.ast-desktop .ast-mega-menu-enabled.ast-below-header-menu .menu-item .menu-link:hover,.ast-desktop .ast-mega-menu-enabled.ast-below-header-menu .menu-item .menu-link:focus{background-color:#575757;}.ast-desktop .ast-below-header-navigation .astra-megamenu-li .menu-item .menu-link:hover,.ast-desktop .ast-below-header-navigation .astra-megamenu-li .menu-item .menu-link:focus{color:#ffffff;}.ast-above-header-menu .astra-full-megamenu-wrapper{box-shadow:0 5px 20px rgba(0,0,0,0.06);}.ast-above-header-menu .astra-full-megamenu-wrapper .sub-menu,.ast-above-header-menu .astra-megamenu .sub-menu{box-shadow:none;}.ast-below-header-menu.ast-mega-menu-enabled.submenu-with-border .astra-full-megamenu-wrapper{border-color:#ffffff;}.ast-below-header-menu .astra-full-megamenu-wrapper{box-shadow:0 5px 20px rgba(0,0,0,0.06);}.ast-below-header-menu .astra-full-megamenu-wrapper .sub-menu,.ast-below-header-menu .astra-megamenu .sub-menu{box-shadow:none;}.ast-desktop .main-header-menu.submenu-with-border .astra-megamenu,.ast-desktop .main-header-menu.ast-mega-menu-enabled.submenu-with-border .astra-full-megamenu-wrapper{border-top-width:2px;border-left-width:0px;border-right-width:0px;border-bottom-width:0px;border-style:solid;}.ast-desktop .ast-mega-menu-enabled.main-header-menu .menu-item-heading > .menu-link{font-weight:700;font-size:1.1em;}.ast-desktop .ast-above-header .submenu-with-border .astra-full-megamenu-wrapper{border-top-width:2px;border-left-width:0px;border-right-width:0px;border-bottom-width:0px;border-style:solid;}.ast-desktop .ast-below-header .submenu-with-border .astra-full-megamenu-wrapper{border-top-width:2px;border-left-width:0px;border-right-width:0px;border-bottom-width:0px;border-style:solid;}.ast-advanced-headers-different-logo .advanced-header-logo,.ast-header-break-point .ast-has-mobile-header-logo .advanced-header-logo{display:inline-block;}.ast-header-break-point.ast-advanced-headers-different-logo .ast-has-mobile-header-logo .ast-mobile-header-logo{display:none;}.ast-advanced-headers-layout{width:100%;}.ast-header-break-point .ast-advanced-headers-parallax{background-attachment:fixed;}
elearning.edu.sa: #
elearning.edu.sa: # robots.txt
elearning.edu.sa: #
elearning.edu.sa: # This file is to prevent the crawling and indexing of certain parts
elearning.edu.sa: # of your site by web crawlers and spiders run by sites like Yahoo!
elearning.edu.sa: # and Google. By telling these "robots" where not to go on your site,
elearning.edu.sa: # you save bandwidth and server resources.
elearning.edu.sa: #
elearning.edu.sa: # This file will be ignored unless it is at the root of your host:
elearning.edu.sa: # Used: http://example.com/robots.txt
elearning.edu.sa: # Ignored: http://example.com/site/robots.txt
elearning.edu.sa: #
elearning.edu.sa: # For more information about the robots.txt standard, see:
elearning.edu.sa: # http://www.robotstxt.org/robotstxt.html
elearning.edu.sa: # CSS, JS, Images
elearning.edu.sa: # Directories
elearning.edu.sa: # Files
elearning.edu.sa: # Paths (clean URLs)
elearning.edu.sa: # Paths (no clean URLs)
sensibull.com: # www.robotstxt.org/
sensibull.com: # Allow crawling of all content
finder.com.au: # Prevent crawling searches.
finder.com.au: # https://finder.atlassian.net/browse/CWS-452
finder.com.au: # Prevent crawling additional searches.
finder.com.au: # https://finder.atlassian.net/browse/CWS-497
finder.com.au: # Allow Google AdsBot to crawl anything
finder.com.au: # Allow Twitterbot to crawl anything
finder.com.au: # https://finder.atlassian.net/browse/OPS-915
finder.com.au: # Don't crawl twitter links with encoded URLs
finder.com.au: # https://finder.atlassian.net/browse/FD-5667
finder.com.au: # NBN Tracker
finder.com.au: # https://finder.atlassian.net/browse/FD-6467
finder.com.au: # https://finder.atlassian.net/browse/GXUSR-37
finder.com.au: # Crawl image versions
finder.com.au: # https://finder.atlassian.net/browse/FD-7310
finder.com.au: # https://finder.atlassian.net/browse/FD-9630
finder.com.au: # https://finder.atlassian.net/browse/GXFO-34
finder.com.au: # https://finder.atlassian.net/browse/OPS-498
finder.com.au: # https://finder.atlassian.net/browse/PROJ-174
finder.com.au: # https://www.deepcrawl.com/bot/
finder.com.au: # Block pages from appearing in Google News
finder.com.au: # Main sitemap
javfree.me: # XML Sitemap & Google News version 5.0.6 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
iubenda.com: # Url space used for testing
iubenda.com: # Disallow PP generator
iubenda.com: # Disallow misc folders
iubenda.com: # Following urls cause exception if crawled by bots
iubenda.com: # CS configurator
iubenda.com: # Google Image
iubenda.com: # Google AdSense
iubenda.com: # Sitemap: https://www.iubenda.com/sitemap_index.xml.gz
iubenda.com: ## Various exclusions
iubenda.com: # Trovacigusto
iubenda.com: # BravoReisen
iubenda.com: # U4PET
iubenda.com: # IDSCAN
iubenda.com: # www.omc2diesel.it
crnobelo.com: # If the Joomla site is installed within a folder such as at
crnobelo.com: # e.g. www.example.com/joomla/ the robots.txt file MUST be
crnobelo.com: # moved to the site root at e.g. www.example.com/robots.txt
crnobelo.com: # AND the joomla folder name MUST be prefixed to the disallowed
crnobelo.com: # path, e.g. the Disallow rule for the /administrator/ folder
crnobelo.com: # MUST be changed to read Disallow: /joomla/administrator/
crnobelo.com: #
crnobelo.com: # For more information about the robots.txt standard, see:
crnobelo.com: # http://www.robotstxt.org/orig.html
crnobelo.com: #
crnobelo.com: # For syntax checking, see:
crnobelo.com: # http://tool.motoricerca.info/robots-checker.phtml
estadao.com.br: # Support directories
estadao.com.br: # Sitemaps
estadao.com.br: #User-agent: Googlebot-News
estadao.com.br: #Disallow: /
delhigovt.nic.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
delhigovt.nic.in: #content{margin:0 0 0 2%;position:relative;}
calpoly.edu: #
calpoly.edu: # robots.txt
calpoly.edu: #
calpoly.edu: # This file is to prevent the crawling and indexing of certain parts
calpoly.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
calpoly.edu: # and Google. By telling these "robots" where not to go on your site,
calpoly.edu: # you save bandwidth and server resources.
calpoly.edu: #
calpoly.edu: # This file will be ignored unless it is at the root of your host:
calpoly.edu: # Used: http://example.com/robots.txt
calpoly.edu: # Ignored: http://example.com/site/robots.txt
calpoly.edu: #
calpoly.edu: # For more information about the robots.txt standard, see:
calpoly.edu: # http://www.robotstxt.org/robotstxt.html
calpoly.edu: # CSS, JS, Images
calpoly.edu: # Directories
calpoly.edu: # Files
calpoly.edu: # Paths (clean URLs)
calpoly.edu: # Paths (no clean URLs)
adam4adam.com: # www.robotstxt.org/
adam4adam.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
porn300.com: # www.robotstxt.org/
porn300.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
etnews.com: # Grapeshot º≥¡§ #
efka.gov.gr: #
efka.gov.gr: # robots.txt
efka.gov.gr: #
efka.gov.gr: # This file is to prevent the crawling and indexing of certain parts
efka.gov.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
efka.gov.gr: # and Google. By telling these "robots" where not to go on your site,
efka.gov.gr: # you save bandwidth and server resources.
efka.gov.gr: #
efka.gov.gr: # This file will be ignored unless it is at the root of your host:
efka.gov.gr: # Used: http://example.com/robots.txt
efka.gov.gr: # Ignored: http://example.com/site/robots.txt
efka.gov.gr: #
efka.gov.gr: # For more information about the robots.txt standard, see:
efka.gov.gr: # http://www.robotstxt.org/robotstxt.html
efka.gov.gr: # CSS, JS, Images
efka.gov.gr: # Directories
efka.gov.gr: # Files
efka.gov.gr: # Paths (clean URLs)
efka.gov.gr: # Paths (no clean URLs)
suratmunicipal.gov.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
suratmunicipal.gov.in: #content{margin:0 0 0 2%;position:relative;}
managebuilding.com: #mktoForm_2388 .mktoButton:hover {
managebuilding.com: #mktoForm_2388 .mktoRadioList {
managebuilding.com: #mktoForm_2388 label#LblmarketingField3 {
managebuilding.com: #mktoForm_2388 label#LblmktoRadio_29854_0, #LblmktoRadio_29854_1 {
managebuilding.com: #mkto-form-wrapper #mktoForm_1295 .mktoRequiredField .mktoLabel:after {
managebuilding.com: #mktoForm_2031 #LblzoomEventDay {
managebuilding.com: #industry-report span.mktoButtonWrap.mktoInset {
managebuilding.com: #industry-report .mktoButton {
managebuilding.com: #mktoForm_2198 .mktoFormRow .mktoFormCol:nth-child(2) {
managebuilding.com: #mktoForm_2198 :-ms-input-placeholder {
managebuilding.com: #mc-embedded-subscribe-form .mktoLabel {
managebuilding.com: #ot-ccpa-banner {
managebuilding.com: #ot-ccpa-banner .ot-ccpa-icon {
managebuilding.com: #ot-ccpa-banner .ot-ccpa-icon img{
bni.co.id: # Begin robots.txt file
bni.co.id: #/-----------------------------------------------\
bni.co.id: #| In single portal/domain situations, uncomment the sitmap line and enter domain name
bni.co.id: #\-----------------------------------------------/
bni.co.id: #Sitemap: http://www.DomainNamehere.com/sitemap.aspx
bni.co.id: # End of robots.txt file
ldlc.com: # Blocage section
ldlc.com: # Blocage agent
daniweb.com: # DaniWeb
daniweb.com: # DaniWeb Connect
daniweb.com: # Legacy
genbeta.com: #
genbeta.com: # robots.txt
genbeta.com: #
genbeta.com: # Crawlers that are kind enough to obey, but which we'd rather not have
genbeta.com: # unless they're feeding search engines.
genbeta.com: # Some bots are known to be trouble, particularly those designed to copy
genbeta.com: # entire sites. Please obey robots.txt.
genbeta.com: # Sorry, wget in its recursive mode is a frequent problem.
genbeta.com: # Please read the man page and use it properly; there is a
genbeta.com: # --wait option you can use to set the delay between hits,
genbeta.com: # for instance.
genbeta.com: #
genbeta.com: #
genbeta.com: # The 'grub' distributed client has been *very* poorly behaved.
genbeta.com: #
genbeta.com: #
genbeta.com: # Doesn't follow robots.txt anyway, but...
genbeta.com: #
genbeta.com: #
genbeta.com: # Hits many times per second, not acceptable
genbeta.com: # http://www.nameprotect.com/botinfo.html
genbeta.com: # A capture bot, downloads gazillions of pages with no public benefit
genbeta.com: # http://www.webreaper.net/
rd.com: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/
tutpub.com: # 1) this filename (robots.txt) must stay lowercase
tutpub.com: # 2) this file must be in the servers root directory
tutpub.com: # ex: http://www.mydomain.com/pliklisubfolder/ -- you must move the robots.txt from
tutpub.com: # /pliklisubfolder/ to the root folder for http://www.mydomain.com/
tutpub.com: # you must then add your subfolder to each 'Disallow' below
tutpub.com: # ex: Disallow: /cache/ becomes Disallow: /pliklisubfolder/cache/
u-bordeaux.fr: # Handled by rewrite rule, based on the domain.
mygov.in: #
mygov.in: # robots.txt
mygov.in: #
mygov.in: # This file is to prevent the crawling and indexing of certain parts
mygov.in: # of your site by web crawlers and spiders run by sites like Yahoo!
mygov.in: # and Google. By telling these "robots" where not to go on your site,
mygov.in: # you save bandwidth and server resources.
mygov.in: #
mygov.in: # Images
mygov.in: # Directories
mygov.in: # Directories without slash
mygov.in: # Files
mygov.in: # Files without slash
mygov.in: # Files
mygov.in: # Paths (no clean URLs)
mygov.in: # Paths (no clean URLs without slash)
cratejoy.com: #
cratejoy.com: # robots.txt
cratejoy.com: #
uwo.ca: # robots.txt for http://www.uwo.ca/
uwo.ca: #
uwo.ca: # Inktomi's web robot will obey the first record in the robots.txt file with a User-Agent containing "UWO-InktomiSearch".
uwo.ca: # If there is no such record, It will obey the first entry with a User-Agent of "*".
uwo.ca: # Because nothing is disallowed, everything is allowed
uwo.ca: # specifies that no robots should visit
uwo.ca: # any URL starting with "/ccs/export/"
dailykos.com: #Disallow: /
dailykos.com: #Disallow: /
dailykos.com: # Alexa Archver, allow them
dailykos.com: # Internet Archives open source crawler
dailykos.com: # Has gone nuts on us before.
dailykos.com: # topsy.com's bot
thermofisher.com: # Added 8/20/2014 \/
thermofisher.com: # compensate for subdirectories that do need to be blocked: discussions from 6/3/2014
thermofisher.com: # all of this content get's 301 redirected to regional URL and search bots can't update if they are not followed
thermofisher.com: # Updated 10/5/2014/\
thermofisher.com: # Added 8/20/2014 /\
thermofisher.com: # Added 3/28/2015 \/
thermofisher.com: # Added 3/28/2015 /\
thermofisher.com: # requested by 7/28/2014 \/
thermofisher.com: # requested by 7/28/2014
thermofisher.com: #requested by 3/30/2015 \/
thermofisher.com: #requested by 3/30/2015 /\
outofthesandbox.com: # we use Shopify as our ecommerce platform
outofthesandbox.com: # Google adsbot ignores robots.txt unless specifically named!
stylecaster.com: # Sitemap archive
elcinema.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
elcinema.com: #
elcinema.com: # To ban all spiders from the entire site uncomment the next two lines:
commonsensemedia.org: #
commonsensemedia.org: # robots.txt
commonsensemedia.org: #
commonsensemedia.org: # This file is to prevent the crawling and indexing of certain parts
commonsensemedia.org: # of your site by web crawlers and spiders run by sites like Yahoo!
commonsensemedia.org: # and Google. By telling these "robots" where not to go on your site,
commonsensemedia.org: # you save bandwidth and server resources.
commonsensemedia.org: #
commonsensemedia.org: # This file will be ignored unless it is at the root of your host:
commonsensemedia.org: # Used: http://example.com/robots.txt
commonsensemedia.org: # Ignored: http://example.com/site/robots.txt
commonsensemedia.org: #
commonsensemedia.org: # For more information about the robots.txt standard, see:
commonsensemedia.org: # http://www.robotstxt.org/robotstxt.html
commonsensemedia.org: #
commonsensemedia.org: # For syntax checking, see:
commonsensemedia.org: # http://www.frobee.com/robots-txt-check
commonsensemedia.org: # CSS, JS, Images
commonsensemedia.org: # Directories
commonsensemedia.org: # Files
commonsensemedia.org: # Paths (clean URLs)
commonsensemedia.org: # Paths (no clean URLs)
commonsensemedia.org: # Help with dupe content?
commonsensemedia.org: # Allow images to be indexed? (google)
commonsensemedia.org: # Disallow CP related traffic links
commonsensemedia.org: #HybridAuth paths
baby-kingdom.com: #
baby-kingdom.com: # robots.txt for Discuz! X1.5
baby-kingdom.com: #
newspim.com: # robots.txt generated at http://www.adop.cc
kmcert.com: # robots.txt
shopify.com.au: # ,:
shopify.com.au: # ,' |
shopify.com.au: # / :
shopify.com.au: # --' /
shopify.com.au: # \/ />/
shopify.com.au: # / <//_\
shopify.com.au: # __/ /
shopify.com.au: # )'-. /
shopify.com.au: # ./ :\
shopify.com.au: # /.' '
shopify.com.au: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers
shopify.com.au: # robots.txt file for www.shopify.com.au
kanazawa-u.ac.jp: #fb-root{
98zudisw.xyz: #
98zudisw.xyz: # robots.txt for Discuz! X3
98zudisw.xyz: #
riselinkedu.com: # www.robotstxt.org/
riselinkedu.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
downcc.com: #
downcc.com: # robots.txt for www.downcc.com
downcc.com: #
empower-retirement.com: #
empower-retirement.com: # robots.txt
empower-retirement.com: #
empower-retirement.com: # This file is to prevent the crawling and indexing of certain parts
empower-retirement.com: # of your site by web crawlers and spiders run by sites like Yahoo!
empower-retirement.com: # and Google. By telling these "robots" where not to go on your site,
empower-retirement.com: # you save bandwidth and server resources.
empower-retirement.com: #
empower-retirement.com: # This file will be ignored unless it is at the root of your host:
empower-retirement.com: # Used: http://example.com/robots.txt
empower-retirement.com: # Ignored: http://example.com/site/robots.txt
empower-retirement.com: #
empower-retirement.com: # For more information about the robots.txt standard, see:
empower-retirement.com: # http://www.robotstxt.org/robotstxt.html
empower-retirement.com: # CSS, JS, Images
empower-retirement.com: # Directories
empower-retirement.com: # Files
empower-retirement.com: # Paths (clean URLs)
empower-retirement.com: # Paths (no clean URLs)
parents.com: # Sitemaps
parents.com: # current CMS
parents.com: # ONECMS
parents.com: # Content
parents.com: # current CMS
parents.com: # ONECMS
parents.com: # Content
plainenglish.io: # https://www.robotstxt.org/robotstxt.html
campo-golf.de: # ##############################
campo-golf.de: # ##################################
campo-golf.de: # ####################################
campo-golf.de: # #####################################
campo-golf.de: # ############# ############# ##########@ ########### ###### @######## ######## ###### ######## ###########
campo-golf.de: # ########### ########### ##############& ############### ############################# ################# ###############
campo-golf.de: # ########## #### ########## ####### (###### ####### ####### #######( ######### ####### ######## ####### ####### #######&
campo-golf.de: # ######### ###### ######### ###### ###### ###### ###### ####### ###### ###### ######% ####### ######
campo-golf.de: # ######### ###################### ###### ############ ###### %###### ###### ###### ###### ###### ######
campo-golf.de: # ######### ###################### ###### ################ ###### %###### ###### ###### ###### ###### ######
campo-golf.de: # ######### ###################### ###### #### ###### ###### ###### %###### ###### ###### ####### ###### ######
campo-golf.de: # ########## ##### ######### ####### ######& ###### ###### ###### %###### ###### ####### ####### ####### #######
campo-golf.de: # ########## ########## ################ ################# ###### %###### ###### #################( (################
campo-golf.de: # ############ ############ ############ ########## ###### ###### %###### ###### ################ (############
campo-golf.de: # ################ ############### ######
campo-golf.de: # ##################################### ######
campo-golf.de: # ################ ############## ######
campo-golf.de: # ################ ############## ######
campo-golf.de: # Directories
digiskills.pk: # Group 1
gaia.com: #Search
gaia.com: #Random Paths
gaia.com: #Cart
gaia.com: #Disallow Affiliates, Ambassadors & Hosts
gaia.com: #Disallow Go Handler
gaia.com: # Language Queries
gaia.com: #Migrated Aomm.TV URLs
gaia.com: #Tercer Milenio URLs
gaia.com: #German Language URLs
gaia.com: #Twitter sharing exemptions
onesignal.com: # robots.txt for https://onesignal.com/
onesignal.com: # live - don't allow web crawlers to index cpresources/ or vendor/
onesignal.com: # Copied from old website
okta-emea.com: #
okta-emea.com: # robots.txt
okta-emea.com: #
okta-emea.com: # This file is to prevent the crawling and indexing of certain parts
okta-emea.com: # of your site by web crawlers and spiders run by sites like Yahoo!
okta-emea.com: # and Google. By telling these "robots" where not to go on your site,
okta-emea.com: # you save bandwidth and server resources.
okta-emea.com: #
okta-emea.com: # This file will be ignored unless it is at the root of your host:
okta-emea.com: # Used: http://example.com/robots.txt
okta-emea.com: # Ignored: http://example.com/site/robots.txt
okta-emea.com: #
okta-emea.com: # For more information about the robots.txt standard, see:
okta-emea.com: # http://www.robotstxt.org/robotstxt.html
okta-emea.com: # CSS, JS, Images
okta-emea.com: # Directories
okta-emea.com: # Files
okta-emea.com: # Paths (clean URLs)
okta-emea.com: # Paths (no clean URLs)
enphaseenergy.com: #
enphaseenergy.com: # robots.txt
enphaseenergy.com: #
enphaseenergy.com: # This file is to prevent the crawling and indexing of certain parts
enphaseenergy.com: # of your site by web crawlers and spiders run by sites like Yahoo!
enphaseenergy.com: # and Google. By telling these "robots" where not to go on your site,
enphaseenergy.com: # you save bandwidth and server resources.
enphaseenergy.com: #
enphaseenergy.com: # This file will be ignored unless it is at the root of your host:
enphaseenergy.com: # Used: http://example.com/robots.txt
enphaseenergy.com: # Ignored: http://example.com/site/robots.txt
enphaseenergy.com: #
enphaseenergy.com: # For more information about the robots.txt standard, see:
enphaseenergy.com: # http://www.robotstxt.org/robotstxt.html
enphaseenergy.com: # CSS, JS, Images
enphaseenergy.com: # Directories
enphaseenergy.com: # Files
enphaseenergy.com: # Paths (clean URLs)
enphaseenergy.com: # Paths (no clean URLs)
enphaseenergy.com: # Vanity Paths
enphaseenergy.com: # Taxonomy Term listing Page
enphaseenergy.com: # NL SItemap
enphaseenergy.com: #Disallow Files
enphaseenergy.com: #Disallow search Page
enphaseenergy.com: # BCMT-547 EN-US
enphaseenergy.com: # BCMT-547 NL-NL
enphaseenergy.com: # BCMT-547 EN-AU
enphaseenergy.com: # External URLs
vapejuicedepot.com: # we use Shopify as our ecommerce platform
vapejuicedepot.com: # Google adsbot ignores robots.txt unless specifically named!
google.com.bo: # AdsBot
google.com.bo: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
google.ba: # AdsBot
google.ba: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
honda.com: #Blank robots.txt
coachoutlet.com: #2020.10.19
puzzle-english.com: # XML Sitemap & Google News Feeds version 4.3.2 - http://status301.net/wordpress-plugins/xml-sitemap-feed/
principal.com: #
principal.com: # robots.txt
principal.com: #
principal.com: # This file is to prevent the crawling and indexing of certain parts
principal.com: # of your site by web crawlers and spiders run by sites like Yahoo!
principal.com: # and Google. By telling these "robots" where not to go on your site,
principal.com: # you save bandwidth and server resources.
principal.com: #
principal.com: # This file will be ignored unless it is at the root of your host:
principal.com: # Used: http://example.com/robots.txt
principal.com: # Ignored: http://example.com/site/robots.txt
principal.com: #
principal.com: # For more information about the robots.txt standard, see:
principal.com: # http://www.robotstxt.org/robotstxt.html
principal.com: # Directories
principal.com: # Files
principal.com: # Paths (clean URLs)
principal.com: # Paths (no clean URLs)
playblackdesert.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
playblackdesert.com: #content{margin:0 0 0 2%;position:relative;}
busuu.com: #Grav
busuu.com: # Backend Symfony
busuu.com: # Frontend
busuu.com: # Specific paths
busuu.com: #Sitemap
reseau-canope.fr: #On empeche l'indexation de la page de resultats de recherche
reseau-canope.fr: #Disallow: /resultats-de-recherche/ ces pages contiennent déjà un balise meta robots =>noindex, nofollow.
reseau-canope.fr: #supprimer le nofollow
reseau-canope.fr: #On empeche l'indexation des fichers typoscript
reseau-canope.fr: #Disallow: /*.ts$ => désactivé avec /fileadmin/template/ts/
seconnecter.org: #
seconnecter.org: # robots.txt
seconnecter.org: #
seconnecter.org: # This file is to prevent the crawling and indexing of certain parts
seconnecter.org: # of your site by web crawlers and spiders run by sites like Yahoo!
seconnecter.org: # and Google. By telling these "robots" where not to go on your site,
seconnecter.org: # you save bandwidth and server resources.
seconnecter.org: #
seconnecter.org: # This file will be ignored unless it is at the root of your host:
seconnecter.org: # Used: http://example.com/robots.txt
seconnecter.org: # Ignored: http://example.com/site/robots.txt
seconnecter.org: #
seconnecter.org: # For more information about the robots.txt standard, see:
seconnecter.org: # http://www.robotstxt.org/wc/robots.html
seconnecter.org: #
seconnecter.org: # For syntax checking, see:
seconnecter.org: # http://www.sxw.org.uk/computing/robots/check.html
seconnecter.org: # Directories
seconnecter.org: # Files
seconnecter.org: # Paths (clean URLs)
seconnecter.org: # Paths (no clean URLs)
devpost.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
devpost.com: #
devpost.com: # To ban all spiders from the entire site uncomment the next two lines:
devpost.com: # User-agent: *
devpost.com: # Disallow: /
streeteasy.com: # robots.txt
eventbrite.fr: # http://www.google.fr/adsbot.html - AdsBot ignores * wildcard
azerforum.com: # Sitemap files
tvm.com.mt: # slow down dot
98asedwwq.xyz: #
98asedwwq.xyz: # robots.txt for Discuz! X3
98asedwwq.xyz: #
snyk.io: #Sitemap: https://snyk.io/search-sitemaps/sitemap_index.xml
snyk.io: #sitemap: http://a213584.sitemaphosting4.com/4168338/sitemap.xml
snyk.io: #Sitemap: https://snyk.io/search-sitemaps/test-sitemap-1-03122020.xml
rubtc.top: #container {
actu.fr: # Lana Sitemap version 1.0.0 - http://wp.lanaprojekt.hu/blog/wordpress-plugins/lana-sitemap/
centrify.com: #
centrify.com: # robots.txt
centrify.com: #
centrify.com: # This file is to prevent the crawling and indexing of certain parts
centrify.com: # of your site by web crawlers and spiders run by sites like Yahoo!
centrify.com: # and Google. By telling these "robots" where not to go on your site,
centrify.com: # you save bandwidth and server resources.
centrify.com: #
centrify.com: # This file will be ignored unless it is at the root of your host:
centrify.com: # Used: http://example.com/robots.txt
centrify.com: # Ignored: http://example.com/site/robots.txt
centrify.com: #
centrify.com: # For more information about the robots.txt standard, see:
centrify.com: # http://www.robotstxt.org/robotstxt.html
centrify.com: # CSS, JS, Images
centrify.com: # Directories
centrify.com: # Files
centrify.com: # Paths (clean URLs)
centrify.com: # Paths (no clean URLs)
centrify.com: # Other Paths
foodpanda.com.tw: # www.robotstxt.org/
foodpanda.com.tw: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
comparably.com: # Disallow: Yandex
comparably.com: # Disallow: Sistrix
comparably.com: # Disallow: Sistrix
comparably.com: # Disallow: Sistrix
comparably.com: # Disallow: SEOkicks-Robot
comparably.com: # Disallow: jobs.de-Robot
comparably.com: # Backlink Analysis
comparably.com: # Bot der Leipziger Unister Holding GmbH
comparably.com: # http://moz.com/products
comparably.com: # http://www.searchmetrics.com
comparably.com: # http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
comparably.com: # http://www.domaintools.com/webmasters/surveybot.php
comparably.com: # http://www.seodiver.com/bot
comparably.com: # http://openlinkprofiler.org/bot
comparably.com: # http://www.wotbox.com/bot/
comparably.com: # http://www.opensiteexplorer.org/dotbot
comparably.com: # http://moz.com/researchtools/ose/dotbot
comparably.com: # http://www.meanpath.com/meanpathbot.html
comparably.com: # http://www.backlinktest.com/crawler.html
comparably.com: # http://www.brandwatch.com/magpie-crawler/
comparably.com: # http://filterdb.iss.net/crawler/
comparably.com: # http://webmeup-crawler.com
comparably.com: # https://megaindex.com/crawler
comparably.com: # http://www.cloudservermarket.com
comparably.com: # http://www.trendiction.de/de/publisher/bot
comparably.com: # http://www.exalead.com
comparably.com: # http://www.career-x.de/bot.html
comparably.com: # https://www.lipperhey.com/en/about/
comparably.com: # https://www.lipperhey.com/en/about/
comparably.com: # https://turnitin.com/robot/crawlerinfo.html
comparably.com: # http://help.coccoc.com/
comparably.com: # ubermetrics-technologies.com
comparably.com: # datenbutler.de
comparably.com: # http://searchgears.de/uber-uns/crawling-faq.html
comparably.com: # http://commoncrawl.org/faq/
comparably.com: # https://www.qwant.com/
comparably.com: # http://linkfluence.net/
comparably.com: # http://www.botje.com/plukkie.htm
comparably.com: # https://www.safedns.com/searchbot
comparably.com: # http://www.haosou.com/help/help_3_2.html
comparably.com: # http://www.haosou.com/help/help_3_2.html
comparably.com: # http://www.moz.com/dp/rogerbot
comparably.com: # http://www.openhose.org/bot.html
comparably.com: # http://www.screamingfrog.co.uk/seo-spider/
comparably.com: # http://thumbsniper.com
comparably.com: # http://www.radian6.com/crawler
comparably.com: # http://cliqz.com/company/cliqzbot
comparably.com: # https://www.aihitdata.com/about
comparably.com: # http://www.trendiction.com/en/publisher/bot
comparably.com: # http://warebay.com/bot.html
civilica.com: # https://www.robotstxt.org/robotstxt.html
thehut.com: # Sitemap files
teratail.com: # robotstxt.org/
screamingfrog.co.uk: # Protection of frog team
screamingfrog.co.uk: # Protection of frog teams sanity
screamingfrog.co.uk: # Screaming Frog - Search Engine Marketing
screamingfrog.co.uk: # If you're looking at our robots.txt then you might well be interested in our current SEO vacancies :-)
screamingfrog.co.uk: # https://www.screamingfrog.co.uk/careers/
onvista.de: #robots.txt for www.onvista.de
onvista.de: #Robots.txt File
onvista.de: #Version: 0.3
onvista.de: #Last updated: 20/06/2018
onvista.de: #Please note our terms and conditions "http://www.onvista.de/agb.html"
onvista.de: #Spidering is not allowed by our terms and conditions
onvista.de: #Authorised spidering is subject to permission
onvista.de: #For authorisation please contact us - see "http://www.onvista.de/impressum.html"
nazk.gov.ua: #ajax_ac_widget th{background:none repeat scroll 0 0 #457cbf;color:#FFF;font-weight:400;padding:5px 1px;text-align:center;font-size:16px}
nazk.gov.ua: #ajax_ac_widget td{text-align:center}
nazk.gov.ua: #ajax_ac_widget table tbody tr{
nazk.gov.ua: #my-calendar a{background:none repeat scroll 0 0 #BEE6FD;color:#2B4261;display:block;padding:7px 0;width:100%!important}
nazk.gov.ua: #my-calendar a:hover{background: none repeat scroll 0 0 #ffd232;}
nazk.gov.ua: #my-calendar{width:100%}
nazk.gov.ua: #my_calender span{display:block;padding:7px 0;width:100%!important}
nazk.gov.ua: #today a,#today span{background:none repeat scroll 0 0 #ffd232!important;color:#1A1A22}
nazk.gov.ua: #ajax_ac_widget #my_year{float:right}
nazk.gov.ua: #my_accessibility{
nazk.gov.ua: #stat .swiper-slide{cursor:pointer;display:-ms-flexbox;display:flex;-ms-flex-pack:justify;justify-content:space-evenly;-ms-flex-align:baseline;align-items:center;padding:6px 20px 3px}
nazk.gov.ua: #stat .swiper-wrapper{justify-content:center}
nazk.gov.ua: #stat .swiper-wrapper{justify-content:initial}
nazk.gov.ua: #stat .swiper-slide .section1__block{width:auto;padding:6px 20px 3px;}
nazk.gov.ua: #stat .swiper-slide.active{display:flex}
nazk.gov.ua: #stat .section1__top{flex-wrap:wrap}
nazk.gov.ua: #stat .swiper-slide.active{display:flex;align-items:flex-start;justify-content:flex-start}
nazk.gov.ua: #ui-datepicker-div{max-width:235px}
greatist.com: #
greatist.com: # robots.txt
greatist.com: #
greatist.com: # This file is to prevent the crawling and indexing of certain parts
greatist.com: # of your site by web crawlers and spiders run by sites like Yahoo!
greatist.com: # and Google. By telling these "robots" where not to go on your site,
greatist.com: # you save bandwidth and server resources.
greatist.com: #
greatist.com: # This file will be ignored unless it is at the root of your host:
greatist.com: # Used: http://example.com/robots.txt
greatist.com: # Ignored: http://example.com/site/robots.txt
greatist.com: #
greatist.com: # For more information about the robots.txt standard, see:
greatist.com: # http://www.robotstxt.org/robotstxt.html
greatist.com: # CSS, JS, Images
greatist.com: # Directories
greatist.com: # Files
greatist.com: # Paths (clean URLs)
greatist.com: # Paths (no clean URLs)
greatist.com: # Sitemaps
greatist.com: #SEO recommendation
google.com.mm: # AdsBot
google.com.mm: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
metropolia.fi: #
metropolia.fi: # robots.txt
metropolia.fi: #
metropolia.fi: # This file is to prevent the crawling and indexing of certain parts
metropolia.fi: # of your site by web crawlers and spiders run by sites like Yahoo!
metropolia.fi: # and Google. By telling these "robots" where not to go on your site,
metropolia.fi: # you save bandwidth and server resources.
metropolia.fi: #
metropolia.fi: # This file will be ignored unless it is at the root of your host:
metropolia.fi: # Used: http://example.com/robots.txt
metropolia.fi: # Ignored: http://example.com/site/robots.txt
metropolia.fi: #
metropolia.fi: # For more information about the robots.txt standard, see:
metropolia.fi: # http://www.robotstxt.org/robotstxt.html
metropolia.fi: # CSS, JS, Images
metropolia.fi: # Directories
metropolia.fi: # Files
metropolia.fi: # Paths (clean URLs)
metropolia.fi: # Paths (no clean URLs)
eatingwell.com: # Sitemaps
eatingwell.com: # current CMS
eatingwell.com: # ONECMS
eatingwell.com: # Content
eatingwell.com: # current CMS
eatingwell.com: # ONECMS
eatingwell.com: # Content
moe.gov.om: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
moe.gov.om: #content{margin:0 0 0 2%;position:relative;}
renweb.com: # Default Flywheel robots file
bibliocommons.com: # Squarespace Robots Txt
labcorp.com: #
labcorp.com: # robots.txt
labcorp.com: #
labcorp.com: # This file is to prevent the crawling and indexing of certain parts
labcorp.com: # of your site by web crawlers and spiders run by sites like Yahoo!
labcorp.com: # and Google. By telling these "robots" where not to go on your site,
labcorp.com: # you save bandwidth and server resources.
labcorp.com: #
labcorp.com: # This file will be ignored unless it is at the root of your host:
labcorp.com: # Used: http://example.com/robots.txt
labcorp.com: # Ignored: http://example.com/site/robots.txt
labcorp.com: #
labcorp.com: # For more information about the robots.txt standard, see:
labcorp.com: # http://www.robotstxt.org/robotstxt.html
labcorp.com: # CSS, JS, Images
labcorp.com: # Directories
labcorp.com: # Files
labcorp.com: # Paths (clean URLs)
labcorp.com: # Paths (no clean URLs)
labcorp.com: # Custom Entries
labcorp.com: # LC22517-267
labcorp.com: # Disallow: /account-setup-international-providers
labcorp.com: # Disallow: /account-setup-japan
labcorp.com: # Disallow: /account-setup-providers
getsmarter.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
getsmarter.com: #
getsmarter.com: # To ban all spiders from the entire site uncomment the next two lines:
getsmarter.com: # User-agent: *
getsmarter.com: # Disallow: /
getsmarter.com: # User-agent: *
getsmarter.com: # Disallow: /checkout
getsmarter.com: # Disallow: /cart
getsmarter.com: # Disallow: /orders
getsmarter.com: # Disallow: /user
getsmarter.com: # Disallow: /account
getsmarter.com: # Disallow: /api
getsmarter.com: # Disallow: /password
mailtester.com: #User-agent: Mediapartners-Google
cirqlive.com: # Optimization for Google Ads Bot
chapman.edu: # robots.txt for Chapman University http://www.chapman.edu (maintained in Cascade)
simplypsychology.org: ####################################################
simplypsychology.org: # ALLOW MEDIA BOT TO CRAWL ANYWHERE
simplypsychology.org: #####
simplypsychology.org: ####################################################
simplypsychology.org: # ALLOW IMAGE BOT TO CRAWL ANYWHERE
simplypsychology.org: #####
simplypsychology.org: ####################################################
simplypsychology.org: # ALLOW GOOGLE BOT TO CRAWL ANYWHERE
simplypsychology.org: #####
simplypsychology.org: ####################################################
simplypsychology.org: # ALLOW GOOGLE IPHONE AD BOT TO CRAWL ANYWHERE
simplypsychology.org: #####
simplypsychology.org: # Some bots are known to be trouble, particularly those designed to copy
simplypsychology.org: # entire sites. Please obey robots.txt.
simplypsychology.org: # Misbehaving: requests much too fast:
simplypsychology.org: #
simplypsychology.org: # Sorry, wget in its recursive mode is a frequent problem.
simplypsychology.org: # Please read the man page and use it properly; there is a
simplypsychology.org: # --wait option you can use to set the delay between hits,
simplypsychology.org: # for instance.
simplypsychology.org: #
simplypsychology.org: #
simplypsychology.org: # The 'grub' distributed client has been *very* poorly behaved.
simplypsychology.org: #
simplypsychology.org: #
simplypsychology.org: # Doesn't follow robots.txt anyway, but...
simplypsychology.org: #
simplypsychology.org: #
simplypsychology.org: # Hits many times per second, not acceptable
simplypsychology.org: # http://www.nameprotect.com/botinfo.html
simplypsychology.org: # A capture bot, downloads gazillions of pages with no public benefit
simplypsychology.org: # http://www.webreaper.net/
simplypsychology.org: # Wayback Machine: defaults and whether to index user-pages
simplypsychology.org: # FIXME: Complete the removal of this block, per T7582.
simplypsychology.org: # User-agent: archive.org_bot
simplypsychology.org: # Allow: /
hrdc-drhc.gc.ca: #esdc_gc_ca
archives.gov: #
archives.gov: # robots.txt
archives.gov: #
archives.gov: # This file is to prevent the crawling and indexing of certain parts
archives.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
archives.gov: # and Google. By telling these "robots" where not to go on your site,
archives.gov: # you save bandwidth and server resources.
archives.gov: #
archives.gov: # This file will be ignored unless it is at the root of your host:
archives.gov: # Used: http://example.com/robots.txt
archives.gov: # Ignored: http://example.com/site/robots.txt
archives.gov: #
archives.gov: # For more information about the robots.txt standard, see:
archives.gov: # http://www.robotstxt.org/robotstxt.html
trendlyne.com: # Block trendkite-akashic-crawler
xtb.com: # www.robotstxt.org/
xtb.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
jobomas.com: #Baiduspider
mailtrap.io: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
mailtrap.io: #
mailtrap.io: # To ban all spiders from the entire site uncomment the next two lines:
mailtrap.io: # User-agent: *
mailtrap.io: # Disallow: /
trekbikes.com: # For all robots Block access to specific groups of pages
akhbarak.net: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
akhbarak.net: #
akhbarak.net: # To ban all spiders from the entire site uncomment the next two lines:
akhbarak.net: #Disallow: /*.js$
akhbarak.net: #User-Agent: *
akhbarak.net: #Disallow: /
abokifx.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
abokifx.com: #
abokifx.com: # To ban all spiders from the entire site uncomment the next two lines:
abokifx.com: # User-agent: *
abokifx.com: # Disallow: /
saatvesaat.com.tr: # Google Image Crawler Setup
google.com.sv: # AdsBot
google.com.sv: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
zeropark.com: # Fix for Redundant Hostnames in Universal Analytics
okmall.com: # Disallow: Sistrix
okmall.com: # Disallow: Sistrix
okmall.com: # Disallow: Sistrix
okmall.com: # Disallow: SEOkicks-Robot
okmall.com: # Disallow: jobs.de-Robot
okmall.com: # Bot der Leipziger Unister Holding GmbH
okmall.com: # http://www.searchmetrics.com
okmall.com: # http://www.domaintools.com/webmasters/surveybot.php
okmall.com: # http://www.seodiver.com/bot
okmall.com: # http://openlinkprofiler.org/bot
okmall.com: # http://www.wotbox.com/bot/
okmall.com: # http://www.opensiteexplorer.org/dotbot
okmall.com: # http://moz.com/researchtools/ose/dotbot
okmall.com: # http://www.meanpath.com/meanpathbot.html
okmall.com: # http://www.backlinktest.com/crawler.html
okmall.com: # http://www.brandwatch.com/magpie-crawler/
okmall.com: # http://filterdb.iss.net/crawler/
okmall.com: # http://webmeup-crawler.com
okmall.com: # https://megaindex.com/crawler
okmall.com: # http://www.cloudservermarket.com
okmall.com: # http://www.trendiction.de/de/publisher/bot
okmall.com: # http://www.exalead.com
okmall.com: # http://www.career-x.de/bot.html
okmall.com: # https://www.lipperhey.com/en/about/
okmall.com: # https://www.lipperhey.com/en/about/
okmall.com: # https://turnitin.com/robot/crawlerinfo.html
okmall.com: # http://help.coccoc.com/
okmall.com: # ubermetrics-technologies.com
okmall.com: # datenbutler.de
okmall.com: # http://searchgears.de/uber-uns/crawling-faq.html
okmall.com: # http://commoncrawl.org/faq/
okmall.com: # https://www.qwant.com/
okmall.com: # http://linkfluence.net/
okmall.com: # http://www.botje.com/plukkie.htm
okmall.com: # https://www.safedns.com/searchbot
okmall.com: # http://www.haosou.com/help/help_3_2.html
okmall.com: # http://www.haosou.com/help/help_3_2.html
okmall.com: # http://www.moz.com/dp/rogerbot
okmall.com: # http://www.openhose.org/bot.html
okmall.com: # http://www.screamingfrog.co.uk/seo-spider/
okmall.com: # http://thumbsniper.com
okmall.com: # http://www.radian6.com/crawler
okmall.com: # http://cliqz.com/company/cliqzbot
okmall.com: # https://www.aihitdata.com/about
okmall.com: # http://www.trendiction.com/en/publisher/bot
okmall.com: # http://warebay.com/bot.html
okmall.com: # http://www.website-datenbank.de/
okmall.com: # http://law.di.unimi.it/BUbiNG.html
okmall.com: # http://www.linguee.com/bot; bot@linguee.com
okmall.com: # www.sentibot.eu
okmall.com: # http://velen.io
okmall.com: # https://moz.com/help/guides/moz-procedures/what-is-rogerbot
okmall.com: # http://www.garlik.com
okmall.com: # https://www.gosign.de/typo3-extension/typo3-sicherheitsmonitor/
okmall.com: # http://www.siteliner.com/bot
okmall.com: # https://sabsim.com
okmall.com: # http://ltx71.com/
designcrowd.com: # robots.txt
hoy.com.do: #robots para Hoy
hoy.com.do: #Por Kenneth Burgos
hoy.com.do: # Bloqueo basico para todos los bots y crawlers
hoy.com.do: # puede dar problemas por bloqueo de recursos en GWT
hoy.com.do: # Bloqueo de las URL dinamicas
hoy.com.do: #Bloqueo de busquedas
hoy.com.do: # Bloqueo de trackbacks
hoy.com.do: # Bloqueo de feeds para crawlers
hoy.com.do: # Ralentizamos algunos bots que se suelen volver locos
hoy.com.do: # Haran peticiones cada 20 segundos para bajar la carga de request al hosting
hoy.com.do: # Bloqueando algunos bots adicionales de Google
hoy.com.do: # Activar despues que baje la carga del hosting
hoy.com.do: # Bloqueo de bots y crawlers poco utiles
hoy.com.do: # Previene problemas de recursos bloqueados en Google Webmaster Tools
hoy.com.do: # Crawl-delay: 20
hoy.com.do: # En condiciones normales este es el sitemap pero El Nacional no tiene
hoy.com.do: # Sitemap: https://eldia.com.do/sitemap.xml
hoy.com.do: # Si utiliza Yoast SEO estos son los sitemaps principales
hoy.com.do: # Sitemap: https://eldia.com.do/sitemap_index.xml
hoy.com.do: # Sitemap: https://eldia.com.do/category-sitemap.xml
hoy.com.do: # Sitemap: https://eldia.com.do/page-sitemap.xml
hoy.com.do: # Sitemap: http://eldia.com.do/sitemap_index.xml
tripadvisor.es: # Hi there,
tripadvisor.es: #
tripadvisor.es: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
tripadvisor.es: #
tripadvisor.es: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet?
tripadvisor.es: #
tripadvisor.es: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team
tripadvisor.es: #
tripadvisor.es: # Email seoRockstar@tripadvisor.com
tripadvisor.es: #
tripadvisor.es: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo
tripadvisor.es: #
tripadvisor.es: #
lu.se: #
lu.se: # robots.txt
lu.se: #
lu.se: # This file is to prevent the crawling and indexing of certain parts
lu.se: # of your site by web crawlers and spiders run by sites like Yahoo!
lu.se: # and Google. By telling these "robots" where not to go on your site,
lu.se: # you save bandwidth and server resources.
lu.se: #
lu.se: # This file will be ignored unless it is at the root of your host:
lu.se: # Used: http://example.com/robots.txt
lu.se: # Ignored: http://example.com/site/robots.txt
lu.se: #
lu.se: # For more information about the robots.txt standard, see:
lu.se: # http://www.robotstxt.org/robotstxt.html
lu.se: # CSS, JS, Images
lu.se: # Directories
lu.se: # Files
lu.se: # Paths (clean URLs)
lu.se: # Paths (no clean URLs)
lu.se: # Search
lu.se: # Disallow all on index.php
nokia.com: #
nokia.com: # robots.txt
nokia.com: #
nokia.com: # This file is to prevent the crawling and indexing of certain parts
nokia.com: # of your site by web crawlers and spiders run by sites like Yahoo!
nokia.com: # and Google. By telling these "robots" where not to go on your site,
nokia.com: # you save bandwidth and server resources.
nokia.com: #
nokia.com: # This file will be ignored unless it is at the root of your host:
nokia.com: # Used: http://example.com/robots.txt
nokia.com: # Ignored: http://example.com/site/robots.txt
nokia.com: #
nokia.com: # For more information about the robots.txt standard, see:
nokia.com: # http://www.robotstxt.org/robotstxt.html
nokia.com: # CSS, JS, Images
nokia.com: # Directories
nokia.com: # Files
nokia.com: # Paths (clean URLs)
nokia.com: # Paths (no clean URLs)
nokia.com: # Disallow tax terms (and language based also)
merrilledge.com: #dvFooter .disclaimer {
merrilledge.com: #dvDisclaimer .disclaimer > .disclaimer { padding:0 0 15px 0; }
merrilledge.com: #dvFooter #jdBold {
gossiplankanews.com: # Blogger Sitemap generated on 2013.05.28
slashdot.org: # robots.txt for Slashdot.org
slashdot.org: # $Id$
slashdot.org: # "Any empty [Disallow] value, indicates that all URLs can be retrieved.
slashdot.org: # At least one Disallow field needs to be present in a record."
saashub.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
unibo.it: # User-agent: MegaIndex.ru/2.0
unibo.it: # Disallow: /
unibo.it: # User-agent: MegaIndex.ru/
unibo.it: # Disallow: /
bluejeans.com: #
bluejeans.com: # robots.txt
bluejeans.com: #
bluejeans.com: # This file is to prevent the crawling and indexing of certain parts
bluejeans.com: # of your site by web crawlers and spiders run by sites like Yahoo!
bluejeans.com: # and Google. By telling these "robots" where not to go on your site,
bluejeans.com: # you save bandwidth and server resources.
bluejeans.com: #
bluejeans.com: # This file will be ignored unless it is at the root of your host:
bluejeans.com: # Used: http://example.com/robots.txt
bluejeans.com: # Ignored: http://example.com/site/robots.txt
bluejeans.com: #
bluejeans.com: # For more information about the robots.txt standard, see:
bluejeans.com: # http://www.robotstxt.org/robotstxt.html
bluejeans.com: # CSS, JS, Images
bluejeans.com: # Directories
bluejeans.com: # Files
bluejeans.com: # Paths (clean URLs)
bluejeans.com: # Paths (no clean URLs)
freetaxusa.com: # /robots.txt file for https://www.freetaxusa.com/
freetaxusa.com: # mail webmaster@freetaxusa.com with any comments
etour.com: ## Default robots.txt
thenationalnews.com: # Updated: 2021-02-21
thenationalnews.com: # Robots.txt
simonparkes.org: # Optimization for Google Ads Bot
golf.com: #WP Import Export Rule
quia.com: # -----------------------------------------------------------------------------
quia.com: #
quia.com: # Areas that search robots should avoid
quia.com: # (c) 2011 IXL Learning. All rights reserved.
quia.com: #
quia.com: # created by jkent on 8 Mar 2002
quia.com: #
quia.com: # Site-friendly search robots use this file to determine where _not_
quia.com: # to go. Some URL spaces are simply counterproductive.
quia.com: #
quia.com: # -----------------------------------------------------------------------------
bancofalabella.cl: # Incluye todos los bots
sammobile.com: # This is a tag that is defined for Analytics Tag manager, and not a path
reamaze.com: # Do not allow bot access to private conversation pages
zapimoveis.com.br: # Amenities shall not pass!
zapimoveis.com.br: #Crawl Budget test https://github.com/grupozap/squad-growth/issues/1153
booth.pm: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
google.com.et: # AdsBot
google.com.et: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
bazaarvoice.com: # www.robotstxt.org/
bazaarvoice.com: # Stop heritrix from crawling SEI
bazaarvoice.com: # Default WP
bazaarvoice.com: # Disallow Search Queries
eadaily.com: # robots.txt for https://eadaily.com/
solopos.com: #User-agent: ia_archiver-web.archive.org
solopos.com: #Disallow: /
solopos.com: #Sitemap: http://www.askapache.com/sitemap.xml
solopos.com: #Sitemap: https://www.solopos.com/sitemap_index.xml
solopos.com: #Sitemap: http://www.askapache.com/sitemap.xml
solopos.com: #Sitemap: https://m.solopos.com/sitemap_index.xml
solopos.com: # __ __
solopos.com: # ____ ______/ /______ _____ ____ ______/ /_ ___
solopos.com: # / __ `/ ___/ //_/ __ `/ __ \/ __ `/ ___/ __ \/ _ \
solopos.com: # / /_/ (__ ) ,&lt; / /_/ / /_/ / /_/ / /__/ / / / __/
solopos.com: # \__,_/____/_/|_|\__,_/ .___/\__,_/\___/_/ /_/\___/
solopos.com: # /_/
solopos.com: #
dy2018.com: #1
dy2018.com: # robots.txt for EmpireCMS
dy2018.com: #
imomoe.ai: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
imomoe.ai: #content{margin:0 0 0 2%;position:relative;}
puercn.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
puercn.com: #
puercn.com: # To ban all spiders from the entire site uncomment the next two lines:
puercn.com: #User-agent: *
puercn.com: #Disallow: /jiu/
yn.gov.cn: #btn_tz5 li{
bewakoof.com: # robotstxt.org
watanserb.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
paypalobjects.com: ### BEGIN FILE ###
paypalobjects.com: # PayPal robots.txt file
google.by: # AdsBot
google.by: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
handbrake.fr: # Required to let Google show relevant ads
bdsimg.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
bdsimg.com: #content{margin:0 0 0 2%;position:relative;}
javdb5.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
nofraud.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
nofraud.com: #
nofraud.com: # To ban all spiders from the entire site uncomment the next two lines:
nofraud.com: # User-agent: *
nofraud.com: # Disallow: /
tracker.gg: # Robots!
youcanbook.me: # Hello Spiders
youcanbook.me: # Sorry Baidu - you just don't play nicely
claimfreecoins.io: #adcopy-outer table{ background: #fff;color:#999;}
swaggerhub.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
swaggerhub.com: #content{margin:0 0 0 2%;position:relative;}
vqfit.com: # we use Shopify as our ecommerce platform
vqfit.com: # Google adsbot ignores robots.txt unless specifically named!
findagrave.com: # robots.txt file for Find A Grave
findagrave.com: ## Below disallows are to accomodate user requests to remove a name from search results. ##
findagrave.com: #Updated 2/25/2019
ecoledirecte.com: # robotstxt.org
themanifest.com: #
themanifest.com: # robots.txt
themanifest.com: #
themanifest.com: # This file is to prevent the crawling and indexing of certain parts
themanifest.com: # of your site by web crawlers and spiders run by sites like Yahoo!
themanifest.com: # and Google. By telling these "robots" where not to go on your site,
themanifest.com: # you save bandwidth and server resources.
themanifest.com: #
themanifest.com: # This file will be ignored unless it is at the root of your host:
themanifest.com: # Used: http://example.com/robots.txt
themanifest.com: # Ignored: http://example.com/site/robots.txt
themanifest.com: #
themanifest.com: # For more information about the robots.txt standard, see:
themanifest.com: # http://www.robotstxt.org/robotstxt.html
themanifest.com: # CSS, JS, Images
themanifest.com: # Directories
themanifest.com: # Files
themanifest.com: # Paths (clean URLs)
themanifest.com: # Paths (no clean URLs)
brainscape.com: # blocking bad bots
brandbucket.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $
brandbucket.com: #
brandbucket.com: # robots.txt
brandbucket.com: #
brandbucket.com: # This file is to prevent the crawling and indexing of certain parts
brandbucket.com: # of your site by web crawlers and spiders run by sites like Yahoo!
brandbucket.com: # and Google. By telling these "robots" where not to go on your site,
brandbucket.com: # you save bandwidth and server resources.
brandbucket.com: #
brandbucket.com: # This file will be ignored unless it is at the root of your host:
brandbucket.com: # Used: http://example.com/robots.txt
brandbucket.com: # Ignored: http://example.com/site/robots.txt
brandbucket.com: #
brandbucket.com: # For more information about the robots.txt standard, see:
brandbucket.com: # http://www.robotstxt.org/wc/robots.html
brandbucket.com: #
brandbucket.com: # For syntax checking, see:
brandbucket.com: # http://www.sxw.org.uk/computing/robots/check.html
brandbucket.com: # Crawl-delay: 10
brandbucket.com: # Directories
brandbucket.com: # Files
brandbucket.com: # Paths (clean URLs)
brandbucket.com: # Paths (no clean URLs)
radioformula.com.mx: # WP Refugees
fh-aachen.de: #-------------------------------------------
fh-aachen.de: # robots.txt zu http://www.fh-aachen.de
fh-aachen.de: # 28.2.2021 ML
fh-aachen.de: #-------------------------------------------
freshersworld.com: # Filename:robots.txt file for https://www.freshersworld.com
freshersworld.com: # Created Dec, 09, 2015.
freshersworld.com: # Author: Bijeesh
freshersworld.com: # Email: info@freshersworld.com
freshersworld.com: # Edited : Jan, 29, 2019
freshersworld.com: # GoogleMedia Partners
freshersworld.com: # Google Adsbot
dermstore.com: #
dermstore.com: # DermStore.com: robots.txt
dermstore.com: # Please, we do NOT allow nonauthorized robots any longer.
film.ru: # Directories
film.ru: # Paths (clean URLs)
film.ru: # Paths (no clean URLs)
film.ru: # Directories
film.ru: # Paths (clean URLs)
film.ru: # Paths (no clean URLs)
film.ru: # Directories
film.ru: # Paths (clean URLs)
film.ru: # Paths (no clean URLs)
chime.com: # This robots.txt file was created by Robots.txt Rewrite plugin: https://wordpress.org/plugins/robotstxt-rewrite/
wuerth.de: # robots.txt for www.wuerth.de
wuerth.de: # Disallow: /web/media/system/
wuerth.de: # Disallow: /web/media/system/
sportybet.com: # wap
sportybet.com: # pc
vendasta.com: # We're hiring! https://www.vendasta.com/devjobs
npmjs.com: #
npmjs.com: #
npmjs.com: # _____
npmjs.com: # | |
npmjs.com: # | | | |
npmjs.com: # |_____|
npmjs.com: # ____ ___|_|___ ____
npmjs.com: # ()___) ()___)
npmjs.com: # // /| |\ \\
npmjs.com: # // / | | \ \\
npmjs.com: # (___) |___________| (___)
npmjs.com: # (___) (_______) (___)
npmjs.com: # (___) (___) (___)
npmjs.com: # (___) |_| (___)
npmjs.com: # (___) ___/___\___ | |
npmjs.com: # | | | | | |
npmjs.com: # | | |___________| /___\
npmjs.com: # /___\ ||| ||| // \\
npmjs.com: # // \\ ||| ||| \\ //
npmjs.com: # \\ // ||| ||| \\ //
npmjs.com: # \\ // ()__) (__()
npmjs.com: # /// \\\
npmjs.com: # /// \\\
npmjs.com: # _///___ ___\\\_
npmjs.com: # |_______| |_______|
npmjs.com: #
npmjs.com: #
npmjs.com: #
lloydsbank.com: # v 1.1
lloydsbank.com: # www.lloydsbank.com
state.mn.us: # Disallow everything until we want to expose the site to external search
state.mn.us: # engines.
state.mn.us: # 0000-1200 GMT is 6PM to 6AM here
state.mn.us: # 4/14/14 Updated for DataExplorer to crawl all state sites.
film2serial.ir: # Sitemap
food52.com: # Production
food52.com: # Search
food52.com: #404
emory.edu: # robots.txt for http://www.emory.edu/
emory.edu: #removed /CARTER_CENTER and replaced with a redirect to cartercenter.org. 8/28/2104 --jm
emory.edu: #Disallow: /CARTER_CENTER/
emory.edu: #Disallow: /CARTER--CENTER/
emory.edu: #Disallow: /CARTER-CENTER/
hdfcergo.com: #LokPalPopup .btn-red{ background: #E21F26;
wongnai.com: # If you are interested in our data, please visit https://business.wongnai.com/restaurants-data-service/ for more detail.
express.com.pk: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
express.com.pk: #content{margin:0 0 0 2%;position:relative;}
socalgas.com: #
socalgas.com: # robots.txt
socalgas.com: #
socalgas.com: # This file is to prevent the crawling and indexing of certain parts
socalgas.com: # of your site by web crawlers and spiders run by sites like Yahoo!
socalgas.com: # and Google. By telling these "robots" where not to go on your site,
socalgas.com: # you save bandwidth and server resources.
socalgas.com: #
socalgas.com: # This file will be ignored unless it is at the root of your host:
socalgas.com: # Used: http://example.com/robots.txt
socalgas.com: # Ignored: http://example.com/site/robots.txt
socalgas.com: #
socalgas.com: # For more information about the robots.txt standard, see:
socalgas.com: # http://www.robotstxt.org/robotstxt.html
socalgas.com: # CSS, JS, Images
socalgas.com: # Directories
socalgas.com: # Files
socalgas.com: # Paths (clean URLs)
socalgas.com: # Paths (no clean URLs)
socalgas.com: # Paths (clean URLs) - fixed!
socalgas.com: # Paths (no clean URLs) - fixed!
socalgas.com: # Sitemap
retaildive.com: #
retaildive.com: # ..;coxkOOOOOOkxoc;'.
retaildive.com: # .:d0NWMMMMMMMMMMMMMMWN0xc'
retaildive.com: # .:kXMMMMMMMMMMMMMMMMMMMMMMMXl.
retaildive.com: # .c0WMMMMMMMMMMMMMMMMMMMMMMMXd'
retaildive.com: # ,OWMMMMMMMMMMMMMMMMMMMMMMMXo' ..
retaildive.com: # cXMMMMMMXo::::::::::::::col. .lKXl.
retaildive.com: # lNMMMMMMM0' .lKWMMNo
retaildive.com: # :XMMMMMMMM0' .l0WMMMMMNc
retaildive.com: # .OMMMMMMMMM0' .ccccccc;. ,KMMMMMMMMO.
retaildive.com: # :NMMMMMMMMM0' oWMMMMMMWKc. oWMMMMMMMN:
retaildive.com: # lWMMMMMMMMM0' oWMMMMMMMMX: ,KMMMMMMMMo
retaildive.com: # oMMMMMMMMMM0' oWMMMMMMMMNc ,KMMMMMMMMd
retaildive.com: # cNMMMMMMMMM0' oWMMMMMMMNd. lWMMMMMMMWl
retaildive.com: # '0MMMMMMMMWk. ,oooooooc' ,0MMMMMMMMK,
retaildive.com: # oWMMMMMMXo. ,0MMMMMMMMWo
retaildive.com: # .xWMMMXd' ,dXMMMMMMMMWk.
retaildive.com: # .xWNx' .',''''''',,;coONMMMMMMMMMWk.
retaildive.com: # .:, .l0WWWWWWWWWWWMMMMMMMMMMMMMNd.
retaildive.com: # .lKWMMMMMMMMMMMMMMMMMMMMMMMWk;
retaildive.com: # .lKWMMMMMMMMMMMMMMMMMMMMMMMNk;.
retaildive.com: # .ckXWMMMMMMMMMMMMMMMMMMWXkl'
retaildive.com: # .;ldO0XNWWWWWWNXKOxl;.
retaildive.com: # ..'',,,,''..
retaildive.com: #
retaildive.com: #
retaildive.com: # NOTE: Allow is a non-standard directive for robots.txt. It is allowed by Google bots. See https://developers.google.com/search/reference/robots_txt#allow
retaildive.com: # no deep queries to search
retaildive.com: # don't index our dynamic images
retaildive.com: # hide old-school trend report
retaildive.com: #
retaildive.com: # Rules for specific crawlers below. Note these replace and override the '*' rules above.
retaildive.com: #
retaildive.com: # Allow Twitter to see all links
retaildive.com: # Allow Googlebot-News to see header images and favicons, BUT make it follow all the directives from our * group
retaildive.com: # See below link for why we have to repeat these directives
retaildive.com: # https://developers.google.com/search/reference/robots_txt#order-of-precedence-for-user-agents
retaildive.com: # no deep queries to search
retaildive.com: # hide old-school trend report
retaildive.com: # Allow Google News to see header images and favicons
retaildive.com: # Don't let Google Images crawler see anything at all
retaildive.com: # Don't let PetalBot crawl at all
retaildive.com: # All Facebook crawler user-agent to see all
retaildive.com: # Allow swiftbot custom search to see all, but with a delay
retaildive.com: # We want this bot to crawl way slower http://ahrefs.com/robot/
retaildive.com: # And be more aggressive on what not to allow
careerfoundry.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
careerfoundry.com: #
careerfoundry.com: # To ban all spiders from the entire site uncomment the next two lines:
ksusentinel.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
homes.com: # /robots.txt file for http://www.homes.com
homes.com: # @@ROBOTS-PROD@@
homes.com: # e-mail web@homes.com for issues
subdl.com: # robots.txt generated at http://www.mcanerin.com
reedsy.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
reedsy.com: #
reedsy.com: # To ban all spiders from the entire site uncomment the next two lines:
reedsy.com: # User-agent: *
reedsy.com: # Disallow: /
genshin.gg: # https://www.robotstxt.org/robotstxt.html
bdo.com.ph: #
bdo.com.ph: # robots.txt
bdo.com.ph: #
bdo.com.ph: # This file is to prevent the crawling and indexing of certain parts
bdo.com.ph: # of your site by web crawlers and spiders run by sites like Yahoo!
bdo.com.ph: # and Google. By telling these "robots" where not to go on your site,
bdo.com.ph: # you save bandwidth and server resources.
bdo.com.ph: #
bdo.com.ph: # This file will be ignored unless it is at the root of your host:
bdo.com.ph: # Used: http://example.com/robots.txt
bdo.com.ph: # Ignored: http://example.com/site/robots.txt
bdo.com.ph: #
bdo.com.ph: # For more information about the robots.txt standard, see:
bdo.com.ph: # http://www.robotstxt.org/robotstxt.html
bdo.com.ph: # CSS, JS, Images
bdo.com.ph: # Directories
bdo.com.ph: # Files
bdo.com.ph: # Paths (clean URLs)
bdo.com.ph: # Paths (no clean URLs)
exactseek.com: # Allow only specific directories
pnu.ac.ir: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
pnu.ac.ir: #content{margin:0 0 0 2%;position:relative;}
politpuzzle.ru: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/
mccourier.com: # XML Sitemap & Google News version 5.2.7 - https://status301.net/wordpress-plugins/xml-sitemap-feed/
uts.edu.au: #
uts.edu.au: # robots.txt
uts.edu.au: #
uts.edu.au: # This file is to prevent the crawling and indexing of certain parts
uts.edu.au: # of your site by web crawlers and spiders run by sites like Yahoo!
uts.edu.au: # and Google. By telling these "robots" where not to go on your site,
uts.edu.au: # you save bandwidth and server resources.
uts.edu.au: #
uts.edu.au: # This file will be ignored unless it is at the root of your host:
uts.edu.au: # Used: http://example.com/robots.txt
uts.edu.au: # Ignored: http://example.com/site/robots.txt
uts.edu.au: #
uts.edu.au: # For more information about the robots.txt standard, see:
uts.edu.au: # http://www.robotstxt.org/robotstxt.html
uts.edu.au: # CSS, JS, Images
uts.edu.au: # Directories
uts.edu.au: # Files
uts.edu.au: # Paths (clean URLs)
uts.edu.au: # Paths (no clean URLs)
moxtra.com: # Squarespace Robots Txt
mulesoft.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $
mulesoft.com: #
mulesoft.com: # robots.txt
mulesoft.com: #
mulesoft.com: # This file is to prevent the crawling and indexing of certain parts
mulesoft.com: # of your site by web crawlers and spiders run by sites like Yahoo!
mulesoft.com: # and Google. By telling these "robots" where not to go on your site,
mulesoft.com: # you save bandwidth and server resources.
mulesoft.com: #
mulesoft.com: # This file will be ignored unless it is at the root of your host:
mulesoft.com: # Used: http://example.com/robots.txt
mulesoft.com: # Ignored: http://example.com/site/robots.txt
mulesoft.com: #
mulesoft.com: # For more information about the robots.txt standard, see:
mulesoft.com: # http://www.robotstxt.org/wc/robots.html
mulesoft.com: #
mulesoft.com: # For syntax checking, see:
mulesoft.com: # http://www.sxw.org.uk/computing/robots/check.html
mulesoft.com: # Directories
mulesoft.com: # Translated pages, origin
mulesoft.com: # Files
mulesoft.com: # Paths (clean URLs)
mulesoft.com: # Localization
mulesoft.com: # Paths (no clean URLs)
tufts.edu: #
tufts.edu: # robots.txt
tufts.edu: #
tufts.edu: # This file is to prevent the crawling and indexing of certain parts
tufts.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
tufts.edu: # and Google. By telling these "robots" where not to go on your site,
tufts.edu: # you save bandwidth and server resources.
tufts.edu: #
tufts.edu: # This file will be ignored unless it is at the root of your host:
tufts.edu: # Used: http://example.com/robots.txt
tufts.edu: # Ignored: http://example.com/site/robots.txt
tufts.edu: #
tufts.edu: # For more information about the robots.txt standard, see:
tufts.edu: # http://www.robotstxt.org/robotstxt.html
tufts.edu: # CSS, JS, Images
tufts.edu: # Directories
tufts.edu: # Files
tufts.edu: # Paths (clean URLs)
tufts.edu: # Paths (no clean URLs)
amasty.com: # Directories
amasty.com: # Paths (clean URLs)
amasty.com: # Files
amasty.com: # Paths (no clean URLs)
amasty.com: # Disallow: *do=
amasty.com: # Blog
sba.gov: #
sba.gov: # robots.txt
sba.gov: #
sba.gov: # CSS, JS, Images
sba.gov: # Directories
sba.gov: # Files
sba.gov: # Paths (clean URLs)
sba.gov: # Paths (no clean URLs)
kenhub.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
kenhub.com: #
kenhub.com: # To ban all spiders from the entire site uncomment the next two lines:
mix.com: # User-agent: Googlebot
mix.com: # User-agent: Bingbot
mix.com: # User-agent: baiduspider
mix.com: # User-agent: Applebot
mix.com: # User-agent: Yandex
mix.com: # Microsoft Search Engine Robot
mix.com: # User-agent: msnbot
mix.com: # Yahoo! Search Engine Robot
mix.com: # User-agent: Slurp
mix.com: # Stuff that search engines seem to pick up from wiggin.routes:
green-japan.com: # See http://www.robotstxt.org/wc/norobots for documentation on how to use the robots.txt file
peopleperhour.com: # Allow all robots to access our site
peopleperhour.com: # Real user monitoring causing errors
peopleperhour.com: # Disallowed pages
peopleperhour.com: # Disallowed Terms
peopleperhour.com: # Disallowed GET parameters
peopleperhour.com: # Disallow WordPress admin section
peopleperhour.com: # Sitemaps
topstarnews.net: #
topstarnews.net: # Other Bot, crawlers all disallow
topstarnews.net: #
staples.ca: # we use Shopify as our ecommerce platform
staples.ca: # Google adsbot ignores robots.txt unless specifically named!
exist.ru: # https://www.exist.ru
exist.ru: # Crawl-delay: 1
sponichi.co.jp: Binary file (standard input) matches
ixigo.com: # Hi there! Since you are here, we assume you are either a bot or a geek. In either case, drop us an email at [careers@ixigo.com]. We would love to have a conversation with you ;)
ixigo.com: #sitemaps
animenewsnetwork.com: # disallowed for ALL robots due to impact on impressions/click tracking
animenewsnetwork.com: # deprecated
animenewsnetwork.com: # TODO: add nofollow to such links because not all bots understand wildcards
animenewsnetwork.com: # disallowed for search engines because redundant
animenewsnetwork.com: # only for authorized users
animenewsnetwork.com: ################################################################################
animenewsnetwork.com: ################################################################################
animenewsnetwork.com: ################################################################################
animenewsnetwork.com: # block useless bot
raychat.io: # The robots.txt file is used to control how search engines index your live URLs.
raychat.io: # See http://sailsjs.org/documentation/anatomy/my-app/assets/robots-txt for more information.
raychat.io: # To prevent search engines from seeing the site altogether, uncomment the next two lines:
raychat.io: # User-Agent: *
raychat.io: # Disallow: /
ali213.net: # file: robots.txt,v 1.0 2015/03/06 created by ali213
ali213.net: # robots.txt for www.ali213.net <URL:http://www.robotstxt.org>
ali213.net: # -----------------------------------------------------------------------------
go4worldbusiness.com: # www.robotstxt.org/
go4worldbusiness.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
go4worldbusiness.com: #User-agent: *
go4worldbusiness.com: #Disallow: /
go4worldbusiness.com: # Enable when the pages aren't in google's index anymore
go4worldbusiness.com: #User-agent: *
go4worldbusiness.com: #Disallow: /inquiries/send
go4worldbusiness.com: #Disallow: /report/complaint
go4worldbusiness.com: # Temporarily allowing as per Nikhil's request
go4worldbusiness.com: #User-agent: Xenu's
go4worldbusiness.com: #Disallow: /
radio-canada.ca: # Pages qui n'existe plus (erreur 404)
radio-canada.ca: # Pages Problèmatiques : wildcards
radio-canada.ca: # Vieilles pages de nouvelles régionales du Québec (Ticket 21836)
radio-canada.ca: #Cas des calendriers qui peuvent reculer jusqu'au début des temps
radio-canada.ca: # Disallow: /*calendrier.as*
radio-canada.ca: #Cas des pages avec des directives cache et nocache
radio-canada.ca: # Disallow: /*cache=*
radio-canada.ca: # Disallow: /regions/*/Dossiers/detail.asp?Pk_Dossiers_regionaux=*&Pk_Dossiers_regionaux_page=*&VCh=*
radio-canada.ca: # Disallow: /regions/*/emissions/emission.asp?pk=*&date=*
radio-canada.ca: # Pages Problèmatiques : cas par cas
radio-canada.ca: #
radio-canada.ca: # Ticket 17686
radio-canada.ca: # Disallow: /sujet/monfleuvemonhistoire/
radio-canada.ca: # Disallow: /sujet-complements/monfleuvemonhistoire/
purewow.com: # robots.jsp
roguefitness.com: #CVS, SVN directories and dump files
roguefitness.com: # Magento Technical Folders
roguefitness.com: #Magento admin page
roguefitness.com: # Paths (clean URLs) Use if URLs are rewritten
roguefitness.com: # Checkout and user account - ensure proper checkout directory is used
roguefitness.com: # Magento Files
roguefitness.com: # Misc
anz.co.nz: # /robots.txt for https://www.anz.co.nz/
anz.co.nz: #
smartbizloans.com: # robots.txt generated at http://www.mcanerin.com
smartbizloans.com: # Session new
cervantesvirtual.com: # go away
sueddeutsche.de: # Robots.txt for sueddeutsche.de
sueddeutsche.de: # www.robotstxt.org/
sueddeutsche.de: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
sueddeutsche.de: # Exclude all other stuff for CRE tracking
sueddeutsche.de: # Exclude SEO-Tools & SPAM-Bots
sueddeutsche.de: # Uber Metrics
sueddeutsche.de: #Heidorn
tinuiti.com: # Disallow all crawlers from the following list updated 2019.10.02
hualongxiang.com: #
hualongxiang.com: # robots.txt for PHPWind
hualongxiang.com: # Version 8.0
hualongxiang.com: #
autoscout24.de: # Some bots are known to be trouble, particularly those designed to copy
autoscout24.de: # entire sites. Please obey robots.txt.
autoscout24.de: # Michael H, 17.12.19
optimisemedia.com: # www.robotstxt.org/
optimisemedia.com: # Allow crawling of all content
bookmyhsrp.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
bookmyhsrp.com: #content{margin:0 0 0 2%;position:relative;}
neighborhoodscout.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
neighborhoodscout.com: #
neighborhoodscout.com: # To ban all spiders from the entire site uncomment the next two lines:
neighborhoodscout.com: # User-agent: *
neighborhoodscout.com: # Disallow: /
avanza.se: # Avanza Bank Robots
lavoz.com.ar: # robots.txt La Voz
lavoz.com.ar: # Sitemaps
lavoz.com.ar: #Sitemap: https://www.lavoz.com.ar/sites/default/files/xmlsitemap/todos_sitemap_desktop.xml
lavoz.com.ar: # Tests
lavoz.com.ar: # API
lavoz.com.ar: # Denuncias
tripadvisor.co.uk: # Hi there,
tripadvisor.co.uk: #
tripadvisor.co.uk: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
tripadvisor.co.uk: #
tripadvisor.co.uk: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet?
tripadvisor.co.uk: #
tripadvisor.co.uk: # Run - don't crawl - to apply to join TripAdvisor's elite SEO team
tripadvisor.co.uk: #
tripadvisor.co.uk: # Email seoRockstar@tripadvisor.com
tripadvisor.co.uk: #
tripadvisor.co.uk: # Or visit https://careers.tripadvisor.com/search-results?keywords=seo
tripadvisor.co.uk: #
tripadvisor.co.uk: #
cellphones.com.vn: ## robots.txt for Magento Community and Enterprise
cellphones.com.vn: ## GENERAL SETTINGS
cellphones.com.vn: ## Enable robots.txt rules for all crawlers
cellphones.com.vn: ## Crawl-delay parameter: number of seconds to wait between successive requests to the same server.
cellphones.com.vn: ## Set a custom crawl rate if you're experiencing traffic problems with your server.
cellphones.com.vn: # Crawl-delay: 30
cellphones.com.vn: ## Magento sitemap: uncomment and replace the URL to your Magento sitemap file
cellphones.com.vn: # Sitemap: http://www.example.com/sitemap/sitemap.xml
cellphones.com.vn: ## DEVELOPMENT RELATED SETTINGS
cellphones.com.vn: ## Do not crawl development files and folders: CVS, svn directories and dump files
cellphones.com.vn: ## GENERAL MAGENTO SETTINGS
cellphones.com.vn: ## Do not crawl Magento admin page
cellphones.com.vn: ## Do not crawl common Magento technical folders
cellphones.com.vn: ## Do not crawl common Magento files
cellphones.com.vn: ## MAGENTO SEO IMPROVEMENTS
cellphones.com.vn: ## Do not crawl sub category pages that are sorted or filtered.
cellphones.com.vn: ## Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs.
cellphones.com.vn: ## Do not crawl links with session IDs
cellphones.com.vn: ## Do not crawl checkout and user account pages
cellphones.com.vn: ## Do not crawl seach pages and not-SEO optimized catalog links
cellphones.com.vn: ## SERVER SETTINGS
cellphones.com.vn: ## Do not crawl common server technical folders and files
cellphones.com.vn: ## IMAGE CRAWLERS SETTINGS
cellphones.com.vn: ## Extra: Uncomment if you do not wish Google and Bing to index your images
cellphones.com.vn: # User-agent: Googlebot-Image
cellphones.com.vn: # Disallow: /
cellphones.com.vn: # User-agent: msnbot-media
cellphones.com.vn: # Disallow: /
cellphones.com.vn: ## Cellphones Sitemap
google.com.gt: # AdsBot
google.com.gt: # Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.
cookieandkate.com: # flywheel permissions test
letemps.ch: #
letemps.ch: # robots.txt
letemps.ch: #
letemps.ch: # This file is to prevent the crawling and indexing of certain parts
letemps.ch: # of your site by web crawlers and spiders run by sites like Yahoo!
letemps.ch: # and Google. By telling these "robots" where not to go on your site,
letemps.ch: # you save bandwidth and server resources.
letemps.ch: #
letemps.ch: # This file will be ignored unless it is at the root of your host:
letemps.ch: # Used: http://example.com/robots.txt
letemps.ch: # Ignored: http://example.com/site/robots.txt
letemps.ch: #
letemps.ch: # For more information about the robots.txt standard, see:
letemps.ch: # http://www.robotstxt.org/robotstxt.html
letemps.ch: # CSS, JS, Images
letemps.ch: # Directories
letemps.ch: # Files
letemps.ch: # Paths (clean URLs)
letemps.ch: # Paths (no clean URLs)
takprosto.cc: #Disallow: /tag
takprosto.cc: #Disallow: /tag
mofa.go.kr: #
mofa.go.kr: # robots.txt
mofa.go.kr: #
mofa.go.kr: # This file is to prevent the crawling and indexing of certain parts
mofa.go.kr: # of your site by web crawlers and spiders run by sites like Yahoo!
mofa.go.kr: # and Google. By telling these "robots" where not to go on your site,
mofa.go.kr: # you save bandwidth and server resources.
mofa.go.kr: #
mofa.go.kr: # This file will be ignored unless it is at the root of your host:
mofa.go.kr: # Used: http://example.com/robots.txt
mofa.go.kr: # Ignored: http://example.com/site/robots.txt
mofa.go.kr: #
mofa.go.kr: # For more information about the robots.txt standard, see:
mofa.go.kr: # http://www.robotstxt.org/wc/robots.html
mofa.go.kr: #
mofa.go.kr: # For syntax checking, see:
mofa.go.kr: # http://www.sxw.org.uk/computing/robots/check.html
mofa.go.kr: # Paths (no clean URLs)
fgv.br: #
fgv.br: # robots.txt
fgv.br: #
fgv.br: # This file is to prevent the crawling and indexing of certain parts
fgv.br: # of your site by web crawlers and spiders run by sites like Yahoo!
fgv.br: # and Google. By telling these "robots" where not to go on your site,
fgv.br: # you save bandwidth and server resources.
fgv.br: #
fgv.br: # This file will be ignored unless it is at the root of your host:
fgv.br: # Used: http://example.com/robots.txt
fgv.br: # Ignored: http://example.com/site/robots.txt
fgv.br: #
fgv.br: # For more information about the robots.txt standard, see:
fgv.br: # http://www.robotstxt.org/robotstxt.html
fgv.br: # Directories CPDOC
fgv.br: # Directories TIC
fgv.br: # Directories DICOM
handshake.com: # robots.txt file for www.handshake.com
efe.com: # Begin block Bad-Robots from robots.txt
efe.com: # SEO-related bots
vivanuncios.com.mx: #Sitemaps
vivanuncios.com.mx: #Sorting parameters
vivanuncios.com.mx: #Other comments:
vivanuncios.com.mx: #Sorting parameters
vivanuncios.com.mx: #Other comments:
vivanuncios.com.mx: #Sorting parameters
vivanuncios.com.mx: #Other comments:
baccredomatic.com: #
baccredomatic.com: # robots.txt
baccredomatic.com: #
baccredomatic.com: # This file is to prevent the crawling and indexing of certain parts
baccredomatic.com: # of your site by web crawlers and spiders run by sites like Yahoo!
baccredomatic.com: # and Google. By telling these "robots" where not to go on your site,
baccredomatic.com: # you save bandwidth and server resources.
baccredomatic.com: #
baccredomatic.com: # This file will be ignored unless it is at the root of your host:
baccredomatic.com: # Used: http://example.com/robots.txt
baccredomatic.com: # Ignored: http://example.com/site/robots.txt
baccredomatic.com: #
baccredomatic.com: # For more information about the robots.txt standard, see:
baccredomatic.com: # http://www.robotstxt.org/robotstxt.html
baccredomatic.com: # CSS, JS, Images
baccredomatic.com: # Directories
baccredomatic.com: # Files
baccredomatic.com: # Paths (clean URLs)
baccredomatic.com: # Paths (no clean URLs)
bitearns.com: # User-agent: *
bitearns.com: #User-agent: DeepCrawl
bitearns.com: #Disallow: /
ucsb.edu: #
ucsb.edu: # robots.txt
ucsb.edu: #
ucsb.edu: # This file is to prevent the crawling and indexing of certain parts
ucsb.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
ucsb.edu: # and Google. By telling these "robots" where not to go on your site,
ucsb.edu: # you save bandwidth and server resources.
ucsb.edu: #
ucsb.edu: # This file will be ignored unless it is at the root of your host:
ucsb.edu: # Used: http://example.com/robots.txt
ucsb.edu: # Ignored: http://example.com/site/robots.txt
ucsb.edu: #
ucsb.edu: # For more information about the robots.txt standard, see:
ucsb.edu: # http://www.robotstxt.org/robotstxt.html
ucsb.edu: # CSS, JS, Images
ucsb.edu: # Directories
ucsb.edu: # Files
ucsb.edu: # Paths (clean URLs)
ucsb.edu: # Paths (no clean URLs)
ucsb.edu: # No A to Z items
prettylittlething.com: #****************************************************************************
prettylittlething.com: # robots.txt
prettylittlething.com: # : Robots, spiders, and search engines use this file to detmine which
prettylittlething.com: # content they should *not* crawl while indexing your website.
prettylittlething.com: # : This system is called "The Robots Exclusion Standard."
prettylittlething.com: # : It is strongly encouraged to use a robots.txt validator to check
prettylittlething.com: # for valid syntax before any robots read it!
prettylittlething.com: #
prettylittlething.com: # Examples:
prettylittlething.com: #
prettylittlething.com: # Instruct all robots to stay out of the admin area.
prettylittlething.com: # : User-agent: *
prettylittlething.com: # : Disallow: /admin/
prettylittlething.com: #
prettylittlething.com: # Restrict Google and MSN from indexing your images.
prettylittlething.com: # : User-agent: Googlebot
prettylittlething.com: # : Disallow: /images/
prettylittlething.com: # : User-agent: MSNBot
prettylittlething.com: # : Disallow: /images/
prettylittlething.com: #****************************************************************************
viatorrents.com: #linksdownload a{
viatorrents.com: #lista_download{
viatorrents.com: #lista_download a{
viatorrents.com: #lista_download strong a{
viatorrents.com: #lista_download img{
viatorrents.com: #menu_direito a{
viatorrents.com: #menu_direito li{
viatorrents.com: #informacoes{
viatorrents.com: #elenco{
viatorrents.com: #sinopse{
viatorrents.com: #capas_pequenas a{
viatorrents.com: #capas_pequenas{
viatorrents.com: #capas_pequenas h3{
viatorrents.com: #capas_pequenas img{
viatorrents.com: #capas_pequenas p{
viatorrents.com: #inicio img{
viatorrents.com: #inicio .col-sm-3{
viatorrents.com: #inicio h2{
viatorrents.com: #inicio .col-12:hover h2{
viatorrents.com: #pesquisa{
viatorrents.com: #rodape {
mobareco.jp: # This virtual robots.txt file was created by the Virtual Robots.txt WordPress plugin: https://www.wordpress.org/plugins/pc-robotstxt/
nikkei225jp.com: #body1{width:1053px;overflow:hidden;border-right:#690 solid 2px;background:#fff}
nikkei225jp.com: #outline{padding-top:5px}
nikkei225jp.com: #main1{padding-left:5px;float:left}
nikkei225jp.com: #main2{width:732px}
nikkei225jp.com: #main3{padding-left:5px;float:left}
nikkei225jp.com: #main4{width:732px}
nikkei225jp.com: #side1{padding-right:5px;float:right}
nikkei225jp.com: #side2{width:305px}
nikkei225jp.com: #main1 .win2{width:728px;overflow:hidden}
nikkei225jp.com: #main3 .win2{width:728px;overflow:hidden}
nikkei225jp.com: #topmenu{background:#FeFbe3;border-bottom:#cb9 solid 1px;width:1053px;height:48px;overflow:hidden;}
nikkei225jp.com: #topmenu a{background:#FeFbe3;border-color:#ffd #cb9 #FeFbe3 #ffd;border-style:solid;border-width:1px;color:#825839;display:blaock;float:left;font-size:15px;font-weight:700;height:44px;line-height:46px;position:relative;text-align:center;text-decoration:none}
nikkei225jp.com: #topmenu a:before{border:6px transparent solid;border-left-color:#ec9;border-right-width:0;content:'';height:0;left:3px;position:absolute;top:16px;width:0}
nikkei225jp.com: #topmenu a:hover{background:#FFFABF;color:#bb3333}
nikkei225jp.com: #topmenu a{height:46px;line-height:48px;}
nikkei225jp.com: #topmenu .flag{margin:14px 0 0 4px}
nikkei225jp.com: #topmenu .topF{padding:0 8px 0 10px;}
nikkei225jp.com: #topmenu .topF .flag{border:1px solid#f7dfaf;}
nikkei225jp.com: #nkLink a{background:#FeFbe3;border-color:#ffe #eda #cb9 #ffe;border-style:solid;border-width:1px;color:#853;display:block;font-size:13px;font-weight:600;padding:0;height:33.3px;line-height:37px;position:relative;text-align:center;text-decoration:none;width:137px;float:left;overflow:hidden}
nikkei225jp.com: #nkLink a:before{border:6px transparent solid;border-left-color:#eda;border-right-width:0;content:'';height:0;position:absolute;left:3px;top:12px;width:0}
nikkei225jp.com: #nkLink a:hover{color:#b33}
nikkei225jp.com: #rankLink a{background:#FeFbe3;border-color:#ffe #eda #cb9 #ffe;border-style:solid;border-width:1px;color:#853;display:block;font-size:13px;font-weight:600;margin:0;padding-left:16px;height:22.7px;line-height:26px;position:relative;text-decoration:none;width:185px;float:right;overflow:hidden}
nikkei225jp.com: #rankLink a:before{border:5px transparent solid;border-left-color:#eda;border-right-width:0;content:'';height:0;position:absolute;left:3px;top:7px;width:0}
nikkei225jp.com: #rankLink a:hover{color:#b33}
nikkei225jp.com: #rankLink span{background:#FeF9ec;border-color:#ffe #eda #eda #ffe;border-style:solid;border-width:1px;color:#853;display:block;font-size:13px;font-weight:500;margin:0;padding:0;height:22.7px;line-height:26px;position:relative;text-align:center;width:44px;float:left}
nikkei225jp.com: #rankLink .tit7,#nkLink .tit7{padding:5px;border-left:1px solid #eda}
nikkei225jp.com: #rankLink span{border-left:1px solid #eda}
nikkei225jp.com: #wtime{margin-bottom:0px;}
nikkei225jp.com: #if_con11 a.title,#if_con33 a,.if_tit a{color:#ccc;text-decoration:none}
nikkei225jp.com: #if_con11 a:hover.title,#if_con33 a:hover,.if_tit a:hover{color:#ffcc00}
nikkei225jp.com: #if_con{background:#ffffff;border:1px solid #ffffff;font:normal 12px Helvetica,Arial}
nikkei225jp.com: #if_con11{background:#ffffff;font:bold 10px Helvetica;height:22px;line-height:22px;text-align:center;width:158px}
nikkei225jp.com: #if_con2{border-top:1px solid #ddd;padding-bottom:0px;padding-top:4px}
nikkei225jp.com: #if_con22{height:130px;width:158p;padding-left:8px;}
nikkei225jp.com: #if_con3{border-top:1px solid #ddd}
nikkei225jp.com: #if_con33{background:#ffffff;clear:both;color:#999;font:normal 11px Helvetica;height:18px;line-height:18px;text-align:center;width:158px}
nikkei225jp.com: #if_con3{display:none}
nikkei225jp.com: #eveS{font-size:13px;line-height:165%;padding:5px}
nikkei225jp.com: #headline tt,.eve tt,li tt{color:#bbb;font-family:Arial;font-size:14px}
nikkei225jp.com: #headline font{color:#444;font-size:12px}
nikkei225jp.com: #headline a{color:#777;font-size:12px;padding-left:7px}
nikkei225jp.com: #Suke span{padding:2px 3px 2px 4px;}
nikkei225jp.com: #Suke .day font{padding:2px 3px 2px 4px;}
nikkei225jp.com: #sideLink a.glink{width:140px;text-align:left;float:left;height:18px;padding:0;margin:3px;clear:both}
nikkei225jp.com: #sideLink a.glink span{margin-right:4px;}
nikkei225jp.com: #sideLink span.linkTX{width:100px;float:left;display:block;height:20px;line-height:24px;padding:0 0 0 3px;margin:0;font-size:11px;color:#aaa;}
nikkei225jp.com: #reload{background:#c33;color:#fff;display:none;font-weight:bold;height:30px;left:0;line-height:30px;position:fixed;text-align:center;top:0;width:1053px}
nikkei225jp.com: #datatbl{width:100%}
nikkei225jp.com: #dhtmltooltip{position:absolute;left:-300px;width:150px;border:1px solid black;visibility:hidden;z-index:100;filter:progid:DXImageTransform.Microsoft.Shadow(color=gray,direction=135);padding:2px;background:lightyellow}
nikkei225jp.com: #dhtmlpointer{position:absolute;left:-300px;z-index:101;visibility:hidden}
nikkei225jp.com: #snsSiteTwS,#snsSiteTwF,#snsChtTw,#Tw111,#Tw717{background-position:0 0}
nikkei225jp.com: #snsSiteFbS,#snsSiteFbF{background-position:0 -32px}
nikkei225jp.com: #snsChtFb,#snsTblFb,#snsChtFb2,#snsTblFb1,#snsTblFb2,#snsTblFb3,#snsTblFb4,#snsTblFb5{background-position:0 -24px}
nikkei225jp.com: #snsSiteTwS,#snsSiteFbS{margin:0 0 8px 22px;}
nikkei225jp.com: #snsSiteTwS{position:absolute;top:3px;left:-12px;}
nikkei225jp.com: #snsSiteFbS{position:absolute;top:3px;left:30px;}
nikkei225jp.com: #stbl td{white-space:nowrap}
shopify.jp: # ,:
shopify.jp: # ,' |
shopify.jp: # / :
shopify.jp: # --' /
shopify.jp: # \/ />/
shopify.jp: # / <//_\
shopify.jp: # __/ /
shopify.jp: # )'-. /
shopify.jp: # ./ :\
shopify.jp: # /.' '
shopify.jp: # No need to shop around. Board the rocketship today – great SEO careers to checkout at shopify.com/careers
shopify.jp: # robots.txt file for www.shopify.jp
jornalcontabil.com.br: #robots.txt by ServerDo.in -- www.jornalcontabil.com.br
ku.edu: #
ku.edu: # robots.txt
ku.edu: #
ku.edu: # This file is to prevent the crawling and indexing of certain parts
ku.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
ku.edu: # and Google. By telling these "robots" where not to go on your site,
ku.edu: # you save bandwidth and server resources.
ku.edu: #
ku.edu: # This file will be ignored unless it is at the root of your host:
ku.edu: # Used: http://example.com/robots.txt
ku.edu: # Ignored: http://example.com/site/robots.txt
ku.edu: #
ku.edu: # For more information about the robots.txt standard, see:
ku.edu: # http://www.robotstxt.org/robotstxt.html
ku.edu: # CSS, JS, Images
ku.edu: # Directories
ku.edu: # Files
ku.edu: # Paths (clean URLs)
ku.edu: # Paths (no clean URLs)
codechef.com: # $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $
codechef.com: #
codechef.com: # robots.txt
codechef.com: #
codechef.com: # This file is to prevent the crawling and indexing of certain parts
codechef.com: # of your site by web crawlers and spiders run by sites like Yahoo!
codechef.com: # and Google. By telling these "robots" where not to go on your site,
codechef.com: # you save bandwidth and server resources.
codechef.com: #
codechef.com: # This file will be ignored unless it is at the root of your host:
codechef.com: # Used: http://example.com/robots.txt
codechef.com: # Ignored: http://example.com/site/robots.txt
codechef.com: #
codechef.com: # For more information about the robots.txt standard, see:
codechef.com: # http://www.robotstxt.org/wc/robots.html
codechef.com: #
codechef.com: # For syntax checking, see:
codechef.com: # http://www.sxw.org.uk/computing/robots/check.html
codechef.com: # Allowing css, js and images
codechef.com: # Directories
codechef.com: # Files
codechef.com: # Paths (clean URLs)
codechef.com: # Paths (no clean URLs)
codechef.com: # Add Sitemap
loltoy.myshopify.com: # we use Shopify as our ecommerce platform
loltoy.myshopify.com: # Google adsbot ignores robots.txt unless specifically named!
viz.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
viz.com: #
viz.com: # To ban all spiders from the entire site uncomment the next two lines:
viz.com: # User-agent: *
viz.com: # Disallow: /
tv21.tv: #Websavers Bot Protection
tv21.tv: #Timely Events Calendar
tv21.tv: #ecwd Calendar
tv21.tv: #Tribe Events Calendar
tv21.tv: #Comment Mail Plugin
tv21.tv: #Search, sharing, sorting not needed for bots
jmbullion.com: # Added by SEO Ultimate's Link Mask Generator module
jmbullion.com: # End Link Mask Generator output
volusion.com: # robots.txt for https://www.volusion.com/
volusion.com: # Directories
volusion.com: # Erin Directories
volusion.com: # Files
volusion.com: # Erin Files
volusion.com: # Paths (clean URLs)
volusion.com: # Erin Paths (clean URLs)
volusion.com: # Paths (no clean URLs)
volusion.com: ## Erin Paths (no clean URLs)
matplotlib.org: # Docs: https://developers.google.com/search/docs/advanced/robots/intro
matplotlib.org: # Note old files will still be indexed if they have links to them,
matplotlib.org: # hopefully they are weighted less...
matplotlib.org: # do not search root directory by default.
matplotlib.org: # files at top level:
matplotlib.org: # tell robots this is sitemap
nsportal.ru: #
nsportal.ru: # robots.txt
nsportal.ru: #
nsportal.ru: # This file is to prevent the crawling and indexing of certain parts
nsportal.ru: # of your site by web crawlers and spiders run by sites like Yahoo!
nsportal.ru: # and Google. By telling these "robots" where not to go on your site,
nsportal.ru: # you save bandwidth and server resources.
nsportal.ru: #
nsportal.ru: # This file will be ignored unless it is at the root of your host:
nsportal.ru: # Used: http://example.com/robots.txt
nsportal.ru: # Ignored: http://example.com/site/robots.txt
nsportal.ru: #
nsportal.ru: # For more information about the robots.txt standard, see:
nsportal.ru: # http://www.robotstxt.org/robotstxt.html
nsportal.ru: # CSS, JS, Images
nsportal.ru: # Directories
nsportal.ru: # Files
nsportal.ru: # Paths (clean URLs)
nsportal.ru: # Paths (no clean URLs)
bobvila.com: # allow boomtrain bot on entire sites
maalaimalar.com: # Sitemap Files
thewindowsclub.com: # global
techsmith.fr: # Robots.txt for www.techsmith.fr
techsmith.fr: #02 July 2018
bitwarden.com: # .__________________________.
bitwarden.com: # | .___________________. |==|
bitwarden.com: # | | ................. | | |
bitwarden.com: # | | ::[ Dear robot ]: | | |
bitwarden.com: # | | ::::[ be nice ]:: | | |
bitwarden.com: # | | ::::::::::::::::: | | |
bitwarden.com: # | | ::::::::::::::::: | | |
bitwarden.com: # | | ::::::::::::::::: | | |
bitwarden.com: # | | ::::::::::::::::: | | ,|
bitwarden.com: # | !___________________! |(c|
bitwarden.com: # !_______________________!__!
bitwarden.com: # / \
bitwarden.com: # / [][][][][][][][][][][][][] \
bitwarden.com: # / [][][][][][][][][][][][][][] \
bitwarden.com: #( [][][][][____________][][][][] )
bitwarden.com: # \ ------------------------------ /
bitwarden.com: # \______________________________/
bitwarden.com: # ________
bitwarden.com: # __,_, | |
bitwarden.com: # [_|_/ | OK |
bitwarden.com: # // |________|
bitwarden.com: # _// __ /
bitwarden.com: #(_|) |@@|
bitwarden.com: # \ \__ \--/ __
bitwarden.com: # \o__|----| | __
bitwarden.com: # \ }{ /\ )_ / _\
bitwarden.com: # /\__/\ \__O (__
bitwarden.com: # (--/\--) \__/
bitwarden.com: # _)( )(_
bitwarden.com: # `---''---`
powr.io: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
powr.io: #
powr.io: # To ban all spiders from the entire site uncomment the next two lines:
powr.io: # User-agent: *
powr.io: # Disallow: /
mypetads.com: # Blocks robots from specific folders / directories
mypetads.com: # Crawl-delay: 80
eventbrite.com.au: # http://www.google.com.au/adsbot.html - AdsBot ignores * wildcard
bloggingwizard.com: # block bots
bloggingwizard.com: # slow down bots
mercadolibre.cl: #siteId: MLC
mercadolibre.cl: #country: chile
mercadolibre.cl: ##Block - Referidos
mercadolibre.cl: ##Block - siteinfo urls
mercadolibre.cl: ##Block - Cart
mercadolibre.cl: ##Block Checkout
mercadolibre.cl: ##Block - User Logged
mercadolibre.cl: #Shipping selector
mercadolibre.cl: ##Block - last search
mercadolibre.cl: ## Block - Profile - By Id
mercadolibre.cl: ## Block - Profile - By Id and role (old version)
mercadolibre.cl: ## Block - Profile - Leg. Req.
mercadolibre.cl: ##Block - noindex
mercadolibre.cl: # Mercado-Puntos
mercadolibre.cl: # Viejo mundo
mercadolibre.cl: ##Block recommendations listing
secretsearchenginelabs.com: #Don't spider our own search results
uphf.fr: # Directories
uphf.fr: # Files
uphf.fr: # Paths (clean URLs)
uphf.fr: # Paths (no clean URLs)
deodap.com: # we use Shopify as our ecommerce platform
deodap.com: # Google adsbot ignores robots.txt unless specifically named!
decathlon.com: # we use Shopify as our ecommerce platform
decathlon.com: # Google adsbot ignores robots.txt unless specifically named!
depaul.edu: #AjaxDelta1 i {
manithan.com: # Disallow: /*? This is match ? anywhere in the URL
sololearn.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
sololearn.com: #content{margin:0 0 0 2%;position:relative;}
midocean.com: # dynamic url's
midocean.com: # no hammering
oneplus.in: # robots.txt for https://www.oneplus.in/
ravelry.com: ## Below entries are from Wikipedia's robots.txt :)
ravelry.com: #
ravelry.com: # recursive wget
ravelry.com: #
ravelry.com: #
ravelry.com: # The 'grub' distributed client has been *very* poorly behaved.
ravelry.com: #
ravelry.com: #
ravelry.com: # Doesn't follow robots.txt anyway, but...
ravelry.com: #
ravelry.com: #
ravelry.com: # Hits many times per second, not acceptable
ravelry.com: # http://www.nameprotect.com/botinfo.html
ravelry.com: # A capture bot, downloads gazillions of pages with no public benefit
ravelry.com: # http://www.webreaper.net/
akbank.com: ### Start ###
akbank.com: # global rules
akbank.com: # sitemaps
akbank.com: ### Stop ###
bookmarking.info: # 1) this filename (robots.txt) must stay lowercase
bookmarking.info: # 2) this file must be in the servers root directory
bookmarking.info: # ex: http://www.mydomain.com/kliqqisubfolder/ -- you must move the robots.txt from
bookmarking.info: # /kliqqisubfolder/ to the root folder for http://www.mydomain.com/
bookmarking.info: # you must then add your subfolder to each 'Disallow' below
bookmarking.info: # ex: Disallow: /cache/ becomes Disallow: /kliqqisubfolder/cache/
photoshelter.com: # ROBOTS.TXT FOR PHOTOSHELTER.COM
photoshelter.com: # Was disallowed because it was overly aggressive
photoshelter.com: # access re-enabled on May 30, 2013
photoshelter.com: # User-agent: ia_archiver
photoshelter.com: # Disallow: /
capital.fr: # from orphanPage robots
capital.fr: # www.robotstxt.org/
capital.fr: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
capital.fr: ##############
capital.fr: # Blocage URL NR
capital.fr: ##############
capital.fr: ###############
capital.fr: # URLs inutiles 2014
capital.fr: ###############
capital.fr: ######
capital.fr: # A CORRIGER
capital.fr: #####
capital.fr: ######
capital.fr: # 17-08-2017
capital.fr: ######
capital.fr: #Disallow: /bourse/communiques/
vikingswap.finance: # https://www.robotstxt.org/robotstxt.html
binghamton.edu: ### User-agent: Baiduspider
binghamton.edu: ### Disallow: /
binghamton.edu: ### User-agent: "Sogou web spider"
binghamton.edu: ### Disallow: /
rankwatch.com: # robots.txt for /
jitunews.com: # robots.txt generated by Jay (jaykeren@gmail.com)
fangraphs.com: # robots.txt for http://www.fangraphs.com/
converse.com: # https://www.converse.com robots.txt
designbundles.net: # www.robotstxt.org/
designbundles.net: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
walmart.ca: # Prevent refined browse pages from being crawled, avoiding millions of near-duplicate entries. MG IG
walmart.ca: # Now allowing assets to be crawled. MG
walmart.ca: #Disallow: /assets/*
walmart.ca: # Prevents Financial Services One and Done page from being hit. 11-16-2016 NV
walmart.ca: # Prevents Financial WM MC page from being hit, contains promocodes. 11-16-2016 NV
walmart.ca: # Always include index sitemaps below rules. MB
walmart.ca: #Ending of robots.txt
granma.cu: #14-08-2019
granma.cu: # Directories
granma.cu: # Files
granma.cu: # Paths (clean URLs)
granma.cu: # Paths (no clean URLs)
granma.cu: #Bots
progress.com: #Image Sitemap
progress.com: #Video Sitemap
coolblue.nl: # On all coolblue. domains
coolblue.nl: # Old, on shops
coolblue.nl: # No translation known or needed for
coolblue.nl: # Only on Coolblue.nl and .be - Dutch language (on coolblue.nl as a safeguard)
coolblue.nl: # Only on Coolblue. domains - English language
coolblue.nl: # The URL behind the # mark is the Dutch equivalent (just for reference, doesn't block anything), sorted alphabetically in Dutch
coolblue.nl: # If a line is behind a #, the translation still needs to be added
coolblue.nl: # Disallow: /en/??? # /nl/mailafriend
coolblue.nl: # Disallow: /en/??? # /nl/questionnaire
coolblue.nl: # Disallow: /en/??? # /nl/*/voor-de$
coolblue.nl: # Disallow: /en/??? # /nl/*/voor-de/*
coolblue.nl: # Only on Coolblue. domains - English language
coolblue.nl: # Only on Coolblue.nl - Dutch only
coolblue.nl: # Only on Coolblue.nl - English only
coolblue.nl: # For specific bots (on all domains)
coolblue.nl: # Hi! Trying to reverse engineer something?
coolblue.nl: # Maybe you should come work with us.
coolblue.nl: # Apply at www.careersatcoolblue.com and mention this comment.
wnacg.org: #instantclick-bar{background:#d22;}
sreality.cz: # Better safe than sorry
biz2credit.com: # For more information about the robots.txt standard, see:
biz2credit.com: # http://www.robotstxt.org/orig.html
biz2credit.com: #
biz2credit.com: # For syntax checking, see:
biz2credit.com: # http://tool.motoricerca.info/robots-checker.phtml
hipertextual.com: #
hipertextual.com: # robots.txt
hipertextual.com: #
chmotor.cn: #
chmotor.cn: # robots.txt for chmotor.cn
chmotor.cn: #
simplenote.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
simplenote.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
simplenote.com: # This file was generated on Tue, 31 Mar 2020 18:02:18 +0000
mq.edu.au: # Limit Bingbot
mq.edu.au: # Allow CSS and JS
mq.edu.au: # Allow: /_design/css/*
mq.edu.au: # Allow: /_design/js/*
mq.edu.au: # Disallow some matrix defaults
mq.edu.au: # Disallow UAT and sandbox pages
mq.edu.au: # Disallow search page
mq.edu.au: # sitemap
mq.edu.au: # Keep deliberately duplicated pages out of the Google index
mp3party.net: #Disallow: /online/*
mp3party.net: #Disallow: /search*
mp3party.net: #Disallow: /play/
mp3party.net: #Disallow: *?*sort=
mp3party.net: #Disallow: /artist/*/new
mp3party.net: #Disallow: /artist/*/pop
retty.me: #User-agent: bingbot
retty.me: #Crawl-delay: 5
retty.me: #User-agent: msnbot
retty.me: #Crawl-delay: 5
retty.me: #User-agent: baiduspider
retty.me: #Crawl-delay: 5
mailtrack.io: # www.robotstxt.org/
weworkremotely.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
bollywoodhungama.com: #Baiduspider
xataka.com: #
xataka.com: # robots.txt
xataka.com: #
xataka.com: # Crawlers that are kind enough to obey, but which we'd rather not have
xataka.com: # unless they're feeding search engines.
xataka.com: # Some bots are known to be trouble, particularly those designed to copy
xataka.com: # entire sites. Please obey robots.txt.
xataka.com: # Sorry, wget in its recursive mode is a frequent problem.
xataka.com: # Please read the man page and use it properly; there is a
xataka.com: # --wait option you can use to set the delay between hits,
xataka.com: # for instance.
xataka.com: #
xataka.com: #
xataka.com: # The 'grub' distributed client has been *very* poorly behaved.
xataka.com: #
xataka.com: #
xataka.com: # Doesn't follow robots.txt anyway, but...
xataka.com: #
xataka.com: #
xataka.com: # Hits many times per second, not acceptable
xataka.com: # http://www.nameprotect.com/botinfo.html
xataka.com: # A capture bot, downloads gazillions of pages with no public benefit
xataka.com: # http://www.webreaper.net/
teleamazonas.com: #Disallow: /wp-admin/
teleamazonas.com: #Disallow: /wp-includes/
temple.edu: #
temple.edu: # robots.txt
temple.edu: #
temple.edu: # This file is to prevent the crawling and indexing of certain parts
temple.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
temple.edu: # and Google. By telling these "robots" where not to go on your site,
temple.edu: # you save bandwidth and server resources.
temple.edu: #
temple.edu: # This file will be ignored unless it is at the root of your host:
temple.edu: # Used: http://example.com/robots.txt
temple.edu: # Ignored: http://example.com/site/robots.txt
temple.edu: #
temple.edu: # For more information about the robots.txt standard, see:
temple.edu: # http://www.robotstxt.org/robotstxt.html
temple.edu: # CSS, JS, Images
temple.edu: # Directories
temple.edu: # Files
temple.edu: # Paths (clean URLs)
temple.edu: # Paths (no clean URLs)
radio.co: # robots.txt for https://radio.co/
radio.co: # live - don't allow web crawlers to index cpresources/ or vendor/
escortinparis.info: #linkpad.ru
escortinparis.info: #majestic.com
escortinparis.info: #ahrefs.com
escortinparis.info: #moz.com
escortinparis.info: #semrush.com
4over.com: # robotstxt.org
roofstock.com: # http://www.robotstxt.org
valenciacollege.edu: # robots.txt for http://valenciacollege.edu/
lifelock.com: #Global Allow
lifelock.com: # General disallow rules
productreview.com.au: # Bad bots
productreview.com.au: # No query string or inner pages
productreview.com.au: # No query string
productreview.com.au: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's)
gnu.org: # robots.txt for http://www.gnu.org/
gnu.org: # RT #1298215.
gnu.org: # RT #1298215.
gnu.org: # RT #1638325.
usatuan.com: # we use Shopify as our ecommerce platform
usatuan.com: # Google adsbot ignores robots.txt unless specifically named!
highlow.net: # www.robotstxt.org/
highlow.net: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
ey.com: # Allow all robots complete access
riotpixels.com: # HOW TO USE THIS FILE:
riotpixels.com: # 1) Edit this file to change "/forum/" to the correct relative path from your base URL, for example if your forum was at "domain.com/sites/community", then you'd use "/sites/community/"
riotpixels.com: # 2) Rename the file to 'robots.txt' and move it to your web root (public_html, www, or htdocs)
riotpixels.com: # 3) Edit the file to remove this comment (anything above the dashed line, including the dashed line
riotpixels.com: #
riotpixels.com: # NOTES:
riotpixels.com: # Even though wild cards and pattern matching are not part of the robots.txt specification, many search bots understand and make use of them
riotpixels.com: #------------------------ REMOVE THIS LINE AND EVERYTHING ABOVE SO THAT User-agent: * IS THE FIRST LINE ------------------------------------------
powershow.com: #Disallow: /js/
powershow.com: # Old pages
powershow.com: # templates
powershow.com: # slides
powershow.com: # stock-photos
powershow.com: #books
infopicked.com: # Allow Google robots to crawl adServe
nuskin.com: # robots.txt for nuskin - Fastly
integrationsfonds.at: # folders
integrationsfonds.at: # parameters
wko.at: # 07/2007 hide mitarbeiterlistings and obsolete personal info
wko.at: # 08/2007 detailindex
wko.at: # 10/2018
dorar.net: # www.robotstxt.org/
dorar.net: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
ilsole24ore.com: # robots.txt www.ilsole24ore.com
ilsole24ore.com: # 20/06/2019 v. 1.0
pocket-lint.com: # www.robotstxt.org/
pocket-lint.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
healthunlocked.com: # User pages
healthunlocked.com: # Community pages
healthunlocked.com: # Private Programs
healthunlocked.com: # javascript files
healthunlocked.com: # Overrides
healthunlocked.com: # Private groups via API
healthunlocked.com: # Bot specific control
rapmls.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
rapmls.com: #content{margin:0 0 0 2%;position:relative;}
sosocar.cn: # robotstxt.org/
ptrack1.com: # `..--:::::::::--.``
ptrack1.com: # ``..-::///////////////:.`
ptrack1.com: # `..-:://////////////////-
ptrack1.com: # ```````..-----------::://+/////-
ptrack1.com: # .....```...............```.:+////
ptrack1.com: # `----....-:-............`````.+///`
ptrack1.com: # `::/:---smmNy.........-hMNo```:+//
ptrack1.com: # `://:---dNNMd-........+MMMN.``-+/:
ptrack1.com: # -//----:os+-..........+hy:```:+/-
ptrack1.com: # -:/:---------............`.-:+//-
ptrack1.com: # -:/++////////+++++///////+++////:`
ptrack1.com: # `.-:/+++++++++++//////////////////:-.` ````
ptrack1.com: # ``````````.-://++++++++++++++++//////////////::-.....--::/-`
ptrack1.com: # `----------:://++++++++++++++++++++//////////////::::::///////`
ptrack1.com: # :////////////++++++++++++++++++++++++++++++++++///////////////.
ptrack1.com: # /++++++++++++++++++++++++++++++++++++++++++++++++++++++++/////.
ptrack1.com: # :++++++++++++++++++++++++++++++++++++++++++++++++++++++++++///`
ptrack1.com: # `/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++/.
ptrack1.com: # .:////:::::/+++++++++++++++++++++++++++++++++++//://////:-`
ptrack1.com: # -/+++++++++++++++++++++++++++++++/`
ptrack1.com: # `///+++++++++++++++++++++++++++++-
ptrack1.com: # `:////+++++++++++++++++++++++++++.
ptrack1.com: # ::////+++++++++++++++++++++++++/`
ptrack1.com: # -::///+++++++++++++++++++++++++/`
ptrack1.com: # .-:///+++++++++++++++++++++++++/
ptrack1.com: # .-::///++++++++++++++++++++++++/
ptrack1.com: # .-::///++++++++++++++++++++++++/
ptrack1.com: # --:////++++++++++++++++++++++++/
ptrack1.com: # `-::///++++++++:``-+++++++++++++/`
ptrack1.com: # `-:///+++++++++.```+++++++++++++/`
ptrack1.com: # `::///+++++++++:``.+++++++++++++/.````````
ptrack1.com: # ````:///++++++++++/..:+++++++++++++/.``````````````
ptrack1.com: # ` ```````://++++++++++/----/+++++++++++/:...`````````````
ptrack1.com: # `````````.-----------......--::::/::::.``````````````
pinkvilla.com: # Files
pinkvilla.com: # Paths (clean URLs)
pinkvilla.com: # Paths (no clean URLs)
academy.com: # changes for 11/01 release
academy.com: # changes for 10/26 release
academy.com: #Changes for 3/10/18 Kermit R1.1
academy.com: #Changes for 16/10/18 Kermit 1.2
makaan.com: # robots.txt for http://www.makaan.com/
makaan.com: ########api docs########
makaan.com: # Block SEM page
anaconda.com: # robots.txt for https://www.anaconda.com/
anaconda.com: # live - don't allow web crawlers to index cpresources/ or vendor/
vingle.net: # @user/interests/:interest
coolblue.be: # On all coolblue. domains
coolblue.be: # Old, on shops
coolblue.be: # No translation known or needed for
coolblue.be: # Only on Coolblue.nl and .be - Dutch language (on coolblue.nl as a safeguard)
coolblue.be: # Only on Coolblue. domains - English language
coolblue.be: # The URL behind the # mark is the Dutch equivalent (just for reference, doesn't block anything), sorted alphabetically in Dutch
coolblue.be: # If a line is behind a #, the translation still needs to be added
coolblue.be: # Disallow: /en/??? # /nl/mailafriend
coolblue.be: # Disallow: /en/??? # /nl/questionnaire
coolblue.be: # Disallow: /en/??? # /nl/*/voor-de$
coolblue.be: # Disallow: /en/??? # /nl/*/voor-de/*
coolblue.be: # Only on Coolblue. domains - English language
coolblue.be: # Only on Coolblue.be - French language
coolblue.be: # The URL behind the # mark is the Dutch equivalent, sorted alphabetically in Dutch
coolblue.be: # If a line is behind a #, the translation still needs to be added
coolblue.be: # Disallow: /fr/??? # /nl/mailafriend
coolblue.be: # Disallow: /fr/??? # /nl/questionnaire
coolblue.be: # Disallow: /fr/??? # /nl/*/voor-de$
coolblue.be: # Disallow: /fr/??? # /nl/*/voor-de/*
coolblue.be: # Only on Coolblue.be domain - French language
coolblue.be: # For specific bots (on all domains)
coolblue.be: # Hi! Trying to reverse engineer something?
coolblue.be: # Maybe you should come work with us.
coolblue.be: # Apply at www.careersatcoolblue.com and mention this comment.
gdeposylka.ru: # Begin Bad-Robots (DO NOT EDIT AFTER THIS LINE)
gdeposylka.ru: # SEO-related bots
smartdraw.com: # Additional restrictions for MSIECrawler anywhere on Site
sravni.ru: # tech
sravni.ru: # interface pages
sravni.ru: # auto
sravni.ru: # currency
sravni.ru: # news and articles
sravni.ru: #–†–µ–≥–∏–æ–Ω—ã –∏ —Å—É–±—ä–µ–∫—Ç—ã —Ñ–µ–¥–µ—Ä–∞—Ü–∏–∏
sravni.ru: #old urls
sravni.ru: # tech
sravni.ru: # interface pages
sravni.ru: # auto
sravni.ru: # currency
sravni.ru: # news and articles
sravni.ru: #–†–µ–≥–∏–æ–Ω—ã –∏ —Å—É–±—ä–µ–∫—Ç—ã —Ñ–µ–¥–µ—Ä–∞—Ü–∏–∏
sravni.ru: #old urls
unipd.it: #
unipd.it: # robots.txt
unipd.it: #
unipd.it: # This file is to prevent the crawling and indexing of certain parts
unipd.it: # of your site by web crawlers and spiders run by sites like Yahoo!
unipd.it: # and Google. By telling these "robots" where not to go on your site,
unipd.it: # you save bandwidth and server resources.
unipd.it: #
unipd.it: # This file will be ignored unless it is at the root of your host:
unipd.it: # Used: http://example.com/robots.txt
unipd.it: # Ignored: http://example.com/site/robots.txt
unipd.it: #
unipd.it: # For more information about the robots.txt standard, see:
unipd.it: # http://www.robotstxt.org/robotstxt.html
unipd.it: # CSS, JS, Images
unipd.it: # Directories
unipd.it: # Files
unipd.it: # Paths (clean URLs)
unipd.it: # Paths (no clean URLs)
unipd.it: # Directory sites/unipd.it/files
unipd.it: # #######
ut.ac.id: #
ut.ac.id: # robots.txt
ut.ac.id: #
ut.ac.id: # This file is to prevent the crawling and indexing of certain parts
ut.ac.id: # of your site by web crawlers and spiders run by sites like Yahoo!
ut.ac.id: # and Google. By telling these "robots" where not to go on your site,
ut.ac.id: # you save bandwidth and server resources.
ut.ac.id: #
ut.ac.id: # This file will be ignored unless it is at the root of your host:
ut.ac.id: # Used: http://example.com/robots.txt
ut.ac.id: # Ignored: http://example.com/site/robots.txt
ut.ac.id: #
ut.ac.id: # For more information about the robots.txt standard, see:
ut.ac.id: # http://www.robotstxt.org/robotstxt.html
ut.ac.id: # CSS, JS, Images
ut.ac.id: # Directories
ut.ac.id: # Files
ut.ac.id: # Paths (clean URLs)
ut.ac.id: # Paths (no clean URLs)
makro.co.za: # For all robots
makro.co.za: # Block access to specific groups of pages
makro.co.za: # Allow search crawlers to discover the sitemap
makro.co.za: # Block CazoodleBot as it does not present correct accept content headers
makro.co.za: # Block MJ12bot as it is just noise
makro.co.za: # Block dotbot as it cannot parse base urls properly
makro.co.za: # Block Gigabot
doctorsfile.jp: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
doctorsfile.jp: #
doctorsfile.jp: # To ban all spiders from the entire site uncomment the next two lines:
doctorsfile.jp: # User-agent: *
doctorsfile.jp: # Disallow: /
theladders.com: # robots.txt for TheLadders.com
theladders.com: # robots.txt,v 3.1 2020/06/19 10:57:00
classified4free.net: # Blocks robots from specific folders / directories
classified4free.net: # Crawl-delay: 80
greetingsisland.com: #allowing twitter bot so ecard will show in ticket
greetingsisland.com: #disallow 'way back machine'
skiddle.com: #Disallow: /infofeed/
skiddle.com: #Disallow hotel pages, apart from locations and nearby
skiddle.com: #Disallow: /hotels/*/$
skiddle.com: #Allow: /hotels/*near*/$
skiddle.com: #Allow: /hotels/*.html$
skiddle.com: #Disallow restaurant pages, apart from locations and nearby
capterra.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
ic.gc.ca: #User-agent: *
ic.gc.ca: #Disallow: /app/scr/cc/CorporationsCanada/fdrlCrpDtls.html
pevex.hr: # Trive.digital robots.txt 17.9.2020
pevex.hr: # Image Crawler Setup
pevex.hr: # All other bots
pevex.hr: # Directories
pevex.hr: # Paths (clean URLs)
pevex.hr: # Paths (no clean URLs)
pevex.hr: # Pevex specific
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # multichoice
sauto.cz: # 4+ params
sauto.cz: # Homepage categories
sauto.cz: # Better safe than sorry
brookings.edu: # Sitemap archive
cbp.gov: #
cbp.gov: # robots.txt
cbp.gov: #
cbp.gov: # This file is to prevent the crawling and indexing of certain parts
cbp.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
cbp.gov: # and Google. By telling these "robots" where not to go on your site,
cbp.gov: # you save bandwidth and server resources.
cbp.gov: #
cbp.gov: # This file will be ignored unless it is at the root of your host:
cbp.gov: # Used: http://example.com/robots.txt
cbp.gov: # Ignored: http://example.com/site/robots.txt
cbp.gov: #
cbp.gov: # For more information about the robots.txt standard, see:
cbp.gov: # http://www.robotstxt.org/robotstxt.html
cbp.gov: # CSS, JS, Images
cbp.gov: # Directories
cbp.gov: # Files
cbp.gov: # Paths (clean URLs)
cbp.gov: # Paths (no clean URLs)
wikidata.org: #
wikidata.org: # Please note: There are a lot of pages on this site, and there are
wikidata.org: # some misbehaved spiders out there that go _way_ too fast. If you're
wikidata.org: # irresponsible, your access to the site may be blocked.
wikidata.org: #
wikidata.org: # Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN
wikidata.org: # and ignoring 429 ratelimit responses, claims to respect robots:
wikidata.org: # http://mj12bot.com/
wikidata.org: # advertising-related bots:
wikidata.org: # Wikipedia work bots:
wikidata.org: # Crawlers that are kind enough to obey, but which we'd rather not have
wikidata.org: # unless they're feeding search engines.
wikidata.org: # Some bots are known to be trouble, particularly those designed to copy
wikidata.org: # entire sites. Please obey robots.txt.
wikidata.org: # Misbehaving: requests much too fast:
wikidata.org: #
wikidata.org: # Sorry, wget in its recursive mode is a frequent problem.
wikidata.org: # Please read the man page and use it properly; there is a
wikidata.org: # --wait option you can use to set the delay between hits,
wikidata.org: # for instance.
wikidata.org: #
wikidata.org: #
wikidata.org: # The 'grub' distributed client has been *very* poorly behaved.
wikidata.org: #
wikidata.org: #
wikidata.org: # Doesn't follow robots.txt anyway, but...
wikidata.org: #
wikidata.org: #
wikidata.org: # Hits many times per second, not acceptable
wikidata.org: # http://www.nameprotect.com/botinfo.html
wikidata.org: # A capture bot, downloads gazillions of pages with no public benefit
wikidata.org: # http://www.webreaper.net/
wikidata.org: #
wikidata.org: # Friendly, low-speed bots are welcome viewing article pages, but not
wikidata.org: # dynamically-generated pages please.
wikidata.org: #
wikidata.org: # Inktomi's "Slurp" can read a minimum delay between hits; if your
wikidata.org: # bot supports such a thing using the 'Crawl-delay' or another
wikidata.org: # instruction, please let us know.
wikidata.org: #
wikidata.org: # There is a special exception for API mobileview to allow dynamic
wikidata.org: # mobile web & app views to load section content.
wikidata.org: # These views aren't HTTP-cached but use parser cache aggressively
wikidata.org: # and don't expose special: pages etc.
wikidata.org: #
wikidata.org: # Another exception is for REST API documentation, located at
wikidata.org: # /api/rest_v1/?doc.
wikidata.org: #
wikidata.org: #
wikidata.org: # ar:
wikidata.org: #
wikidata.org: # dewiki:
wikidata.org: # T6937
wikidata.org: # sensible deletion and meta user discussion pages:
wikidata.org: # 4937#5
wikidata.org: # T14111
wikidata.org: # T15961
wikidata.org: #
wikidata.org: # enwiki:
wikidata.org: # Folks get annoyed when VfD discussions end up the number 1 google hit for
wikidata.org: # their name. See T6776
wikidata.org: # T15398
wikidata.org: # T16075
wikidata.org: # T13261
wikidata.org: # T12288
wikidata.org: # T16793
wikidata.org: #
wikidata.org: # eswiki:
wikidata.org: # T8746
wikidata.org: #
wikidata.org: # fiwiki:
wikidata.org: # T10695
wikidata.org: #
wikidata.org: # hewiki:
wikidata.org: #T11517
wikidata.org: #
wikidata.org: # huwiki:
wikidata.org: #
wikidata.org: # itwiki:
wikidata.org: # T7545
wikidata.org: #
wikidata.org: # jawiki
wikidata.org: # T7239
wikidata.org: # nowiki
wikidata.org: # T13432
wikidata.org: #
wikidata.org: # plwiki
wikidata.org: # T10067
wikidata.org: #
wikidata.org: # ptwiki:
wikidata.org: # T7394
wikidata.org: #
wikidata.org: # rowiki:
wikidata.org: # T14546
wikidata.org: #
wikidata.org: # ruwiki:
wikidata.org: #
wikidata.org: # svwiki:
wikidata.org: # T12229
wikidata.org: # T13291
wikidata.org: #
wikidata.org: # zhwiki:
wikidata.org: # T7104
wikidata.org: #
wikidata.org: # sister projects
wikidata.org: #
wikidata.org: # enwikinews:
wikidata.org: # T7340
wikidata.org: #
wikidata.org: # itwikinews
wikidata.org: # T11138
wikidata.org: #
wikidata.org: # enwikiquote:
wikidata.org: # T17095
wikidata.org: #
wikidata.org: # enwikibooks
wikidata.org: #
wikidata.org: # working...
wikidata.org: #
wikidata.org: #
wikidata.org: #
wikidata.org: #----------------------------------------------------------#
wikidata.org: #
wikidata.org: #
wikidata.org: #
wikidata.org: # Lines here will be added to the global robots.txt or at least to http://www.wikidata.org/robots.txt
wikidata.org: #</syntaxhighlight>
claroshop.com: #Google Search Engine Robot
claroshop.com: #Yahoo! Search Engine Robot
claroshop.com: #Yandex Search Engine Robot
claroshop.com: #Microsoft Search Engine Robot
claroshop.com: #Twitter Search Engine Robot
claroshop.com: # Every bot that might possibly read and respect this file.
claroshop.com: # the protocol of the sitemap.
money.it: # robots.txt
money.it: # @url: https://www.money.it
money.it: # @generator: SPIP 3.1.8 [23955]
money.it: # @template: money2017/robots.txt.html
motorimagazine.it: #wp stuff
motorimagazine.it: #files
sayidaty.net: #
sayidaty.net: # robots.txt
sayidaty.net: #
sayidaty.net: # This file is to prevent the crawling and indexing of certain parts
sayidaty.net: # of your site by web crawlers and spiders run by sites like Yahoo!
sayidaty.net: # and Google. By telling these "robots" where not to go on your site,
sayidaty.net: # you save bandwidth and server resources.
sayidaty.net: #
sayidaty.net: # This file will be ignored unless it is at the root of your host:
sayidaty.net: # Used: http://example.com/robots.txt
sayidaty.net: # Ignored: http://example.com/site/robots.txt
sayidaty.net: #
sayidaty.net: # For more information about the robots.txt standard, see:
sayidaty.net: # http://www.robotstxt.org/robotstxt.html
sayidaty.net: # CSS, JS, Images
sayidaty.net: # Directories
sayidaty.net: # Files
sayidaty.net: # Paths (clean URLs)
sayidaty.net: # Paths (no clean URLs)
rayamarketing.com: # www.robotstxt.org/
rayamarketing.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
rayamarketing.com: #Sitemap: https://rayamarketing.com/sitemap.xml
travelboutiqueonline.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
travelboutiqueonline.com: #content{margin:0 0 0 2%;position:relative;}
wdsz.org: #
wdsz.org: # robots.txt for PHPWind
wdsz.org: # Version 8.7
wdsz.org: #
degiro.nl: # For sitemaps.xml autodiscovery. Uncomment if you have one:
containerstore.com: #best-selling-solutions .grid-parent, #design-tools .grid-parent, #components .grid-parent, #limited-time-savings .grid-parent, #trending-now .grid-parent, #tips-inspiration .grid-parent {
containerstore.com: #best-selling-solutions, #design-tools, #trending-now {
containerstore.com: #tips-inspiration {
containerstore.com: #limited-time-savings .bem-padding-bottom-40, #trending-now .bem-padding-bottom-40 {
containerstore.com: #best-selling-solutions .ht-closing_cta, #components .ht-closing_cta {
containerstore.com: #best-selling-solutions .ht-closing_cta_link, #components .ht-closing_cta_link{
containerstore.com: #design-tools .ht-tile_overlay {
containerstore.com: #design-tools .ht-tile_label {
containerstore.com: #elfa_sale_header {
containerstore.com: #design-tools .ht-tile_label {
containerstore.com: #elfa_sale_header {
tiffany.com: # These hides directories from search engines
finder.com: # Prevent crawling searches.
finder.com: # https://finder.atlassian.net/browse/CWS-452
finder.com: # Prevent crawling additional searches.
finder.com: # https://finder.atlassian.net/browse/CWS-497
finder.com: # Allow Twitterbot to crawl anything
finder.com: # https://finder.atlassian.net/browse/OPS-915
finder.com: # Block pages from appearing in Google News
myupchar.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
myupchar.com: #
myupchar.com: # To ban all spiders from the entire site uncomment the next two lines:
myupchar.com: #Baiduspider
myupchar.com: #Yandex
eonlineads.com: # Blocks all robots except google and disallow specific folders / directories
buyandship.co.jp: # Date: Wed, 24 Feb 2021 11:21:01 GMT
broadcom.com: # robots.txt for Broadcom.com 11/4/2020
arabicpost.net: #WP Import Export Rule
fr.de: # robots.txt www.fr.de
actcorp.in: # CSS, JS, Images
actcorp.in: # Directories
actcorp.in: # Files
actcorp.in: # Paths (clean URLs)
actcorp.in: # Paths (no clean URLs)
computerhoy.com: #
computerhoy.com: # robots.txt
computerhoy.com: #
computerhoy.com: # This file is to prevent the crawling and indexing of certain parts
computerhoy.com: # of your site by web crawlers and spiders run by sites like Yahoo!
computerhoy.com: # and Google. By telling these "robots" where not to go on your site,
computerhoy.com: # you save bandwidth and server resources.
computerhoy.com: #
computerhoy.com: # This file will be ignored unless it is at the root of your host:
computerhoy.com: # Used: http://example.com/robots.txt
computerhoy.com: # Ignored: http://example.com/site/robots.txt
computerhoy.com: #
computerhoy.com: # For more information about the robots.txt standard, see:
computerhoy.com: # http://www.robotstxt.org/robotstxt.html
computerhoy.com: # Files
computerhoy.com: # Paths (clean URLs)
computerhoy.com: # Paths (no clean URLs)
computerhoy.com: # Paths (url errors)
computerhoy.com: # Sitemaps
woocommerce.com: # Sitemap archive
mtggoldfish.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
mtggoldfish.com: #
mtggoldfish.com: # To ban all spiders from the entire site uncomment the next two lines:
advanceautoparts.com: #tagline {
meraki.com: # www.robotstxt.org/
meraki.com: # support.google.com/webmasters/answer/6062608
pushauction.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
pushauction.com: #content{margin:0 0 0 2%;position:relative;}
glassdoor.sg: # Singapore
glassdoor.sg: # Greetings, human beings!,
glassdoor.sg: #
glassdoor.sg: # If you're sniffing around this file, and you're not a robot, we're looking to meet curious folks such as yourself.
glassdoor.sg: #
glassdoor.sg: # Think you have what it takes to join the best white-hat SEO growth hackers on the planet, and help improve the way people everywhere find jobs?
glassdoor.sg: #
glassdoor.sg: # Run - don't crawl - to apply to join Glassdoor's SEO team here http://jobs.glassdoor.com
glassdoor.sg: #
glassdoor.sg: #
glassdoor.sg: #logging related
glassdoor.sg: # Blocking track urls (ACQ-2468)
glassdoor.sg: #Blocking non standard job view and job search URLs, and paginated job SERP URLs (TRFC-2831)
glassdoor.sg: # Blocking bots from crawling DoubleClick for Publisher and Google Analytics related URL's (which aren't real URL's)
glassdoor.sg: # TRFC-4037 Block page from being indexed
glassdoor.sg: #
glassdoor.sg: # Note that this file has the extension '.text' rather than the more-standard '.txt'
glassdoor.sg: # to keep it from being pre-compiled as a servlet. (*.txt files are precompiled, and
glassdoor.sg: # there doesn't seem to be a way to turn this off.)
glassdoor.sg: #
iamrohit.in: # Block NextGenSearchBot
iamrohit.in: # Block ia-archiver from crawling site
iamrohit.in: # Block archive.org_bot from crawling site
iamrohit.in: # Block Archive.org Bot from crawling site
iamrohit.in: # Block LinkWalker from crawling site
iamrohit.in: # Block GigaBlast Spider from crawling site
iamrohit.in: # Block ia_archiver-web.archive.org_bot from crawling site
iamrohit.in: # Block PicScout Crawler from crawling site
iamrohit.in: # Block BLEXBot Crawler from crawling site
iamrohit.in: # Block TinEye from crawling site
iamrohit.in: # Block SEOkicks
iamrohit.in: # Block BlexBot
iamrohit.in: # Block SISTRIX
iamrohit.in: # Block Uptime robot
iamrohit.in: # Block Ezooms Robot
iamrohit.in: # Block netEstate NE Crawler (+http://www.website-datenbank.de/)
iamrohit.in: # Block WiseGuys Robot
iamrohit.in: # Block Turnitin Robot
iamrohit.in: # Block Heritrix
iamrohit.in: # Block pricepi
iamrohit.in: # Block Eniro
iamrohit.in: # Block Psbot
iamrohit.in: # Block Youdao
iamrohit.in: # BLEXBot
iamrohit.in: # Block NaverBot
iamrohit.in: # Block ZBot
iamrohit.in: # Block Vagabondo
iamrohit.in: # Block LinkWalker
iamrohit.in: # Block SimplePie
iamrohit.in: # Block Wget
iamrohit.in: # Block Pixray-Seeker
iamrohit.in: # Block BoardReader
iamrohit.in: # Block Quantify
iamrohit.in: # Block Plukkie
iamrohit.in: # Block Cuam
iamrohit.in: # https://megaindex.com/crawler
shameless.com: #Disallow: /?*
shameless.com: #Disallow: /videos/*/?*
shameless.com: #Disallow: /models/?*
shameless.com: #Disallow: /models/*/?*
shameless.com: #Disallow: /models/*/*/?*
shameless.com: #Disallow: /categories/?*
shameless.com: #Disallow: /categories/*/?*
shameless.com: #Disallow: /categories/*/*/?*
shameless.com: #Disallow: /tags/?*
shameless.com: #Disallow: /tags/*/?*
shameless.com: #Disallow: /tags/*/*/?*
hardware.fr: # Fichier robots.txt du site HardWare.fr
usf.edu: ###############################
usf.edu: # robots.txt for USF.edu
usf.edu: ###############################
usf.edu: # list folders robots are not allowed to index
usf.edu: #
usf.edu: # list specific files robots are not allowed to index
usf.edu: #
usf.edu: #
usf.edu: #
usf.edu: #
usf.edu: # allow twitter to fetch images for news
usf.edu: #
usf.edu: #
usf.edu: # End of robots.txt file
usf.edu: #
usf.edu: ###############################
epochconverter.com: # Robots.txt for EpochConverter.com
olx.bg: # sitecode:olxbg-desktop
delltechnologies.com: # directory exclusion used for dellemc.com - AEM file
delltechnologies.com: # NOTE all paths must begin with "/*/" in order to apply to all AEM locales automatically
delltechnologies.com: # paths below this line are old and probably invalid for dellemc.com, leaving them for reference only
univ-lyon1.fr: #bandeau_flash{background:#ffffff;}
univ-lyon1.fr: #bandeau_flash_contenu{color:#000000;}
wsb.pl: #
wsb.pl: # robots.txt
wsb.pl: #
wsb.pl: # This file is to prevent the crawling and indexing of certain parts
wsb.pl: # of your site by web crawlers and spiders run by sites like Yahoo!
wsb.pl: # and Google. By telling these "robots" where not to go on your site,
wsb.pl: # you save bandwidth and server resources.
wsb.pl: #
wsb.pl: # This file will be ignored unless it is at the root of your host:
wsb.pl: # Used: http://example.com/robots.txt
wsb.pl: # Ignored: http://example.com/site/robots.txt
wsb.pl: #
wsb.pl: # For more information about the robots.txt standard, see:
wsb.pl: # http://www.robotstxt.org/robotstxt.html
wsb.pl: # CSS, JS, Images
wsb.pl: # Directories
wsb.pl: # Files
wsb.pl: # Paths (clean URLs)
wsb.pl: # Paths (no clean URLs)
edunet.bh: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
edunet.bh: #content{margin:0 0 0 2%;position:relative;}
yourator.co: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
yourator.co: #
yourator.co: # To ban all spiders from the entire site uncomment the next two lines:
yourator.co: # User-agent: *
yourator.co: # Disallow: /
gu-global.com: #footer {
gu-global.com: #footer ul li {
gu-global.com: #footer ul.menu li {
gu-global.com: #gu-footer {
gu-global.com: #gu-footer .uq-footer-innner {
gu-global.com: #gu-footer ul.menu {
gu-global.com: #dynamic-footer.sp {
gu-global.com: #dynamic-footer .footer_tax_bnr {
gu-global.com: #dynamic-footer .footer_tax_bnr div {
gu-global.com: #dynamic-footer .footer_tax_bnr div > span {
gu-global.com: #dynamic-footer .footer_tax_bnr div > span span {
gu-global.com: #dynamic-footer .footer-in {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li a,
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li span {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li svg {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li .world-gu {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li .world-gu:after {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li .active:after {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul li {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul li a {
gu-global.com: #dynamic-footer .footer-in .footer-menu ul li ul li:last-child {
gu-global.com: #dynamic-footer .footer-in .sns-area {
gu-global.com: #dynamic-footer .footer-in .sns-area .sns-title {
gu-global.com: #dynamic-footer .footer-in .sns-area ul {
gu-global.com: #dynamic-footer .footer-in .sns-area ul li {
gu-global.com: #dynamic-footer .footer-in .copyright {
gu-global.com: #dynamic-footer .footer-in .footer_tax_bnr img {
gu-global.com: #dynamic-footer.pc {
gu-global.com: #dynamic-footer footer {
gu-global.com: #dynamic-footer footer .footer-in {
gu-global.com: #dynamic-footer footer .footer-in nav {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li::after {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li:last-child::after {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-top li a {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dt {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd ul li::after {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd ul li:last-child::after {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-nav-in .footer-nav-bottom dd ul li a {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-logo {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-logo a {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-logo a:first-child {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-copyright {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-copyright a {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-logo-area .footer-copyright svg {
gu-global.com: #dynamic-footer footer .footer-in nav .footer-copyright .footer-copyright-in .footer-sns ul li {
gu-global.com: #dynamic-footer .footer_tax_bnr {
gu-global.com: #dynamic-footer .footer_tax_bnr div {
gu-global.com: #pagetop {
appllio.com: # Directories
appllio.com: # Files
appllio.com: # Paths (clean URLs)
appllio.com: # Paths (no clean URLs)
sante.fr: #
sante.fr: # robots.txt
sante.fr: #
sante.fr: # This file is to prevent the crawling and indexing of certain parts
sante.fr: # of your site by web crawlers and spiders run by sites like Yahoo!
sante.fr: # and Google. By telling these "robots" where not to go on your site,
sante.fr: # you save bandwidth and server resources.
sante.fr: #
sante.fr: # This file will be ignored unless it is at the root of your host:
sante.fr: # Used: http://example.com/robots.txt
sante.fr: # Ignored: http://example.com/site/robots.txt
sante.fr: #
sante.fr: # For more information about the robots.txt standard, see:
sante.fr: # http://www.robotstxt.org/robotstxt.html
sante.fr: # CSS, JS, Images
sante.fr: # Directories
sante.fr: # Files
sante.fr: # Paths (clean URLs)
sante.fr: # Paths (no clean URLs)
aut.ac.ir: # robots.txt for https://aut.ac.ir
anu.edu.au: #
anu.edu.au: # robots.txt
anu.edu.au: #
anu.edu.au: # This file is to prevent the crawling and indexing of certain parts
anu.edu.au: # of your site by web crawlers and spiders run by sites like Yahoo!
anu.edu.au: # and Google. By telling these "robots" where not to go on your site,
anu.edu.au: # you save bandwidth and server resources.
anu.edu.au: #
anu.edu.au: # This file will be ignored unless it is at the root of your host:
anu.edu.au: # Used: http://example.com/robots.txt
anu.edu.au: # Ignored: http://example.com/site/robots.txt
anu.edu.au: #
anu.edu.au: # For more information about the robots.txt standard, see:
anu.edu.au: # http://www.robotstxt.org/robotstxt.html
anu.edu.au: #
anu.edu.au: # For syntax checking, see:
anu.edu.au: # http://www.frobee.com/robots-txt-check
anu.edu.au: # Directories
anu.edu.au: # Files
anu.edu.au: # Paths (clean URLs)
anu.edu.au: # Paths (no clean URLs)
anu.edu.au: # Legacy gateway paths
anu.edu.au: # Change of Preference
anu.edu.au: # Spam prevention
feeder.co: # Wieeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee üí©
editorx.com: # by editorx.com
softonic.ru: # ES
softonic.ru: # BR
softonic.ru: # DE
softonic.ru: # NL
softonic.ru: # EN,JP
softonic.ru: # FR
softonic.ru: # IT
softonic.ru: # PL
softonic.ru: #SHARED
softonic.ru: # CATEGORIES
softonic.ru: # EN
softonic.ru: # ES
softonic.ru: # DE
softonic.ru: # FR
softonic.ru: # BR
softonic.ru: # IT
softonic.ru: # PL
softonic.ru: # NL
softonic.ru: # JP
cookinglight.com: # Sitemaps
cookinglight.com: # CMS FE
cookinglight.com: #Content
cookinglight.com: # CMS FE
cookinglight.com: #Content
thevc.kr: # Notice: The use of robots or other automated means to access The VC without
thevc.kr: # the express permission of The VC is strictly prohibited.
thevc.kr: # The VC may, in its discretion, permit certain automated access to certain The VC pages,
thevc.kr: # for the limited purpose of including content in approved publicly available search engines.
costco.com.tw: # For all robots
costco.com.tw: # Block access to specific groups of pages
costco.com.tw: # Allow search crawlers to discover the sitemap
costco.com.tw: # Block CazoodleBot as it does not present correct accept content headers
costco.com.tw: # Block MJ12bot as it is just noise
costco.com.tw: # Block dotbot as it cannot parse base urls properly
costco.com.tw: # Block Gigabot
keilhub.com: # we use Shopify as our ecommerce platform
keilhub.com: # Google adsbot ignores robots.txt unless specifically named!
zbozi.cz: ## 8888888888P 888 d8b
zbozi.cz: ## d88P 888 Y8P
zbozi.cz: ## d88P 888
zbozi.cz: ## d88P 88888b. .d88b. 88888888 888 .d8888b 88888888
zbozi.cz: ## d88P 888 "88b d88""88b d88P 888 d88P" d88P
zbozi.cz: ## d88P 888 888 888 888 d88P 888 888 d88P
zbozi.cz: ## d88P 888 d88P Y88..88P d88P 888 d8b Y88b. d88P
zbozi.cz: ## d8888888888 88888P" "Y88P" 88888888 888 Y8P "Y8888P 88888888
zbozi.cz: ## ###############################################################
zbozi.cz: ## Disallow clicks to shops
zbozi.cz: ## Disallow result pages - search results
zbozi.cz: ## multichoice in left menu
zbozi.cz: ## other
zbozi.cz: ## Disallow pages with 2 and more parameters - seznambot
zbozi.cz: ## kvuli pravidlu Disallow: /*?*&*&*&*
zbozi.cz: ## Disallow pages with specific parameters
zbozi.cz: ## Disallow pages that may directly set/postpone rating (intended for e-mail clickthrus)
zbozi.cz: # Disallow all searchScreen
zbozi.cz: ## nesmyslne rozsahy
zbozi.cz: ## location + parameters
zbozi.cz: ## bordel v URL
zbozi.cz: ## Disallow clicks to shops.
zbozi.cz: ## other
zbozi.cz: ## old pages in Google index
zbozi.cz: ## Disallow pages with 4 and more parameters
zbozi.cz: ## Disallow pages with specific parameters
zbozi.cz: ## product fotogaleries
zbozi.cz: ## multichoice in left menu
zbozi.cz: ## search in categories
zbozi.cz: ## do not index sorting in categories (except first page)
zbozi.cz: ## kvuli pravidlu Disallow: /*?*&*&*&*
zbozi.cz: ## Disallow pages that may directly set/postpone rating (intended for e-mail clickthrus)
zbozi.cz: ## nesmyslne rozsahy
zbozi.cz: ## location + parmeters
zbozi.cz: ## bordel v URL
zbozi.cz: # /?0=%5Bobject%20Object%5D&vyrobce=tommy-hilfiger
zbozi.cz: # 2019-08-03
zbozi.cz: # /?LOCK%20HLE7203S=true&barva=cerna&vyrobce=desigual
zbozi.cz: # 2020-08-06 - fun with sitemaps
zbozi.cz: # 2020-12-02 - Prolinkovani bug
zbozi.cz: # /?amp;barva=modra&barva=bezova&vyrobce=guess
zbozi.cz: # /?amp%3Bstrana=10&barva=zluta&vyrobce=grosso
zbozi.cz: # invalid parameters in /vyrobek/
zbozi.cz: ## Disallow clicks to shops.
zbozi.cz: ## other
zbozi.cz: ## old pages in Google index
zbozi.cz: ## Disallow pages with 4 and more parameters
zbozi.cz: ## Disallow pages with specific parameters
zbozi.cz: ## product fotogaleries
zbozi.cz: ## multichoice in left menu
zbozi.cz: ## search in categories
zbozi.cz: ## do not index sorting in categories (except first page)
zbozi.cz: ## kvuli pravidlu Disallow: /*?*&*&*&*
zbozi.cz: ## Disallow pages that may directly set/postpone rating (intended for e-mail clickthrus)
zbozi.cz: ## nesmyslne rozsahy
zbozi.cz: ## location + parmeters
zbozi.cz: ## bordel v URL
zbozi.cz: # /?0=%5Bobject%20Object%5D&vyrobce=tommy-hilfiger
zbozi.cz: # 2019-08-03
zbozi.cz: # /?LOCK%20HLE7203S=true&barva=cerna&vyrobce=desigual
zbozi.cz: # 2020-08-06 - fun with sitemaps
zbozi.cz: # 2020-12-02 - Prolinkovani bug
zbozi.cz: # /?amp;barva=modra&barva=bezova&vyrobce=guess
zbozi.cz: # /?amp%3Bstrana=10&barva=zluta&vyrobce=grosso
zbozi.cz: # invalid parameters in /vyrobek/
xjzsks.com: #
xjzsks.com: # robots.txt for baidu
xjzsks.com: #
blackbaud.com: # Allow MOZ to crawl everything
blackbaud.com: # Update the path to the file(s) and remove this comment when the site goes live
blackbaud.com: # Allow Siteimprove to access the site while in development
blackbaud.com: # Allow Siteimprove to access the site while in development
blackbaud.com: # Allow Siteimprove to access the site while in development
bankofindia.co.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
bankofindia.co.in: #content{margin:0 0 0 2%;position:relative;}
slate.fr: # CSS, JS, Images
slate.fr: # Directories
slate.fr: # Files
slate.fr: # Paths (clean URLs)
slate.fr: # Paths (no clean URLs)
pakbcn.live: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
pakbcn.live: #content{margin:0 0 0 2%;position:relative;}
sears.com.mx: #Google Search Engine Robot
sears.com.mx: #Yahoo! Search Engine Robot
sears.com.mx: #Yandex Search Engine Robot
sears.com.mx: #Microsoft Search Engine Robot
sears.com.mx: #Twitter Search Engine Robot
sears.com.mx: # Every bot that might possibly read and respect this file.
sears.com.mx: # Wait 1 second between successive requests.
chattanoogastate.edu: #
chattanoogastate.edu: # robots.txt
chattanoogastate.edu: #
chattanoogastate.edu: # This file is to prevent the crawling and indexing of certain parts
chattanoogastate.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
chattanoogastate.edu: # and Google. By telling these "robots" where not to go on your site,
chattanoogastate.edu: # you save bandwidth and server resources.
chattanoogastate.edu: #
chattanoogastate.edu: # This file will be ignored unless it is at the root of your host:
chattanoogastate.edu: # Used: http://example.com/robots.txt
chattanoogastate.edu: # Ignored: http://example.com/site/robots.txt
chattanoogastate.edu: #
chattanoogastate.edu: # For more information about the robots.txt standard, see:
chattanoogastate.edu: # http://www.robotstxt.org/robotstxt.html
chattanoogastate.edu: # CSS, JS, Images
chattanoogastate.edu: # Directories
chattanoogastate.edu: # Files
chattanoogastate.edu: # Paths (clean URLs)
chattanoogastate.edu: # Paths (no clean URLs)
userbenchmark.com: # UserBenchmark Robot.txt
userbenchmark.com: #######################
aast.edu: # robots.txt
aast.edu: #
aast.edu: # This file is to prevent the crawling and indexing of certain parts
aast.edu: # of your site by web crawlers and spiders run by sites like Yahoo!
aast.edu: # and Google. By telling these "robots" where not to go on your site.
aast.edu: # Allow: /
aast.edu: #
videobin.co: #User-agent: *
videobin.co: #Disallow: /
joinindianarmy.nic.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
joinindianarmy.nic.in: #content{margin:0 0 0 2%;position:relative;}
express.com: #Disallow: /twitter-share-submit.jsp
nrttv.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
nrttv.com: #content{margin:0 0 0 2%;position:relative;}
jimmyjazz.com: # we use Shopify as our ecommerce platform
jimmyjazz.com: # Google adsbot ignores robots.txt unless specifically named!
reforge.com: # Squarespace Robots Txt
dailyguides.com: ## Default robots.txt
rakuten.ca: # /robots.txt file for https://www.rakuten.ca/
9to5google.com: # Sitemap archive
missoma.com: # we use Shopify as our ecommerce platform
missoma.com: # Google adsbot ignores robots.txt unless specifically named!
autofarm.network: # https://www.robotstxt.org/robotstxt.html
geekwire.com: ### Version Information #
geekwire.com: ###################################################
geekwire.com: ### Version: V4.2020.04.2051
geekwire.com: ### Updated: Wed Apr 22 22:38:14 SAST 2020
geekwire.com: ### Bad Bot Count: 571
geekwire.com: ###################################################
geekwire.com: ### Version Information ##
moat.com: # http://i.imgur.com/aj9eiII.jpg
bigcommerce.com: # robots.txt for https://www.bigcommerce.com/
bigcommerce.com: # Directories
bigcommerce.com: # Files
bigcommerce.com: # Paths (clean URLs)
bigcommerce.com: # Paths (Don't index any unclean paths)
xn--b1aew.xn--p1ai: # robots.txt for xn--b1aew.xn--p1ai
casetify.com: # robotstxt.org/
casetify.com: # Disallow: /*invite/
casetify.com: # Disallow: /*showcase/
casetify.com: # Disallow: /*controllers*
casetify.com: # Disallow: /*layout_template*
casetify.com: # Disallow: /*redirect*
casetify.com: # Allow Google Ad Spiders
wevrlabs.net: #Begin Attracta SEO Tools Sitemap. Do not remove
wevrlabs.net: #End Attracta SEO Tools Sitemap. Do not remove
mediaalpha.com: # Default Flywheel robots file
reddress.com: # we use Shopify as our ecommerce platform
reddress.com: # Google adsbot ignores robots.txt unless specifically named!
su.edu.sa: #
su.edu.sa: # robots.txt
su.edu.sa: #
su.edu.sa: # This file is to prevent the crawling and indexing of certain parts
su.edu.sa: # of your site by web crawlers and spiders run by sites like Yahoo!
su.edu.sa: # and Google. By telling these "robots" where not to go on your site,
su.edu.sa: # you save bandwidth and server resources.
su.edu.sa: #
su.edu.sa: # This file will be ignored unless it is at the root of your host:
su.edu.sa: # Used: http://example.com/robots.txt
su.edu.sa: # Ignored: http://example.com/site/robots.txt
su.edu.sa: #
su.edu.sa: # For more information about the robots.txt standard, see:
su.edu.sa: # http://www.robotstxt.org/robotstxt.html
su.edu.sa: # CSS, JS, Images
su.edu.sa: # Directories
su.edu.sa: # Files
su.edu.sa: # Paths (clean URLs)
su.edu.sa: # Paths (no clean URLs)
allbirds.com: # we use Shopify as our ecommerce platform
allbirds.com: # Google adsbot ignores robots.txt unless specifically named!
kmart.com: # 20190428
kmart.com: # www.kmart.com
kmart.com: #Lumen #18359173
kmart.com: # Category
kmart.com: # Product
kmart.com: #Sitemap: https://www.kmart.com/Sitemap_Index_Product_MP_1.xml
kmart.com: # Misc
kmart.com: #Images
kmart.com: #Sitemap: https://www.kmart.com/Sitemap_Index_Image_1.xml
kmart.com: #Sitemap: https://www.kmart.com/Sitemap_Index_Image_MP_1.xml
touchofmodern.com: # www.robotstxt.org/
touchofmodern.com: # http://code.google.com/web/controlcrawlindex/
netgalley.com: # www.robotstxt.org/
netgalley.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
anibis.ch: #prevent crawling of all locations (catchall)
anibis.ch: #include localized listing overview pages
anibis.ch: #allow crawling of specific locations (exceptions)
thedrum.com: # www.robotstxt.org/
thedrum.com: # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
thedrum.com: # Legacy disallow statements
thedrum.com: # Directories
thedrum.com: # Files
thedrum.com: # Paths (clean URLs)
thedrum.com: # Paths (no clean URLs)
consorsbank.de: # Consorsbank robots.txt
consorsbank.de: # Sitemap: https://www.cortalconsors.de/content/dam/cortalconsors_de_cc/system/sitemap/sitemap.xml
superkopilka.com: #Sitemap: https://www.superkopilka.com/sitemap.xml.gz
oilprice.com: # robots.txt
oilprice.com: #User-agent: Mediapartners-Google
oilprice.com: #Disallow:
oilprice.com: #
oilprice.com: # Original rules for old site
oilprice.com: #mobile usability problems
oilprice.com: #Urgent SEO issues - please confirm receipt 15/02/2017
qiniu.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
qiniu.com: #
qiniu.com: # To ban all spiders from the entire site uncomment the next two lines:
qiniu.com: # User-agent: *
qiniu.com: # Disallow: /
olacity.com: # vestacp autogenerated robots.txt
thriveagency.com: # Default Flywheel robots file
credit-suisse.com: # /robots.txt file for www.credit-suisse.com
credit-suisse.com: # Sitemap file
cntraveler.com: #disallow /user/ as there are incoming links going to pages within the /user/ directory that can't be accessed.
vistazo.com: #
vistazo.com: # robots.txt
vistazo.com: #
vistazo.com: # This file is to prevent the crawling and indexing of certain parts
vistazo.com: # of your site by web crawlers and spiders run by sites like Yahoo!
vistazo.com: # and Google. By telling these "robots" where not to go on your site,
vistazo.com: # you save bandwidth and server resources.
vistazo.com: #
vistazo.com: # This file will be ignored unless it is at the root of your host:
vistazo.com: # Used: http://example.com/robots.txt
vistazo.com: # Ignored: http://example.com/site/robots.txt
vistazo.com: #
vistazo.com: # For more information about the robots.txt standard, see:
vistazo.com: # http://www.robotstxt.org/robotstxt.html
vistazo.com: # Directories
vistazo.com: # Files
vistazo.com: # Paths (clean URLs)
vistazo.com: # Paths (no clean URLs)
twenty20.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
twenty20.com: #
twenty20.com: # To ban all spiders from the entire site uncomment the next two lines:
twenty20.com: # User-Agent: *
twenty20.com: # Disallow: /
twenty20.com: # 2012-12-05: Prevent crawlers from triggering checkout flow errors.
yeepay.com: #Sitemap files
bikroy.com: # Sitemap
bikroy.com: # Excludes
bikroy.com: # Blog
bikroy.com: # Promotions
bikroy.com: # msn
fashionnetwork.com: # To be purged into Google's cache
fashionnetwork.com: # Ajax URIs
fashionnetwork.com: # Fragments
fashionnetwork.com: # Yandex : bloquer toutes les URL de news
firstdata.com: # For domain: https://www.firstdata.com
firstdata.com: # Last updated: 06/11/2020
uvic.ca: # Production only
uvic.ca: # Do not put child site rules in here.
uvic.ca: # Since this file is publicly readable, do not put in URLs that contain sensitive information and are not locked down with proper access controls.
lipscosme.com: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
lipscosme.com: #
lipscosme.com: # To ban all spiders from the entire site uncomment the next two lines:
nissan.co.jp: #container {width: 1200px; margin: 0 auto;}
nissan.co.jp: #head {margin-bottom: 80px;}
nissan.co.jp: #rogo {margin-left: 6.5px;}
nissan.co.jp: #foot {width: 1200px; height: 30px; border-top: 1px solid #999999; margin:120px auto 0; text-align: right; font-size: 10px; padding: 5px;}
comcast.net: # Comcast
comcast.net: # robots.txt for http://www.comcast.net
comcast.net: # Modified on 1/25/2017
comcast.net: # Hosted on the Edge
thenewsminute.com: #
thenewsminute.com: # robots.txt
thenewsminute.com: #
thenewsminute.com: # This file is to prevent the crawling and indexing of certain parts
thenewsminute.com: # of your site by web crawlers and spiders run by sites like Yahoo!
thenewsminute.com: # and Google. By telling these "robots" where not to go on your site,
thenewsminute.com: # you save bandwidth and server resources.
thenewsminute.com: #
thenewsminute.com: # This file will be ignored unless it is at the root of your host:
thenewsminute.com: # Used: http://example.com/robots.txt
thenewsminute.com: # Ignored: http://example.com/site/robots.txt
thenewsminute.com: #
thenewsminute.com: # For more information about the robots.txt standard, see:
thenewsminute.com: # http://www.robotstxt.org/robotstxt.html
thenewsminute.com: # CSS, JS, Images
thenewsminute.com: # Directories
thenewsminute.com: # Files
thenewsminute.com: # Paths (clean URLs)
thenewsminute.com: # Paths (no clean URLs)
hotwire.com: #Disallow register/logout pages
hotwire.com: #Disallow legacy account & email pages
hotwire.com: #Disallow checkout pages
hotwire.com: #Disallow dynamic legacy deals
hotwire.com: #Disallow legacy results
hotwire.com: #HCORE-2775 Disallow things to do
hotwire.com: #Flight LOB specific rules
hotwire.com: #Car LOB specific rules
hotwire.com: #Hotel LOB specific rules
hotwire.com: #Script Specific Rules
hotwire.com: #Disallow ?chkin Hotel Infosite page links
hotwire.com: #Disallow deals page segment specific paths
hotwire.com: #HCORE-2880 Disallow request-cancellation path
salaire-brut-en-net.fr: #Disallow: /wp-includes/
salaire-brut-en-net.fr: #Disallow: /wp-content/languages/
salaire-brut-en-net.fr: #Disallow: /wp-content/plugins/
salaire-brut-en-net.fr: #Disallow: /wp-content/themes/
salaire-brut-en-net.fr: #Disallow: /wp-content/upgrade/
salaire-brut-en-net.fr: #Disallow: /wp-
salaire-brut-en-net.fr: #Disallow: /wp-content/
salaire-brut-en-net.fr: #Allow: /wp-content/uploads/
carsguide.com.au: #
carsguide.com.au: # robots.txt
carsguide.com.au: #
carsguide.com.au: # This file is to prevent the crawling and indexing of certain parts
carsguide.com.au: # of your site by web crawlers and spiders run by sites like Yahoo!
carsguide.com.au: # and Google. By telling these "robots" where not to go on your site,
carsguide.com.au: # you save bandwidth and server resources.
carsguide.com.au: #
carsguide.com.au: # This file will be ignored unless it is at the root of your host:
carsguide.com.au: # Used: http://example.com/robots.txt
carsguide.com.au: # Ignored: http://example.com/site/robots.txt
carsguide.com.au: #
carsguide.com.au: # For more information about the robots.txt standard, see:
carsguide.com.au: # http://www.robotstxt.org/robotstxt.html
carsguide.com.au: # CSS, JS, Images
carsguide.com.au: # Directories
carsguide.com.au: # Files
carsguide.com.au: # Paths (clean URLs)
carsguide.com.au: # Paths (no clean URLs)
vans.com: # robots.txt for https://www.vans.com
vans.com: # __ __ _ _ _ ___ ___ ___ __ __
vans.com: # \ \ / / /_\ | \| | / __| / __| / _ \ | \/ |
vans.com: # \ V / / _ \ | .` | \__ \ _ | (__ | (_) | | |\/| |
vans.com: # \_/ /_/ \_\ |_|\_| |___/ (_) \___| \___/ |_| |_|
vans.com: #
vans.com: #
vans.com: # || ___ ___ ___
vans.com: # / _ \ | __| | __|
vans.com: # | (_) | | _| | _|
vans.com: # \___/ |_| |_|
vans.com: #
vans.com: # _____ _ _ ___
vans.com: # |_ _| | || | | __|
vans.com: # | | | __ | | _|
vans.com: # |_| |_||_| |___|
vans.com: #
vans.com: # ___ ___ _ __ __ _ ||
vans.com: # / __| | _ \ /_\ \ \ / / | |
vans.com: # | (__ | / / _ \ \ \/\/ / | |__
vans.com: # \___| |_|_\ /_/ \_\ \_/\_/ |____|
vans.com: #
vans.com: #
vans.com: # O-\-<]:
beckershospitalreview.com: # Slow down bing
cpnl.cat: # robots.txt for https://www.cpnl.cat
segurcaixaadeslas.es: #Especialidades
radio.garden: # https://www.robotstxt.org/robotstxt.html
worldofwarships.asia: # General
worldofwarships.asia: # News
worldofwarships.asia: # Media
papaki.com: #
papaki.com: # robots.txt
papaki.com: #
papaki.com: # This file is to prevent the crawling and indexing of certain parts
papaki.com: # of your site by web crawlers and spiders run by sites like Yahoo!
papaki.com: # and Google. By telling these "robots" where not to go on your site,
papaki.com: # you save bandwidth and server resources.
papaki.com: #
papaki.com: # This file will be ignored unless it is at the root of your host:
papaki.com: # Used: http://example.com/robots.txt
papaki.com: # Ignored: http://example.com/site/robots.txt
papaki.com: #
papaki.com: # For more information about the robots.txt standard, see:
papaki.com: # http://www.robotstxt.org/robotstxt.html
papaki.com: # CSS, JS, Images
papaki.com: # Directories
papaki.com: # Files
papaki.com: # Paths (clean URLs)
papaki.com: # Paths (no clean URLs)
papaki.com: # Dynamic Directories
papaki.com: # Disallow 20210216
papaki.com: # Dynamic Files
contentkingapp.com: # Don't watch our robots.txt, watch yours instead!
contentkingapp.com: # Monitor and keep track of changes to your robots.txt with ContentKing.
contentkingapp.com: # Start your free trial at https://www.contentkingapp.com/#onboarding-url
ctee.com.tw: ## Custom made disallows 21/07/2016 - 08:52
ctee.com.tw: # from https://benza.es/robots.txt
ctee.com.tw: # Open Link Profiler
ctee.com.tw: # Mozilla/5.0+(compatible;+spbot/4.4.2;++http://OpenLinkProfiler.org/bot+)
ctee.com.tw: # http://OpenLinkProfiler.org/bot
ctee.com.tw: # SemrushBot
ctee.com.tw: # Mozilla/5.0 (compatible; SemrushBot/1.1~bl; +http://www.semrush.com/bot.html)
ctee.com.tw: # http://www.semrush.com/bot.html
ctee.com.tw: # DotBot
ctee.com.tw: # Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)
ctee.com.tw: # http://www.opensiteexplorer.org/dotbot
ctee.com.tw: # AhrefsBot
ctee.com.tw: # Mozilla/5.0 (compatible; AhrefsBot/5.1; +http://ahrefs.com/robot/)
ctee.com.tw: # http://ahrefs.com/robot
ctee.com.tw: # MJ12bot
ctee.com.tw: # Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)
ctee.com.tw: # http://www.majestic12.co.uk/bot.php
ctee.com.tw: # MojeekBot
ctee.com.tw: # Mozilla/5.0 (compatible; MojeekBot/0.6; +https://www.mojeek.com/bot.html)
ctee.com.tw: # https://www.mojeek.com/bot.html
ctee.com.tw: # YandexImages
ctee.com.tw: # Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)
ctee.com.tw: # http://yandex.com/bots
ctee.com.tw: # Shareaholicbot
ctee.com.tw: # Mozilla/5.0 (compatible; Shareaholicbot/1.0; +http://www.shareaholic.com/bot)
ctee.com.tw: # http://www.shareaholic.com
ctee.com.tw: # Baiduspider
ctee.com.tw: # Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
ctee.com.tw: # http://www.baidu.com/search/spider.html
ctee.com.tw: #User-agent: baiduspider
ctee.com.tw: #Disallow: /
ctee.com.tw: # BLEXBot
ctee.com.tw: # Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
ctee.com.tw: # http://webmeup-crawler.com 136.243.36.100
ctee.com.tw: ## END Custom made
grab.com: #Sitemaps
grab.com: #Duplicated URLs
mindbodygreen.com: # allow
elgato.com: #
elgato.com: # robots.txt
elgato.com: #
elgato.com: # This file is to prevent the crawling and indexing of certain parts
elgato.com: # of your site by web crawlers and spiders run by sites like Yahoo!
elgato.com: # and Google. By telling these "robots" where not to go on your site,
elgato.com: # you save bandwidth and server resources.
elgato.com: #
elgato.com: # This file will be ignored unless it is at the root of your host:
elgato.com: # Used: http://example.com/robots.txt
elgato.com: # Ignored: http://example.com/site/robots.txt
elgato.com: #
elgato.com: # For more information about the robots.txt standard, see:
elgato.com: # http://www.robotstxt.org/robotstxt.html
elgato.com: # CSS, JS, Images
elgato.com: # Directories
elgato.com: # Files
elgato.com: # Paths (clean URLs)
elgato.com: # Paths (no clean URLs)
mdcomputers.in: # === Lightning code start
mdcomputers.in: # === Lightning code end
ivanontech.com: # This robots.txt allows indexing of all site paths.
ivanontech.com: # See http://www.robotstxt.org/robotstxt.html for more information.
ilan.gov.tr: # Crawlers
ilan.gov.tr: # Sitemap Files
ilan.gov.tr: # Sitemap: https://www.ilan.gov.tr/sitemap/daily.xml
ilan.gov.tr: # Sitemap: https://www.ilan.gov.tr/sitemap/ads.xml
socialbookmarkingmentor.com: # 1) this filename (robots.txt) must stay lowercase
socialbookmarkingmentor.com: # 2) this file must be in the servers root directory
socialbookmarkingmentor.com: # ex: http://www.mydomain.com/pliklisubfolder/ -- you must move the robots.txt from
socialbookmarkingmentor.com: # /pliklisubfolder/ to the root folder for http://www.mydomain.com/
socialbookmarkingmentor.com: # you must then add your subfolder to each 'Disallow' below
socialbookmarkingmentor.com: # ex: Disallow: /cache/ becomes Disallow: /pliklisubfolder/cache/
nav.com: # Custom disallow rules
fcc.gov: #
fcc.gov: # robots.txt
fcc.gov: #
fcc.gov: # This file is to prevent the crawling and indexing of certain parts
fcc.gov: # of your site by web crawlers and spiders run by sites like Yahoo!
fcc.gov: # and Google. By telling these "robots" where not to go on your site,
fcc.gov: # you save bandwidth and server resources.
fcc.gov: #
fcc.gov: # This file will be ignored unless it is at the root of your host:
fcc.gov: # Used: http://example.com/robots.txt
fcc.gov: # Ignored: http://example.com/site/robots.txt
fcc.gov: #
fcc.gov: # For more information about the robots.txt standard, see:
fcc.gov: # http://www.robotstxt.org/robotstxt.html
fcc.gov: # CSS, JS, Images
fcc.gov: # Directories
fcc.gov: # Files
fcc.gov: # Paths (clean URLs)
fcc.gov: # Paths (no clean URLs)
moe.gov.my: # If the Joomla site is installed within a folder
moe.gov.my: # eg www.example.com/joomla/ then the robots.txt file
moe.gov.my: # MUST be moved to the site root
moe.gov.my: # eg www.example.com/robots.txt
moe.gov.my: # AND the joomla folder name MUST be prefixed to all of the
moe.gov.my: # paths.
moe.gov.my: # eg the Disallow rule for the /administrator/ folder MUST
moe.gov.my: # be changed to read
moe.gov.my: # Disallow: /joomla/administrator/
moe.gov.my: #
moe.gov.my: # For more information about the robots.txt standard, see:
moe.gov.my: # http://www.robotstxt.org/orig.html
moe.gov.my: #
moe.gov.my: # For syntax checking, see:
moe.gov.my: # http://tool.motoricerca.info/robots-checker.phtml
tipranks.com: # robotstxt.org
careerride.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
careerride.com: #content{margin:0 0 0 2%;position:relative;}
bancaynegocios.com: ####### Disallow: /
nationwide.com: # For domain: https://www.nationwide.com/
nationwide.com: # Last updated: 8/26/2020
nxp.com.cn: #
nxp.com.cn: # robots.txt for http://www.w3.org/
nxp.com.cn: #
nxp.com.cn: # $Id: robots.txt,v 1.22 2002/04/18 20:23:04 ted Exp $
nxp.com.cn: #
downyi.com: #
downyi.com: # robots.txt for www.downyi.com
downyi.com: #
hxnews.com: #
hxnews.com: # robots.txt for hxnews
hxnews.com: #
linuxprobe.com: # Forum regulated by the Ministry of Public Security.
linuxprobe.com: # Do not attempt to approach the invasion . thanks .
wifi.id: # https://www.robotstxt.org/robotstxt.html
punjabkesari.in: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
punjabkesari.in: #content{margin:0 0 0 2%;position:relative;}
mybroadband.co.za: # Disallow: /*.html?print$
pagomiscuentas.com: #
pagomiscuentas.com: # robots.txt
pagomiscuentas.com: #
pagomiscuentas.com: # This file is to prevent the crawling and indexing of certain parts
pagomiscuentas.com: # of your site by web crawlers and spiders run by sites like Yahoo!
pagomiscuentas.com: # and Google. By telling these "robots" where not to go on your site,
pagomiscuentas.com: # you save bandwidth and server resources.
pagomiscuentas.com: #
pagomiscuentas.com: # This file will be ignored unless it is at the root of your host:
pagomiscuentas.com: # Used: http://example.com/robots.txt
pagomiscuentas.com: # Ignored: http://example.com/site/robots.txt
pagomiscuentas.com: #
pagomiscuentas.com: # For more information about the robots.txt standard, see:
pagomiscuentas.com: # http://www.robotstxt.org/robotstxt.html
pagomiscuentas.com: # CSS, JS, Images
pagomiscuentas.com: # Directories
pagomiscuentas.com: # Files
pagomiscuentas.com: # Paths (clean URLs)
pagomiscuentas.com: # Paths (no clean URLs)
kuantokusta.pt: # File robots.txt
kuantokusta.pt: # FULL access (Google Adsense)
kuantokusta.pt: # SITEMAP FILES
namepros.com: # vim: ft=robots
namepros.com: # Mediapartners-Google and AdsBot-Google ignore wildcard groups.
namepros.com: ##
namepros.com: # Inaccessible by crawlers because of authentication requirements
namepros.com: ##
namepros.com: # Require login
namepros.com: # Require POST
namepros.com: # Require login often enough to cause problems
namepros.com: # Transient pages
namepros.com: ##
namepros.com: # Only intended for real, authentic, genuine homo sapiens
namepros.com: ##
namepros.com: ##
namepros.com: # Internal stuff.
namepros.com: ##
namepros.com: # If you're a hacker, go straight to /internal/ because it's obviously the most vulnerable.
namepros.com: # That's where we hide our ion cannon. (We have cake!)
namepros.com: ##
namepros.com: # Bots that ignore *
namepros.com: ##
namepros.com: ##
namepros.com: # Less-than-intelligent bots that can't properly parse constructs such as rel="nofollow", X-Robots-Tag, or base tags.
namepros.com: # They also tend not to deduplicate URLs (so if a link to /misc/style appears on every page, they'll crawl it once for each page).
namepros.com: # If your crawler appears here, it's probably snowballing, resulting in a ridiculous number of requests.
namepros.com: ##
namepros.com: # THE CAKE IS A LIE
namepros.com: # THE CAKE IS A LIE
namepros.com: # THE CAKE IS^D
appointy.com: # Block Uptime robot
linebiz.com: #
linebiz.com: # robots.txt
linebiz.com: #
linebiz.com: # This file is to prevent the crawling and indexing of certain parts
linebiz.com: # of your site by web crawlers and spiders run by sites like Yahoo!
linebiz.com: # and Google. By telling these "robots" where not to go on your site,
linebiz.com: # you save bandwidth and server resources.
linebiz.com: #
linebiz.com: # This file will be ignored unless it is at the root of your host:
linebiz.com: # Used: http://example.com/robots.txt
linebiz.com: # Ignored: http://example.com/site/robots.txt
linebiz.com: #
linebiz.com: # For more information about the robots.txt standard, see:
linebiz.com: # http://www.robotstxt.org/robotstxt.html
linebiz.com: # CSS, JS, Images
linebiz.com: # Directories
linebiz.com: # Files
linebiz.com: # Paths (clean URLs)
linebiz.com: # Paths (no clean URLs)
vitalydesign.com: # we use Shopify as our ecommerce platform
vitalydesign.com: # Google adsbot ignores robots.txt unless specifically named!
fontspring.com: # Whitelisted user-agents are allowed
fontspring.com: #disallows language and tag pages with pagination
tori.fi: # It is expressly forbidden to use spiders or other
tori.fi: # automated methods to access tori.fi. Only if tori.fi
tori.fi: # has given special permit such access is allowed.
tori.fi: ## Archive.org
tori.fi: ## theTradeDesk
tori.fi: ## Common list for most search engines
kaidee.com: #Last update 10/02/2021
kaidee.com: #Disallow all of parameters
kaidee.com: # Disable 'en' pages while still POC
dujiza.com: #
dujiza.com: # robots.txt for EmpireCMS
dujiza.com: #
edudisk.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
edudisk.cn: #content{margin:0 0 0 2%;position:relative;}
feizui.com: #
feizui.com: # robots.txt for EmpireCMS
feizui.com: #
game773.com: # Robots.txt file from http://www.game773.com
game773.com: # All robots will spider the domain
gujarat1.wordpress.com: # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
gujarat1.wordpress.com: # Please see https://developer.wordpress.com/docs/firehose/ for more details.
gujarat1.wordpress.com: # This file was generated on Mon, 13 Apr 2020 06:47:57 +0000
headphoneclub.com: #
headphoneclub.com: # robots.txt for Discuz! X3
headphoneclub.com: #
idtag.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
idtag.cn: #content{margin:0 0 0 2%;position:relative;}
iemate.com: #
iemate.com: # robots.txt for EmpireCMS
iemate.com: #
itstrike.cn: #
itstrike.cn: # robots.txt for EmpireCMS
itstrike.cn: #
jiaoyizhe.com: #
jiaoyizhe.com: # robots.txt for Discuz! X3
jiaoyizhe.com: #
jobbaidu.com: #
jobbaidu.com: # robots.txt for jobbaidu.com
jobbaidu.com: #
josmith1845.wordpress.com: # This file was generated on Thu, 11 Feb 2021 13:28:17 +0000
kumi.cn: # robots.txt generated at http://tool.chinaz.com/robots/
laawoo.com: #mask_div {background:none repeat scroll 0 0 #000000; left:0; opacity:0.1;filter:Alpha(Opacity=10); -moz-opacity:0.1; position:absolute; top:0;}
laawoo.com: #research_protocols {border:0;position:absolute;background:transparent none repeat scroll 0 0;}
lwlm.com: #
lwlm.com: # robots.txt for iwms
lwlm.com: #
maxviewrealty.com: #
maxviewrealty.com: # robots.txt for EmpireCMS
maxviewrealty.com: # Version 6.0
maxviewrealty.com: #
mozest.com: #
mozest.com: # robots.txt for EmpireCMS
mozest.com: #
nanjixiong.com: #
nanjixiong.com: # robots.txt for Discuz! X3
nanjixiong.com: #
north-plus.net: #
north-plus.net: # robots.txt for PHPWIND BOARD
north-plus.net: # Version 7.x
north-plus.net: #
openedu.com.cn: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
openedu.com.cn: #content{margin:0 0 0 2%;position:relative;}
qumei.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
qumei.com: #content{margin:0 0 0 2%;position:relative;}
seed-china.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
seed-china.com: #content{margin:0 0 0 2%;position:relative;}
swiper.com.cn: #wrap{
swiper.com.cn: #wrap a{
swiper.com.cn: #wrap a:hover{background-color:#3f92f0;}
szcw.cn: # robots.txt generated at http://tool.chinaz.com/robots/
tongyi.com: #container {
toutiao.io: # See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
toutiao.io: #
toutiao.io: # To ban all spiders from the entire site uncomment the next two lines:
toutiao.io: # User-agent: *
toutiao.io: # Disallow: /
uzzf.com: #topNav,#footer,#page,#container{width:960px;display:block;margin:0 auto;clear:both;}
uzzf.com: #new_menu li { height:32px; font-size:16px; color:#438a32; font-family:'∫⁄ÃÂ'; line-height:32px; text-align:center; margin-bottom:3px; background: url(/skin/gr/images/class_menu.gif) no-repeat 0 -32px; letter-spacing:8px;cursor:pointer;}
uzzf.com: #new_menu li a { text-decoration:none; color:#438a32; padding-left:8px;}
uzzf.com: #new_menu .active, #new_menu .hover {background-position:0 0;}
uzzf.com: #new_menu .active a, #new_menu .hover a {color:#fff;}
wzaobao.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
wzaobao.com: #content{margin:0 0 0 2%;position:relative;}
xiangrikui.com: # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
xiangrikui.com: #
xiangrikui.com: # To ban all spiders from the entire site uncomment the next two lines:
xlobo.com: #header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
xlobo.com: #content{margin:0 0 0 2%;position:relative;}
xq0757.com: #
xq0757.com: # robots.txt for PHPWind
xq0757.com: # Version 8.7
xq0757.com: #
yousheng8.com: #page{width:910px; padding:20px 20px 40px 20px; margin-top:80px;}
sitel.com.mk: #
sitel.com.mk: # robots.txt
sitel.com.mk: #
sitel.com.mk: # This file is to prevent the crawling and indexing of certain parts
sitel.com.mk: # of your site by web crawlers and spiders run by sites like Yahoo!
sitel.com.mk: # and Google. By telling these "robots" where not to go on your site,
sitel.com.mk: # you save bandwidth and server resources.
sitel.com.mk: #
sitel.com.mk: # This file will be ignored unless it is at the root of your host:
sitel.com.mk: # Used: http://example.com/robots.txt
sitel.com.mk: # Ignored: http://example.com/site/robots.txt
sitel.com.mk: #
sitel.com.mk: # For more information about the robots.txt standard, see:
sitel.com.mk: # http://www.robotstxt.org/robotstxt.html
sitel.com.mk: # CSS, JS, Images
sitel.com.mk: # Directories
sitel.com.mk: # Files
sitel.com.mk: # Paths (clean URLs)
sitel.com.mk: # Paths (no clean URLs)
strato.de: # robots.txt file
strato.de: # fuer www.strato.de
splunk.com: # Splunk Documentation
splunk.com: # Splunk Documentation
iamtxt.com: #
iamtxt.com: # robots.txt for EmpireCMS
iamtxt.com: #
mediaocean.com: #
mediaocean.com: # robots.txt
mediaocean.com: #
mediaocean.com: # This file is to prevent the crawling and indexing of certain parts
mediaocean.com: # of your site by web crawlers and spiders run by sites like Yahoo!
mediaocean.com: # and Google. By telling these "robots" where not to go on your site,
mediaocean.com: # you save bandwidth and server resources.
mediaocean.com: #
mediaocean.com: # This file will be ignored unless it is at the root of your host:
mediaocean.com: # Used: http://example.com/robots.txt
mediaocean.com: # Ignored: http://example.com/site/robots.txt
mediaocean.com: #
mediaocean.com: # For more information about the robots.txt standard, see:
mediaocean.com: # http://www.robotstxt.org/robotstxt.html
mediaocean.com: # CSS, JS, Images
mediaocean.com: # Directories
mediaocean.com: # Files
mediaocean.com: # Paths (clean URLs)
mediaocean.com: # Paths (no clean URLs)
emvolio.gov.gr: #
emvolio.gov.gr: # robots.txt
emvolio.gov.gr: #
emvolio.gov.gr: # This file is to prevent the crawling and indexing of certain parts
emvolio.gov.gr: # of your site by web crawlers and spiders run by sites like Yahoo!
emvolio.gov.gr: # and Google. By telling these "robots" where not to go on your site,
emvolio.gov.gr: # you save bandwidth and server resources.
emvolio.gov.gr: #
emvolio.gov.gr: # This file will be ignored unless it is at the root of your host:
emvolio.gov.gr: # Used: http://example.com/robots.txt
emvolio.gov.gr: # Ignored: http://example.com/site/robots.txt
emvolio.gov.gr: #
emvolio.gov.gr: # For more information about the robots.txt standard, see:
emvolio.gov.gr: # http://www.robotstxt.org/robotstxt.html
emvolio.gov.gr: # CSS, JS, Images
emvolio.gov.gr: # Directories
emvolio.gov.gr: # Files
emvolio.gov.gr: # Paths (clean URLs)
emvolio.gov.gr: # Paths (no clean URLs)
dof.gob.mx: #
sas.com: #
sas.com: # robots.txt file for www.sas.com
sas.com: #
babycare.nl: #
babycare.nl: # ____ _ _
babycare.nl: # | _ \ | | | |
babycare.nl: # | |_) | __ _| |__ _ _ ___ __ _ _ __ ___ _ __ | |
babycare.nl: # | _ < / _` | '_ \| | | |/ __/ _` | '__/ _ \ | '_ \| |
babycare.nl: # | |_) | (_| | |_) | |_| | (_| (_| | | | __/_| | | | |
babycare.nl: # |____/ \__,_|_.__/ \__, |\___\__,_|_| \___(_)_| |_|_|
babycare.nl: # __/ |
babycare.nl: # |___/
forloveandlemons.com: # we use Shopify as our ecommerce platform
forloveandlemons.com: # Google adsbot ignores robots.txt unless specifically named!
belk.com: #Non-Canonical Parameters
belk.com: #URLs
belk.com: #Pagination
belk.com: #OSS
belk.com: #Shop by Brand
belk.com: #Clearance
pu.go.id: #SigapMembangunNegeri</a></h4>
revenuquebec.ca: # robots.txt pour Revenu Québec
revenuquebec.ca: # Only allow URLs generated with RealURL
revenuquebec.ca: # L=0 is the default language
revenuquebec.ca: # Should always be protected (.htaccess)
revenuquebec.ca: # sitemap
dainikamadershomoy.com: # www.robotstxt.org/
dainikamadershomoy.co
@softplus
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment