Skip to content

Instantly share code, notes, and snippets.

@MicBrain
Last active October 7, 2016 20:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MicBrain/0fafd9620eaaae41b6dffe59ced40783 to your computer and use it in GitHub Desktop.
Save MicBrain/0fafd9620eaaae41b6dffe59ced40783 to your computer and use it in GitHub Desktop.
The general structure of Web Robots used for controling search engine spiders
# The file should be written in the ASCII format. Put the file in your root directory.
# If you want to exclude all web spiders from the entire server, you can use this:
User-agent: *
Disallow: /
# If you want the web spider to access everything you can use:
User-agent: *
Disallow:
# If you don't want web spiders to access directory1 and directory2, you can use this:
User-agent: *
Disallow: /directory1/
Disallow: /directory2/
# If you don't want Google to collect images from your website, use this:
User-agent: Googlebot-Image
Disallow: /
# If you don't want web spiders to access the particular file, you can use this:
User-agent: Googlebot-Image
Disallow: /images/particular_file.jpg
# If you don't want web spiders to access file1 and file2 from directory1, you can use this:
User-agent: *
Disallow: /~directory1/file1html
Disallow: /~directory1/file2.html
# If you want to exclude certain search engine use this:
User-agent: BadSearchEngine
Disallow: /
# If you want to allow only one search engine to access your content, then use this:
User-agent: GoodSearchEngine
Disallow:
User-agent: *
Disallow: /
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment