Skip to content

Instantly share code, notes, and snippets.

@henshaw
Last active June 30, 2024 15:47
Show Gist options
  • Save henshaw/aa8b68ad8b7f897c709bd0ef4fd03b48 to your computer and use it in GitHub Desktop.
Save henshaw/aa8b68ad8b7f897c709bd0ef4fd03b48 to your computer and use it in GitHub Desktop.
Disallow all pages from being trained for LLMs by all major GenAI bots
User-agent: Amazonbot
User-agent: Anthropic-ai
User-agent: Applebot-Extended
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: Cohere-ai
User-agent: DataForSeoBot
User-agent: FacebookBot
User-agent: Google-Extended
User-agent: GPTBot
User-agent: ImagesiftBot
User-agent: Magpie-crawler
User-agent: Omgili
User-agent: Omgilibot
User-agent: Peer39_crawler
User-agent: Peer39_crawler/1.0
User-agent: PerplexityBot
User-agent: YouBot
Disallow: /
@henshaw
Copy link
Author

henshaw commented Sep 29, 2023

@henshaw
Copy link
Author

henshaw commented Jun 30, 2024

Grouped the list of user-agents to shorten the code. H/T to @chapter42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment