Skip to content

Instantly share code, notes, and snippets.

@edsu
Last active September 7, 2023 19:14
Show Gist options
  • Save edsu/5e70493ae02e6ccc58201cf5c1477133 to your computer and use it in GitHub Desktop.
Save edsu/5e70493ae02e6ccc58201cf5c1477133 to your computer and use it in GitHub Desktop.
Uses the Wayback Machine to show (approximately) when the New York Times started telling OpenAI to stop scraping them.
#!/bin/bash
#
# Use the Internet Archive Wayback Machine to demonstrate roughly when the
# NYTimes started blocking GPTBot.
#
# See: https://www.theverge.com/2023/8/21/23840705/new-york-times-openai-web-crawler-ai-gpt
#
wget -q -O robots-20230817.txt https://web.archive.org/web/20230817012138id_/https://www.nytimes.com/robots.txt
wget -q -O robots-20230818.txt https://web.archive.org/web/20230818012335id_/https://www.nytimes.com/robots.txt
vimdiff robots-20230817.txt robots-20230818.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment