Skip to content

Instantly share code, notes, and snippets.

@edsu
Last active April 22, 2022 17:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edsu/4ef1a23feb60f456d7a4fbaf4da28e7f to your computer and use it in GitHub Desktop.
Save edsu/4ef1a23feb60f456d7a4fbaf4da28e7f to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
# This is an example of seeing what unique HTML webpages there are in the
# Wayback Machine for the http://myshtetl.org/ website after 2022-03-01.
from wayback import WaybackClient
wb = WaybackClient()
pages = set()
for rec in wb.search('http://myshtetl.org/', matchType='prefix', from_date='2022-03-01'):
if 'html' in rec.mime_type and rec.status_code == 200 and rec.url not in pages:
pages.add(rec.url)
print(rec.url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment