Skip to content

Instantly share code, notes, and snippets.

@gordonje
Last active February 19, 2020 20:48
Show Gist options
  • Save gordonje/b0417948fa6483016463d10af38a669e to your computer and use it in GitHub Desktop.
Save gordonje/b0417948fa6483016463d10af38a669e to your computer and use it in GitHub Desktop.
A scraping script that runs as a single, synchronous process.
import requests
from time import sleep
session = requests.Session()
def cache_page(identifier):
sleep(3)
url = f'https://mycourts.in.gov/PORP/Search/Detail?ID={identifier}'
r = session.get(url)
html = r.content
with open(f".cache/SearchDetail/{identifier}.html", 'wb') as file:
file.write(html)
return print(f' Cached content from {url}')
if __name__ == "__main__":
for identifier in range(1, 60000):
cache_page(identifier)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment