Skip to content

Instantly share code, notes, and snippets.

@dusekdan
Created April 21, 2019 04:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dusekdan/63fedd231e10256bc47072a9839d553a to your computer and use it in GitHub Desktop.
Save dusekdan/63fedd231e10256bc47072a9839d553a to your computer and use it in GitHub Desktop.
Scrapes the Mueller report from the wired's embedded article reader.
"""
Wired Article with embeded web-reader: https://www.wired.com/story/mueller-report-russia-redacted-trump-barr-read/
How to use this script:
- Python3 & installed requests package (run: 'pip install requests')
- Create "report" folder in the same directory as this script
- run: 'python MuellerReportScraper.py'
"""
import os
import requests
url = "https://assets.documentcloud.org/documents/5955214/pages/Mueller-report-p{page_no}-large.gif"
folder = "./report"
max_page = 448
for i in range (1, max_page + 1):
try:
r = requests.get(
url.replace('{page_no}', str(i))
)
with open(
os.path.join(folder, "%s.gif" % i), 'wb'
) as f:
f.write(r.content)
print("[SUCCESS] Scraped page %s" % i)
except requests.exceptions.RequestException as e:
print("[ERROR] Downloading page %s (%s)" % (i, e))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment