Skip to content

Instantly share code, notes, and snippets.

@longlostnick
Last active August 10, 2017 18:18
Show Gist options
  • Save longlostnick/77c060bccd8121414d91ee2cbd2b140a to your computer and use it in GitHub Desktop.
Save longlostnick/77c060bccd8121414d91ee2cbd2b140a to your computer and use it in GitHub Desktop.
Scrape a list of urls from a file
import io
import urllib.request
opener = urllib.request.FancyURLopener({})
pages_to_scrape = []
file = open("/Users/<user>/Downloads/random_slugs.txt", "r")
pages_to_scrape = file.readlines()
for url in pages_to_scrape:
print(url)
slug = url.split('/')[-1].strip()
f = opener.open(url)
with io.open("/Users/<user>/Downloads/scraped/{0}.html".format(slug), 'w', encoding='utf8') as file:
file.write(f.read().decode('utf8'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment