Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Scrape a list of urls from a file
import io
import urllib.request
opener = urllib.request.FancyURLopener({})
pages_to_scrape = []
file = open("/Users/<user>/Downloads/random_slugs.txt", "r")
pages_to_scrape = file.readlines()
for url in pages_to_scrape:
print(url)
slug = url.split('/')[-1].strip()
f = opener.open(url)
with io.open("/Users/<user>/Downloads/scraped/{0}.html".format(slug), 'w', encoding='utf8') as file:
file.write(f.read().decode('utf8'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.