Skip to content

Instantly share code, notes, and snippets.

@carlos-aguayo
Created February 19, 2018 21:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save carlos-aguayo/a1c750bef60843b0702eb3ee8943396e to your computer and use it in GitHub Desktop.
Save carlos-aguayo/a1c750bef60843b0702eb3ee8943396e to your computer and use it in GitHub Desktop.
Given a filename, find all http links and take a screenshot of them.
from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import requests
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.set_window_size(1280, 1600)
filename = 'NOTICE.html'
soup = BeautifulSoup(open(filename, 'rb'))
for link in soup.findAll('a'):
href = link.get('href')
if href.startswith("http"):
print "fetching: {}".format(href)
r = requests.head(href)
driver.get(href)
href = href.replace('http://', '')
href = href.replace('https://', '')
href = href.replace('/', '\\')
driver.save_screenshot('{}-{}.png'.format(r.status_code, href))
driver.quit()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment