Skip to content

Instantly share code, notes, and snippets.

@macloo
Created March 31, 2019 21:31
Show Gist options
  • Save macloo/1bf3d7c56cd278647e8603e6439af80a to your computer and use it in GitHub Desktop.
Save macloo/1bf3d7c56cd278647e8603e6439af80a to your computer and use it in GitHub Desktop.
For Madison March 2019
import requests
from bs4 import BeautifulSoup
url = "https://www.myfloridahouse.gov/Sections/Representatives/representatives.aspx"
hdr = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
session = requests.Session()
req = session.get(url, headers=hdr)
bs = BeautifulSoup(req.text, "html5lib")
reps = bs.find_all('div', {'class' : 'team-box'})
# print(len(reps))
# create new empty file
newfile = open('list_of_urls.txt', 'w')
for rep in reps:
link = rep.find('a')
if 'href' in link.attrs:
newfile.write(link.attrs['href'] + '\n')
newfile.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment