Created
March 31, 2019 21:31
-
-
Save macloo/1bf3d7c56cd278647e8603e6439af80a to your computer and use it in GitHub Desktop.
For Madison March 2019
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
from bs4 import BeautifulSoup | |
url = "https://www.myfloridahouse.gov/Sections/Representatives/representatives.aspx" | |
hdr = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36', | |
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', | |
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', | |
'Accept-Encoding': 'none', | |
'Accept-Language': 'en-US,en;q=0.8', | |
'Connection': 'keep-alive'} | |
session = requests.Session() | |
req = session.get(url, headers=hdr) | |
bs = BeautifulSoup(req.text, "html5lib") | |
reps = bs.find_all('div', {'class' : 'team-box'}) | |
# print(len(reps)) | |
# create new empty file | |
newfile = open('list_of_urls.txt', 'w') | |
for rep in reps: | |
link = rep.find('a') | |
if 'href' in link.attrs: | |
newfile.write(link.attrs['href'] + '\n') | |
newfile.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Based on these examples:
Using headers w/o Selenium -
https://github.com/macloo/python-beginners/tree/master/web_scraping/more-from-mitchell#sending-http-headers-in-your-script
Using the Requests library (pip-install it) -
https://github.com/REMitchell/python-scraping/blob/master/v1/chapter12/1-headers.py
Write into a new text file -
https://github.com/macloo/python-beginners/blob/master/week03/copy_into_new_file.py