Skip to content

Instantly share code, notes, and snippets.

@macloo
Created April 1, 2019 13:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save macloo/9591f9108ffbdadd597448245cab23ea to your computer and use it in GitHub Desktop.
Save macloo/9591f9108ffbdadd597448245cab23ea to your computer and use it in GitHub Desktop.
For Sarah April 2019 - part 2
from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import csv
driver = webdriver.Chrome('/Users/mcadams/Documents/python/scraping2019/chromedriver')
# testing the 'C' page only
driver.get('https://www.usa.gov/federal-agencies/c')
# pause because page is slow to load
time.sleep(5)
html = driver.page_source
bs = BeautifulSoup(html, "html5lib")
# close automated chrome
driver.quit()
# get all a elements and test by printing
letter_list = bs.find( 'ul', {'class':'one_column_bullet'} )
letter_urls = letter_list.find_all('a')
print(len(letter_urls))
print(letter_urls[0])
print(letter_urls[12])
print( letter_urls[len(letter_urls) -1] )
@macloo
Copy link
Author

macloo commented Apr 1, 2019

Again, not getting hrefs, but see how this code is almost identical to the previous script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment