Skip to content

Instantly share code, notes, and snippets.

@AlJohri
Created January 16, 2019 05:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AlJohri/f006ff5914608723b3889da6dc59a949 to your computer and use it in GitHub Desktop.
Save AlJohri/f006ff5914608723b3889da6dc59a949 to your computer and use it in GitHub Desktop.
import requests
from bs4 import BeautifulSoup
url = 'https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/presidential-candidates-debates-1960-2016'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for tr in soup.select('.field-body > table > tbody > tr > td > table > tbody > tr > td > table > tbody > tr'):
columns = tr.select('td')
if len(columns) == 1: continue
elif any(x.get('colspan') or x.get('rowspan') or x.select('img') for x in columns): continue
elif len(columns) == 3:
year, date, url = columns
elif len(columns) == 2:
date, url = columns
date = date.text
url = url.find('a').get('href')
print(date, url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment