Skip to content

Instantly share code, notes, and snippets.

@nickrobson
Created June 24, 2016 02:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nickrobson/1decd40502717f48b6ccf7b6d69f8d19 to your computer and use it in GitHub Desktop.
Save nickrobson/1decd40502717f48b6ccf7b6d69f8d19 to your computer and use it in GitHub Desktop.
EU Referendum Scraper:
#!/usr/bin/env python
import re
import urllib2
import bs4
BASE_URL = 'http://www.bbc.co.uk/news/politics/eu_referendum/results/local/'
VOTES = re.compile(r'[0-9]+(?:,[0-9]+)*')
def get_urls():
return [BASE_URL + chr(x) for x in range(97, 123)]
for url in get_urls():
try:
content = urllib2.urlopen(url).read()
soup = bs4.BeautifulSoup(content, 'html.parser')
results = soup.find_all('div', class_='eu-ref-result-bar')
for result in results:
name = result.find('h3').get_text()
leave = result.find('div', class_='eu-ref-result-bar__party--leave')
remain = result.find('div', class_='eu-ref-result-bar__party--remain')
lvotes = leave.find('div', class_='eu-ref-result-bar__votes').get_text().strip()
rvotes = remain.find('div', class_='eu-ref-result-bar__votes').get_text().strip()
lv = VOTES.search(lvotes)
if lv:
lvotes = lv.group(0)
rv = VOTES.search(rvotes)
if rv:
rvotes = rv.group(0)
print name.strip(), '|', lvotes, '|', rvotes
except urllib2.HTTPError:
pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment