Skip to content

Instantly share code, notes, and snippets.

Last active September 29, 2015 03:57
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
Scrape Wisconsin state representative bios


Python scraper to pull Wisconsin state senator and state representative district contact information and biographies into a text file or csv.

import itertools
import requests
import lxml
from lxml import html
from django.utils.encoding import smart_str, smart_unicode
#opens text file for output, names it output
file = open('output.txt', 'w')
endpoint = 99
district = 1
while district <= endpoint:
#search URL and assign to variable r
r = requests.get('' + str(district) + '&display=bio')
#create variable tree from r's content
tree = lxml.html.fromstring(r.content)
#search the tree for the given element
elements = tree.cssselect("div.indent span")
#for each element in the variable
for el in elements:
#set data to the content
data = el.text_content().strip().encode('utf-8')
#display the data
print data
#write the data to the file
district = district + 1
#close the file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment