Skip to content

Instantly share code, notes, and snippets.

@palewire
Created December 16, 2014 19:57
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save palewire/8b7b7745810f3a0f85b6 to your computer and use it in GitHub Desktop.
Save palewire/8b7b7745810f3a0f85b6 to your computer and use it in GitHub Desktop.
Mechanize scrape with Python
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
def extract(html):
soup = BeautifulSoup(html)
table = soup.find("table", border=1)
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(" ", "")
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
return list_of_rows
url = 'http://palewi.re/scrape/albums/2006.html'
mech = Browser()
page = mech.open(url)
html = page.read()
rows = extract(html)
page2 = mech.follow_link(text_regex="Previous")
html2 = page2.read()
rows2 = extract(html2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment