Skip to content

Instantly share code, notes, and snippets.

@aphoenix
Created March 5, 2013 18:49
Show Gist options
  • Save aphoenix/5093018 to your computer and use it in GitHub Desktop.
Save aphoenix/5093018 to your computer and use it in GitHub Desktop.
import urllib2, html5lib, re, sys
from bs4 import BeautifulSoup
page = urllib2.urlopen('http://www.ft.dk/Folketinget/searchResults.aspx?letter=ALLE&pageSize=100').read()
soup = BeautifulSoup(page)
thenames = []
for anchor in soup.findAll('a', href=re.compile('^/Folketinget/findMedlem/')):
newpage = urllib2.urlopen('http://www.ft.dk' + anchor["href"])
stew = BeautifulSoup(newpage)
print (stew.findAll('h1') + stew.findAll('a', href=re.compile('^mailto:')))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment