Skip to content

Instantly share code, notes, and snippets.

@edsu
Created June 17, 2013 12:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edsu/5796610 to your computer and use it in GitHub Desktop.
Save edsu/5796610 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
from urllib2 import Request, urlopen
from html5lib import HTMLParser, treebuilders
response = urlopen(Request(url="http://www.bbc.co.uk/news/world-us-canada-22857062"))
html = response.read()
parser = HTMLParser(tree=treebuilders.getTreeBuilder("dom"))
dom = parser.parse(html)
print dom
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment