Skip to content

Instantly share code, notes, and snippets.

@scraperdragon
Created September 2, 2014 10:18
Show Gist options
  • Save scraperdragon/94296125325bc4b75be8 to your computer and use it in GitHub Desktop.
Save scraperdragon/94296125325bc4b75be8 to your computer and use it in GitHub Desktop.
HTML5 parser which may be compatible with lxml
import xml.etree.ElementTree as etree
import html5lib
def fromstring(s):
tb = html5lib.getTreeBuilder("lxml", implementation=etree)
p = html5lib.HTMLParser(tb, namespaceHTMLElements=False)
return p.parse(s)
@scraperdragon
Copy link
Author

Compatible but doesn't support make_links_absolute :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment