Skip to content

Instantly share code, notes, and snippets.

@FND
Created April 24, 2009 20:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save FND/101319 to your computer and use it in GitHub Desktop.
Save FND/101319 to your computer and use it in GitHub Desktop.
# -*- coding: UTF-8 -*-
"""
test case for the following error:
UnicodeEncodeError:
'ascii' codec can't encode character u'\xf6' in position 1:
ordinal not in range(128)
"""
import html5lib
document = u"""
<html>
<body>
<p>lörem ipsüm dÅlor sät æmet</p>
</body>
</html>
"""
tree = html5lib.treebuilders.getTreeBuilder("dom")
parser = html5lib.HTMLParser(tree=tree)
doc = parser.parse(document)
el = doc.getElementsByTagName("p")[0]
text = el.firstChild
print text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment