Skip to content

Instantly share code, notes, and snippets.

@cdent
Forked from FND/UnicodeEncodeError.py
Created April 26, 2009 11:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cdent/102019 to your computer and use it in GitHub Desktop.
Save cdent/102019 to your computer and use it in GitHub Desktop.
"""
test case for the following error:
UnicodeEncodeError:
'ascii' codec can't encode character u'\xf6' in position 1:
ordinal not in range(128)
"""
import html5lib
document = u"""
<html>
<body>
<p>l\u00D6rem</p>
</body>
</html>
"""
print document.encode('utf-8')
tree = html5lib.treebuilders.getTreeBuilder("beautifulsoup")
parser = html5lib.HTMLParser(tree=tree)
doc = parser.parse(document)
el = doc.find("p")
text = el.decodeContents()
print text.encode('utf-8')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment