Skip to content

Instantly share code, notes, and snippets.

@gcarothers
Created December 7, 2011 23:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gcarothers/1445180 to your computer and use it in GitHub Desktop.
Save gcarothers/1445180 to your computer and use it in GitHub Desktop.
Example HTML5 UTF-8 string with errors
html_example = "41 98 BA 42 E2 98 43 E2 98 BA E2 98"
bytes_in_hex = html_example.split(" ")
html_example_bytes = ''.join((chr(int(x,16)) for x in bytes_in_hex))
# html_example_bytes.decode('utf-8') #Throws error
replaced_unicode = html_example_bytes.decode('utf-8', 'replace')
assert(u'A\ufffd\ufffdB\ufffdC\u263a\ufffd' == replaced_unicode)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment