Skip to content

Instantly share code, notes, and snippets.

@lycantrope
Last active December 9, 2023 04:47
Show Gist options
  • Save lycantrope/1ca45a59519ab17993862165e4542985 to your computer and use it in GitHub Desktop.
Save lycantrope/1ca45a59519ab17993862165e4542985 to your computer and use it in GitHub Desktop.
An example function to unescape the xml unicode (unsafe)
import xml.parsers.expat
from io import StringIO
def unescape(s:str) -> str:
# the rest of this assumes that `s` is UTF-8
output = StringIO()
# create and initialize a parser object
p = xml.parsers.expat.ParserCreate("utf-8")
p.buffer_text = True
p.CharacterDataHandler = output.write
# parse the data wrapped in a dummy element
# (needed so the "document" is well-formed)
p.Parse("<e>", 0)
p.Parse(s, 0)
p.Parse("</e>", 1)
return output.getvalue()
assert unescape("&#x5468;&#x6770;&#x502b;") == "周杰倫", "Fail to parse xml unicode"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment