Skip to content

Instantly share code, notes, and snippets.

@da2x
Created July 13, 2017 12:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save da2x/4afd13c2806d0175bd382f5d787bd81d to your computer and use it in GitHub Desktop.
Save da2x/4afd13c2806d0175bd382f5d787bd81d to your computer and use it in GitHub Desktop.
Extracting the text node of an XML element containing a double-escaped entety.
>>> import xml.etree.ElementTree as ETree
>>> ETree.fromstring("<test>&amp;rdquo;</test>").text
'&rdquo;'
@da2x
Copy link
Author

da2x commented Jul 13, 2017

>>> import html
>>> html.unescape("&rdquo;")
'”'

Most feed reader implementations won’t know to do that extra step at the end. Named HTML-entities are quite rare in XML, especially double escaped ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment