Skip to content

Instantly share code, notes, and snippets.

@bofm
Last active July 30, 2019 22:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bofm/d8932cf04913554d8f393ba43ef30dd5 to your computer and use it in GitHub Desktop.
Save bofm/d8932cf04913554d8f393ba43ef30dd5 to your computer and use it in GitHub Desktop.
Escape invalid XML characters in Python 3
#!/usr/bin/env python3
import sys
import re
# https://trac-hacks.org/ticket/11050#comment:13
_illegal_unichrs = ((0x00, 0x08), (0x0B, 0x1F), (0x7F, 0x84), (0x86, 0x9F),
(0xD800, 0xDFFF), (0xFDD0, 0xFDDF), (0xFFFE, 0xFFFF),
(0x1FFFE, 0x1FFFF), (0x2FFFE, 0x2FFFF),
(0x3FFFE, 0x3FFFF), (0x4FFFE, 0x4FFFF),
(0x5FFFE, 0x5FFFF), (0x6FFFE, 0x6FFFF),
(0x7FFFE, 0x7FFFF), (0x8FFFE, 0x8FFFF),
(0x9FFFE, 0x9FFFF), (0xAFFFE, 0xAFFFF),
(0xBFFFE, 0xBFFFF), (0xCFFFE, 0xCFFFF),
(0xDFFFE, 0xDFFFF), (0xEFFFE, 0xEFFFF),
(0xFFFFE, 0xFFFFF), (0x10FFFE, 0x10FFFF))
_illegal_ranges = tuple("%s-%s" % (chr(low), chr(high))
for (low, high) in _illegal_unichrs
if low < sys.maxunicode)
_illegal_xml_chars_re = re.compile('[%s]' % ''.join(_illegal_ranges))
def _escape_match(match):
return '&#%i;' % ord(match.group(0))
def escape_xml_invalid_chars(xml_text):
return _illegal_xml_chars_re.sub(_escape_match, xml_text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment