Skip to content

Instantly share code, notes, and snippets.

@rcalsaverini
Last active December 20, 2015 23:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rcalsaverini/6212717 to your computer and use it in GitHub Desktop.
Save rcalsaverini/6212717 to your computer and use it in GitHub Desktop.
To filter invalid XML characters in python
#Based on an answer by John Machin on Stack Overflow (http://stackoverflow.com/users/84270/john-machin)
#http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python
def isValidXMLChar(char):
codepoint = ord(char)
return 0x20 <= codepoint <= 0xD7FF or \
codepoint in (0x9, 0xA, 0xD) or \
0xE000 <= codepoint <= 0xFFFD or \
0x10000 <= codepoint <= 0x10FFFF
def filterInvalidXMLChars(input):
return filter(isValidXMLChar, input)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment