public
Last active

To filter invalid XML characters in python

  • Download Gist
InvalidXMLFilter.py
Python
1 2 3 4 5 6 7 8 9 10 11 12
#Based on an answer by John Machin on Stack Overflow (http://stackoverflow.com/users/84270/john-machin)
#http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python
 
def isValidXMLChar(char):
codepoint = ord(char)
return 0x20 <= codepoint <= 0xD7FF or \
codepoint in (0x9, 0xA, 0xD) or \
0xE000 <= codepoint <= 0xFFFD or \
0x10000 <= codepoint <= 0x10FFFF
 
def filterInvalidXMLChars(input):
return filter(isValidXMLChar, input)

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.