Skip to content

Instantly share code, notes, and snippets.

@ladder1984
Created January 22, 2016 07:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ladder1984/0fe62194b04560768ccb to your computer and use it in GitHub Desktop.
Save ladder1984/0fe62194b04560768ccb to your computer and use it in GitHub Desktop.
过滤非BMP字符
def filter_invalid_str(text):
"""
过滤非BMP字符
"""
try:
# UCS-4
highpoints = re.compile(u'[\U00010000-\U0010ffff]')
except re.error:
# UCS-2
highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')
return highpoints.sub(u'_', text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment