Skip to content

Instantly share code, notes, and snippets.

@eestrada
Last active September 13, 2019 20:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eestrada/390440804116216c25f6cf397b3120eb to your computer and use it in GitHub Desktop.
Save eestrada/390440804116216c25f6cf397b3120eb to your computer and use it in GitHub Desktop.
Small gist for how to compute unicode whitespace (since it doesn't exists as a constant like `string.whitespace` does)
# This is free and unencumbered software released into the public domain
# using the Unlicense. Please refer to <http://unlicense.org/>
def _unicode_whitespace():
# Yes, it is bad practice to import modules in a function,
# but we are trying to avoid leaving garbage around once we are done generating our values.
import re
import sys
ws_re = re.compile(r'\s')
uchars = map(chr, range(sys.maxunicode+1))
uws_matches = map(ws_re.fullmatch, uchars)
uws_matches_filt = filter(None, uws_matches)
unicode_whitespace = ''.join(m.string for m in uws_matches_filt)
return unicode_whitespace
unicode_whitespace = _unicode_whitespace()
# NOTE: clean up the namespace since we don't need this anymore
del _unicode_whitespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment