Skip to content

Instantly share code, notes, and snippets.

@gwerbin
Last active April 13, 2022 00:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gwerbin/9f6b03231d5806820ad9cba2ce950906 to your computer and use it in GitHub Desktop.
Save gwerbin/9f6b03231d5806820ad9cba2ce950906 to your computer and use it in GitHub Desktop.
Python whitespace definition
import unicodedata
def is_whitespace(c: str) -> bool:
"""Detect if a character is "whitespace".
As of 3.10, this is how CPython defines "whitespace" for string operations like `str.split`.
Sources:
* https://github.com/python/cpython/blob/v3.10.4/Objects/unicodeobject.c#L311-L340
* https://github.com/python/cpython/blob/v3.10.4/Tools/unicode/makeunicodedata.py#L420-L422=
* https://github.com/python/cpython/blob/v3.10.4/Objects/unicodetype_db.h#L6205-L6243=
"""
if len(c) != 1:
raise ValueError("Must be a length-one string.")
return (
c in {"\t", "\n", "\x0b", "\x0c", "\r", "\x1c", "\x1d", "\x1e", "\x1f", " "} or
unicodedata.bidirectional(c) in {"WS", "B", "S"} or
unicodedata.category == "Zs"
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment