Skip to content

Instantly share code, notes, and snippets.

View aflc's full-sized avatar

Hiroyuki Tanaka aflc

  • Kanagawa, Japan
View GitHub Profile
@msmhrt
msmhrt / twitter-text.py
Created November 10, 2012 01:17
sample code using the regex module
import regex
UNICODE_SPACES = ("[" +
"\\u0009-\\u000d" + # White_Space # Cc [5] <control-0009>..<control-000D>
"\\u0020" + # White_Space # Zs SPACE
"\\u0085" + # White_Space # Cc <control-0085>
"\\u00a0" + # White_Space # Zs NO-BREAK SPACE
"\\u1680" + # White_Space # Zs OGHAM SPACE MARK
"\\u180E" + # White_Space # Zs MONGOLIAN VOWEL SEPARATOR
"\\u2000-\\u200a" + # White_Space # Zs [11] EN QUAD..HAIR SPACE