Skip to content

Instantly share code, notes, and snippets.

@dpk
Created June 2, 2013 17:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dpk/5694265 to your computer and use it in GitHub Desktop.
Save dpk/5694265 to your computer and use it in GitHub Desktop.
Python: iterate over the graphemes in a string.
import unicodedata as u
def itergraphemes(str):
def modifierp(char): return u.category(char)[0] == 'M'
start = 0
for end, char in enumerate(str):
if not modifierp(char) and not start == end:
yield str[start:end]
start = end
yield str[start:]
@dpk
Copy link
Author

dpk commented Feb 1, 2015

(This is broken: the definition of a 'grapheme' in Unicode is more complex than I thought. See http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries for the actual definition — Python's unicodedata does not expose enough character data to make this work.)

@alanhamlett
Copy link

@johncf
Copy link

johncf commented Jun 5, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment