Skip to content

Instantly share code, notes, and snippets.

@dpk dpk/gist:5694265

Created Jun 2, 2013
What would you like to do?
Python: iterate over the graphemes in a string.
import unicodedata as u
def itergraphemes(str):
def modifierp(char): return u.category(char)[0] == 'M'
start = 0
for end, char in enumerate(str):
if not modifierp(char) and not start == end:
yield str[start:end]
start = end
yield str[start:]

This comment has been minimized.

Copy link
Owner Author

dpk commented Feb 1, 2015

(This is broken: the definition of a 'grapheme' in Unicode is more complex than I thought. See for the actual definition — Python's unicodedata does not expose enough character data to make this work.)


This comment has been minimized.


This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.