Skip to content

Instantly share code, notes, and snippets.

@dpk dpk/gist:5694265

Created Jun 2, 2013
Embed
What would you like to do?
Python: iterate over the graphemes in a string.
import unicodedata as u
def itergraphemes(str):
def modifierp(char): return u.category(char)[0] == 'M'
start = 0
for end, char in enumerate(str):
if not modifierp(char) and not start == end:
yield str[start:end]
start = end
yield str[start:]
@dpk

This comment has been minimized.

Copy link
Owner Author

dpk commented Feb 1, 2015

(This is broken: the definition of a 'grapheme' in Unicode is more complex than I thought. See http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries for the actual definition — Python's unicodedata does not expose enough character data to make this work.)

@alanhamlett

This comment has been minimized.

@johncf

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.