Skip to content

Instantly share code, notes, and snippets.

@dpk dpk/gist:5694265
Created Jun 2, 2013

Embed
What would you like to do?
Python: iterate over the graphemes in a string.
import unicodedata as u
def itergraphemes(str):
def modifierp(char): return u.category(char)[0] == 'M'
start = 0
for end, char in enumerate(str):
if not modifierp(char) and not start == end:
yield str[start:end]
start = end
yield str[start:]
@dpk

This comment has been minimized.

Copy link
Owner Author

commented Feb 1, 2015

(This is broken: the definition of a 'grapheme' in Unicode is more complex than I thought. See http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries for the actual definition — Python's unicodedata does not expose enough character data to make this work.)

@alanhamlett

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.