Skip to content

Instantly share code, notes, and snippets.

@FlameWolf
Last active November 6, 2023 14:05
Show Gist options
  • Save FlameWolf/7def56462de516401ec817b77889f03e to your computer and use it in GitHub Desktop.
Save FlameWolf/7def56462de516401ec817b77889f03e to your computer and use it in GitHub Desktop.
Regular expression to split a Unicode string into approximate constituent graphemes
/\p{L}\p{M}?|\S|\s/gu

Usage:

"മാസങ്ങളിൽ മേടം പ്രധാനം".match(/\p{L}\p{M}?|\S|\s/gu); // ["മാ", "സ", "ങ്", "ങ", "ളി", "ൽ", " ", "മേ", "ടം", " ", "പ്", "ര", "ധാ", "നം"]
@FlameWolf
Copy link
Author

Usage:

"മാസങ്ങളിൽ മേടം പ്രധാനം".match(/\p{L}\p{M}?|\S|\s/gu); // ["മാ", "സ", "ങ്", "ങ", "ളി", "ൽ", " ", "മേ", "ടം", " ", "പ്", "ര", "ധാ", "നം"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment