Skip to content

Instantly share code, notes, and snippets.

Last active December 22, 2022 00:13
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save andjc/d9a90d2f89588a76bc1d5ed1f028858c to your computer and use it in GitHub Desktop.
titlecasing: using pyicu, or to_title(), a wrapper to python's inbuilt method str.title().
from icu import Locale, UnicodeString
# loc = Locale.createCanonical("haw_US")
loc = Locale("haw_US")
s1 = "ʻōlelo hawaiʻi"
s2 = "oude ijssel "
import regex as re
def to_title(s, hyphens=False):
def slice_group(grp):
if[0] == "ʻ":
return[0] +[1].upper() +[2:].lower()
return[0].upper() +[1:].lower()
pattern = r"[ʻ]?[\p{Alphabetic}]+([\-'·:\uA789\u2019]?[\p{Alphabetic}\p{Mn}\p{Mc}])+" if hyphens else r"[ʻ]?[\p{Alphabetic}]+(['·:\uA789\u2019]?[\p{Alphabetic}\p{Mn}\p{Mc}])+"
regexPattern = re.compile(pattern, re.I)
return regexPattern.sub(lambda grp: slice_group(grp), s)
s1 = "ʻōlelo hawaiʻi"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment