Skip to content

Instantly share code, notes, and snippets.

@andjc
Last active March 22, 2024 01:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andjc/37c4b6564e01171bf1faaade23560137 to your computer and use it in GitHub Desktop.
Save andjc/37c4b6564e01171bf1faaade23560137 to your computer and use it in GitHub Desktop.
Rules sets for custom break iterators
{
"din": "!!quoted_literals_only; $CR = [\\p{Grapheme_Cluster_Break = CR}]; $LF = [\\p{Grapheme_Cluster_Break = LF}]; $Control = [[\\p{Grapheme_Cluster_Break = Control}]]; $Extend = [[\\p{Grapheme_Cluster_Break = Extend}]]; $ZWJ = [\\p{Grapheme_Cluster_Break = ZWJ}]; $Regional_Indicator = [\\p{Grapheme_Cluster_Break = Regional_Indicator}]; $Prepend = [\\p{Grapheme_Cluster_Break = Prepend}]; $SpacingMark = [\\p{Grapheme_Cluster_Break = SpacingMark}]; $Virama = [\\p{Gujr}\\p{sc=Telu}\\p{sc=Mlym}\\p{sc=Orya}\\p{sc=Beng}\\p{sc=Deva}&\\p{Indic_Syllabic_Category=Virama}]; $LinkingConsonant = [\\p{Gujr}\\p{sc=Telu}\\p{sc=Mlym}\\p{sc=Orya}\\p{sc=Beng}\\p{sc=Deva}&\\p{Indic_Syllabic_Category=Consonant}]; $ExtCccZwj = [[\\p{gcb=Extend}-\\p{ccc=0}] \\p{gcb=ZWJ}]; $L = [\\p{Grapheme_Cluster_Break = L}]; $V = [\\p{Grapheme_Cluster_Break = V}]; $T = [\\p{Grapheme_Cluster_Break = T}]; $LV = [\\p{Grapheme_Cluster_Break = LV}]; $LVT = [\\p{Grapheme_Cluster_Break = LVT}]; $Extended_Pict = [:ExtPict:]; !!chain; 'AA'|'Aa'|'aa'; '\u00c4\u00c4'|'\u00c4\u00e4'|'\u00e4\u00e4'; 'EE'|'Ee'|'ee'; '\u00cb\u00cb'|'\u00cb\u00eb'|'\u00eb\u00eb'; '\u0190\u0190'|'\u0190\u025b'|'\u025b\u025b'; '\u0190\u0308\u0190\u0308'|'\u0190\u0308\u025b\u0308'|'\u025b\u0308\u025b\u0308'; 'II'|'Ii'|'ii'; '\u00cf\u00cf'|'\u00cf\u00ef'|'\u00ef\u00ef'; 'OO'|'Oo'|'oo'; '\u00d6\u00d6'|'\u00d6\u00f6'|'\u00f6\u00f6'; '\u0186\u0186'|'\u0186\u0254'|'\u0254\u0254'; '\u0186\u0308\u0186\u0308'|'\u0186\u0308\u0254\u0308'|'\u0254\u0308\u0254\u0308'; 'UU'|'Uu'|'uu'; 'DH'|'Dh'|'dh'; 'NH'|'Nh'|'nh'; 'NY'|'Ny'|'ny'; 'TH'|'Th'|'th'; !!lookAheadHardBreak; $CR $LF; $L ($L | $V | $LV | $LVT); ($LV | $V) ($V | $T); ($LVT | $T) $T; [^$Control $CR $LF] ($Extend | $ZWJ); [^$Control $CR $LF] $SpacingMark; $Prepend [^$Control $CR $LF]; $LinkingConsonant $ExtCccZwj* $Virama $ExtCccZwj* $LinkingConsonant; $Extended_Pict $Extend* $ZWJ $Extended_Pict; ^$Prepend* $Regional_Indicator $Regional_Indicator / $Regional_Indicator; ^$Prepend* $Regional_Indicator $Regional_Indicator; .;"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment