Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
devanagari breakdown
common stuff
-----------
Basic consonants(32):
कखगघङचछजझटठडढणतथदधनपफबभमयरलवशषसह
Weirdo that only is used in ligatures, but necessary(1)
basic standalone vowels (11):
अआइईउऊऋएऐओऔ
basic vowels (12)
ि
virama(1)
nukta(1) This is usually not present in NFC form, you can get rid of it if you want
[58]
Language-specific
---------------
Hindi:
nukta'd consonants for hindi/urdu-only (7)
can be represented as consonant + nukta, but NFC to these, so the consonant + nukta form is hard to see
क़ख़ग़ज़ड़ढ़फ़
I think you can get rid of the nukta combiner and rely on the text being NFC (or manually NFC nukta combinations)
If you need to get rid of more, removing ण, ञ, or ढ़ should be fine for most words.
Marathi (1):
Kashmiri (11):
ऎऒऄॳॴॶॷ
sanskrit (7):
ऌॠॡ
Sindhi (4):
ॻॼॾॿ
marwari(1):
Transcribing other languages
------
marathi only, for transcribing other languages (4):
ॲऑऍ
Consonants only used for transcribing other languages (5):
ऩऱऴय़ॹ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment