Skip to content

Instantly share code, notes, and snippets.

@Manishearth
Last active April 8, 2019 21:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save Manishearth/97900bf1de47f1389e409cc030d84f2c to your computer and use it in GitHub Desktop.
Save Manishearth/97900bf1de47f1389e409cc030d84f2c to your computer and use it in GitHub Desktop.
devanagari breakdown
common stuff
-----------
Basic consonants(32):
कखगघङचछजझटठडढणतथदधनपफबभमयरलवशषसह
Weirdo that only is used in ligatures, but necessary(1)
basic standalone vowels (11):
अआइईउऊऋएऐओऔ
basic vowels (12)
ि
virama(1)
nukta(1) This is usually not present in NFC form, you can get rid of it if you want
[58]
Language-specific
---------------
Hindi:
nukta'd consonants for hindi/urdu-only (7)
can be represented as consonant + nukta, but NFC to these, so the consonant + nukta form is hard to see
क़ख़ग़ज़ड़ढ़फ़
I think you can get rid of the nukta combiner and rely on the text being NFC (or manually NFC nukta combinations)
If you need to get rid of more, removing ण, ञ, or ढ़ should be fine for most words.
Marathi (1):
Kashmiri (11):
ऎऒऄॳॴॶॷ
sanskrit (7):
ऌॠॡ
Sindhi (4):
ॻॼॾॿ
marwari(1):
Transcribing other languages
------
marathi only, for transcribing other languages (4):
ॲऑऍ
Consonants only used for transcribing other languages (5):
ऩऱऴय़ॹ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment