Cree Grapheme Stats
Count unigrams and bigrams in the Wolfart-Ahenakew nêhiyawêwin corpus!
When building a keyboard for typing Cree, it is useful to know which graphemes are typed often, and which pairs of graphemes are typed one after the other. Using unigram statistics, we can place the most frequent graphemes in the most ergonomic "neutral" positions. To speed up typing, we place frequently typed pairs on opposite sides of the keyboard, optimizing for two-handed typing with to maximize alternating hands/thumbs.
- Python 3.6+
sponge(1)from moreutils (
brew install moreutils)
nfc(1)from unormalize (
brew install eddieantonio/eddieantonio/unormalize)
. ├── Makefile ├── bigrams.pdf [output] ├── bigrams.tsv [output] ├── cleancorp.txt [input] ├── count-bigrams ├── count-unigrams ├── create-fdp ├── defuse ├── filter-out-non-sro ├── tokenize └── unigrams.tsv [output]
Note: You must download