Last active
May 6, 2024 16:54
-
-
Save increpare/9aaf57056b857cb44a38d0ff0de9534b to your computer and use it in GitHub Desktop.
toki pona letter/syllable/word frequency statistics based on #toki-pona-taso on the ma pona pi toki pona discord server. Second file has some stats from "toki pona taso sin" on telegram.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
none of the following data is used in the other file - it's just a different data source. The track eachother pretty well though! | |
(64218 words in total) | |
li 4647 | |
mi 4143 | |
e 3597 | |
toki 2905 | |
ni 2811 | |
pona 2692 | |
a 2126 | |
ala 1996 | |
jan 1853 | |
sina 1765 | |
la 1729 | |
lon 1594 | |
sona 1483 | |
mute 1268 | |
tawa 1242 | |
pi 1169 | |
ike 1019 | |
tenpo 1006 | |
seme 973 | |
wile 914 | |
ona 905 | |
o 856 | |
kama 764 | |
taso 757 | |
ken 738 | |
pali 663 | |
nimi 663 | |
tan 660 | |
ma 636 | |
pilin 592 | |
lili 584 | |
moku 565 | |
lukin 445 | |
tomo 444 | |
ilo 433 | |
kepeken 432 | |
sitelen 411 | |
musi 408 | |
anu 348 | |
jo 325 | |
ali 321 | |
sama 318 | |
luka 318 | |
kin 311 | |
en 310 | |
ante 282 | |
pana 261 | |
ijo 258 | |
lape 256 | |
telo 253 | |
suno 252 | |
wan 229 | |
suli 228 | |
pini 228 | |
losi 224 | |
nasa 220 | |
nasin 220 | |
lipu 218 | |
nanpa 217 | |
lawa 198 | |
tu 196 | |
mani 192 | |
kalama 185 | |
kulupu 176 | |
wawa 172 | |
sin 170 | |
weka 161 | |
ale 151 | |
moli 148 | |
sike 143 | |
pakala 137 | |
soweli 130 | |
sewi 126 | |
awen 113 | |
utala 107 | |
inli 103 | |
pan 97 | |
kon 95 | |
poka 94 | |
sonja 89 | |
ko 89 | |
leko 86 | |
sijelo 86 | |
linja 85 | |
pimeja 84 | |
pu 82 | |
seli 80 | |
kute 80 | |
kasi 78 | |
jaki 75 | |
insa 73 | |
suwi 71 | |
lete 67 | |
pije 58 | |
kili 56 | |
sonko 54 | |
uta 54 | |
kiwen 50 | |
mama 50 | |
p 49 | |
open 48 | |
oko 46 | |
esun 45 | |
meli 44 | |
lupa 43 | |
poki 42 | |
wowa 39 | |
mije 39 | |
unpa 38 | |
i 37 | |
mun 36 | |
onkon 35 | |
monsuta 35 | |
olin 34 | |
len 32 | |
nijon 31 | |
namako 30 | |
palisa 30 | |
l 29 | |
pipi 29 | |
loje 29 | |
anpa 28 | |
kule 28 | |
m 28 | |
walo 27 | |
noka 27 | |
nena 27 | |
selo 26 | |
jelo 24 | |
supa 21 | |
epanja 21 | |
pata 21 | |
n 20 | |
t 19 | |
kala 19 | |
powe 19 | |
laso 19 | |
epelanto 16 | |
sinpin 15 | |
mu 14 | |
tosi 14 | |
kanse 14 | |
u 14 | |
tajo 13 | |
akesi 13 | |
aaa 12 | |
w 12 | |
k 12 | |
po 11 | |
katala 11 | |
na 11 | |
kan 11 | |
apeja 10 | |
mateli 10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
syllable frequency based on toki-pona-taso (lots of filtering, tried to remove things like usernames/words with non-tp-characters-in-them, also ended up removing anything else styled (like emphasized text) possibly) | |
li 19545 | |
na 17887 | |
mi 14485 | |
la 12106 | |
a 11507 | |
po 9428 | |
e 9250 | |
ni 8528 | |
to 8182 | |
si 7587 | |
ki 7095 | |
ta 6354 | |
pi 6301 | |
te 5773 | |
so 5671 | |
ma 5620 | |
o 5439 | |
wa 5142 | |
ka 4776 | |
lon 3881 | |
le 3767 | |
jan 3738 | |
mu 3520 | |
ke 3323 | |
i 3301 | |
wi 3268 | |
pa 3232 | |
ken 3118 | |
ten 2845 | |
mo 2693 | |
lin 2623 | |
lo 2330 | |
pe 2135 | |
ku 2078 | |
sa 1938 | |
su 1903 | |
lu 1836 | |
se 1824 | |
jo 1685 | |
kin 1552 | |
tan 1512 | |
len 1480 | |
pu 1413 | |
me 1395 | |
sin 1068 | |
ja 869 | |
an 841 | |
no 783 | |
we 778 | |
nu 698 | |
je 611 | |
tu 610 | |
in 570 | |
u 557 | |
wen 515 | |
ko 504 | |
en 483 | |
kon 389 | |
pan 353 | |
wan 335 | |
nan 289 | |
pen 227 | |
sun 192 | |
lan 147 | |
kan 147 | |
ne 139 | |
mon 139 | |
pin 86 | |
ju 77 | |
son 76 | |
mun 75 | |
un 69 | |
ti 49 | |
jon 37 | |
win 30 | |
ton 30 | |
san 29 | |
on 29 | |
wo 27 | |
pon 26 | |
ji 23 | |
man 21 | |
jen 18 | |
men 15 | |
sen 14 | |
tun 11 | |
wu 10 | |
non 9 | |
tin 7 | |
nin 7 | |
nun 3 | |
min 2 | |
kun 2 | |
jin 2 | |
pun 1 | |
nen 1 | |
won 1 | |
lun 1 | |
jun 1 | |
letter frequency: | |
a 78489 | |
i 76698 | |
n 55747 | |
l 48078 | |
o 41835 | |
e 38048 | |
m 28147 | |
t 25683 | |
p 23403 | |
k 23231 | |
s 20699 | |
u 13249 | |
w 10225 | |
j 7173 | |
words by frequency (words with frequency>10) | |
mi 12551 | |
li 11430 | |
e 8785 | |
toki 6617 | |
pona 6479 | |
ni 5753 | |
a 5231 | |
la 4715 | |
ala 4430 | |
sina 4012 | |
lon 3907 | |
jan 3736 | |
tawa 3480 | |
pi 2976 | |
sona 2949 | |
tenpo 2757 | |
ona 2741 | |
wile 2434 | |
mute 2242 | |
taso 2140 | |
o 2063 | |
kama 2041 | |
ken 2001 | |
pilin 1971 | |
nimi 1790 | |
ike 1703 | |
lili 1594 | |
tan 1476 | |
tomo 1472 | |
pali 1389 | |
ma 1361 | |
sitelen 1306 | |
kepeken 1104 | |
musi 975 | |
jo 930 | |
moku 912 | |
lukin 835 | |
sama 828 | |
telo 826 | |
lape 820 | |
seme 805 | |
kin 747 | |
ilo 734 | |
ale 733 | |
pini 729 | |
ante 722 | |
suli 703 | |
ijo 684 | |
anu 665 | |
nasa 660 | |
kulupu 646 | |
suno 635 | |
pana 566 | |
kalama 549 | |
lipu 528 | |
tu 514 | |
nasin 501 | |
sin 492 | |
pakala 482 | |
en 477 | |
wawa 448 | |
olin 419 | |
lawa 416 | |
awen 366 | |
sewi 356 | |
seli 355 | |
kon 352 | |
soweli 352 | |
weka 341 | |
mu 329 | |
wan 328 | |
inli 323 | |
ali 319 | |
lete 306 | |
sike 296 | |
nanpa 286 | |
kasi 283 | |
moli 281 | |
kute 270 | |
suwi 268 | |
utala 260 | |
pimeja 255 | |
mama 252 | |
sijelo 249 | |
pan 223 | |
luka 215 | |
uta 214 | |
open 211 | |
ko 209 | |
jaki 192 | |
kala 188 | |
pu 185 | |
insa 185 | |
esun 183 | |
kili 178 | |
poka 172 | |
mani 168 | |
len 158 | |
linja 145 | |
meli 142 | |
kiwen 129 | |
poki 119 | |
supa 110 | |
i 110 | |
kule 109 | |
kanse 103 | |
mije 101 | |
waso 100 | |
walo 96 | |
pipi 94 | |
palisa 94 | |
to 92 | |
anpa 88 | |
noka 84 | |
akesi 78 | |
loje 77 | |
mun 75 | |
nena 71 | |
ten 66 | |
unpa 66 | |
sinpin 65 | |
mewika 64 | |
selo 64 | |
aa 61 | |
monsi 58 | |
epanja 58 | |
epelanto 58 | |
jelo 57 | |
monsuta 57 | |
laso 54 | |
oko 54 | |
alasa 53 | |
kawa 49 | |
u 49 | |
lo 46 | |
s 44 | |
in 43 | |
p 42 | |
elopa 40 | |
aaa 40 | |
sonala 39 | |
me 36 | |
t 36 | |
is 36 | |
sonko 35 | |
aaaa 34 | |
losi 33 | |
noun 33 | |
lupa 32 | |
l 31 | |
tok 29 | |
sonja 27 | |
n 26 | |
pillin 26 | |
it 26 | |
k 25 | |
leni 25 | |
lanpan 25 | |
pije 25 | |
ee 24 | |
toks 24 | |
kanata 24 | |
amelika 23 | |
tosi 23 | |
majuna 22 | |
ne 22 | |
like 22 | |
aaaaa 21 | |
kipisi 21 | |
m 21 | |
ka 20 | |
nijon 20 | |
jans 20 | |
po 20 | |
w 20 | |
tempo 19 | |
naluto 19 | |
j 19 | |
lile 19 | |
iwisi 18 | |
aaaaaa 18 | |
ana 17 | |
masatuse 16 | |
nu 16 | |
wije 16 | |
elena 16 | |
an 16 | |
onkon 16 | |
waleja 15 | |
losupan 15 | |
maliku 15 | |
lasina 15 | |
leko 15 | |
anku 14 | |
nikole 14 | |
makuwe 14 | |
ejewa 14 | |
wajen 14 | |
linluwi 14 | |
oselija 13 | |
nawi 13 | |
kisi 13 | |
sumi 12 | |
pa 12 | |
teka 12 | |
namako 12 | |
te 12 | |
il 12 | |
inkepa 11 | |
kan 11 | |
apeja 11 | |
tomen 11 | |
lu 11 | |
ti 11 | |
man 11 | |
on 11 | |
ese 11 | |
pesije 11 | |
powe 11 | |
pikan 11 | |
akon 11 | |
kapesi 11 | |
oo 10 | |
lena 10 | |
naj 10 | |
juwese 10 | |
juke 10 | |
new 10 | |
misisipi 10 | |
no 10 | |
kipo 10 | |
posuka 10 | |
kepe 10 | |
jasi 10 | |
na 10 |
After trying to use this in my code and failing, I have noticed that the issue is that "ilo" and "ale" (line 164 and 165) have a space instead of a tab
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
...tonsi is not one of them... ike a :/