Overview Nayr's Japanese Core5000 Anki deck (discussion) contains pronunciations of all five thousand or so sentences in A Frequency Dictionary of Japanese by Yukio Tono, Makoto Yamazaki, and Kikuo Maekawa (2013), which contains the top five thousand words in Japanese according to the latest corpus research. I analyzed these sentences to make a histogram table of hiragana occurrences, including dipthongs like きゃ, ちょ, etc. The attached two tables show the results in modern hiragana order, and sorted order.
Technical notes I parsed a file containing those sentences (with annotated readings in hiragana, in core5k-sentences.md
) using the following script and helper file (in kana.txt
):
cp core5k-sentences.md sacrifice.md;
sed '/^$/d' kana.txt | while read i; do
echo -n $i " : " ;
sed -n "s/$i/$i\n/gp"