Skip to content

Instantly share code, notes, and snippets.

@alephyud
Last active October 6, 2018 21:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alephyud/1f59280757c10beaecd70ecde2bc679a to your computer and use it in GitHub Desktop.
Save alephyud/1f59280757c10beaecd70ecde2bc679a to your computer and use it in GitHub Desktop.
Yiddish letter frequency calculation
א 13.6%
י 8.6%
ע 8.3%
ר 6.6%
ט 6.2%
ן 5.4%
ו 4.0%
ז 3.6%
ל 3.6%
ד 3.3%
נ 3.3%
מ 3.2%
ג 3.1%
ײ 2.8%
ס 2.6%
פ 2.5%
ב 2.4%
װ 2.3%
ש 2.3%
ך 2.2%
ק 2.1%
ה 2.1%
ױ 1.2%
צ 1.1%
כ 1.0%
ם 1.0%
ת 0.5%
ח 0.4%
ף 0.4%
ץ 0.2%
# Source text: Tevye the Milkman, https://www.cs.uky.edu/~raphael/yiddish/tevye.html;
// Take the text, normalize it and remove characters other than Yiddish letters and digraphs:
const yiText = document.body.innerText.normalize('NFD').replace(/[^ײױא-תװ]/g, '')
// Frequency list function (source link: https://stackoverflow.com/a/5668246)
function freqs(arr) {
var a = [], b = [], prev;
arr.sort();
for (var i = 0; i < arr.length; i++) {
if (arr[i] !== prev) {
a.push(arr[i]);
b.push(1);
} else {
b[b.length-1]++;
}
prev = arr[i];
}
return [a, b];
}
// Now, calculate the frequencies:
const [letters, freqs] = freqs(Array.from(yiText));
const freqsList = freqs.map((x, i) => [letters[i], x]).sort((x, y) => y[1] - x[1]);
const csv = freqsList.map(([letter, freq]) => `"${letter}",${freq/yiText.length}`).join('\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment