Last active
August 12, 2022 14:43
-
-
Save v6ak/625838fa2bd5d5f59420 to your computer and use it in GitHub Desktop.
Czech noun inflection Scala-like pseudocode
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
In Czech, there are various "templates" for noun inflection. (I am not completely sure if "template" is the correct term.) We divide the nouns by grammatical gender to manculines, feminines and neuters. | |
Masculines are divided to (grammaticaly) animate and (grammaticaly) inanimate. (Note that grammatical animateness may differ from the actual animateness. For example, "sněhulák" (snowman) and "umrlec" (dead man) are grammatically animate.) | |
Templates according to https://cs.wikipedia.org/wiki/%C4%8Cesk%C3%A1_podstatn%C3%A1_jm%C3%A9na#Rod_mu.C5.BEsk.C3.BD_.C5.BEivotn.C3.BD : | |
masculine, animate: | |
pán | |
muž | |
předseda | |
hajný # inflected as an adjective "mladý" | |
Jiří # inflected as an adjective "jarní" | |
soudce # Seriously, because of accusative of plural! | |
masculine, inanimate: | |
hrad | |
les # very similar inflection to "hrad", they are usualy considered as only one template | |
stroj | |
feminine: | |
žena | |
škola # very similar inflection to "žena", they are usualy considered as only one template | |
růže | |
píseň | |
kost | |
neuter: | |
město | |
moře | |
kuře | |
stavení | |
Given a grammatical gender, you can determine the template by comparing endings of some cases of singular. For feminines and neuters, it is enough to compare nominative and genitive. For masculines, you also need to compare accusative. | |
If you don't know the grammatical gender, but you know some cases, comparing endings of singular nominativ, singular genitiv and possibly singular accusative covers most of the cases, but not all of them. | |
*/ | |
/** | |
* Given a Czech noun, returns a grammatical template as a String. | |
* Some edge cases are not covered. These edge cases include (but are not limited to): | |
* * Non-inflected words like "kupé", "filé" and "ragù". (Probably all of them are alien words.) | |
* * Some alien words with "us" ending in nominative. | |
* * Mixed inflection. | |
* * Some special templates. (I've covered some of them, but not all.) | |
* | |
* Note that the code is rather meant as pseudocode. I hope one can make it working just by denining the Noun class, which should be rather simple. | |
* | |
*/ | |
def determineTemplate(noun: Noun) = noun.singular.nominative.ending match { | |
case "" => // MA: pán, muž; MI: hrad, les, stroj; F: píseň, kost | |
noun.singular.genitive.ending match { | |
// F: píseň, kost | |
case "a" => // MA-G: pána; MI-G: lesa | |
noun.singular.accusative.ending match { | |
case "a" => "pán" // MA-A: pána | |
case "" => "les" // MI-A: lesa | |
} | |
case "e" => // MA-G: muže, MI-G: stroje | |
noun.singular.accusative.ending match { | |
case "e" => "muž" // MA-A: muže | |
case "" => "stroj" // MI-A: stroj | |
} | |
case "u" => "hrad" // MI-G: hradu | |
case "ě" => "píseň" // F-G: písně | |
} | |
case "a" => // MA: předseda; F: žena, škola | |
// These templates are very very similar. Not sure why "chairman" is similar (in inflection) to "woman" :D | |
noun.singular.locative.ending match { | |
// Note that "žena" and "škola" are usualy considered as one template, since they don't differ in grammatical gender and they only slightly differ in inflection | |
case "ě" => "žena" // F-L: ženě | |
case "e" => "škola" // F-L: škole | |
case "ovi" => "předseda" // MA-L: předsedovi | |
} | |
case "e" => // MA: soudce; F: růže; N: moře, kuře | |
noun.singular.genitive.ending match { | |
case "e" => // MA: soudce; F: růže; N: moře | |
// The most tricky one if you don't know the grammatical gender. We can't distinguish these three just by comparing one case. | |
noun.singular.accusative.ending match { | |
case "e" => // MA-A: soudce; N-A: moře | |
// Uarrgh, we can't clearly distinguish "soudce" (judge) and "moře" (sea) by singular cases! We have to use plural! | |
noun.plural.nominative.ending match { | |
case "i"|"ové" => "soudce" // N-PLURAL-N: soudci, soudcové (both are correct) | |
case "e" => "moře" // MA-PLURAL-N: moře | |
} | |
case "i" => "růže" // F-A: růži | |
} | |
case "ete"=> "kuře" | |
} | |
case "í" => // MA: Jiří; N: stavení | |
noun.singular.genitive.ending match { | |
case "ího" => "Jiří" | |
case "í" => "stavení" | |
} | |
case "ý" => "hajný" | |
case "o" => "město" | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Note that the hidden complexity is in grammatical genders. Native speakers are typically aware of the gender of virtually any Czech word. If you know the gender, it is usually enough to match just suffixes of nominative and genitive. In some cases, you also need to match the accusative in order to distinguish between masculine aminate and masculine inanimate words.