Skip to content

Instantly share code, notes, and snippets.

@altsoph
Created December 12, 2019 09:00
Show Gist options
  • Save altsoph/7af08cd7f57864599a3530ab027a6896 to your computer and use it in GitHub Desktop.
Save altsoph/7af08cd7f57864599a3530ab027a6896 to your computer and use it in GitHub Desktop.
en
nltk
http://snowball.tartarus.org/algorithms/porter/stemmer.html
http://snowball.tartarus.org/algorithms/english/stemmer.html
http://proiot.ru/jssnowball/ (online), source https://github.com/mazko/jssnowball
porter stemmer (много на гитхабе)
ru
nltk
http://snowball.tartarus.org/algorithms/russian/stemmer.html
https://github.com/neonxp/Stemmer
https://metacpan.org/pod/Lingua::Stem::Ru
http://proiot.ru/jssnowball/ (online), source https://github.com/mazko/jssnowball
de
nltk
http://snowball.tartarus.org/algorithms/german/stemmer.html
http://snowball.tartarus.org/algorithms/german2/stemmer.html
cs Czech
http://snowball.tartarus.org/otherapps/oregan/czech-do.sbl
https://github.com/dundalek/czech-stemmer
http://research.variancia.com/czech_stemmer/
http://proiot.ru/jssnowball/ (online), source https://github.com/mazko/jssnowball
be Belarusian
https://tt-api.tech/en/stemmer (платное)
uk Ukranian
https://github.com/Amice13/ukr_stemmer
https://github.com/vgrichina/ukrainian-stemmer
https://tt-api.tech/en/stemmer (платное)
pl Polish
https://pypi.org/project/pystempel/
https://github.com/eugeniashurko/polish-stem
http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en
bg Bulgarian
https://github.com/gkostadinov/py-bulgarian-stemmer
https://github.com/Glamdring/bg-stemmer
hr Croatian
http://nlp.ffzg.hr/resources/tools/stemmer-for-croatian/ (обрати внимание, тут гнездо nlp-кодеров про Хорватский)
https://eliteinformatiker.de/2015/05/15/rewriting-university-of-zagrebs-croatian-stemmer-to-a-nltk-compliant-class
https://pypi.org/project/text-hr/
https://vukbatanovic.github.io/SCStemmers/
id Indonesian
https://snowballstem.org/algorithms/indonesian/stemmer.html
https://pypi.org/project/Sastrawi/
https://github.com/har07/PySastrawi
mk Macedonian
https://dighumlab.org/cst-lemmatiser/ + https://nlpweb01.nors.ku.dk/download/cstlemma/macedonian/
sk Slovak
https://github.com/mrshu/stemm-sk
https://github.com/essential-data/stemmer-sk
sl Slovenian
https://repo.ijs.si/DIS-AGENTS/snowball-stemmer
https://stackoverflow.com/questions/8714040/slovenian-stemmer-for-sphinx (не уверен, что это живое)
http://proiot.ru/jssnowball/ (online), source https://github.com/mazko/jssnowball
sr Serbian
https://snowballstem.org/algorithms/serbian/stemmer.html
https://vukbatanovic.github.io/SCStemmers/
https://nikolamilosevic86.github.io/SerbianStemmer/
https://github.com/vdragan1993/serbian-stemmer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment