Skip to content

Instantly share code, notes, and snippets.

@peio
Created March 28, 2012 17:04
Show Gist options
  • Save peio/2228241 to your computer and use it in GitHub Desktop.
Save peio/2228241 to your computer and use it in GitHub Desktop.
Regular expressions to detect russian language
is_cyrilic = re.compile(u'а|е|и|о|у|ъ|я|ю', re.U) # Кирилска гласна
ru_extra_letters = re.compile(u'Ё|ё|Ы|ы|Э|э|ь[^о]+', re.U) # Специфични букви за руския език както и палатализация http://en.wikipedia.org/wiki/Russian_phonology#Palatalization
ru_j = re.compile(u' ж[ \.!?,;]', re.U) # в руския се среща ж като самостоятелна част в изречение "Впрочем, что ж я"
ru_k = re.compile(u' к[ \.!?,;]', re.U|re.I) # в руския се среща к като предлог
bg_definite_article = re.compile(u'\wът[ \.!?,;]', re.U)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment