Skip to content

Instantly share code, notes, and snippets.

@korenmiklos
Created December 6, 2016 07:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save korenmiklos/e919d2c385b6733898959e5acb58f58d to your computer and use it in GitHub Desktop.
Save korenmiklos/e919d2c385b6733898959e5acb58f58d to your computer and use it in GitHub Desktop.
# coding: utf-8
# transliteration from http://www.boutler.de/translit/trans.htm
RUS = (u'\u0430', u'\u0431', u'\u0432', u'\u0433', u'\u0434', u'\u0435', u'\u0451', u'\u0436', u'\u0437',u'\u0438', u'\u0439',
u'\u043A', u'\u043B', u'\u043C', u'\u043D', u'\u043E', u'\u043F', u'\u0440', u'\u0441', u'\u0442', u'\u0443', u'\u0444',
u'\u0445', u'\u0446', u'\u0447', u'\u0448', u'\u0449', u'\u044A', u'\u044B', u'\u044C', u'\u044D', u'\u044E', u'\u044F')
HUN = ('a', 'b', 'v', 'g', 'd', 'je', 'jo', 'zs', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 'sz', 't', 'u', 'f',
'h', 'c', 'cs', 's', 'scs', '', 'i', '', 'e', 'ju', 'ja')
RUS_HUN = dict(zip(RUS,HUN))
def transliterate_to_hungarian(russian):
output = u''
for character in russian:
if character.lower() in RUS_HUN:
output += RUS_HUN[character.lower()]
else:
output += character.lower()
return output
transliterate_to_hungarian(u'Транслитерация русского алфавита')
@e3krisztian
Copy link

There is a unicode.translate method, that does almost exactly what the transliterate function above do:

RUS_HUN = dict(zip(map(ord, RUS), HUN))

def transliterate_to_hungarian(russian):
    return russian.lower().translate(RUS_HUN)

The difference is in the translate table: it must be a mapping from unicode ordinals, not strings, and probably in speed.

BTW should not be russian named cyrillic instead?

There is also a unidecode module, that gives a slightly different result (I prefer yours):

Transliteratsiia russkogo alfavita

vs

transzlitjeracija ruszszkogo alfavita

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment