Skip to content

Instantly share code, notes, and snippets.

@jvanasco
Created February 17, 2012 20:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jvanasco/1855302 to your computer and use it in GitHub Desktop.
Save jvanasco/1855302 to your computer and use it in GitHub Desktop.
drop accented chars (unicode) to ascii equivalents
import unicodedata
accented= {
'czech' : u'ťúůýžšřóňíěéďčá',
'french' : u"ùûüÿàâæçéèêëïîôœ",
'finnish' : u'äåö',
'danish' : u'åæéø',
'german' : u'äöüß',
'hungarian': u'áéíöóőüúű',
'icelandic': u'áæðéíóöþúý',
'italian': u'àèéìòóù',
'norwegian': u'åæâéèêøóòô',
'polish': u'ąćęłńóśźż',
'portuguese': u'úüãáâàçéêíõóô',
'romanian': u'ăâîşșţț',
'spanish': u'áéíñóúü',
'swedish': u'äåéö',
'welsh': u'ûüúùŵẃẅẁŷÿýỳäáàêëéèîïíôöóò'
}
for ( lang , chars ) in accented.iteritems() :
print "-----"
print lang
print " %s" % chars
print " %s" % unicodedata.normalize('NFKD',chars).encode('ascii','ignore')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment