Skip to content

Instantly share code, notes, and snippets.

@roopalgarg
Created April 21, 2017 01:09
Show Gist options
  • Save roopalgarg/933a01d3dbf1cbb7f3c7a067413a39ba to your computer and use it in GitHub Desktop.
Save roopalgarg/933a01d3dbf1cbb7f3c7a067413a39ba to your computer and use it in GitHub Desktop.
deaccent in python
def deaccent(text):
"""
Remove accentuation from the given string. Input text is either a unicode string or utf8 encoded bytestring.
Return input string with accents removed, as unicode.
>>> deaccent("Šéf chomutovských komunistů dostal poštou bílý prášek")
u'Sef chomutovskych komunistu dostal postou bily prasek'
"""
if not isinstance(text, unicode):
# assume utf8 for byte strings, use default (strict) error handling
text = text.decode('utf8')
norm = unicodedata.normalize("NFD", text)
result = u('').join(ch for ch in norm if unicodedata.category(ch) != 'Mn')
return unicodedata.normalize("NFC", result)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment