Skip to content

Instantly share code, notes, and snippets.

@enamoria
Last active December 16, 2018 10:39
Show Gist options
  • Save enamoria/0d133275a8aa88992980ed736f5484ad to your computer and use it in GitHub Desktop.
Save enamoria/0d133275a8aa88992980ed736f5484ad to your computer and use it in GitHub Desktop.
Convert unicode character to the nearest (removed tone, accent, ... that feature a specific language) ASCII character. For ex: "Тгцмр" will be converted to "Tgtsmr", not Trump
# Source: https://stackoverflow.com/a/518232/4102479
# This is better approach, compared to unidecode, though unidecode provide acceptable result in almost every cases
import unicodedata
import string
all_letters = string.ascii_letters + " .,;'"
nb_letter = len(all_letters)
print(all_letters, len(all_letters))
def unicodeToAscii(s):
return ''.join(
c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn'
and c in all_letters
)
xx = u"рнаи нцу куин"
print(unicodeToAscii(xx))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment