Skip to content

Instantly share code, notes, and snippets.

@karolzlot
Created December 1, 2022 01:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save karolzlot/83768e1977b7a0fb6cadc1912e272607 to your computer and use it in GitHub Desktop.
Save karolzlot/83768e1977b7a0fb6cadc1912e272607 to your computer and use it in GitHub Desktop.
Python normalization comparison
import unicodedata
from unidecode import unidecode
def normalize(text:str):
text = unicodedata.normalize('NFD', text)\
.encode('ascii', 'ignore')\
.decode("utf-8")
return text
text ='zażółć gęślą jaźń, kožušček 北亰 François aaßaa aßb'
print(normalize(text))
# zazoc gesla jazn, kozuscek Francois aaaa ab
print(unidecode(text))
# zazolc gesla jazn, kozuscek Bei Jing Francois aassaa assb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment