Skip to content

Instantly share code, notes, and snippets.

@tushortz
Created April 6, 2016 22:14
Show Gist options
  • Star 25 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save tushortz/9fbde5d023c0a0204333267840b592f9 to your computer and use it in GitHub Desktop.
Save tushortz/9fbde5d023c0a0204333267840b592f9 to your computer and use it in GitHub Desktop.
Function to replace some annoying characters
def unicodetoascii(text):
TEXT = (text.
replace('\\xe2\\x80\\x99', "'").
replace('\\xc3\\xa9', 'e').
replace('\\xe2\\x80\\x90', '-').
replace('\\xe2\\x80\\x91', '-').
replace('\\xe2\\x80\\x92', '-').
replace('\\xe2\\x80\\x93', '-').
replace('\\xe2\\x80\\x94', '-').
replace('\\xe2\\x80\\x94', '-').
replace('\\xe2\\x80\\x98', "'").
replace('\\xe2\\x80\\x9b', "'").
replace('\\xe2\\x80\\x9c', '"').
replace('\\xe2\\x80\\x9c', '"').
replace('\\xe2\\x80\\x9d', '"').
replace('\\xe2\\x80\\x9e', '"').
replace('\\xe2\\x80\\x9f', '"').
replace('\\xe2\\x80\\xa6', '...').#
replace('\\xe2\\x80\\xb2', "'").
replace('\\xe2\\x80\\xb3', "'").
replace('\\xe2\\x80\\xb4', "'").
replace('\\xe2\\x80\\xb5', "'").
replace('\\xe2\\x80\\xb6', "'").
replace('\\xe2\\x80\\xb7', "'").
replace('\\xe2\\x81\\xba', "+").
replace('\\xe2\\x81\\xbb', "-").
replace('\\xe2\\x81\\xbc', "=").
replace('\\xe2\\x81\\xbd', "(").
replace('\\xe2\\x81\\xbe', ")")
)
return TEXT
@vishnudas-raveendran
Copy link

You may use this utf-8 special char lists to clean text with certain codes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment