Skip to content

Instantly share code, notes, and snippets.

@mkocikowski
Created March 1, 2014 01:00
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mkocikowski/9283114 to your computer and use it in GitHub Desktop.
Save mkocikowski/9283114 to your computer and use it in GitHub Desktop.
Take str or unicode in, return utf-8 encoded string, with 'bad' characters stripped
def utf8(data):
"""Takes in str or unicode, returns utf-8 string, stripping invalid chars."""
if type(data) not in [str, unicode]:
raise TypeError("'data' must be str or unicode")
try:
if type(data) is unicode:
s = data.encode('utf-8', 'ignore')
elif type(data) is str:
s = data.decode('utf-8', 'ignore').encode('utf-8', 'ignore')
return s
except UnicodeError as exc:
raise
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment