Skip to content

Instantly share code, notes, and snippets.

@kowalcj0
Last active November 27, 2021 12:31
Show Gist options
  • Save kowalcj0/c9e5497ce19c7f3a833b to your computer and use it in GitHub Desktop.
Save kowalcj0/c9e5497ce19c7f3a833b to your computer and use it in GitHub Desktop.
Convert any text file into UTF-8 file like object
import chardet # https://pypi.python.org/pypi/chardet
def file_to_utf8(f_name):
"""Convert any file into UTF-8 file like object.
This conversion method is not perfect, because chardet doesn't detect the char set with 100% accuracy.
You'd have to base the decision to do the conversion on the confidence level (raging from 0 to 1) returned by the detect()
detect() returns a dictionary containing the auto-detected character encoding and a confidence level from 0 to 1.
"""
with open(f_name, 'rb') as f:
data = f.read()
chardet_res = chardet.detect(data)['encoding']
if chardet_res != 'UTF-8':
data = unicode(data, chardet_res).encode('UTF-8')
return data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment