Skip to content

Instantly share code, notes, and snippets.

@huihut
Created August 28, 2018 15:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save huihut/b9427d6d6402e8a9721fb44c2a635970 to your computer and use it in GitHub Desktop.
Save huihut/b9427d6d6402e8a9721fb44c2a635970 to your computer and use it in GitHub Desktop.
Guess the encoding of the file
def guess_encoding(csv_file):
"""guess the encoding of the given file"""
import io
import locale
with io.open(csv_file, "rb") as f:
data = f.read(5)
if data.startswith(b"\xEF\xBB\xBF"): # UTF-8 with a "BOM"
return "utf-8-sig"
elif data.startswith(b"\xFF\xFE") or data.startswith(b"\xFE\xFF"):
return "utf-16"
else: # in Windows, guessing utf-8 doesn't work, so we have to try
try:
with io.open(csv_file, encoding="utf-8") as f:
preview = f.read(222222)
return "utf-8"
except:
return locale.getdefaultlocale()[1]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment