Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pedrotnascimento/9d368fb0dc5b37824bbf951d2486861a to your computer and use it in GitHub Desktop.
Save pedrotnascimento/9d368fb0dc5b37824bbf951d2486861a to your computer and use it in GitHub Desktop.
Script for detecting and treating files whether utf-8 or ansi files
import sys
try:
FILE_PATH =sys.argv[1]
except IndexError:
print("need pass a file as input parameter\npython my_script.py my_csv_file.csv")
exit()
def predict_encoding(file_path, n_lines=20):
'''Predict a file's encoding using chardet'''
import chardet
# Open the file as binary data
with open(file_path, 'rb') as f:
# Join binary lines for specified number of lines
rawdata = b''.join([f.readline() for _ in range(n_lines)])
a = chardet.detect(rawdata)['encoding']
return chardet.detect(rawdata)['encoding']
ANSI_CODE = "ISO-8859-1"
FILE_ENCODE = predict_encoding(FILE_PATH)
def ansiToUtf8(string, FILE_ENCODE):
if ANSI_CODE == FILE_ENCODE:
return unicode(string, "cp1252")
else:
return string.decode("utf-8")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment