Skip to content

Instantly share code, notes, and snippets.

@siddontang
Created October 8, 2013 15:30
Show Gist options
  • Save siddontang/6886537 to your computer and use it in GitHub Desktop.
Save siddontang/6886537 to your computer and use it in GitHub Desktop.
string encoding detect
import chardet
BOM_UTF8 = '\xef\xbb\xbf'
def detect(raw):
if raw.startswith(BOM_UTF8):
return 'utf-8-sig'
else:
result = chardet.detect(raw)
return result['encoding']
def detectFile(fileName):
with open(fileName, "rb") as f:
raw = f.read(32)
return detect(raw)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment