Skip to content

Instantly share code, notes, and snippets.

@mbrenig
Created May 6, 2013 17:41
Show Gist options
  • Save mbrenig/5526706 to your computer and use it in GitHub Desktop.
Save mbrenig/5526706 to your computer and use it in GitHub Desktop.
Unicode character detection across a directory.
import chardet # https://pypi.python.org/pypi/chardet
import os
import sys
import fnmatch
def detect(arg, dir, names):
for name in names:
if fnmatch.fnmatch(name, arg):
fpath = os.path.join(dir, name)
r = chardet.detect(fpath)
print "%s is %s with confidence %s" % (fpath, r['encoding'], r['confidence'])
os.path.walk(".", detect, sys.argv[1])
# Example usage:
# cd to top of directory to test.
# utftester.py *.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment