Skip to content

Instantly share code, notes, and snippets.

@rmetzler
Created June 18, 2012 10:43
Show Gist options
  • Star 45 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save rmetzler/2947828 to your computer and use it in GitHub Desktop.
Save rmetzler/2947828 to your computer and use it in GitHub Desktop.
find all non UTF-8 encoded files
find . -type f | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {}" > utf8_fail
@rmetzler
Copy link
Author

@fedir
Copy link

fedir commented Nov 6, 2013

find . -type f -exec file --mime {} \; | grep -v '.git' | grep -v "charset=utf-8"

@aaaronic
Copy link

aaaronic commented Jun 4, 2015

I only wanted to check files that were actually tracked in my git repo, so I used the following modified form of the original (only change is the very first command):

git ls-files . | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {}" > utf8_fail

@chtenb
Copy link

chtenb commented Nov 1, 2017

To exclude multiple encodings

find . -type f -exec file --mime {} \;  | grep -v 'utf-8\|binary\|ascii'

@DennisDyallo
Copy link

This is great. Thanks!

@Mielai1l
Copy link

Mielai1l commented May 3, 2019

you can also instruct find to skip the content of .git directories:
find -type d -iname .git -prune -o -type f -exec file --mime {} \; | grep "text/" | grep -v "utf-8\|us-ascii"
./xyz/q: text/html; charset=iso-8859-1
./cdr/index.html.iso8859: text/html; charset=iso-8859-1

with grep "text/" we can filter empty files which are displayed as
inode/x-empty; charset=binary

@anasram
Copy link

anasram commented Apr 12, 2020

Using file command, I think you can find a way to do it in a simpler way, yet I don't know how.

Try this to understand what I mean:

file --mime-encoding *

@vtuz
Copy link

vtuz commented Jan 28, 2021

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment