Skip to content

Instantly share code, notes, and snippets.

Created April 4, 2012 19:06
Show Gist options
  • Save akost/2304819 to your computer and use it in GitHub Desktop.
Save akost/2304819 to your computer and use it in GitHub Desktop.
Bash script for recursive file convertion windows-1251 --> utf-8
# Recursive file convertion windows-1251 --> utf-8
# Place this file in the root of your site, add execute permission and run
# Converts *.php, *.html, *.css, *.js files.
# To add file type by extension, e.g. *.cgi, add '-o -name "*.cgi"' to the find command
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |
while read file
echo " $file"
mv $file $file.icv
iconv -f WINDOWS-1251 -t UTF-8 $file.icv > $file
rm -f $file.icv
Copy link

@1nt3g3r, your script won't work. You missed * in the filename templates. To make it work the first line should look like this:

find ./ -name "*.txt" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |

However, your variant works much better then the TS's. It works even with the unprintable characters in the filenames. Thanks!

Copy link

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.txt"  -type f |
while read file
  if ! file -bi $file | grep -q 'utf-8'
    echo " $file"
    mv "$file" "$file".icv
    iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
    rm -f "$file".icv

Copy link

catmater commented Oct 9, 2020

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

Copy link

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

just a quick note that that would require enca installed (brew install enca) and might fail if, say, a CP-1251 file was incorrectly saved as UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment