Skip to content

Instantly share code, notes, and snippets.

@akost
Created April 4, 2012 19:06
Show Gist options
  • Save akost/2304819 to your computer and use it in GitHub Desktop.
Save akost/2304819 to your computer and use it in GitHub Desktop.
Bash script for recursive file convertion windows-1251 --> utf-8
#!/bin/bash
# Recursive file convertion windows-1251 --> utf-8
# Place this file in the root of your site, add execute permission and run
# Converts *.php, *.html, *.css, *.js files.
# To add file type by extension, e.g. *.cgi, add '-o -name "*.cgi"' to the find command
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |
while read file
do
echo " $file"
mv $file $file.icv
iconv -f WINDOWS-1251 -t UTF-8 $file.icv > $file
rm -f $file.icv
done
@Batname
Copy link

Batname commented Apr 15, 2014

Thanks

@FernandoBasso
Copy link

Great script. Works perfectly, even on cygwin (which I have to use at work). Thanks a lot.

@vkdimitrov
Copy link

save my day

@ranold
Copy link

ranold commented Apr 7, 2016

thank you!

@nymo
Copy link

nymo commented Jan 20, 2017

Thanks! Really good script.

@obojdi
Copy link

obojdi commented Feb 17, 2017

Confirmed working on cygwin, many thanks @akost!

@anonymous2ch
Copy link

That script is bad. since iconv doesn't detect if file is already UTF-8. So it will ruin your files if run on directory with files in mixed encodings. Running iconv more than once is guaranteed to screw your files too.

What you actually should use for this operation is enca, since it will correctly detect input encoding and act accordingly.

After installing enca, just run this one-liner & your files will be UTF-8 in no time:
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f | while read file; do enca -x UTF-8 $file; done;

@loadinger
Copy link

thanks @anonymous2ch

@finalchild
Copy link

worked like a charm
Thank you so much!!!!

@aysenz
Copy link

aysenz commented Aug 29, 2017

Thanks!

@shuravban
Copy link

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

@mitya12342
Copy link

@anonymous2ch Помог )

@1nt3g3r
Copy link

1nt3g3r commented Jan 20, 2018

Есть момент, когда имена файлов с пробелами - тогда скрипт не работает. Поправленный вариант скрипта -

find ./ -name ".txt" -o -name ".html" -o -name ".css" -o -name ".js" -type f |
while read file
do
echo " $file"
mv "$file" "$file".icv
iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
rm -f "$file".icv
done

@pasha-pivo
Copy link

@1nt3g3r, your script won't work. You missed * in the filename templates. To make it work the first line should look like this:

find ./ -name "*.txt" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |

However, your variant works much better then the TS's. It works even with the unprintable characters in the filenames. Thanks!

@gevmarlen
Copy link

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.txt"  -type f |
while read file
do
  if ! file -bi $file | grep -q 'utf-8'
  then 
    echo " $file"
    mv "$file" "$file".icv
    iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
    rm -f "$file".icv
  fi
done

@catmater
Copy link

catmater commented Oct 9, 2020

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

@definiteIymaybe
Copy link

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

just a quick note that that would require enca installed (brew install enca) and might fail if, say, a CP-1251 file was incorrectly saved as UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment