Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bash script for recursive file convertion windows-1251 --> utf-8
#!/bin/bash
# Recursive file convertion windows-1251 --> utf-8
# Place this file in the root of your site, add execute permission and run
# Converts *.php, *.html, *.css, *.js files.
# To add file type by extension, e.g. *.cgi, add '-o -name "*.cgi"' to the find command
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |
while read file
do
echo " $file"
mv $file $file.icv
iconv -f WINDOWS-1251 -t UTF-8 $file.icv > $file
rm -f $file.icv
done
@Batname
Copy link

Batname commented Apr 15, 2014

Thanks

@FernandoBasso
Copy link

FernandoBasso commented May 7, 2015

Great script. Works perfectly, even on cygwin (which I have to use at work). Thanks a lot.

@vkdimitrov
Copy link

vkdimitrov commented Aug 4, 2015

save my day

@ranold
Copy link

ranold commented Apr 7, 2016

thank you!

@nymo
Copy link

nymo commented Jan 20, 2017

Thanks! Really good script.

@obojdi
Copy link

obojdi commented Feb 17, 2017

Confirmed working on cygwin, many thanks @akost!

@anonymous2ch
Copy link

anonymous2ch commented Feb 27, 2017

That script is bad. since iconv doesn't detect if file is already UTF-8. So it will ruin your files if run on directory with files in mixed encodings. Running iconv more than once is guaranteed to screw your files too.

What you actually should use for this operation is enca, since it will correctly detect input encoding and act accordingly.

After installing enca, just run this one-liner & your files will be UTF-8 in no time:
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f | while read file; do enca -x UTF-8 $file; done;

@loadinger
Copy link

loadinger commented Jun 15, 2017

thanks @anonymous2ch

@finalchild
Copy link

finalchild commented Jul 25, 2017

worked like a charm
Thank you so much!!!!

@aysenz
Copy link

aysenz commented Aug 29, 2017

Thanks!

@shuravban
Copy link

shuravban commented Dec 21, 2017

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

@mitya12342
Copy link

mitya12342 commented Jan 2, 2018

@anonymous2ch Помог )

@1nt3g3r
Copy link

1nt3g3r commented Jan 20, 2018

Есть момент, когда имена файлов с пробелами - тогда скрипт не работает. Поправленный вариант скрипта -

find ./ -name ".txt" -o -name ".html" -o -name ".css" -o -name ".js" -type f |
while read file
do
echo " $file"
mv "$file" "$file".icv
iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
rm -f "$file".icv
done

@pasha-pivo
Copy link

pasha-pivo commented Apr 26, 2018

@1nt3g3r, your script won't work. You missed * in the filename templates. To make it work the first line should look like this:

find ./ -name "*.txt" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |

However, your variant works much better then the TS's. It works even with the unprintable characters in the filenames. Thanks!

@gevmarlen
Copy link

gevmarlen commented Dec 28, 2018

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.txt"  -type f |
while read file
do
  if ! file -bi $file | grep -q 'utf-8'
  then 
    echo " $file"
    mv "$file" "$file".icv
    iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
    rm -f "$file".icv
  fi
done

@catmater
Copy link

catmater commented Oct 9, 2020

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

@definiteIymaybe
Copy link

definiteIymaybe commented Apr 11, 2021

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

just a quick note that that would require enca installed (brew install enca) and might fail if, say, a CP-1251 file was incorrectly saved as UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment