Skip to content

Instantly share code, notes, and snippets.

@akost
Created April 4, 2012 19:06
Show Gist options
  • Save akost/2304819 to your computer and use it in GitHub Desktop.
Save akost/2304819 to your computer and use it in GitHub Desktop.
Bash script for recursive file convertion windows-1251 --> utf-8
#!/bin/bash
# Recursive file convertion windows-1251 --> utf-8
# Place this file in the root of your site, add execute permission and run
# Converts *.php, *.html, *.css, *.js files.
# To add file type by extension, e.g. *.cgi, add '-o -name "*.cgi"' to the find command
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |
while read file
do
echo " $file"
mv $file $file.icv
iconv -f WINDOWS-1251 -t UTF-8 $file.icv > $file
rm -f $file.icv
done
@gevmarlen
Copy link

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.txt"  -type f |
while read file
do
  if ! file -bi $file | grep -q 'utf-8'
  then 
    echo " $file"
    mv "$file" "$file".icv
    iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
    rm -f "$file".icv
  fi
done

@catmater
Copy link

catmater commented Oct 9, 2020

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

@definiteIymaybe
Copy link

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

just a quick note that that would require enca installed (brew install enca) and might fail if, say, a CP-1251 file was incorrectly saved as UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment