Skip to content

Instantly share code, notes, and snippets.

@rponte
Last active May 10, 2019 11:30
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save rponte/559613 to your computer and use it in GitHub Desktop.
Save rponte/559613 to your computer and use it in GitHub Desktop.
Converting encoding with iconv command
# verifica encoding do arquivo
$ file -I script_perebento.sql
$ file -bI script_perebento.sql
# converte arquivo de ISO-8859-1 para UTF-8
$ iconv -f ISO-8859-1 -t UTF-8 script_perebento.sql > script_bonitao.sql
#! /bin/bash
NON_UTF_FILE_DIR="WebRoot/WEB-INF/jsp"
PATTERN_FILE_NAME="*.jsp"
find $NON_UTF_FILE_DIR -type f -name $PATTERN_FILE_NAME > utf8list
iconv utf8list > asciilist
i=1
for file in $(cat utf8list); do
newname=$(head -$i asciilist | tail -1 | tr -d '\n').utf8
echo "converting file to utf-8 $file => $newname"
iconv -f ISO-8859-1 -t utf8 $file > $newname
mv $newname $file
let i++
done
rm utf8list asciilist
#!/bin/bash
find_this="windows-1252"
replace_with="utf-8"
for file in `find . -regextype posix-egrep -regex ".*\.(html|jsp|jspx)"`; do
sed "s/$find_this/$replace_with/g" $file > $file.utf-8;
mv $file.utf-8 $file;
done
#! /bin/bash
set -e # Exit script immediately on first error.
#set -x # Print commands and their arguments as they are executed.
NON_UTF_FILE_DIR="."
PATTERN_FILE_NAME="*.sql"
find $NON_UTF_FILE_DIR -type f -name "$PATTERN_FILE_NAME" > utf8list
iconv utf8list > asciilist
i=1
for file in $(cat utf8list); do
CURRENT_CHARSET="$(file -bi "$file" | awk -F "=" '{print $2}')"
if [ "$CURRENT_CHARSET" == utf-8 ]; then
let i++
continue
fi
newname=$(head -$i asciilist | tail -1 | tr -d '\n').utf8
echo "converting file ($CURRENT_CHARSET) to utf-8 $file => $newname"
#iconv -f ISO-8859-1 -t utf8 $file > $newname
iconv -f "$CURRENT_CHARSET" -t utf-8 $file > $newname
mv $newname $file
let i++
done
rm utf8list asciilist
#! /bin/bash
set -e # Exit script immediately on first error.
#set -x # Print commands and their arguments as they are executed.
NON_UTF_FILE_DIR="WebRoot"
PATTERN_FILE_NAME="*.jsp"
find $NON_UTF_FILE_DIR -type f -name $PATTERN_FILE_NAME > utf8list
iconv utf8list > asciilist
i=1
for file in $(cat utf8list); do
CURRENT_CHARSET="$(file -bI "$file" | awk -F "=" '{print $2}')"
if [ "$CURRENT_CHARSET" == utf-8 ]; then
let i++
continue
fi
newname=$(head -$i asciilist | tail -1 | tr -d '\n').utf8
echo "converting file ($CURRENT_CHARSET) to utf-8 $file => $newname"
#iconv -f ISO-8859-1 -t utf8 $file > $newname
iconv -f "$CURRENT_CHARSET" -t utf8 $file > $newname
mv $newname $file
let i++
done
rm utf8list asciilist
@rponte
Copy link
Author

rponte commented Jul 26, 2012

@caiofrota
Copy link

caiofrota commented Nov 16, 2017

Dude, I made a small change to your code (from_iso88591_to_utf8.sh) to expand more files:

#!/bin/bash

for file in `find . -regextype posix-egrep -regex ".*\.(html|jsp|jspx|java|properties|xml|sql|pck|fnc|trg)"`; do
  iconv -f ISO-8859-1 -t UTF-8 $file > $file.utf-8;
  mv $file.utf-8 $file;
done

Code in My Gists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment