Skip to content

Instantly share code, notes, and snippets.

@morrizon
Last active January 12, 2017 16:11
Show Gist options
  • Save morrizon/fc20d367605b0e70359e8c8f9b6c32fc to your computer and use it in GitHub Desktop.
Save morrizon/fc20d367605b0e70359e8c8f9b6c32fc to your computer and use it in GitHub Desktop.
Double encoding issue

Some times I found in some legacy code a doubled encoding issue (latin1 tables with UTF-8).

I workaround to fix the issue I find the problematic content:

db=$1
user=$2
pass=$3

mysqldump -u$user -p$pass $db > bkpEncodingIssue$db.sql
mysqldump --default-character-set=latin1 -u$user -p$pass $db | sed 's/DEFAULT CHARSET=LATIN1/DEFAULT CHARSET=utf8/i' | grep -v "^/\*" > bkpEncodingIssue${db}NewEncoding.sql
mysql -u$user -p$pass $db < bkpEncodingIssue${db}NewEncoding.sql

If you have a mixing (utf8 tables and latin1 ones) you can detect what are the problematic tables:

db=$1
user=$2
pass=$3
timestamp=$(date +"%s")

#backup db
mysqldump -u$user -p$pass $db > bkpEncodingIssue${db}_${timestamp}.sql

#convert content by table
mkdir encodingIssue
tableList=$(echo show tables|mysql -u$user -p$pass $db -N)
for table in $tableList;do
    mysqldump --default-character-set=latin1 -u$user -p$pass $db $table | sed 's/DEFAULT CHARSET=LATIN1/DEFAULT CHARSET=utf8/i' | grep -v "^/\*" > encodingIssue/$table.sql
done

#check the encoding of the tables and load the UTF-8 ones
fixedTables=$(file encodingIssue/*|grep UTF-8|awk '{print $1}'|xargs|sed 's/://g')
for table in $fixedTables;do
    echo "Fixing encoding of table <$table>"
    mysql -u$user -p$pass $db < $table
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment