Moving a WordPress database to "all UTF8" can be problematic. Without taking note of the specific scenario, you might find yourself struggling with messed up encoding (on not-ascii chars). If you find yourself manually replacing messed up chars in a database, stop! You should never need to do this. Only in the worst cases of "split brain encoding" might you need to do something like that.
When moving a WordPress database to a new charset, it's not uncommon for there to be a mixture of latin
and utf8
in different settings. This is because WordPress' default encoding used to be latin1
a long time ago. To confuse things furthere, there can also be a mismatch between WordPress' understanding of the encoding for the database, and MySQL's.
You must find the value of DB_CHARSET in WordPress. This is typically in the wp-config.php
, however it might not be. WordPress has a default value, which can vary depending if you are using a DB drop-in plugin too. The best way to check this is with WP CLI via wp shell
and echo DB_CHARSET;
.
The MySQL database has it's own encoding value (as well as all tables, but for this guide we'll assume all tables match the database level encoding value). You can run a query like SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = "wordpress"
to determine what the charset in the database is.
From that, you should map to one of the following scenarios.
Of most interesting, and common point of confusion is scenario 3. In this case, you have WordPress sending UTF8 data into a database that MySQL understands to be latin1. Any database export from MySQL will then treat that data as latin1
on export, which can mangle data. The most direct way to fix this is to export the database from the MySQL server with no default char encoding set in the mysqldump file, and then re-import it into a new utf8 database. That looks something like this:
mysqldump -h $hostname -u wordpress --skip-set-charset --default-character-set=latin1 wordpress > backup.sql
Next, we re-import the database backup (that has skipped setting the latin1 charset in the database file). This assumes that the default charset on your MySQL server is UTF8.
mysql -h $hostname -u wordpress wordpress < backup.sql
If at any point the value of DB_CHARSET
on the WordPress side changes from latin1
to utf8
without first correcting the database, you'll be in a world of hurt that's virtually impossible to fix. The issue here will be that some data (say posts after you messed up those charset values) will be encoded as utf8 and some will be latin1, and there's no true way to distinguish them. Converting from latin1 to utf8 will now break everything that was already utf8, and vice versa.
The only tool you then have at your disposal is messy migration scripts running things like mb_convert_encoding
trying to guess which data needs to be converted and which doesn't.