Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save mdemblani/34d30015f24e1927736f189b88be143a to your computer and use it in GitHub Desktop.
Save mdemblani/34d30015f24e1927736f189b88be143a to your computer and use it in GitHub Desktop.
Mysql Character Set conversion - Latin1 to UTF-8(utf8mb4)
  1. Make sure mysql-client is installed. If not, then :
    sudo apt install mysql-client
	or
    sudo apt-get install mysql-client
  1. Open php.ini

    ; PHP's default character set is set to UTF-8.
    ; http://php.net/default-charset
    default_charset = "UTF-8"
    

    Source: https://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql

  2. Disconnect all active applications connected to mysql and take a backup of the database.

  3. Speak UTF-8 everywhere. Update mysqld, mysql and client settings as follows(/etc/mysql/*.cnf):

    • On your local machine
    [client]
    default-character-set = utf8mb4
    
    [mysql]
    default-character-set = utf8mb4
    
    [mysqld]
    character_set_server=utf8mb4
    collation_server=utf8mb4_unicode_ci
    #The following should be set if you are using mysql version 5.6 or lower
    innodb_file_format=barracuda
    innodb_file_per_table=1
    innodb_large_prefix=1
    

    Source: https://mathiasbynens.be/notes/mysql-utf8mb4
    Source: http://mechanics.flite.com/blog/2014/07/29/using-innodb-large-prefix-to-avoid-error-1071/

    • On AWS RDS set the following:
    [mysqld]
    innodb_file_format=barracuda
    innodb_file_per_table=1
    innodb_large_prefix=1
    
    character_set_client: utf8mb4
    character_set_database: utf8mb4
    character_set_results: utf8mb4
    character_set_connection: utf8mb4
    collation_connection: utf8mb4_unicode_ci
    collation_server: utf8mb4_unicode_ci
    character_set_server: utf8mb4
    

    Source: http://aprogrammers.blogspot.in/2014/12/utf8mb4-character-set-in-amazon-rds.html
    Source: http://mechanics.flite.com/blog/2014/07/29/using-innodb-large-prefix-to-avoid-error-1071/

  4. Restart mysql service

  5. Connect to mysql using command line:

    mysql -u username -p
    
  6. Run the following query:

    SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

    Output:

    +--------------------------+--------------------+
    | Variable_name            | Value              |
    +--------------------------+--------------------+
    | character_set_client     | utf8mb4            |
    | character_set_connection | utf8mb4            |
    | character_set_database   | utf8mb4            |
    | character_set_filesystem | binary             |
    | character_set_results    | utf8mb4            |
    | character_set_server     | utf8mb4            |
    | character_set_system     | utf8               |
    | collation_connection     | utf8mb4_general_ci |
    | collation_database       | utf8mb4_unicode_ci |
    | collation_server         | utf8mb4_unicode_ci |
    +--------------------------+--------------------+
    10 rows in set (0.05 sec)
    

    Source: https://mathiasbynens.be/notes/mysql-utf8mb4

  7. Convert your Latin-1 collated tables to UTF-8 a. Individual queries on each table : https://codex.wordpress.org/Converting_Database_Character_Sets b. A mysql dump and restoration of the dump : https://www.bluebox.net/insight/blog-article/getting-out-of-mysql-character-set-hell

  8. Take a database dump as follows:

    mysqldump --verbose -u root -p --add-drop-database --opt --skip-set-charset --default-character-set=latin1 --skip-extended-insert --databases databasename > dump_file.sql
    

    Note: On the mysqldump command, the --skip-set-charset and --default-char-set=latin1 options should prevent MySQL from taking the already-Latin-1-collated table and “helpfully” converting it to any other character set for you. This should ensure that your mysqldump is really in the Latin-1 character encoding scheme.The --skip-extended-insert option forces mysqldump to put each INSERT command in the dump on its own line. This will make the dump take much longer to re-import, however, in my experimentation, adding this option was enough to prevent the dump from having syntax errors in in anywhere. And in any case, should the re-import fail for any reason, having each row's data on its own line really helps to be able to zero-in on which rows are causing you problems (and gives you easier options to work-around the problem rows).*

  9. Replace all latin1 with utf8

 perl -i -pe 's/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci/' dump_file.sql`
 ```
 The first command replaces all instances of DEFAULT CHARACTER SET latin1 with DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci. This is used to fix up the database's default charset and collation.
 
  ```perl
 perl -i -pe 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4 COLLATE utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC/' dump_file.sql
 ```
 The second command replaces all instances of DEFAULT CHARSET=latin1 with DEFAULT CHARSET=utf8mb4. This converts all tables from using latin1 to using utf8mb4.

 *Source : https://docs.moodle.org/24/en/Converting_your_MySQL_database_to_UTF8#Linux_.26_Mac*

11. Restore the Dump
 ```
nohup mysql -v -u username -ppassword < dump_file.sql  & (to run i background)
    or 
mysql -v -u username -p < dump_file.sql  (to run in foreground)

 ```
 *Source: https://www.maketecheasier.com/run-bash-commands-background-linux/*
12. Repair the tables for any problems in-case
 ```
 mysqlcheck -u root -p --auto-repair --optimize --all-databases
 ```

13. In your application, execute the following query on your application database and verify the result:
 ```mysql
 SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
 ```
 ```
 +--------------------------+--------------------+
 | Variable_name            | Value              |
 +--------------------------+--------------------+
 | character_set_client     | utf8mb4            |
 | character_set_connection | utf8mb4            |
 | character_set_database   | utf8mb4            |
 | character_set_filesystem | binary             |
 | character_set_results    | utf8mb4            |
 | character_set_server     | utf8mb4            |
 | character_set_system     | utf8               |
 | collation_connection     | utf8mb4_general_ci |
 | collation_database       | utf8mb4_unicode_ci |
 | collation_server         | utf8mb4_unicode_ci |
 +--------------------------+--------------------+
 ```
If the result is not as above, pefrom the following steps.

14. [CakePHP] Open database.php and set encoding to utf8mb4 as follows.
 ```php
 class DATABASE_CONFIG {
     public $default = array(
 	'datasource'  => 'Database/Mysql',
 	'persistent'  => false,
 	'host'        => 'localhost',
 	'login'       => 'username',
 	'password'    => 'password!',
 	'database'    => 'MySQL Database Name',
 	'prefix'      => '',
 	'encoding'    => 'utf8mb4'
     );
 }
 ```

15. [SailsJS] Open connections.js in your SailsJS application and set as follows:
 ```js
  mysql: {
     adapter   : 'sails-mysql',
     host      : 'localhost',
     port      : 3306,
     user      : 'username',
     password  : 'password',
     database  : 'MySQL Database Name'

     // OR (explicit sets take precedence)
     adapter   : 'sails-mysql',
     url       : 'mysql2://USER:PASSWORD@HOST:PORT/DATABASENAME'

     // Add the following
     charset   : 'utf8mb4',
     collation : 'utf8mb4_unicode_ci'
  }
  ```
 *Source: https://github.com/balderdashy/sails-mysql#sails-configuration*
 
 
Sources
---

- MOST RELIABLE : https://www.bluebox.net/insight/blog-article/getting-out-of-mysql-character-set-hell
- If your database isn't big, also proposes the fastest solution : https://codex.wordpress.org/Converting_Database_Character_Sets
- If you are using AWS-RDS, steps required: http://aprogrammers.blogspot.in/2014/12/utf8mb4-character-set-in-amazon-rds.html
- Why to use MySQL utf8mb4 instead of utf8 - https://mathiasbynens.be/notes/mysql-utf8mb4
- User freindly way to figure out if your database has encodig issues: https://www.whitesmith.co/blog/latin1-to-utf8/
- For PHP Applications : https://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql
- If you are using MySQL version 5.6 or lower : http://mechanics.flite.com/blog/2014/07/29/using-innodb-large-prefix-to-avoid-error-1071/
- https://docs.moodle.org/24/en/Converting_your_MySQL_database_to_UTF8#Linux_.26_Mac
- http://blog.socialcast.com/rails-2-mysql-and-character-sets/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment