Skip to content

Instantly share code, notes, and snippets.

@tjh
Created January 31, 2012 16:07
Show Gist options
  • Star 60 You must be signed in to star a gist
  • Fork 18 You must be signed in to fork a gist
  • Save tjh/1711329 to your computer and use it in GitHub Desktop.
Save tjh/1711329 to your computer and use it in GitHub Desktop.
Convert all Rails table column collation and character set
#!/usr/bin/env ruby
# Put this file in the root of your Rails project,
# then run it to output the SQL needed to change all
# your tables and columns to the same character set
# and collation.
#
# > ruby character_set_and_collation.rb
DATABASE = ''
CHARACTER_SET = 'utf8'
COLLATION = 'utf8_general_ci'
schema = File.open('db/schema.rb', 'r').read
rows = schema.split("\n")
table_name = nil
rows.each do |row|
if row =~ /create_table/
table_name = row.match(/create_table "(.+)"/)[1]
puts "ALTER TABLE `#{DATABASE}`.`#{table_name}` DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION};"
elsif row =~ /t\.string/
field_name = row.match(/"(.+)"/)[1]
puts "ALTER TABLE `#{DATABASE}`.`#{table_name}` CHANGE COLUMN `#{field_name}` `#{field_name}` CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION};"
elsif row =~ /t\.text/
field_name = row.match(/"(.+)"/)[1]
puts "ALTER TABLE `#{DATABASE}`.`#{table_name}` CHANGE COLUMN `#{field_name}` `#{field_name}` CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION};"
end
end
@pkarjala
Copy link

pkarjala commented Aug 5, 2021

Just adding on, the default COLLATE for MySQL 8.0 is now utf8mb4_0900_ai_c. Scripts may need to be updated to support this.

See http://mysqlserverteam.com/new-collations-in-mysql-8-0-0/ for more information.

@Whitespace
Copy link

Unsure why CONVERT TO CHARACTER SET isn't used? It converts all the TEXT and VARCHAR to the proper data type, including converting them to a different type if the bytes change (for example VARCHAR => MEDIUMTEXT if the byte limit for VARCHAR would be hit). Seems reasonable vs reducing 255 => 191 characters.

Additionally there are blocking writes for every ALTER TABLE in the db.columns(table) block. You can use multiple MODIFY statements in a single ALTER TABLE.

class ChangeDefaultCharsetAndCollation < ActiveRecord::Migration[5.2]
  def up
    db = ActiveRecord::Base.connection

    # set defaults for new tables
    execute "ALTER DATABASE `#{db.current_database}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"

    # migrate all tables to Barracuda InnoDB and utf8mb4
    db.tables.each do |table|
      execute "ALTER TABLE `#{table}` ROW_FORMAT=COMPRESSED, ALGORITHM=INPLACE, LOCK=NONE;"
      execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
    end
  end

  def down
    db = ActiveRecord::Base.connection

    # restore utf8mb3 defaults for new tables
    execute "ALTER DATABASE `#{db.current_database}` CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci;"

    # restore all tables to Antelope InnoDB and utf8mb3
    db.tables.each do |table|
      execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci;"
      execute "ALTER TABLE `#{table}` ROW_FORMAT=COMPACT, ALGORITHM=INPLACE, LOCK=NONE;"
    end
  end
end

@jerry2013
Copy link

CONVERT TO CHARACTER SET is great but MySQL warns against using it if the existing columns are latin1.

From https://dev.mysql.com/doc/refman/8.0/en/alter-table.html

The CONVERT TO operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8mb4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment