-
-
Save tjh/1711329 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby | |
# Put this file in the root of your Rails project, | |
# then run it to output the SQL needed to change all | |
# your tables and columns to the same character set | |
# and collation. | |
# | |
# > ruby character_set_and_collation.rb | |
DATABASE = '' | |
CHARACTER_SET = 'utf8' | |
COLLATION = 'utf8_general_ci' | |
schema = File.open('db/schema.rb', 'r').read | |
rows = schema.split("\n") | |
table_name = nil | |
rows.each do |row| | |
if row =~ /create_table/ | |
table_name = row.match(/create_table "(.+)"/)[1] | |
puts "ALTER TABLE `#{DATABASE}`.`#{table_name}` DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION};" | |
elsif row =~ /t\.string/ | |
field_name = row.match(/"(.+)"/)[1] | |
puts "ALTER TABLE `#{DATABASE}`.`#{table_name}` CHANGE COLUMN `#{field_name}` `#{field_name}` CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION};" | |
elsif row =~ /t\.text/ | |
field_name = row.match(/"(.+)"/)[1] | |
puts "ALTER TABLE `#{DATABASE}`.`#{table_name}` CHANGE COLUMN `#{field_name}` `#{field_name}` CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION};" | |
end | |
end |
Thanks a lot for this, saved me some coding :)
If you are on MySQL 5.7, you probably don't need to shorten columns to 191 chars. Maximum index size is not limit of InnoDB, but row_format
used on specific table.
SELECT @@innodb_default_row_format;
This should return dynamic
. dynamic
and compressed
support indexes up to 3072 bytes, if innodb_large_prefix
is on (1). That is default in 5.7.
SELECT @@innodb_large_prefix;
You can see row_format
for every table with:
SHOW TABLE STATUS FROM <your-database-name>;
If you still have some tables with compact
row format, you can change it with:
ALTER TABLE <table-name> ROW_FORMAT=DYNAMIC;
@Simplify is right about column limits. I was worried about data truncation until I read this: https://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-dynamic.html that explains that columns can store more than their limit when the row format is dynamic.
My final migration file was:
def change
db = ActiveRecord::Base.connection
execute "ALTER DATABASE `#{db.current_database}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
db.tables.each do |table|
execute "ALTER TABLE `#{table}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
db.columns(table).each do |column|
case column.sql_type
when /([a-z]*)text/i
default = (column.default.nil?) ? '' : "DEFAULT '#{column.default}'"
null = (column.null) ? '' : 'NOT NULL'
execute "ALTER TABLE `#{table}` MODIFY `#{column.name}` #{column.sql_type.upcase} CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci #{default} #{null};"
when /varchar\(([0-9]+)\)/i
sql_type = column.sql_type.upcase
default = (column.default.nil?) ? '' : "DEFAULT '#{column.default}'"
null = (column.null) ? '' : 'NOT NULL'
execute "ALTER TABLE `#{table}` MODIFY `#{column.name}` #{sql_type} CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci #{default} #{null};"
end
end
end
end
That was run on a database with dynamic row formats and on MySQL 5.7
.
Oh! and a trap for new players. Some of my tables were still MyISAM for some reason! This blew up the migrations around key length of course. Converting them to InnoDB did the trick.
Just adding on, the default COLLATE for MySQL 8.0 is now utf8mb4_0900_ai_c
. Scripts may need to be updated to support this.
See http://mysqlserverteam.com/new-collations-in-mysql-8-0-0/ for more information.
Unsure why CONVERT TO CHARACTER SET
isn't used? It converts all the TEXT
and VARCHAR
to the proper data type, including converting them to a different type if the bytes change (for example VARCHAR => MEDIUMTEXT
if the byte limit for VARCHAR
would be hit). Seems reasonable vs reducing 255 => 191 characters.
Additionally there are blocking writes for every ALTER TABLE
in the db.columns(table)
block. You can use multiple MODIFY
statements in a single ALTER TABLE
.
class ChangeDefaultCharsetAndCollation < ActiveRecord::Migration[5.2]
def up
db = ActiveRecord::Base.connection
# set defaults for new tables
execute "ALTER DATABASE `#{db.current_database}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
# migrate all tables to Barracuda InnoDB and utf8mb4
db.tables.each do |table|
execute "ALTER TABLE `#{table}` ROW_FORMAT=COMPRESSED, ALGORITHM=INPLACE, LOCK=NONE;"
execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
end
end
def down
db = ActiveRecord::Base.connection
# restore utf8mb3 defaults for new tables
execute "ALTER DATABASE `#{db.current_database}` CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci;"
# restore all tables to Antelope InnoDB and utf8mb3
db.tables.each do |table|
execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci;"
execute "ALTER TABLE `#{table}` ROW_FORMAT=COMPACT, ALGORITHM=INPLACE, LOCK=NONE;"
end
end
end
CONVERT TO CHARACTER SET
is great but MySQL warns against using it if the existing columns are latin1
.
From https://dev.mysql.com/doc/refman/8.0/en/alter-table.html
The CONVERT TO operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8mb4).
Unfortunately, the loop over
db.tables
includes a mysql view of ours, which doesn't allow changing its character set:Apparently the
.tables
method will change in Rails 5.1 to not include views:But until then, I've had to add another "get me out of here" escape hatch like @jules2689 (I also ran into index size issues with the internal Rails tables):
next if db.views.include?(table)
.Other problems:
char
columns, so I've changed it to support those in addition tovarchar
..blank?
so I've changed it to check.nil?
)