Skip to content

Instantly share code, notes, and snippets.

@rishabhp
Last active Apr 3, 2018
Embed
What would you like to do?
MySQL online schema migration benchmarks
mysql> select count(*) from users;
+----------+
| count(*) |
+----------+
| 25305798 |
+----------+
1 row in set (10.24 sec)

# Inplace, rebuilds table, allows DML
mysql -u root experiments --execute="alter table users add column ptfun varchar(50) default 'pt fun'"
106s (1min 46s)

# Starts with 1k chunk for first select query followed by 30k and then straight away to 51-56k
pt-online-schema-change --alter "add column ptfun varchar(50) default 'pt fun'" D=experiments,t=users,u=root --execute --alter-foreign-keys-method rebuild_constraints
297s (4min 57s)

# made changes to the online schema code to work on host, instead of socket
# With 9k chunks:
osc_cli copy --ddl-file=alter.sql --database=experiments --socket=/tmp/mysql.socket --mysql-user=root --mysql-password= --outfile-dir=/Users/rishabhpugalia/www/dev/percona/outfile
689s (10min 29s)  

# With 60k chunks:
osc_cli copy --ddl-file=alter.sql --database=experiments --socket=/tmp/mysql.socket --mysql-user=root --mysql-password= --outfile-dir=/Users/rishabhpugalia/www/dev/percona/outfile --chunk-size=15000000
559s (9m 19s)

2m vs 5m vs 10.5m || 9.2m

For FB, had to make these changes:

$ cat /etc/my.cnf
[mysqld]

binlog-format = 'STATEMENT'
secure-file-priv = ''

In db.py, made this change in connection_config variable:

'host':'localhost',
# 'unix_socket': socket,

Differences Noticed

DB selects data into OUTFILE and then LOADs data INFILE in the defined chunks. Finally it does checksumming based on queries like these:

2018-04-03T04:09:23.520806Z       77 Query    /* /usr/local/bin/osc_cli:db.pyc */ SELECT  count(*) as cnt, bit_xor(crc32(@range_end_0:=`id`)), bit_xor(crc32(`name`)), bit_xor(crc32(`email`)), bit_xor(crc32(`active`)), bit_xor(crc32(`active_woi`)), bit_xor(crc32(`bambam5`)), bit_xor(crc32(`bambam6`)), bit_xor(crc32(`booboo5`)) FROM (  SELECT * FROM `__osc_new_users` FORCE INDEX (`PRIMARY`)  WHERE ( `id` > @range_start_0 )  ORDER BY `id` LIMIT 64935 ) as tmp

PT does simple insert into select and doesn't execute any specific query for checksumming. Might be happening inside their code but nothing mentioned in the doc, command's output or general log.

18-04-03T04:19:18.231881Z       81 Query    EXPLAIN SELECT `id`, `name`, `email`, `active`, `active_woi`, `bambam5`, `bambam6`, `booboo5` FROM `experiments`.`users` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '29798504')) AND ((`id` <= '29854092')) LOCK IN SHARE MODE /*explain pt-online-schema-change 30864 copy nibble*/
2018-04-03T04:19:18.232317Z       81 Query    INSERT LOW_PRIORITY IGNORE INTO `experiments`.`_users_new` (`id`, `name`, `email`, `active`, `active_woi`, `bambam5`, `bambam6`, `booboo5`) SELECT `id`, `name`, `email`, `active`, `active_woi`, `bambam5`, `bambam6`, `booboo5` FROM `experiments`.`users` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '29798504')) AND ((`id` <= '29854092')) LOCK IN SHARE MODE /*pt-online-schema-change 30864 copy nibble*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment