Skip to content

Instantly share code, notes, and snippets.

@kaeawc
Created April 26, 2013 16:48
Show Gist options
  • Save kaeawc/5468681 to your computer and use it in GitHub Desktop.
Save kaeawc/5468681 to your computer and use it in GitHub Desktop.

#Performing Overlapping Data Migration in MySQL

This example will assume a really simple schema -- users and messages.

CREATE DATABASE data_1;
CREATE DATABASE data_2;

USE data_1;

DROP TABLE IF EXISTS `users`;
CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
    username VARCHAR(255) NOT NULL
);

DROP TABLE IF EXISTS `messages`;
CREATE TABLE messages (
    id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
    from_user_id INT NOT NULL,
    to_user_id INT NOT NULL,
    message TEXT NOT NULL
);

USE data_2;

DROP TABLE IF EXISTS `users`;
CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
    username VARCHAR(255) NOT NULL
);

DROP TABLE IF EXISTS `messages`;
CREATE TABLE messages (
    id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
    from_user_id INT NOT NULL,
    to_user_id INT NOT NULL,
    message TEXT NOT NULL
);

Instead of directly inserting the old data into the new database, we should setup an identical slave replica.

Now lets assume that the original dataset has users A,B, and C.

INSERT INTO data_1.users (
    username
) VALUES
    ('test_user_A'),
    ('test_user_B'),
    ('test_user_C');

INSERT INTO data_1.messages (
    from_user_id,
    to_user_id,
    message
) VALUES
    (1,2,'From A to B: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (2,1,'From B to A: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (1,3,'From A to C: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.');

Unfortunately in the database that we're migrating to, it was brought online without the users of the previous database, therefore the primary keys won't be matching up. So we only have a subset of the old user list.

INSERT INTO data_2.users (
    username
) VALUES
    ('test_user_B'),
    ('test_user_D'),
    ('test_user_C');

INSERT INTO data_2.messages (
    from_user_id,
    to_user_id,
    message
) VALUES
    (1,2,'From B to D: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (3,2,'From C to D: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (1,2,'From B to D: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (2,3,'From D to C: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (1,2,'From B to D: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (3,2,'From C to D: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (3,2,'From C to D: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (1,2,'From B to D: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.'),
    (2,3,'From D to C: Lorem ipsum Sed eiusmod sint labore ut dolore sunt dolore voluptate ea mollit adipisicing enim nostrud adipisicing fugiat laborum.');

Now we need to insert the users from data_1 into data_2. Notice we are including the old user PK as a SK (surrogate key)

SET SQL_SAFE_UPDATES = 0;

ALTER TABLE data_2.users
    ADD sk_id INT NULL;

UPDATE data_2.users d2u
LEFT JOIN data_1.users d1u
    ON d1u.username = d2u.username
SET d2u.sk_id = d1u.id;

INSERT INTO data_2.users (
    sk_id,
    username
) SELECT
    d1u.id,
    d1u.username
FROM data_1.users d1u
WHERE NOT EXISTS (
    SELECT 1
    FROM data_2.users dmu
    WHERE d1u.username = dmu.username
);

Now we've got a mapping of the old users to new users. Which means we can insert the old messages (which we know don't overlap) in a fairly straight-foward manner.

INSERT INTO data_2.messages (
    from_user_id,
    to_user_id,
    message
) SELECT
    from_user_id,
    to_user_id,
    message
FROM data_1.messages m
INNER JOIN data_2.users fu
    ON m.from_user_id = fu.sk_id
INNER JOIN data_2.users tu
    ON m.from_user_id = tu.sk_id;

We could now spin up a production web app to verify everything looks good. This web app should not be given write access to the database until you're ready to switch over, therefore only grant SELECT to the database user used by this app.

If everything looks good at that point, we could then do the final site migration. If anything goes wrong you'll want to be able to roll back within a few seconds.

  • Put both web apps in maintenance mode (divert all users to landing page)
  • Change the load balancer to point to the new web app
  • Grant necessary write permissions to the database user given to the new web app
  • Users at this point should be hitting the new web app
  • Verify all network connections have ceased on the old web app
  • Take the new web app out of maintenance mode

Hopefully everything goes smoothly from there!

Once the new database is rocking away you can feel free to drop the SK columns. Do not drop the original database at any point in this process, since you've set up replication the drop command will also be replicated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment