Skip to content

Instantly share code, notes, and snippets.

@ccschmitz
Created August 19, 2014 13:10
Show Gist options
  • Save ccschmitz/7f4df754cc8933a2033c to your computer and use it in GitHub Desktop.
Save ccschmitz/7f4df754cc8933a2033c to your computer and use it in GitHub Desktop.
Some tips for handling data migrations in Rails

Data Migrations in Rails Apps

If you need to manipulate existing data when your code is deployed, there are two main ways to do it:

  1. Create a rake task to migrate the data after the code is deployed. This is ideal for more complex data migrations.
  2. Use ActiveRecord models in a migration. This is acceptable for smaller data manipulations.

Regardless of the method you use, make sure to test your migrations before submitting them.

Data Migrations in Models

The problem with putting data migrations in models is that they can error out if model logic changes, which is a big pain when deploying to production. However, sometimes a rake task can be overkill for a simple manipulation. Here are some ways to minimize the risk of updating data in migrations.

Avoid ActiveRecord

SQL doesn't care about validations and all the other logic that comes with ActiveRecord models, so executing a raw query can be less error prone. However, executing raw SQL can also be dangerous.

Stub Out Models

Stubbing out a model in your migrations has two main advantages:

  1. Guards against the case where a model is removed from the codebase but is still being called in a migration.
  2. Prevents validations from being run and eliminates overhead from associations.
class AddStatusToModem < ActiveRecord::Migration
  class Modem < ActiveRecord::Base
  end

  def up
    add_column :modems, :status, :string

    Modem.reset_column_information
    Modem.find_each do |modem|
      modem.status = 'active'
      modem.save!
    end
  end

  def down
    remove_column :modems, :status
  end
end

The call to reset_column_information ensures that the Modem model is updated and has access to the new status column.

If you are going to use models in your migrations, this is how it should be done.

Data Migrations in Rake Tasks

Handling complex data migrations in a rake task is a good idea

To create a custom rake task:

rails g task data_migration set_user_status

Then populate it with your data migration:

namespace :data_migration do
  desc "Sets the default modem status"
  task set_modem_status: :environment do
    ActiveRecord::Base.record_timestamps = false

    Modem.find_each do |modem|
      begin
        modem.status = 'active'
        modem.save!
      rescue
        puts "Error updating #{modem.id}"
      end
    end

    ActiveRecord::Base.record_timestamps = true
  end
end

There are a few notable things about this task:

  1. Setting ActiveRecord::Base.record_timestamps = false prevents ActiveRecord from updating the timestamps on all the records we are touching.
  2. Wrapping the updates in a begin rescue end block gives us the opportunity to catch errors and report them so we can handle problematic records later.

Stubbing out models can also help minimize the chance of failure in rake tasks.

Testing Data Migrations

First, pull down a dump of the production database with rake repl, then run your migations and verify everything looks right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment