Skip to content

Instantly share code, notes, and snippets.

@chasestubblefield
Created April 8, 2012 20:21
Show Gist options
  • Save chasestubblefield/2339681 to your computer and use it in GitHub Desktop.
Save chasestubblefield/2339681 to your computer and use it in GitHub Desktop.
db-charmer

db-charmer

The Problem

If you're unfamiliar with the issue, there is an intermittent failure when running Capybara tests on any machine (spec/requests/ or features/):

undefined method `db_charmer_connection_name' for #<ActiveRecord::ConnectionAdapters::Mysql2Adapter:0x00000009f16ec8>
/var/lib/jenkins/.rvm/gems/ruby-1.9.3-p0/gems/db-charmer-1.7.0/lib/db_charmer/active_record/class_attributes.rb:85:in `db_charmer_remapped_connection'
/var/lib/jenkins/.rvm/gems/ruby-1.9.3-p0/gems/db-charmer-1.7.0/lib/db_charmer/active_record/connection_switching.rb:34:in `connection_with_magic'

The failure is particularly annoying because if it occurs, it will cause all tests in the same process to fail with the same error.

Again, if you're unfamiliar, db-charmer is a gem that allows us to use different databases (ie. slaves) for certain models or certain queries. For example, when we say User.on_db(:slave0).count, the query will be executed on the first slave database, instead of the normal connection (User.connection).

Charles had espoused recently that it might be a multithreading issue with autoload. While his instincts about thread safety were spot on, (since Capybara runs in a separate thread from rspec/cucumber), the loading of the gem was not at fault. Simply put, a race condition was causing the db-charmer class variables to be in an unexpected state.

The Explanation

db-charmer works by hijacking the ActiveRecord::Base.connection method with good ol' alias_method_chain:

class << self
  def connection_with_magic
    connection_proxy || connection_without_magic
  end
  alias_method_chain :connection, :magic
end

Initially, all classes have a connection_proxy of nil. Connection switching is achieved in the User.on_db(new_conn, &block) method, which swaps in a connection_proxy before running the block, and replaces the old one before returning. A simplification of this is as follows:

def self.on_db(new_conn, &block)

  self.connection_level += 1
  old_proxy = self.connection_proxy
  self.connection_proxy = new_conn

  yield(&block)

  self.connection_proxy = old_proxy
  self.connection_level -= 1

end

If you're not already smelling a concurrency problem, you should be. If one thread is interrupted in the middle of an on_db call, all other threads get new_conn when asking for connection. We will come back to this issue. The real problem occurs when two threads interleave the above code. (Which explains why this is so intermittent; the timing has to be very exact.) Here's an example:

# Both threads are running User.on_db(:slave0).first, which gets translated to
# User.on_db(slave_connection) { User.first }

# Thread 1                                        # Thread 2

self.connection_level += 1                # 1
old_proxy = self.connection_proxy         # nil
self.connection_proxy = slave_connection

yield(&block)                                     self.connection_level += 1                # 2
                                                  old_proxy = self.connection_proxy         # slave_connection
                                                  self.connection_proxy = slave_connection

self.connection_proxy = old_proxy        # nil    yield(&block)
self.connection_level -= 1               # 1

                                                  self.connection_proxy = old_proxy         # slave_connection
                                                  self.connection_level -= 1                # 0

# BAD!!
# Now, in all threads, connection_level == 0 but connection_proxy == slave_connection

The state that I just described, where connection_level is zero but connection_proxy is a non-nil connection object, is simply unexpected by db-charmer. In this state, any calls to User.connection will throw NameError: undefined method 'db_charmer_connection_name'. (This isn't apparent by my above simplification of self.connection_with_magic, you can check the code for yourself to see how this state throws an error.) That's why if the error pops up during tests, it will cause all remaining tests to fail, since User.connection is simply broken.

The Solution

The fix I propose is to synchronize thread access to db_charmer_connection_proxy. We can't just slap a mutex around on_db since it could be recursive. (on_db calls can be nested.) My solution is to have threads acquire a lock on the outermost invocation of on_db, and not release that lock until they are out of that outermost invocation. That way, no two threads can be inside of on_db at the same time. Here's the updated method with synchronization code:

def self.on_db(new_conn, &block)

  # Acquire the lock if starting a new chain of on_db calls
  if Thread.current[:in_on_db_call]
    in_outer_call = false
  else
    connection_proxy_mutex.lock
    Thread.current[:in_on_db_call] = true
    in_outer_call = true
  end

  self.connection_level += 1
  old_proxy = self.connection_proxy
  self.connection_proxy = new_conn

  yield(&block)

  self.connection_proxy = old_proxy
  self.connection_level -= 1

  # Release the lock if leaving the outer call
  if in_outer_call
    connection_proxy_mutex.unlock
    Thread.current[:in_on_db_call] = false
  end

end

In addition to preventing two threads from being in on_db calls at the same time, we also don't want a thread to execute connection_with_magic if another thread is in on_db. So we acquire the lock for all threads not inside on_db:

def connection_with_magic
  block = lambda { connection_proxy || connection_without_magic }
  if Thread.current[:in_on_db_call]
    block.call
  else
    connection_proxy_mutex.synchronize &block
  end
end

The Conclusion

The fact is that db-charmer should never have been used in a multithreaded environment. This band-aid will hopefully help. I'm not guaranteeing that this will fix all thread issues, since db-charmer was simply not written with threads in mind.

This fix does indeed seem to prevent the pesky error. The build is currently green on CI, and I am doing repeated builds to be confident. In order to debug this issue, I had a local request spec with which I was able to pretty consistently recreate the error. On local runs of the spec using my fork, I haven't gotten any reds. It is also green against the db-charmer test suite.

The commit for my fix can be found here: https://github.com/change/db-charmer/commit/8f6b7cd38a70c47882a6c7417b35b3ec5b3afbeb

Thanks for joining me on this quest, onto more intermittent failure squashing!

@MrJoy
Copy link

MrJoy commented Oct 19, 2012

Why not just move connection_level and connection_proxy into thread-local storage, possibly keyed to the object ID in case of multiple connections being used in one thread?

You avoid bottlenecking your code to only allowing one thread to talk to a database at a time that way, no?

@chasestubblefield
Copy link
Author

Your concern is valid, I don't expect anyone to use this aggressive locking in a threaded production environment. My fix was only used in our testing environment, and we use forking instead of threading in production (which is why we never saw the error there). Our testing environment was the only one that used threads because thats how Capybara runs alongside your server. We're looking into using forking or completely separate processes with Capybara to forget these thread issues entirely.

@kovyrin
Copy link

kovyrin commented Mar 3, 2013

Hey guys,

So, after looking through the code again and again I think I've found most of the critical stuff that should have been changed to make db-charmer thread-safe. If you could test this branch https://github.com/kovyrin/db-charmer/tree/multithreading on your test suites, etc and let me know if it solves the problems for you, then I could release a 1.9.0 pre-release for other people to try. Thanks for your patience and all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment