Skip to content

Instantly share code, notes, and snippets.

@mattdenner
Created March 22, 2011 13:49
Show Gist options
  • Save mattdenner/881227 to your computer and use it in GitHub Desktop.
Save mattdenner/881227 to your computer and use it in GitHub Desktop.
Be careful when putting :group on an association used as in a has_many :through, but be aware that it can sometimes give a considerable performance improvement!

I've been trying to track down a performance problem (one of many) within our application that sucks 200 seconds (or more) when doing something really simple. The design of the model is this:

class Study
  has_many :requests
  has_many :projects, :through => :requests, :uniq => true
end

Essentially we have a study that can request some work to be done in our laboratories and that work is done as part of a project, which is responsible for paying for it later. So a study can be related to many projects through the work it has requested.

This all works fine until you want to do eager loading of the projects association, specifically within a query for multiple studies. For example, the killer query comes from:

Study.paginate(:page => 5, :per_page => 100, :include => :projects)

It happens that on "page 5" of our studies list there are several studies with 10-of-thousand of request work, even though those requests are only done under 1-5 studies.

Using rbtrace I've tracked it down to somewhere within ActiveRecord::AssociationPreload code. I'll come back to that in a minute.

Because of the way that the relationships work I decided that it would be a good idea to reduce the set of requests first, then find the projects. In other words, I added a specific relationship through which I could find the projects:

class Study
  has_many :requests   # Because we use this relationship elsewhere
  has_many :project_requests, :class_name => 'Request', :group => 'project_id', :conditions => 'project_id IS NOT NULL'
  has_many :projects, :through => :project_requests, :uniq => true
end

Doing this dropped the association loading time down to 0.08s! The only problem: the results aren't the same. There are projects that appear in the original relationship that suddenly disappear in the new one and, worse than that, it only happens when you're loading many studies:

  • load an individual study on it's own and everything is correct
  • load many studies without eager loading and everything is still OK
  • load many studies with eager loading and everything turns to crap

It took me a while of messing around, shouting at code (in my head, work in an open plan office!), starring blindly at the code in ActiveRecord, and looking through the logs to realise my mistake. It's subtle, but it's definitely there!

The issue is with the grouping and here's why:

When doing the eager loading for one study the logic makes sense: we group the requests by their project_id and that gives us only the unique projects.

But the eager loading code has no real way to determine that this isn't what should be happening during its work. Instead, it blindly passes the grouping through so you get all requests, across all studies you are loading, grouped by project_id. In other words, you get all of the unique projects across all of the studies you are loading!

The fix looks odd, feels well weird, but makes perfect sense in both cases:

class Study
  has_many :requests   # Because we use this relationship elsewhere
  has_many :project_requests, :class_name => 'Request', :group => 'study_id, project_id', :conditions => 'project_id IS NOT NULL'
  has_many :projects, :through => :project_requests, :uniq => true
end

Here we now group by both study_id and project_id, giving us the correct level of uniqueness across all studies. It's loading in 0.10s but that's a considerable drop from the original 200s!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment