One of the problems of big ActiveRecord models is their low cohesion. In Rails we group behaviour around the entities of the domain we're modelling. If we use only ActiveRecord models for that, we'll probably end up with classes full of actions, in most cases, completely unrelated to each other (except for the fact that they act on the same domain entity).
To illustrate the issue let's imagine a big Post
AR model of a blog application. In this app, users can add tags to their posts. Since this is the only class with this ability, I decide to put the related logic in the Post
model itself. For this, presumably at the beginning of the file, I write a couple of has_many
entries for tags
and taggings
. Then, fifty lines below, along with the rest of scopes, I add one more to find posts by tag name. A couple of hundred lines later, I put a virtual attribute tag_list
which will be used to update the associations from a string of tag names separated by commas. Finally, at the bottom of my gigantic model, a private method will contain the nuts and bolts of the transformation of the tag_list
.
I don't know about you, but I find this often in my code. All the logic related to the same concern is scattered across a 600-line file. Every time the tagging functionality changes, I have to jump back and forth until I find the code that I need to update. This is specially inconvenient having associations, validations and callbacks which purpose, out of context, isn't always easy to understand.
The fundamental rule of thumb is to put things together that change together.
This is how Martin Fowler encourage us to write highly cohesive code. That's one of the qualities I like my code to have in order to keep it enjoyable to maintain.
In our example, this is something we can achieve by extracting the tagging logic to a concern:
module Taggable
extend ActiveSupport::Concern
included do
has_many :taggings
has_many :tags, through: :taggings
scope :tagged_with, ->(tag_name) do
joins(:tags).where(tags: { name: tag_name })
end
end
def tag_list
tags.pluck(:name).join(', ')
end
def tag_list=(tag_list)
assign_tag_list tag_list
end
private
def assign_tag_list(tag_list)
tag_names = tag_list.split(',').map(&:strip).uniq
self.tags = tag_names.map do |tag_name|
Tag.where(name: tag_name).first_or_initialize
end
end
end
Note that I chose a concern here because it seems a solution particularly appropriate, but there are other solutions that we could have used. The point is that the logic that we've extracted is all that is likely to change together and nothing else.
It's important to be aware that I'm adding a layer of indirection that I didn't have before. It's the price I decide to pay for the benefits I get in terms of cohesion.
Nevertheless, big AR models aren't the only objects with problems of cohesion, as we're about to see.
Let's say that I've recently read the excellent “7 Patterns to Refactor Fat ActiveRecord Models” in the Code Climate blog and I decide to use a Policy object to encapsulate the rule that stipulates what's considered an active user in my application:
class ActiveUserPolicy
def initialize(user)
@user = user
end
def active?
@user.email_confirmed? &&
@user.last_login_at < 14.days.ago
end
end
(This snippet is extracted from the original post. Following snippets are mine)
With this object, everywhere I need to know whether a user is active or not, all I have to do is:
ActiveUserPolicy.new(user).active?
In my applications, I often find that wherever I have a policy like this one I also need to be able to fetch the list of objects that comply with it. For that, another of the patterns mentioned in the Code Climate article will come in handy, a Query object:
class ActiveUserQuery
def initialize(relation = User.scoped)
@relation = relation
end
def find_each(&block)
@relation.
where(email_confirmed: true).
where('last_login < ?', 14.days.ago).
find_each(&block)
end
end
The merit of using this solution is that I have two objects with just one responsibility. However, as you might have noticed, not everything is good. There's a problem of low cohesion here as well.
The main reason to change of these two objects is the same: the business rule that defines what is an active user. If tomorrow, for example, I decide that users can be disabled and disabled users aren't considered active, I'll have to change both the policy and the query object. That means that I have logic changing always at the same time that lives in two different places.
My personal preference to improve this situation would probably be, again, to use a concern:
module User::Active
extend ActiveSupport::Concern
included do
scope :active, -> do
where(email_confirmed: true).
where('last_login < ?', 14.days.ago)
end
end
def active?
user.email_confirmed? &&
user.last_login_at < 14.days.ago
end
end
This way there's only one place to go to change this single aspect of my domain. Alternatively, @brynary himself suggests doing this, which also improves cohesion very nicely and is consistent with the services approach it seems I was following in this example.
It's easy to identify big ActiveRecord models in our applications. What it might be more difficult is to split them and reorganise our domain logic to make our codebase more pleasurable to maintain. There are several patterns to choose from and different considerations to bear in mind. One of the things that usually helps is trying to keep your objects and modules highly cohesive by putting together whatever is likely to change together.
Great article!
I am really, really happy with the concerns approach. Everything is pretty encapsulated, you always know exactly which file your method is going to be, etc.
This is the beginning of one of our big classes :)
As we talked at the Alfredo's rb, what I have not been able to do (and I'm not sure if it's worth it) would be testing each behaviour as a separate entity. That would assure me that there are no hidden dependencies between behaviors.
Something like this:
Instead of testing the fat-tish model.
My feelings tell me that's the "purest" thing to do, but it takes too much effort, IMO...