Skip to content

Instantly share code, notes, and snippets.

@jcoglan
Last active August 29, 2015 14:01
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jcoglan/4ccf434cd2e1305d1c5c to your computer and use it in GitHub Desktop.
Save jcoglan/4ccf434cd2e1305d1c5c to your computer and use it in GitHub Desktop.

An ActiveRecord conundrum

I'm going to describe a weird problem I faced when using ActiveRecord today. To protect the innocent, I'm not going to talk about the app I'm actually working on but will instead discuss a hypothetical but isomorphic database design for a clone of the popular blogging platform Tumblr.

Tumblr lets you publish various different sorts of content. We might be tempted to shove all these types in a big STI table, but the types are all quite different from one another and so we give them their own tables.

class CreateContentTables < ActiveRecord::Migration
  def change
    create_table :texts do |t|
      t.string :title
      t.text :body
    end

    create_table :quotes do |t|
      t.text :body
      t.string :source
      t.string :url
    end

    create_table :links do |t|
      t.string :title
      t.string :url
    end
  end
end

These pieces of content each belong to a blog, which belongs to a user. However you'll notice the above tables have no foreign key for the blog. Instead, we'll use a join table called posts to link content items to blogs. There are various reasons for this in my actual app; in Tumblr it gives us cheap reblogging since we don't need to copy a content record, just create a new posts record. posts has a compound foreign key referring to content items.

class CreateBlogsAndUsers < ActiveRecord::Migration
  def change
    create_table :users do |t|
      t.string :username
    end

    create_table :blogs do |t|
      t.belongs_to :user
      t.string :title
    end

    create_table :posts do |t|
      t.timestamps
      t.belongs_to :blog
      t.belongs_to :content, polymorphic: true
    end
  end
end

Finally, each content item can have comments, which belong to a user and just like posts they have a compound foreign key for the content they relate to.

class CreateComments < ActiveRecord::Migration
  def change
    create_table :comments do |t|
      t.timestamps
      t.belongs_to :user
      t.belongs_to :content, polymorphic: true
      t.text :body
      t.boolean :accepted
    end
  end
end

We migrate the database, and Rails spits out the following SQL schema:

mysql> describe users;
+----------+--------------+------+-----+---------+----------------+
| Field    | Type         | Null | Key | Default | Extra          |
+----------+--------------+------+-----+---------+----------------+
| id       | int(11)      | NO   | PRI | NULL    | auto_increment |
| username | varchar(255) | YES  |     | NULL    |                |
+----------+--------------+------+-----+---------+----------------+

mysql> describe blogs;
+---------+--------------+------+-----+---------+----------------+
| Field   | Type         | Null | Key | Default | Extra          |
+---------+--------------+------+-----+---------+----------------+
| id      | int(11)      | NO   | PRI | NULL    | auto_increment |
| user_id | int(11)      | YES  |     | NULL    |                |
| title   | varchar(255) | YES  |     | NULL    |                |
+---------+--------------+------+-----+---------+----------------+

mysql> describe posts;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| created_at   | datetime     | YES  |     | NULL    |                |
| updated_at   | datetime     | YES  |     | NULL    |                |
| blog_id      | int(11)      | YES  |     | NULL    |                |
| content_id   | int(11)      | YES  |     | NULL    |                |
| content_type | varchar(255) | YES  |     | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

mysql> describe texts;
+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | int(11)      | NO   | PRI | NULL    | auto_increment |
| title | varchar(255) | YES  |     | NULL    |                |
| body  | text         | YES  |     | NULL    |                |
+-------+--------------+------+-----+---------+----------------+

mysql> describe quotes;
+--------+--------------+------+-----+---------+----------------+
| Field  | Type         | Null | Key | Default | Extra          |
+--------+--------------+------+-----+---------+----------------+
| id     | int(11)      | NO   | PRI | NULL    | auto_increment |
| body   | text         | YES  |     | NULL    |                |
| source | varchar(255) | YES  |     | NULL    |                |
| url    | varchar(255) | YES  |     | NULL    |                |
+--------+--------------+------+-----+---------+----------------+

mysql> describe links;
+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | int(11)      | NO   | PRI | NULL    | auto_increment |
| title | varchar(255) | YES  |     | NULL    |                |
| url   | varchar(255) | YES  |     | NULL    |                |
+-------+--------------+------+-----+---------+----------------+

mysql> describe comments;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| created_at   | datetime     | YES  |     | NULL    |                |
| updated_at   | datetime     | YES  |     | NULL    |                |
| user_id      | int(11)      | YES  |     | NULL    |                |
| content_id   | int(11)      | YES  |     | NULL    |                |
| content_type | varchar(255) | YES  |     | NULL    |                |
| body         | text         | YES  |     | NULL    |                |
| accepted     | tinyint(1)   | YES  |     | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

Like a good little Rails developer we go and create model classes for all these tables:

class User < ActiveRecord::Base
  has_many :blogs
  has_many :comments
end

class Blog < ActiveRecord::Base
  belongs_to :user
  has_many :posts
end

class Post < ActiveRecord::Base
  belongs_to :blog
  belongs_to :content, polymorphic: true
end

class Text < ActiveRecord::Base
  has_many :posts, as: :content
  has_many :comments, as: :content
end

class Quote < ActiveRecord::Base
  has_many :posts, as: :content
  has_many :comments, as: :content
end

class Link < ActiveRecord::Base
  has_many :posts, as: :content
  has_many :comments, as: :content
end

class Comment < ActiveRecord::Base
  belongs_to :user
  belongs_to :content, polymorphic: true
end

Now, let's create a couple of users:

alice = User.create(username: 'alice')
bob   = User.create(username: 'bob')

Alice has a blog where she writes about MySQL optimisation. Here's a few posts she's added recently:

alice = User.where(username: 'alice').first
blog  = alice.blogs.create(title: 'Not even once')

text_entry = Text.create(
  title: 'How to bulk update a MySQL table',
  body:  'First, delete all your indexes.'
)

quote_entry = Quote.create(
  body:   'There are two hard things in computer science: MySQL consistency.',
  source: 'CSS Perverts',
  url:    'https://medium.com/could-you-not'
)

link_entry = Link.create(
  title: 'Web development that does not always hurt',
  url:   'http://rubyonrails.org/'
)

[text_entry, quote_entry, link_entry].each do |entry|
  Post.create(blog: blog, content: entry)
end

One day, along comes Bob and leaves some helpful comments on Alice's blog:

bob = User.where(username: 'bob').first

text_entry  = Text.first
quote_entry = Quote.first

Comment.create(
  user:     bob,
  content:  text_entry,
  body:     'This sounds like a lot of work.'
)

Comment.create(
  user:     bob,
  content:  text_entry,
  body:     '4 hours later and I am still at my computer. Help!'
)

Comment.create(
  user:     bob,
  content:  quote_entry,
  body:     'Never were truer words spoken. Such thought leadership.'
)

So, here's the problem. When Alice logs into Tumblr, she wants to see how many new comments there are, because she needs to manually moderate and accept each one. So, we want the count of all comments that relate to content that's posted on one of Alice's blogs. We begin by adding a scope to Comment so we can count unmoderated comments:

class Comment < ActiveRecord::Base
  belongs_to :user
  belongs_to :content, polymorphic: true

  scope :for_moderation, -> { where(accepted: nil) }
end

A naive way to count unmoderated comments would be to iterate over Alice's blogs, and for each post in each blog, count the number of unmoderated comments each post's content has. This loads a lot of unnecessary objects and runs a lot of queries.

>> alice = User.where(username: 'alice').first
=> #<User id: 1, username: "alice">

>> alice.blogs.inject(0) { |s,b| s + b.posts.inject(0) { |s,p| s + p.content.comments.for_moderation.count } }
  Blog Load (0.4ms)  SELECT `blogs`.* FROM `blogs` WHERE `blogs`.`user_id` = 1
  Post Load (0.3ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`blog_id` = 1
  Text Load (0.4ms)  SELECT `texts`.* FROM `texts` WHERE `texts`.`id` = 1 ORDER BY `texts`.`id` ASC LIMIT 1
   (0.3ms)  SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Text' AND `comments`.`accepted` IS NULL
  Quote Load (0.1ms)  SELECT `quotes`.* FROM `quotes` WHERE `quotes`.`id` = 1 ORDER BY `quotes`.`id` ASC LIMIT 1
   (0.6ms)  SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Quote' AND `comments`.`accepted` IS NULL
  Link Load (0.2ms)  SELECT `links`.* FROM `links` WHERE `links`.`id` = 1 ORDER BY `links`.`id` ASC LIMIT 1
   (0.6ms)  SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Link' AND `comments`.`accepted` IS NULL
=> 3

We'd like to do this in one query. Initially this seems hard because we can't join across the polymorphic set of content tables to walk from posts to comments. But, we don't actually need to do that. We can find all the comments for a Post by finding the comments with the same content_{id,type} as the Post. In SQL, we can write:

mysql> SELECT COUNT(*)
    -> FROM comments AS c
    -> INNER JOIN posts AS p ON p.content_id = c.content_id AND p.content_type = c.content_type
    -> INNER JOIN blogs AS b on b.id = p.blog_id
    -> WHERE b.user_id = 1 AND
    ->       c.accepted IS NULL;

+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+

Same answer, one query, and we didn't touch the content tables at all. So, if we can tell Rails how to relate comments directly to posts, then we can use relations and scopes to efficiently compose the above query. (We will actually end up with a slightly different, but equivalent query.)

We ultimately want to be able to chain some scopes on Comment and count the result:

Comment.for_blog_author(alice).for_moderation.count

Let's start by creating the for_blog_author scope. We ultimately want to join to the blogs table and filter by blogs.author_id. In Rails we can express that like this:

class Comment < ActiveRecord::Base
  belongs_to :user
  belongs_to :content, polymorphic: true

  scope :for_moderation, -> { where(accepted: nil) }

  scope :for_blog_author, -> (author) {
    joins(:blog).where(blogs: {user_id: author.id})
  }
end

This new scope demands that comments have an associated blog. We know logically that a Comment belongs to a Post, which in turn belongs to a Blog. We can use has_one :blog, through: :post for the latter, but the former is more complicated. Recall the structure of the comments and posts tables:

mysql> describe posts;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| created_at   | datetime     | YES  |     | NULL    |                |
| updated_at   | datetime     | YES  |     | NULL    |                |
| blog_id      | int(11)      | YES  |     | NULL    |                |
| content_id   | int(11)      | YES  |     | NULL    |                |
| content_type | varchar(255) | YES  |     | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

mysql> describe comments;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| created_at   | datetime     | YES  |     | NULL    |                |
| updated_at   | datetime     | YES  |     | NULL    |                |
| user_id      | int(11)      | YES  |     | NULL    |                |
| content_id   | int(11)      | YES  |     | NULL    |                |
| content_type | varchar(255) | YES  |     | NULL    |                |
| body         | text         | YES  |     | NULL    |                |
| accepted     | tinyint(1)   | YES  |     | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

So rather than a condition like posts.id = comments.post_id to link posts and comments, we have posts.content_id = comments.content_id AND posts.content_type = comments.content_type. My first (wrong) guess at how to express that relation looked like this:

class Comment < ActiveRecord::Base
  belongs_to :user
  belongs_to :content, polymorphic: true

  belongs_to :post, -> (comment) {
    where(content_id: comment.content_id, content_type: comment.content_type)
  }
  has_one :blog, through: :post

  scope :for_moderation, -> { where(accepted: nil) }

  scope :for_blog_author, -> (author) {
    joins(:blog).where(blogs: {user_id: author.id})
  }
end

This implementation leaves Rails still looking at posts.id, since we've not told it what the 'primary key' for this relation is.

>> comment = Comment.first
  Comment Load (0.4ms)  SELECT `comments`.* FROM `comments` ORDER BY `comments`.`id` ASC LIMIT 1
=> #<Comment id: 1, created_at: "2014-05-15 21:43:08", updated_at: "2014-05-15 21:43:08", user_id: 2, content_id: 1, content_type: "Text", body: "This sounds like a lot of work.", accepted: nil>

>> comment.blog
  Blog Load (0.4ms)  SELECT `blogs`.* FROM `blogs`
                     INNER JOIN `posts` ON `blogs`.`id` = `posts`.`blog_id`
                     WHERE `posts`.`id` = NULL AND
                           `posts`.`content_id` = 1 AND
                           `posts`.`content_type` = 'Text'
                     ORDER BY `blogs`.`id` ASC
                     LIMIT 1
=> nil

We need to make Rails think the 'primary key' to use on the posts table is content_id and content_type. Note: it's not the primary key of the posts table in general, it's only the 'primary key' in the terminology Rails has to express this particular join operation. This implementation gets us the query we want:

class Comment < ActiveRecord::Base
  belongs_to :user
  belongs_to :content, polymorphic: true

  belongs_to :post, -> (comment) { where(content_type: comment.content_type) },
              foreign_key: :content_id,
              primary_key: :content_id

  has_one :blog, through: :post

  scope :for_moderation, -> { where(accepted: nil) }

  scope :for_blog_author, -> (author) {
    joins(:blog).where(blogs: {user_id: author.id})
  }
end

Here it is in action:

>> comment = Comment.first
  Comment Load (0.4ms)  SELECT `comments`.* FROM `comments` ORDER BY `comments`.`id` ASC LIMIT 1
=> #<Comment id: 1, created_at: "2014-05-15 21:43:08", updated_at: "2014-05-15 21:43:08", user_id: 2, content_id: 1, content_type: "Text", body: "This sounds like a lot of work.", accepted: nil>

>> comment.blog
  Blog Load (0.9ms)  SELECT `blogs`.* FROM `blogs`
                     INNER JOIN `posts` ON `blogs`.`id` = `posts`.`blog_id`
                     WHERE `posts`.`content_id` = 1 AND
                           `posts`.`content_type` = 'Text'
                     ORDER BY `blogs`.`id` ASC
                     LIMIT 1
=> #<Blog id: 1, user_id: 1, title: "Not even once">

However, this snippet still doesn't work:

>> Comment.for_blog_author(alice).for_moderation.count
app/models/comment.rb:5:in `block in <class:Comment>': undefined method `content_type' for #<ActiveRecord::Associations::JoinDependency::JoinAssociation:0x007f0ab5085f40> (NoMethodError)

This is because, when you access the post association within scope/join logic, the argument to the belongs_to :post lambda is no longer a Comment. To make the lambda work in both cases, I did this:

class Comment < ActiveRecord::Base
  belongs_to :user
  belongs_to :content, polymorphic: true

  belongs_to :post, -> (object) {
    if object.is_a? Comment
      where(content_type: object.content_type)
    else
      where('posts.content_type = comments.content_type')
    end
  }, foreign_key: :content_id, primary_key: :content_id

  has_one :blog, through: :post

  scope :for_moderation, -> { where(accepted: nil) }

  scope :for_blog_author, -> (author) {
    joins(:blog).where(blogs: {user_id: author.id})
  }
end

This ends up running the query I want to count the unmoderated comments on Alice's blogs:

>> Comment.for_blog_author(alice).for_moderation.count

   (0.2ms)  SELECT COUNT(*)
            FROM `comments`
            INNER JOIN `posts` ON `posts`.`content_id` = `comments`.`content_id` AND
                                  (posts.content_type = comments.content_type)
            INNER JOIN `blogs` ON `blogs`.`id` = `posts`.`blog_id`
            WHERE `blogs`.`user_id` = 1 AND
                  `comments`.`accepted` IS NULL
=> 3

Now, although it gets the right answer, and does so efficiently, that belongs_to code is incredibly ugly. Surely there is a better way to express this?

@urbanautomaton
Copy link

I think there's a bug here, too (I might be over-interpreting the toy example, though; ignore me if so). Edit: Murray pipped me to it. :-)

Because Content has_many :posts, but on Comment you're saying it belongs_to :post, the #post and #blog associations are likely to return weird (and inconsistent) results as soon as reblogging is allowed. The #post association will most likely return the first Post record relating to the content, i.e. the original author's post. Whereas #blog looks like this:

>> comment.blog
  Blog Load (0.9ms)  SELECT `blogs`.* FROM `blogs`
                     INNER JOIN `posts` ON `blogs`.`id` = `posts`.`blog_id`
                     WHERE `posts`.`content_id` = 1 AND
                           `posts`.`content_type` = 'Text'
                     ORDER BY `blogs`.`id` ASC
                     LIMIT 1

Instead of the blog of the original poster, this would return the blog with the lowest id of anyone who has blogged or reblogged the content. And then in turn this:

>> Comment.for_blog_author(alice).for_moderation.count
   (0.2ms)  SELECT COUNT(*)
            FROM `comments`
            INNER JOIN `posts` ON `posts`.`content_id` = `comments`.`content_id` AND
                                  (posts.content_type = comments.content_type)
            INNER JOIN `blogs` ON `blogs`.`id` = `posts`.`blog_id`
            WHERE `blogs`.`user_id` = 1 AND
                  `comments`.`accepted` IS NULL

Will mean that comments on a given piece of content will turn up in the moderation queue for everyone who has blogged or reblogged that piece of content.

@jcoglan
Copy link
Author

jcoglan commented May 16, 2014

Eh, I lied a little bit. In the actual system I'm working on, there's a one-to-one relationship between posts and content items, so e.g. Text.has_one(:post). But if we decide to break that constraint there are other ways of checking which post a comment relates to. I can't think of an analogy in Tumblr but in my app the models analogous to Post and Comment both have relations back to a shared parent that you can use to constrain the query.

@pixeltrix
Copy link

I'm guessing this probably isn't going to be the case for your real world needs but in this example I feel that the Comment model should belong to the Post rather than the content models otherwise all of the authors that reblog the content can approve any content. This may be the comment model you want so feel free to ignore the example I've written up here: https://gist.github.com/pixeltrix/e485b81fb8a7398ac3f4

I can see were the problem is in Active Record and the problem is that it sees the polymorphic has many and raises because obviously that's not possible but as soon as you make it a though association then it becomes possible again but it would need to be different to existing through associations since we don't remove the intermediate table.

I feel as though we may be missing a modelling option but it's hard to tell without a real-world example and there's always the possibility that the use case is pretty narrow and it would add complexity to the Active Record code (like it isn't complicated enough!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment