Skip to content

Instantly share code, notes, and snippets.

@mctaylorpants
Created January 19, 2016 19:47
Show Gist options
  • Save mctaylorpants/b5b839debbd2df9a6839 to your computer and use it in GitHub Desktop.
Save mctaylorpants/b5b839debbd2df9a6839 to your computer and use it in GitHub Desktop.
Advanced Active Record querying

We have Users who have many Posts. Posts have many Tags through an associated table we've called Taggings.

We want to get a unique list of all the tags that a given user's Posts have been associated with. So we have to find a user, get all of their post IDs, use those IDs to select the relevant taggings, and finally return a distinct list of related tags.

Handwritten SQL

SELECT DISTINCT "tags".name
FROM  taggings
JOIN tags ON tags.id = taggings.tag_id
WHERE taggings.post_id IN (
SELECT "posts".id FROM posts WHERE user_id = 6
);

I selected from taggings, but either way works.

Building up the Active Record query, from the outside in:

  • We want to ultimately select Tags, so we'll start with that.
  • We know we want to join tags by relevant taggings, so we'll add a nested where clause that filters the taggings by a given user's post ids.
  • We add distinct at the end to filter the final result.
Tag.joins(:taggings).where(taggings: { post_id: user.post_ids }).distinct

# ... generates the following query:
# SELECT DISTINCT "tags".*
# FROM "tags"
# INNER JOIN "taggings" ON "taggings"."tag_id" = "tags"."id"
# WHERE "taggings"."post_id" IN (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)

This is OK, but there are two problems. First, distinct isn't working properly; it's returning duplicate records. We'll get to that in a minute.

Second, the array given to the WHERE clause ends up being generated by a separate SQL query that finds the post IDs based on the user. That won't do. We can modify the where clause to use a nested query:

Tag.joins(:taggings).where(taggings: { post: Post.where(user: user) }).distinct

# ... generates the following query:
# SELECT DISTINCT "tags".*
# FROM "tags"
# INNER JOIN "taggings" ON "taggings"."tag_id" = "tags"."id"
# WHERE "taggings"."post_id" IN (
#   SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = 6
# )

Now we're generating a new query, Post.where(user: user), within the where clause. This causes a subquery to be generated. Perfect!

Except our distinct still isn't working. It's because we're selecting all fields from the tags, but we only want the names, and that's also our criteria for duplicates.

We can solve this by adding an explicit select to our query:

Tag.select(:name).joins(:taggings).where(taggings: { post: Post.where(user: user) }).distinct

# ... generates the following query:
# SELECT DISTINCT "tags"."name"
# FROM "tags"
# INNER JOIN "taggings" ON "taggings"."tag_id" = "tags"."id"
# WHERE "taggings"."post_id" IN (
#   SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = 6
# )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment