We have Users who have many Posts. Posts have many Tags through an associated table we've called Taggings.
We want to get a unique list of all the tags that a given user's Posts have been associated with. So we have to find a user, get all of their post IDs, use those IDs to select the relevant taggings, and finally return a distinct list of related tags.
SELECT DISTINCT "tags".name
FROM taggings
JOIN tags ON tags.id = taggings.tag_id
WHERE taggings.post_id IN (
SELECT "posts".id FROM posts WHERE user_id = 6
);
I selected from taggings, but either way works.
Building up the Active Record query, from the outside in:
- We want to ultimately select Tags, so we'll start with that.
- We know we want to join tags by relevant taggings, so we'll add a nested
where
clause that filters the taggings by a given user's post ids. - We add
distinct
at the end to filter the final result.
Tag.joins(:taggings).where(taggings: { post_id: user.post_ids }).distinct
# ... generates the following query:
# SELECT DISTINCT "tags".*
# FROM "tags"
# INNER JOIN "taggings" ON "taggings"."tag_id" = "tags"."id"
# WHERE "taggings"."post_id" IN (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
This is OK, but there are two problems. First, distinct
isn't working properly; it's returning duplicate records. We'll get to that in a minute.
Second, the array given to the WHERE
clause ends up being generated by a separate SQL query that finds the post IDs based on the user. That won't do. We can modify the where
clause to use a nested query:
Tag.joins(:taggings).where(taggings: { post: Post.where(user: user) }).distinct
# ... generates the following query:
# SELECT DISTINCT "tags".*
# FROM "tags"
# INNER JOIN "taggings" ON "taggings"."tag_id" = "tags"."id"
# WHERE "taggings"."post_id" IN (
# SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = 6
# )
Now we're generating a new query, Post.where(user: user)
, within the where
clause. This causes a subquery to be generated. Perfect!
Except our distinct still isn't working. It's because we're selecting all fields from the tags, but we only want the names, and that's also our criteria for duplicates.
We can solve this by adding an explicit select
to our query:
Tag.select(:name).joins(:taggings).where(taggings: { post: Post.where(user: user) }).distinct
# ... generates the following query:
# SELECT DISTINCT "tags"."name"
# FROM "tags"
# INNER JOIN "taggings" ON "taggings"."tag_id" = "tags"."id"
# WHERE "taggings"."post_id" IN (
# SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = 6
# )