Skip to content

Instantly share code, notes, and snippets.

@akaariai
Last active January 22, 2016 07:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akaariai/716dcdd895ba159123c4 to your computer and use it in GitHub Desktop.
Save akaariai/716dcdd895ba159123c4 to your computer and use it in GitHub Desktop.
This is a random collection of problems with the ORM API regarding multivalued relations.
A multivalued relation is any relation for reverse foreign key queries or relations generated by ManyToManyField.
For multivalued relations an essential concept is that we can query the same relation multiple times inside a single query. For example, one can ask a question "which bands have members who are both older than 50 and taller than 180 cm" (query A), or a question "which bands have a member who is older than 50 and a member who is taller than 180" (query B).
It is notable that negated queries against multivalued relations use subqueries. The reason is that for example members__age__gte=50 matches any band that has at least single member older than 50 while ~Q(members__age__gte=50) removes all bands having at least one member aged older than 50. If we used a join for the exclude query and a band had two members, one aged 30 and one 60, then that band would be returned by the exclude query as the 30 years old member would match the negated condition.
The underlying reason why we want to make exclude() queries work the way they do is that .exclude() should return the complement of .filter() always.
1. Confusing API: Conditions inside one .filter()/.exclude() call target the same set of related object, conditions in different .filter()/.exclude() calls target different set of related objects
So, Band.objects.filter(members__age__gte=50, members__height__gte=180) is query A above, Band.objects.filter(members__age__gte=50).filter(members__height__gte=180) is query B.
The main problem here is that a non-experienced developer has zero chance of guessing that the two queries above are different. And, according to my experience, knowledge of this API isn't common.
2. Impossible queries: Targeting two different sets of related objects combined with OR or NOT is impossible. The only way to get OR or NOT queries is to use Q() objects, but you can't spread a single Q() object to two different filter() or exclude() calls.
Fortunately such queries are very rare, and I actually can't come with a good verbal explanation of such queries.
3. Implementation problems: Multiple negated filters do not target the same set of joins.
Band.objects.exclude(members__age__gte=50, members__height__gte=180) targets currently two sets of related objects.
4. Duplicate results
If a band has two members older than 50, then .filter(members__age__gte=50) gives that band two times in the results. We should be using subqueries for such filters, but we can't do that because annotations target the results of filter() calls.
5. Problems when combining with annotations
The basic API is that you use .filter() to restrict the related object set to wanted objects, then annotate over those. For example .filter(members__height__gte=180).annotate(avg_height=Avg('members__height')) gives the average height of those members who are taller than 180. This API has multiple problems:
- .exclude() queries use subqueries, so targeting multivalued relation excluded earlier doesn't work at all currently. Even if they did, Band.objects.exclude(members__height__gte=180).annotate(avg_height=Avg('members__height')) doesn't give the average of height of members less that 180 cm. The members__height__get exclude() removes any band that has a single member taller than 180, so for such bands you don't get any result.
- .filter(members__age__gte=50).filter(members__height__gte=180).annotate(avg_height=Avg('members__height')) gives the height of members aged over 50. Band.objects.exclude(members__height__gte=180, members__age__gte=50).annotate(avg_height=Avg('members__height')) gives the average height of members taller than 180. This is an confusing API.
- The above query doesn't actually give correct results at all currently. If you have multiple multivalued relations in a single query, then annotations produce silently wrong results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment