Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
GROUP BY and Select MAX from each group in Django, 2 queries
'''
given a Model with:
category = models.CharField(max_length=32, choices=CATEGORY_CHOICES)
pubdate = models.DateTimeField(default=datetime.now)
<other fields>
Fetch the item from each category with the latest pubdate.
'''
model_max_set = Model.objects.values('category').annotate(max_pubdate=Max('pubdate')).order_by()
q_statement = Q()
for pair in model_max_set:
q_statement |= (Q(category__exact=pair['category']) & Q(pubdate=pair['max_pubdate']))
model_set = Model.objects.filter(q_statement)
@formido

This comment has been minimized.

Copy link

formido commented Sep 6, 2013

I used this idea, thanks. :)

@Phlip

This comment has been minimized.

Copy link

Phlip commented Mar 21, 2015

the number of pub_dates is going to go up forever, so this query will get slow, and its generated SQL statement longer, each time a new pub_date appears

@Phlip

This comment has been minimized.

Copy link

Phlip commented Mar 21, 2015

I went with:

qs = MyModel.objects.all()
latest_dates = qs.values('frib', 'frob').annotate(latest_created_at=Max('created_at'))
qs = qs.filter(created_at__in=latest_dates.values('latest_created_at')).order_by('-created_at')

the last order_by() is optional, but it stands to reason it's needed.

@shenqihui

This comment has been minimized.

Copy link

shenqihui commented Apr 8, 2015

@Phlip 👍

@keithcallenberg

This comment has been minimized.

Copy link

keithcallenberg commented Dec 17, 2015

@Phlip In your filter operation your only check is that the model instance has the same created_at date as one of the values in the list. But wouldn't this break down if any instance from one category had the same date/datetime as the max for a different category? It would be included even if it wasn't the max for its category.

E.g.

id category pubdate
1 1 20150101
2 1 20150525
3 2 20140901
4 2 20150101

So even though the pubdate for id=1 is less than id=2, it will be included because it has the same pubdate as id=4. IDs 1, 2 and 4 will all be included.

I guess if you have a datetime (as opposed to date, or some other less granular value) you may feel more confident that they wouldn't share the exact same datetime, but I think it's worth pointing out this assumption/limitation.

@ns5d

This comment has been minimized.

Copy link

ns5d commented Jun 25, 2017

MyModel.objects.all().order_by('id', '-date').distinct('id')

@seanucd2000

This comment has been minimized.

Copy link

seanucd2000 commented Jul 15, 2017

Thanks @ns5d for posting this line here. Learning django and this made my CBV work perfectly in my use. Was fiddling with a less than ideal solution before, but this was worked beautifully. (did need to finally move my dev db from sqlite to postgres for distinct functionality)

@lucascnunes

This comment has been minimized.

Copy link

lucascnunes commented Sep 24, 2017

My savior. Thanks, I've spent 3 days looking for something like it

@TomStobbe

This comment has been minimized.

Copy link

TomStobbe commented Mar 23, 2018

Thanks @ns5d, exactly what I needed. Worked perfectly

@devashish-patel

This comment has been minimized.

Copy link

devashish-patel commented Aug 20, 2018

God bless you..:)

@michel4j

This comment has been minimized.

Copy link

michel4j commented Sep 16, 2018

You can use sub-queries to achieve the same thing in a single query as follows:

    sq = Item.objects.filter(category=OuterRef('category')).order_by('-pubdate')  # deferred execution
    Item.objects.filter(pk=Subquery(sq.values('pk')[:1]))

Alternatively, If you make Category a model like:

   category    = models.ForeignKey(Category, on_delete=models.SET_NULL, related_name='items')
   pubdate     = models.DateTimeField(default=datetime.now)

You can then do it in one query irrespective of database backend as follows:

    Item.objects.annotate(_sel=Max('category__items__pubdate')).filter(pubdate=F('_sel'))

I haven't tested which of them is faster, but both execute a single query on the database backend and work with every backend supported by Django, including SQLite.

@JakLiao

This comment has been minimized.

Copy link

JakLiao commented Sep 19, 2018

Thanks, that's awesome.

@Zakui

This comment has been minimized.

Copy link

Zakui commented Nov 7, 2018

Thanks, exactly what I needed :)

@MewX

This comment has been minimized.

Copy link

MewX commented Dec 17, 2018

Awesome! Thanks a lot!

@evdoks

This comment has been minimized.

Copy link

evdoks commented Mar 27, 2020

@michel4j Took me some time to adapt to my needs, but in the end it worked! Amazing solution!

@rluts

This comment has been minimized.

Copy link

rluts commented Apr 9, 2020

It is very slow for large tables. Use the version provided by @ns5d

@pythonist77

This comment has been minimized.

Copy link

pythonist77 commented Apr 18, 2020

Thanks @ryanpitts for this trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.