Last active
August 26, 2015 12:11
-
-
Save emilbjorklund/9c31ec0400bcba615062 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
popular_authors = list(Person.objects.published().annotate( | |
num_items=Count('newsitem__author')).order_by('-num_items')[:10]) | |
shuffle(popular_authors) | |
popular_authors = popular_authors[:2] | |
# The order_by clause in conjuncion with annotate seemed to | |
# get messed up here. So I switched to shuffle()-ing and slicing the list | |
# after evaluation instead. | |
other_authors = list(Person.objects.published().annotate( | |
num_items=Count('newsitem__author')).filter( | |
num_items__gte=1 | |
).exclude(pk__in=[author.pk for author in popular_authors])) | |
shuffle(other_authors) | |
authors = popular_authors + other_authors[:2] | |
shuffle(authors) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Here's the problem | |
- I'm trying to create a list of article authors that is "diverse" in the | |
sense that it showcases both "popular" authors (technically, people that write a lot) | |
and other people who write less frequently. | |
- The list should not pick the same people each time, so it is shuffled. | |
- The slice of authors that are in the "popular" segment should obviously | |
never appear in the "other authors" category. | |
The issue: I still get duplicates in the list, possibly after a 1.6 to 1.8 upgrade. | |
""" | |
# First, get a list of 10 people who write a lot: | |
popular_authors = list(Person.objects.published().annotate( | |
num_items=Count('newsitem__author')).order_by('-num_items')[:10]) | |
# shuffle that list and cut the two first: | |
shuffle(popular_authors) | |
popular_authors = popular_authors[:2] | |
# Get a list of all the authors who have written anything at all | |
# but exclude the <pk> of people already in the "popular" list. | |
other_authors = list(Person.objects.published().annotate( | |
num_items=Count('newsitem__author') | |
).filter( | |
num_items__gte=1 | |
).order_by('?').exclude( | |
pk__in=[author.pk for author in popular_authors]))[:2] | |
# Merge the two lists and shuffle them | |
authors = popular_authors + other_authors | |
shuffle(authors) |
Ha ha, that sounds like ORMs to me — nice for very simple things, but as soon as you want to do anything even vaguely interesting (like this) they’re more hassle than using raw SQL :) Happy that you figured out how to fix it!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I tried lots of variations of
distinct()
etc, but nothing helped. I think in the end it was theorder_by
that somehow got confused in conjunction with the annotation inother_authors
- generated a massive SQL that somehow got grouped incorrectly.Bloood sugar too low to understand exactly what went wrong, but by extracting the random ordering and limit/slice to outside the query, it was cut down significantly in complexity + takes a third of the time to evalutate (not to mention it works correctly) so I'm happy. :-)