Skip to content

Instantly share code, notes, and snippets.

@emilbjorklund
Last active August 26, 2015 12:11
Show Gist options
  • Save emilbjorklund/9c31ec0400bcba615062 to your computer and use it in GitHub Desktop.
Save emilbjorklund/9c31ec0400bcba615062 to your computer and use it in GitHub Desktop.
popular_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')).order_by('-num_items')[:10])
shuffle(popular_authors)
popular_authors = popular_authors[:2]
# The order_by clause in conjuncion with annotate seemed to
# get messed up here. So I switched to shuffle()-ing and slicing the list
# after evaluation instead.
other_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')).filter(
num_items__gte=1
).exclude(pk__in=[author.pk for author in popular_authors]))
shuffle(other_authors)
authors = popular_authors + other_authors[:2]
shuffle(authors)
"""
Here's the problem
- I'm trying to create a list of article authors that is "diverse" in the
sense that it showcases both "popular" authors (technically, people that write a lot)
and other people who write less frequently.
- The list should not pick the same people each time, so it is shuffled.
- The slice of authors that are in the "popular" segment should obviously
never appear in the "other authors" category.
The issue: I still get duplicates in the list, possibly after a 1.6 to 1.8 upgrade.
"""
# First, get a list of 10 people who write a lot:
popular_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')).order_by('-num_items')[:10])
# shuffle that list and cut the two first:
shuffle(popular_authors)
popular_authors = popular_authors[:2]
# Get a list of all the authors who have written anything at all
# but exclude the <pk> of people already in the "popular" list.
other_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')
).filter(
num_items__gte=1
).order_by('?').exclude(
pk__in=[author.pk for author in popular_authors]))[:2]
# Merge the two lists and shuffle them
authors = popular_authors + other_authors
shuffle(authors)
@emilbjorklund
Copy link
Author

I tried lots of variations of distinct() etc, but nothing helped. I think in the end it was the order_by that somehow got confused in conjunction with the annotation in other_authors - generated a massive SQL that somehow got grouped incorrectly.

Bloood sugar too low to understand exactly what went wrong, but by extracting the random ordering and limit/slice to outside the query, it was cut down significantly in complexity + takes a third of the time to evalutate (not to mention it works correctly) so I'm happy. :-)

@barnabywalters
Copy link

Ha ha, that sounds like ORMs to me — nice for very simple things, but as soon as you want to do anything even vaguely interesting (like this) they’re more hassle than using raw SQL :) Happy that you figured out how to fix it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment