Skip to content

Instantly share code, notes, and snippets.

@emilbjorklund
Last active August 26, 2015 12:11
Show Gist options
  • Save emilbjorklund/9c31ec0400bcba615062 to your computer and use it in GitHub Desktop.
Save emilbjorklund/9c31ec0400bcba615062 to your computer and use it in GitHub Desktop.
popular_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')).order_by('-num_items')[:10])
shuffle(popular_authors)
popular_authors = popular_authors[:2]
# The order_by clause in conjuncion with annotate seemed to
# get messed up here. So I switched to shuffle()-ing and slicing the list
# after evaluation instead.
other_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')).filter(
num_items__gte=1
).exclude(pk__in=[author.pk for author in popular_authors]))
shuffle(other_authors)
authors = popular_authors + other_authors[:2]
shuffle(authors)
"""
Here's the problem
- I'm trying to create a list of article authors that is "diverse" in the
sense that it showcases both "popular" authors (technically, people that write a lot)
and other people who write less frequently.
- The list should not pick the same people each time, so it is shuffled.
- The slice of authors that are in the "popular" segment should obviously
never appear in the "other authors" category.
The issue: I still get duplicates in the list, possibly after a 1.6 to 1.8 upgrade.
"""
# First, get a list of 10 people who write a lot:
popular_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')).order_by('-num_items')[:10])
# shuffle that list and cut the two first:
shuffle(popular_authors)
popular_authors = popular_authors[:2]
# Get a list of all the authors who have written anything at all
# but exclude the <pk> of people already in the "popular" list.
other_authors = list(Person.objects.published().annotate(
num_items=Count('newsitem__author')
).filter(
num_items__gte=1
).order_by('?').exclude(
pk__in=[author.pk for author in popular_authors]))[:2]
# Merge the two lists and shuffle them
authors = popular_authors + other_authors
shuffle(authors)
@emilbjorklund
Copy link
Author

The part of the view responsible for picking the list of authors to highlight. Creates duplicates, possibly after an upgrade from Django 1.6 to 1.8.

@barnabywalters
Copy link

Without setting up a project with a database with test data in I’m unable to offer any actual fix. If the quantities of rows you’re working with here are representative of production, then I would suggest simply doing the existing author exclusion in code rather than in the database, unless someone comes up with an easy fix.

For example, remove the exclude() call and take [:4], then take the first two authors in other_authors who aren’t in popular_authors.

@briansuda
Copy link

Where are you getting the duplicates? In popular_authors or when you join the two lists in authors?

Have you tried adding .distinct() to the queries or looked into using Q()
https://docs.djangoproject.com/en/1.8/topics/db/queries/#complex-lookups-with-q-objects

@emilbjorklund
Copy link
Author

I tried lots of variations of distinct() etc, but nothing helped. I think in the end it was the order_by that somehow got confused in conjunction with the annotation in other_authors - generated a massive SQL that somehow got grouped incorrectly.

Bloood sugar too low to understand exactly what went wrong, but by extracting the random ordering and limit/slice to outside the query, it was cut down significantly in complexity + takes a third of the time to evalutate (not to mention it works correctly) so I'm happy. :-)

@barnabywalters
Copy link

Ha ha, that sounds like ORMs to me — nice for very simple things, but as soon as you want to do anything even vaguely interesting (like this) they’re more hassle than using raw SQL :) Happy that you figured out how to fix it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment