esoergel/django-queryset-abstractions.md

## django-queryset-abstractions.md

      
    Raw
  

              django-queryset-abstractions.md
            
          
    Beyond 'wrappers' - Using querysets to compartmentalize business logic into the Django ORM

The Django ORM is an invaluable tool for querying your data.  However, as a generic tool, it's often not a clean fit.  Querying logic ends up being duplicated in multiple places, often with subtle variations.  Maybe you have helper functions for commonly used queries, but these can't be combined.
This article describes a technique for extending Django querysets with methods specific to your models. This lets you write business logic in a way that fits in to the Django ORM, like MyModel.objects.filter(...).my_custom_method().first()
Some benefits of encapsulating database queries are:

Reusability: This method can be reused elsewhere, preventing the need for duplication.
Unification: This query presumably represents something meaningful.  By creating and using an abstraction, you avoid multiple implementations of the same concept, which may have subtle differences.
Readability: The code that calls the query is much simpler to read.  A jumble of terse, generic query language can be replaced by a meaningful name.
Orthogonality: Should you need to optimize the db call, switch backends, or add a new business logic stipulation, you need only change the encapsulated method.  As long as the encapsulation preserves the original interface, the calling code shouldn't need to be updated.

A word of warning, though - as an abstraction, the implementation is obscured.  This makes it easy to do something complex.  The query may end up being used in situations for which it was never intended, and may have substantial performance costs.
Implementation

Let's start with an example.  Consider following "Book" and "Author" classes:
from django.db import models


class Author(models.Model):
    name = models.CharField()
    bio = models.TextField()


class Book(models.Model):
    title = models.CharField()
    publication_date = models.DateField()
    author = models.ForeignKey(Author)
    genre = models.CharField()
    copies_sold = models.IntegerField()
Maybe you have a concept of "popular books", referring to books which have sold at least 10,000 copies.  Here's how you'd get all popular books:
popular_books = Book.objects.filter(copies_sold__gte=10000)
As described above, you may create an abstraction in code which mirrors this concept.  The simplest way to do so is by putting it in a utility function:
def get_popular_books():
    return Book.objects.filter(copies_sold__gte=10000)
Unfortunately it's never that simple.  In one instance you may want a paginated list of books' titles, or a total count, or you want it restricted to books by a certain author, genre, or both.  One approach is to add a parameter to this function every time you come up with a new change, but that can quickly get unwieldy.
Luckily this is what query builders like Django's are for.  You could just have your function return a queryset, and chain things to the result.  A nicer approach is for your custom query logic to live on the model manager itself, so it integrates with Django's queryset library.
class BookManager(models.Manager):
    def get_popular_books(self):
        return self.filter(copies_sold__gte=10000)

class Book(models.Model):
    objects = BookManager()

    title = models.CharField()
    ...
Now your calling code will look like this:
Book.objects.get_popular_books().values_list('title', flat=True)
Nice, huh?  That should suffice in most cases, but sometimes filters need to be applied before this encapsulated logic is performed, or you need to use two or more of these custom components together.
What happens when a user types in text to search for a book?  This might search title, genre, and author's name.  This is something that could happen in a number of different places, but the concept should be the same, and so should the implementation!
class BookManager(models.Manager):
  ...

  def filter_by_user_input(self, user_input):
      return self.filter(models.Q(title__icontains=user_input) |
                         models.Q(genre__icontains=user_input) |
                         models.Q(author__name__icontains=user_input))
Now what if you need to show a list of popular books filtered by user input?  Model.objects returns a model manager, but Model.objects.filter() returns a queryset.  Let's skip ahead a bit and make these methods available on both the model manager and the queryset.
class BookQuerySet(models.query.QuerySet):
    def get_popular_books(self):
        ...

    def filter_by_user_input(self, user_input):
        ...


class BookManager(models.Manager):
    def similar_books(self, books):
        # this method is only available on the manager
        # more on that below
        ...


class Book(models.Model):
    objects = BookManager.from_queryset(BookQuerySet)()
The from_queryset method (provided by Django) makes queryset methods available on the model manager.  Now similar_books, get_popular_books and filter_by_user_input are all available on the model manager, and the latter two are also available on the queryset.  This makes it possible to chain these custom methods together:
Book.objects.get_popular_books().filter_by_user_input(cleaned_user_input)
Ordering

Consider the order in which custom methods are applied.  The examples above are simple filters, but what if your custom queryset method expanded the results returned?  For instance perhaps it returns a new queryset which includes all books by the authors of the books in the original queryset.  If one of your methods was a means of access control, and then a later method added books, the resulting queryset could include things it shouldn't.
class BookQuerySet(models.query.QuerySet):
    def include_similar_books(self):
        # Calling Book.objects like this should probably be avoided
        return Book.objects.filter(author__in=self.values('author'))

# This may violate expectations
queryset.publicly_visible().include_similar_books()

# This is safer
queryset.include_similar_books().publicly_visible()
The former query may include books which are not publicly visible, since that filter was applied before the new queryset was created.  This behavior is surprising.
It's safer to make methods which create a new queryset based on some input only be available on the model manager.  That way there's no confusion about which filters are actually applied to the result.  You could express the above as:
class BookManager(models.Manager):
    def similar_books(self, books):
        return self.filter(author__in=books.values('author'))

Book.objects.similar_books(queryset).publicly_visible()
The behavior is much more intuitive in this example.  The functionality of finding "similar books" doesn't refine a queryset, it creates a new one, with the original as a parameter.
How Dimagi uses this technique

One of the models Dimagi uses is "Location", which represents user-defined geographic locations.  Instances of this model are interrelated, forming a tree structure - ie, the city of Boston is a sibling of Brookline, and a child of Suffolk County, which is a child of Massachusetts.
Some repeated concepts around querying these models are:

Get the set of ancestors or descendants of an individual location or set of locations. This is handled by a tree library.
Filter the set of locations by user input.  What fields are searched?  Can your search string reference multiple nodes (eg "Mass/Boston")?  This should behave the same across the site.
Location accessibilty.  Some users have their access restricted to a subset or subsets of locations, say Suffolk County and Cambridge.

You could write simple function for these concepts individually, but they couldn't be combined with each other or with other queryset methods like .values() or .count() unless they accepted and/or returned querysets themselves, so we use queryset abstractions.  This lets us express the concept of .accessible_to_user() on its own, and each caller can access the count, list of names, or ids as necessary.  It can also be combined with .filter_by_user_input() where appropriate.
These techniques allow us to define these business concepts on the model related code itself.  Other parts of the code which need to interact with this model, such as reporting, can then be written in the language of the problem domain.  You do not need to know how to determine whether a location is accessible to a user, nor do you need to reimplement it.  This keeps the code cleaner, more consistent, and comprehensible.
Warnings

Resist the temptation to put everything in here.  Is this actually a reusable abstract concept, or is this a one-off thing?  Use this technique sparingly.
Similarly, avoid incorporating logic from other domains.  The Book queryset methods should pertain primarily to book objects.  Queries integrating multiple modules might be better off in utility functions.
You're extending the django queryset interface, and these methods should behave like normal django queryset methods.  That is, they should pertain directly to database queries.  Return values should be new modified querysets (like .filter(), exclude(), or order_by()) or perhaps evaluated and wrapped db calls (think .get(), .values_list(), and .count())
Don't do other queries in your method.  There is an expectation that Django querysets don't do anything until they're evaluated.  Violating this principle can make your code perform queries in places you might not expect.
As with any library interface, try to keep the parameters as simple as possible.  Don't pass in objects when all you need is one property.
Sometimes it's better to repeat yourself a little than to add a level of complication:
Book.objects.title(book_title)
Book.objects.filter(title=book_title)
The first example is a few lines shorter, but it exchanges a familiar Django paradigm (.filter()) with an inscrutable custom method.
Django querysets are a complicated, powerful interface which couple your code tightly to your DB backend.  This doesn't change that fact.  If you are ever going to swap out ORMs or storage backends, pervasive queryset-related code will complicate that task.  For some instances, you may want to hide queryset code inside functions with simple return values.  That way you can easily swap those out to use SQLAlchemy, a handwritten SQL query, a precomputed table, or something like elasticsearch.
Further reading

The Django documentation page on Custom managers goes over this and some related options.