Skip to content

Instantly share code, notes, and snippets.

@pwtail
Last active Jan 9, 2022
Embed
What would you like to do?

Async django orm

DEP xxx, draft

Hi everyone! This text contains an alternative proposal on the async support in django orm.

It aims for a single version of django than could be used in both sync and async contexts, that doesn't break the compatibility with previous versions.

Strictly speaking, this is not an alternative to DEP-09 (Async django) as a natively async orm is out of scope of that DEP, but since DEP-09 states that such (compatible) version is unfeasible, in a sence, it is.

Idea: "yield from" for the resque!

We are going to use generators for our purpose, namely the yield from construct. Generators helped implement async-await in python once, hopefully will also help to abstract it away.

The concept is simple:

  • We replace synchronous function with a generator that yields from other functions containing I/O. As a result, that generator yields at every I/O operation

  • The database driver (an async one, for example), receives a task from the generator (a database query) and executes it whenever it is able to. On success, it takes the results it got from the database and sends them back to the generator, receiving the next task to execute.

  • After all I/O operations are completed, the generator returns the value that former synchronous function was returning.

Note: I said that we replace synchronous function with a generator, but in fact we can't do that because otherwise we would break the compatibility. We should wrap generator into a regular function and provide a way to yield from the unwrapped version of it.

Explained with code

Let's say we have a synchronous method QuerySet.get() that we want to add the async version for.

It calls in turn QuerySet.__len__ which makes a query to the database under the hood. Then it takes the object from QuerySet._result_cache to return. Let's proceed as we said above:

@provide_driver
def get(self: QuerySet):
    ...
    num = yield from self.G.__len__()  # or len(self.G)
    ...

As I said, we want the code to be compatible, so @provide_driver makes .get() a regular function again. The unwrapped version is made accessible through .G namespace. So, for example, QuerySet.G.get - is a generator function.

Here is how provide_driver could look like:

def provide_driver(func):
    def wrapper(*args, **kwargs):
        deferred_calls = func(*args, **kwargs)  # a generator
        if not IS_ASYNC:
            driver = sync_driver
        else:
            driver = async_driver
        return driver.execute(deferred_calls)

class AsyncDriver:
    async def execute(self, deferred_calls):
        result = None
        while True:
            try:
                func, args, kwargs = deferred_calls.send(result)
            except StopIteration as ex:
                return ex.value
            result = await func(*args, **kwargs)  # without await 
                                                  # in the sync case

What we get

Roughly speaking, this simple aproach lets us get both sync and async version from a single codebase. Not without additional changes, of course, but I couldn't see any big issues along that way. According to the IS_ASYNC constant, the ORM will be sync-only or async-only.

The fact that we get fully compatible version, means that we can reuse all tests from django, and those will largely test the async version too.

Changes in the API

Not all of the django API can be made async, but it can be adapted. The single unportable feature of it is the lazy attributes. Currently I am thinking to change them in the following way:

  • The related attributes can be accessed as usual only if they were somehow prefetched before

  • If not, a different syntax (API) is required. For example, await obj.R('related_obj')

Whatever the solution may be, that's not a major issue, in my opinion.

When this approach doesn't fit

The issues begin when we need actual generators in our code, especially when it is a conceptual requirement. Like when the data doesn't fit into memory. In short, if we have asynchronous operations between iterations, that can't be avoided, than the aproach described above can't be applied easily. In other words, we don't want to implement async for.

Speaking of django orm, there is no conceptual need for generators, nor for the async for. With one exception though: a queryset is iterable, and was made so specifically for working with large datasets that don't fit into memory. But fear not: it is one special case that can be handled easily.

I/O is not the main concern of an ORM anyway, so we'll hardly ever hit the limits of this approach.

Downsides

  • The obvious one: the code is uglier, one has to use yield from. The users don't have to deal with it though, they can use async-await and regular functions.

  • The stack of function calls (frames) gets also uglier a bit. It is split in two, specifically, that happens nearly where the driver hits the database.

  • Sometimes, the database is hit pretty deeply in the call stack, and because of that we have more yield from constructs the we could. It's not optimal for the performance, too. But that's not critical, and the refactoring is pretty straightforward.

proof-of-concept

Available here: https://github.com/pwtail/django/pull/4/files

Limitations: a small portion of the orm is ported (QuerySet.get & QuerySet.filter). In fact, I tried other functions as well, and model.save(), for example, and with success.

GNamespace is very lousy implemented (and not needed in general!). Still - a proof of concept.

P. S.

I can implement all this, but I need some good opinions from the python community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment