Async django orm
DEP xxx, draft
Hi everyone! This text contains an alternative proposal on the async support in django orm.
It aims for a single version of django than could be used in both sync and async contexts, that doesn't break the compatibility with previous versions.
Strictly speaking, this is not an alternative to DEP-09 (Async django) as a natively async orm is out of scope of that DEP, but since DEP-09 states that such (compatible) version is unfeasible, in a sence, it is.
Idea: "yield from" for the resque!
We are going to use generators for our purpose, namely the
yield from construct. Generators helped implement async-await in python once, hopefully will also help to abstract it away.
The concept is simple:
We replace synchronous function with a generator that yields from other functions containing I/O. As a result, that generator yields at every I/O operation
The database driver (an async one, for example), receives a task from the generator (a database query) and executes it whenever it is able to. On success, it takes the results it got from the database and sends them back to the generator, receiving the next task to execute.
After all I/O operations are completed, the generator returns the value that former synchronous function was returning.
Note: I said that we replace synchronous function with a generator, but in fact we can't do that because otherwise we would break the compatibility. We should wrap generator into a regular function and provide a way to yield from the unwrapped version of it.
Explained with code
Let's say we have a synchronous method
QuerySet.get() that we want to add the async version for.
It calls in turn
QuerySet.__len__ which makes a query to the database under the hood. Then it takes the object from
QuerySet._result_cache to return. Let's proceed as we said above:
@provide_driver def get(self: QuerySet): ... num = yield from self.G.__len__() # or len(self.G) ...
As I said, we want the code to be compatible, so
.get() a regular function again. The unwrapped version is made accessible through
.G namespace. So, for example,
QuerySet.G.get - is a generator function.
Here is how
provide_driver could look like:
def provide_driver(func): def wrapper(*args, **kwargs): deferred_calls = func(*args, **kwargs) # a generator if not IS_ASYNC: driver = sync_driver else: driver = async_driver return driver.execute(deferred_calls) class AsyncDriver: async def execute(self, deferred_calls): result = None while True: try: func, args, kwargs = deferred_calls.send(result) except StopIteration as ex: return ex.value result = await func(*args, **kwargs) # without await # in the sync case
What we get
Roughly speaking, this simple aproach lets us get both sync and async version from a single codebase. Not without additional changes, of course, but I couldn't see any big issues along that way. According to the
IS_ASYNC constant, the ORM will be sync-only or async-only.
The fact that we get fully compatible version, means that we can reuse all tests from django, and those will largely test the async version too.
Changes in the API
Not all of the django API can be made async, but it can be adapted. The single unportable feature of it is the lazy attributes. Currently I am thinking to change them in the following way:
The related attributes can be accessed as usual only if they were somehow prefetched before
If not, a different syntax (API) is required. For example,
Whatever the solution may be, that's not a major issue, in my opinion.
When this approach doesn't fit
The issues begin when we need actual generators in our code, especially when it is a conceptual requirement. Like when the data doesn't fit into memory. In short, if we have asynchronous operations between iterations, that can't be avoided, than the aproach described above can't be applied easily. In other words, we don't want to implement
Speaking of django orm, there is no conceptual need for generators, nor for the
async for. With one exception though: a queryset is iterable, and was made so specifically for working with large datasets that don't fit into memory. But fear not: it is one special case that can be handled easily.
I/O is not the main concern of an ORM anyway, so we'll hardly ever hit the limits of this approach.
The obvious one: the code is uglier, one has to use
yield from. The users don't have to deal with it though, they can use async-await and regular functions.
The stack of function calls (frames) gets also uglier a bit. It is split in two, specifically, that happens nearly where the driver hits the database.
Sometimes, the database is hit pretty deeply in the call stack, and because of that we have more
yield fromconstructs the we could. It's not optimal for the performance, too. But that's not critical, and the refactoring is pretty straightforward.
Available here: https://github.com/pwtail/django/pull/4/files
Limitations: a small portion of the orm is ported (
QuerySet.filter). In fact, I tried other functions as well, and
model.save(), for example, and with success.
GNamespace is very lousy implemented (and not needed in general!). Still - a proof of concept.
I can implement all this, but I need some good opinions from the python community.