zzzeek/msg373145.rst

## msg373145.rst

      
    Raw
  

              msg373145.rst
            
          
    This is a cross post of something I just posted on the Python bug
tracker at https://bugs.python.org/msg373145.
I seem to have two cents to offer so here it is.    An obscure issue in the
Python bug tracker is probably not the right place for this so consider this as
an early draft of something that maybe I'll talk about more elsewhere.
> This basically divides code into two islands - async and non-async
yes, this is the problem, and at the bottom of this apparently somewhat ranty
comment is a solution, and the good news is that it does not require Python or
asyncio be modified.  My concern is kind of around how it is that everyone has
been OK with the current state of affairs for so long, why it is that "asyncio
is fundamentally incompatible with library X" is considered to be acceptable,
and also how easy it was to find a workaround, this is not something I would
have expected to come up with.  Kind of like you don't expect to invent Velcro
or windshield wipers.
asyncio's approach is what those of us in the library/framework community call
"explicit async", you have to mark functions that will be doing IO and the
points at which IO occurs must also be marked.    Long ago it was via callback
functions, then asyncio turned it into decorators and yields, and finally
pep492 turned it into async/await, and it is very nicely done.  It is of course
a feature of asyncio that writing out async/await means your code can in theory
be clearer as to where IO occurs and all that, and while I don't totally buy
that myself, I'm of course in favor of that style of coding being available, it
definitely has its own kind of self-satisfaction built in when you do it.
That's all great.
But as those of us in the library/framework community also know, asyncio's
approach essentially means, libraries like Flask, Django, my own SQLAlchemy,
etc. are all automatically "non-workable" with the asyncio approach; while
these libraries can certainly have asyncio endpoints added to them, the task as
designed is not that simple, since to go from an asyncio endpoint all the way
through library code that doesn't care about async and then down into a
networking library that again has asyncio endpoints, the publishing of "async"
and the "await" or yield approach must be wired all the way through every
function and method.  This is all despite that when you're not at the
endpoints, the points at which IO occurs is fully predictable such that
libraries like gevent don't need you to write it.   So we are told that
libraries have to have full end-to-end rewrites of all their code to work this
way, or otherwise maintain two codebases, or something like that.
The side effect of this is that a whole bunch of library and framework authors
now get to create all new libraries and frameworks, which do exactly the same
thing as all the existing libraries and frameworks, except they sprinkle the
"async/await" keywords throughout middle tiers as required.  Vague claims of
"framework X is faster because it's async" appear, impossible to confirm as it
is unknown how much of their performance gains come from the "async" aspect and
how much of it is that they happened to rewrite a new framework from scratch in
a completely different way (hint: it's the latter).
Or in other cases, as if to make it obvious how much the "async/await" keywords
come down to being more or less boilerplate for the "middle" parts of
libraries, the urllib3 project wrote the "unasync" project [1] so that they can
simply maintain two separate codebases, one that has "async/await" and  the
other which just search-and-replaced them out.
SQLAlchemy has not been "replaced" by this trend as asyncio database libraries
have not really taken off in Python, and there are very few actual async
drivers.   Some folks have written SQLAlchemy-async libraries that use
SQLAlchemy's expression system while they have done the tedious, redundant and
impossible-to-maintain work of replicating enough of SQLAlchemy's execution
internals such that a modest "sqlalchemy-like" experience with asyncio can be
reproduced. But these libraries are closed out from all of the fixes and
improvements that occur to SQLAlchemy itself, as well as that these systems
likely target a smaller subset of SQLAlchemy's behaviors and features in any
case.    They certainly can't get the ORM working as the ORM runs lots of SQL
executions internally, all of which would have to propagate their "asyncness"
outwards throughout hundreds of functions.
The asyncpg project, one of the few asyncio database drivers that exists, notes
in its FAQ "asyncpg uses asynchronous execution model and API, which is
fundamentally incompatible with SQLAlchemy" [2], yet we know this is not true
because SQLAlchemy works just fine with gevent and eventlet, with no
architectural changes at all.  Using libraries like SQLAlchemy or Django with a
non-blocking IO, event-based model is commonplace.   It's the "explicit" part
of it that is hard, which is because of how asyncio is designed, without any
mediation for code that doesn't publish "async / await" keywords in the middle.
So I finally just sat down to figure out how to use the underlying greenlet
library (which we all know as the portable version of "Stackless Python") to
bridge the gap between asyncio and blocking-style code, it's about 30 lines and
I have SQLAlchemy working with an async front-end to asyncpg DBAPI as can be
seen at [3] based on the proof of concept at [4].  I'm actually running the
full py.test suite all inside the asyncio event loop and running asyncpg
through SQLAlchemy's whole battery of thousands of tests, all of them written
in purely blocking style, and there's not any need to add "async / await /
yield / etc" anywhere except the very endpoints, that is, where the top
function is called, and then down where we call into asyncpg directly, using a
function called await_() that works just like the "await" keyword.  Just no
"async" function declaration.
A day later, someone took the same idea and got Flask to work in an asyncio
event loop at [5] [5a].  The general idea of using greenlet in this way is also
present at [6], so I won't be patenting this idea today as oremanj can claim
prior art.
Using greenlet, there is no need to break out of the asyncio event loop at all,
nor does it change the control flow of parallel coroutines within the loop. It
uses greenlet's "switch", quite minimally, to bridge the gap between code that
does not push out an "async/await" yield and code that does.   There are no
threadpools, no alternate event loops, no monkeypatching, just a few
greenlet.switch() calls in the right spots.   A slight performance decrease of
about 15%, but in theory one would only be using asyncio if their application
is expected to be IO bound in any case (which folks that know me know is
another assertion I frequently doubt).
So to sum up, last week, libraries like Flask and SQLAlchemy were
"fundamentally incompatible" with asyncio, and this week they are not.
What's confusing me is that I'm not that smart and this is something all of the
affected libraries should have been doing years ago, and really, while I know
this is not going to happen, this should be part of asyncio itself or at
least a very standard approach so that nobody has to assume asyncio means
"rewrite all your library code".
To add an extra bonus, you can use this greenlet approach to have
blocking-style functions right in the middle of your otherwise asyncio
application.  Which means this also is a potential solution to the
"lazy-loading" problem.  You have an asyncio app that does lots of asyncio to
talk to microservices, but some functions are doing database work and they
really would like to just work in a transaction, load some objects and access
their attributes without worrying that a SQL statement can't be emitted.  This
approach makes that possible as well.  ORM lazy loading with the asyncpg
driver: [7].     Indeed, if you have a PostgreSQL SQLAlchemy application
already written in blocking style, you can use this new extension and drop the
entire application into the event loop and use the asyncpg driver, not too
unlike using gevent except nothing is monkeypatched.
The recipe is simple and so far appears to be very effective.   Using greenlet
to manipulate the stack is of course "spooky" and I would assume Python devs
may propose that this would lead to hard-to-debug conditions.   I've used
gevent and eventlet for many years and while they do produce some new issues,
most of them relate to the fact that they use monkeypatching of existing
modules and particularly around low level network drivers like pymysql.  The
actual stack moving around within business logic doesn't seem to produce any
difficult new issues.   Using plain asyncio has a lot of novel and confusing
failure modes too.    Using the little bit of "spookyness" of greenlet IMO is a
lot less work than rewriting SQLAlchemy, Django ORM, Flask, urllib3, etc. from
scratch and maintaining two codebases though.


[1] https://pypi.org/project/unasync/


[2] https://magicstack.github.io/asyncpg/current/faq.html#can-i-use-asyncpg-with-sqlalchemy-orm


[3] https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2071


[4] https://gist.github.com/zzzeek/4e89ce6226826e7a8df13e1b573ad354


[5] https://twitter.com/miguelgrinberg/status/1279894131976921088


[5a] "Add Async Support" pallets/flask#3412 (comment)


[6] https://github.com/oremanj/greenback


[7] https://gerrit.sqlalchemy.org/plugins/gitiles/sqlalchemy/sqlalchemy/+/refs/changes/71/2071/10/examples/asyncio/greenlet_orm.py