This is a document full of notes of all the things that have hung me up getting into Python's new asyncio library. I mostly use it for tools that schlep data around between file systems, databases, and search engines for work. So lots of io. Usually I use gevent and python 2.7. But recently I've been trying to get more into asyncio and here I'm going to document all the things that I felt important to know or that tripped me up.
The first thing you need to do is write some coroutines without asyncio. This means understanding the yield
keyword and how it can be used to create generators. Then next the protocol for sending/receiving from generator objects.
Recall generators are a special kind of iterator. When you specify a generator like so, you can use it in a for loop:
def generator()
for i in range(0, 5):
rcvd = yield i
for i in generator():
print(i) # prints 1 then 2 then 3 ... 5
What this for loop is doing is really syntactic sugar for the protocol used to interact with generator objects. We can tell a generator object to move forward with next
. We can also provide input ot the generator object with send
as follows:
g = generator()
next(g) # advance g to first position in generator
for i in range(0, 5)
nextI = g.send(i)
print nextI
Here we're directly providing a value to the yield expression contained within g with send
. The send
method also advances the called iterator to the next yield
statement, taking the result of the yield expression as a result. In a way, send
is the most general way to interact with the generator. It both puts data in and takes data out.
The semantics of yield from can be confusing because they aren't quite like yield. You really need to get this down.
What does yield from do? Instead of delegating to an underlying generator as follows:
def outerGen():
g = generator()
nextToSend = None
while True:
rcvd = g.send(nextToSend)
nextToSend = yield rcvd
You can replace with yield from that delegates as follows:
def outerGen():
g = generator()
yield from g
Here outerGen
is a generator that delegates to the underlying generator. Sending to outerGen
wires it to send to g for the duration of g not being exhausted. The generator outerGen
will be stopped on g until g is exhausted.
Here's the confusing part. If you're familiar with non-asyncio coroutines, this form may throw you as it throws me:
def foo():
g = generator()
result = yield from g
In traditional Python coroutines, the variable on the left of the =, as in result = yield f
picks up what the code on the outside has sent in with send
. This is NOT the case with yield from -- result
is NOT the result of a send being sent to foo. It's the return value of g. Any one that calls send on a generator created from foo
will have its value forwarded to g
.
For example, if we define g as
def g():
for i in range(0, 5)
rcvd = yield i
return "DONE"
then result
in our foo
above would get the value "DONE" after the inner generator is exhausted. In short yield from
locks your generator to the behavior of the underlying generator until the underlying generator is exhausted.
Asyncio uses the form:
result = yield from (future or coroutine)
And what confused me was I wanted coroutine to be the long-running traditional form where somehow the inner generator would yield to the outer generator. I expected inner generators/coroutines to be piped in a sense to the outer generator. What tends to happen (and this is important) is that futures and many times coroutines are one shot. If we recall how yield from works, result is happening when future/coroutine are exhausted. This means that either
* the future is done, this future might represent exactly one read from a file descriptor
* the coroutine is done done. Like it returned.
So a lot of coroutines take the form
def mycoro()
result = yield from foo()
return 5 + result
And foo() is going to yield a future. Once the future is done, foo will resume (because send is sent to the inner locked-in delegated coroutine). foo then returns with result, it doesn't keep living.
If you want to have coroutines that coexist, it seems what you actually want to do is start them on the outside and use synchronization primitives (Futures, Queues, etc) instead of trying to necesarilly wire them together. If you wire them together directly, it seems to be the practice to have the inner coroutine run to completion and return with a result. For example, this form
@asyncio.coroutine
def make_line():
... you setup a stream reader to pull a line from stdin ...
line = yield from stream_reader.readline()
return line
@asyncio.coroutine
def process_lines
line = yield from make_line()
print(line)
Of course its a bit silly for make_line
to setup a stream_reader and read one line. So it seems (and I'm going to valildate this) that while make_line
ought to be a single shot coroutine, this can me a methon on a class, and we can keep the associated stream_reader around for more line reading:
class LineReader:
...
def __init__(self):
self.stream_reader = ... setup stream reader for reading from stdin ...
...
@asyncio.coroutine
def make_line(self):
line = yield from self.stream_reader
return line
def exhausted(self):
....
then the client simply loops, yielding from make_line, and checking some logic in exhausted we use to determine if stream_reader is exhausted.