AvnerBen/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Parallel Programming in Python Lesson 5. Cooperative programming - synchronous

This is the fifth in a series of lessons, covering the various facilities that the Python programming language offers for parallel programming and the motivation for using each of them. In the previous lessons, we explored the applicative requirements for event-driven design, and learned to distinguish the ones that really require parallel code, concentrating on the classical use-cases of multi-threading and multiprocessing. In this lesson, we continue to explore the border-zone application domain of iteration over functions that are as separate as can be (e.g. Producer logic and Consumer logic), but still do not (and do not have to) execute concurrently (and instead, are somehow squeezed into the current thread of control).
Contents:

Introduction
Pull iterator
Iterator essentials
Single Consumer adapted to pull from multiple Producers
Push iterator
Coroutine essentials
Automatic Priming
Single Producer, adapted to push to multiple Consumers (Multi-casting)
Propagating the Data-sink
Comparison of state-machine vs. coroutine based solutions
Exercise: Cooperative Tail Server

1. Introduction

In the previous two lessons (#3 and 4), we studied a simple "Producer/Consumer" (one-to-one) use case:

"The Producer sends enumerated text messages, which are recieved and promptly displayed by the Consumer, in two-second interval. Both Producer and Consumer are created (and paired) by the main program, which shuts down the Producer after 20 seconds, causing the Consumer to shut down as well, resulting in exactly 10 messages displayed".

We have exprimented with various "conventional" implementations of the Producer/Consumer use-case: first with multi-threading and then with Python's own multi-processing twist.
All these "classical" solutions had these two principles in common:

The Producer and Consumer are loosely coupled - they never meet in person, communicating through a third party. We considered the following means of communication: global variable, shared variable, point-to-point message-queue, pipe, publish/subscribe and socket.
The protocol is dictated by the Producer (that manufactures the goods). The Consumer complies.

The "classical" solution of deploying the Producer and Consumer on two separate threads (or processes) seems to follow from the above. If the Producer is not allowed to call the consumer (to consume), as well as the Consumer is not allowed to call the Producer (to produce), and if they both loop (there are many messages, in sequence) - then they must loop in parallel! On the one hand, there is the Producer, producing as many messages as it sees fit and, on the other hand, there is the Consumer, consuming messages, as long as they are available. It is left to solve where the common message (passed along the line) sits. (Hint: not inside any of the two!)
And now, to an alternative solution where the Producer and Consumer (1) loop in the same thread of control, and (2) are temporally coupled (where either the Consumer pulls directly from the Production loop, or the Producer pushes directly into the Consumption loop). Of course, these solutions are restricted to the subset of use cases where Producer and Consumer are never required to execute concurrently (at exactly the same time - which is undeniably thread territory). And then, this is not much to ask, considering that almost all Producer/Consumer use-cases in our universe do not really require concurrency! (By definition, sending the message must occur before receiving it. There is only scope for sending and receiving at the very same time in bizarre cases, where, for example, the message is sent in packets or multiple messages must be issued before being received, so peripheral activity on both sides may happen to occur concurrently). Before questioning if and how this magic can be performed, There is good reason to try: (1) multi-threading/processing is resource-expensive, and (2) it introduces complexity to the design that makes the program (unnecessarily) hard to manage and extend.
The solution involves two programmatic paradigms:

Cooperative processing (use of Coroutines). We shall consider two design patterns: (1) Pull iterator. In each iteration, the consumption loop pulls the next message from the production loop (which pauses), uses it and pauses, etc. (2) "Push iterator. In each iteration, the production loop pushes the next message to the Consumption loop (and pauses). The Consumption loop takes over from there (and pauses), etc. While such a solution can be built with plain procedural programming, the result is likely to be too complicated to justify the effort (which explains why such designs are not frequent in practice). Here, the programmatic feature of coroutines (functions that pass control to each other, retaining execution state) comes handy! The Python support for coroutines makes these solutions quite simple and readable (for those who are familiar with the paradigm, of course).
Substitutability (or Polymorphism). These solutions seem to compromise the essential loose coupling (between Producer and Consumer) - there is no denying that this Producer is explicitly calling a method of the Consumer (or vice-versa). But, here comes the twist: this Producer is unaware of exactly (or even vaguely) who it is talking to. The Producer (or vice-versa) can be given any object, as long as that object implements the Consumer interface (the part that is of interest to the Producer, and as the Producer expects to find it). As we shall see, the substitutability of the iteration argument opens some interesting extensions. For example, why do with 1:1? The object behind the interface may be an adapter that allows to easily implement such use cases as single Producer / multiple Consumers, multiple Producers / single Consumer and others. For example, the object on the other side of the interface may be a proxy for the real object that sits in another thread (if concurrency is really needed!)

2. Pull iterator

The following example couples the Consumer with the Producer explicitly, but with a twist, using Python's support for structured iteration. If we constrain our Producer/Consumer use case to exactly one consumer, (which is quite relaistic), then we are free to rephrase it in this simple form:

"The consumer iterates over messages from the Producer". ("Iterates" in the same way that it would pull records from a file, characters from a string or objects from a list).

This (admittedly unusual) solution, besides coupling Consumer directly with Producer (actually, its interface), also inverts the paradigm: The Consumer pulls the messages from the Producer (compared with all previous implementations in these lessons, where it has always been the Producer that pushed messages to the Consumer).
Here is an example (notes follow):
https://gist.github.com/a8bdab844463ac23909f467b6121ab74
Notes:

In this design, the Producer is not a thread.
A producer method that implements structured iteration. _iter_ is a magic function that is silently invoked by Python to implement for loops on Producer objects. (More on this later).
The yield statement suspends the loop, feeding the calling for-loop with the current element.
The Consumer (a thread) is initialized with an object complying to the Iterable interface, (such as our Producer).
The Consumer iterates on the messages "stored" (for all it knows) in the Producer.
The main program creates the Producer and initializes the Consumer with it.

Output:
Round 1
Round 2
Round 3
Round 4
Round 5
Round 6
Round 7
Round 8
Round 9
Round 10
[End of input]
3. Iterator/generator essentials

This sequence diagram demonstrates the synchronization by data coupling. In each iteration in the consumption loop, "to pull next message from the Producer" (implicit in the for-loop header) is design-wise coupled by "message" (dotted arc) with (and programmatically blocked by) "to yield next message" in the production side, which loops in parallel.

Unlike other languages that support structured iteration under some strict use cases (such as extracting records from files or objects from standard containers), Python's iteration model is - in the Python spirit - as general as can be. Any object may be iterated over, provided it's class complies with the "iterable" protocol, which consists of a single capability: the magic method iter. The built-in function iter over object calls the object's magic method iter (if it has one), which returns an iterator. The loop then proceeds by calls to the iterator's next. The loop is terminated when the iterator throws a special exception.
The magic method iter may be either a generator yielding the next entry or a normal function that returns an iterator. Precisely, the built-in iter always returns an iterator. When presented with a generator, it silently creates a default iterator over it.
An interesting feature, demonstrating the generality of the Python solution, is that both the object that produces the iterator and the iterator itself are iterable. So, when the built-in iter is presented with an object that is already an iterator, it returns it silently. This is an elegant solution, compared with other languages that will only let you iterate over a container or file, etc. (but not over just anything that happens to be iterable).
https://gist.github.com/ba95666dce5fe3433c85b64cc8a39c7d
Notes:

First, an iterator is obtained from the object of iteration (using the built-in iter, which uses the producer's magic iter method). This gives a (default) iterator that "points" before the first message. (I.e. no message has yet been produced).
In each iteration, the built-in next attempts to obtain the next message from the producer (using the iterator's magic method next, which unleashes the producer's magic method iter until the next yield statement, where it pauses, returning the next message.
When the producer decides to stop producing messages (i.e. its magic method iter returns "normally" - as distinct from yield), the iterrator throws StopIteration, which terminates the consumption loop.

While the typical usage of Python iterators is implicit within for loops (as demonstrated above), there are occasions where we had rather take matters into hand and advance the iterator explicitly. Take for example the following function that gets the first entry (or none) from an arbitrary iterable.
https://gist.github.com/6d15e22d207cde66c17db8cee3a0b543
Notes:

The function obtains an iterator from its argument, whatever it may be, and returns the next item in it (actually resulting in the first item, because Python iterators are born uninitialized)
In case the iterator is empty (next fails in the first iteration), the None object is returned.

Output:
'a'
None
4. Single Consumer adapted to pull from multiple Producers

The design decision to couple the Consumer to the Producer by interface opens the design, offering some interesting extended (and useful) use-cases. But first, a demonstration that the Consumer is indeed shielded from the true identity of the Producer:
https://gist.github.com/8a7d0d3526d437ec264cc209341a8624
Notes:

This consumer is initialized with a string (that iterates on character).

Output:
a
b
c
d
e
f
g
[End of input]
A more useful extension is to implement the case of single consumer / multiple producers. And this is done without any major change to the current implementation. Since the Consumer does not know what it realy iterates on, we are going to provide it with an adapter that, in each iteration-request from the Consumer, iterates on the next message from an array of producers that it hides inside!
Incidentally, this Multi Producer Adapter gives the occasion to demonstrate "low-level" usage of iteration, as discussed above.
https://gist.github.com/5190c1c7c8aa0b9d2f2be754a657e4da
Notes:

The producers are provided with a unique prefix, to distinguish their output. Otherwise, there is no change to producer logic.
Producer output is prefixed.
The Multi Producer Adapter receives a list of producers and prepares to iterate on each of them.
The Multi Producer Adapter iterates over each producer, one (next) element at a time.
The delay is now under responsibility of the Multi Producer Adapter. (We do not want it to depend on the - arbitrary - number of producers).
Here, the first producer to end terminates the entire iteration. (An alternative solution is to remove the expired producer from the list and defer the termination of the production loop to when the list becomes empty).
The delay is no longer under responsibility of the consumer. Otherwise, there is no change to Consumer logic.
The main program creates three producers, prefixed "a" to "c".
This time the Consumer is given an adapter over three prefixed Producers.
There are now multiple Producers to stop.

Output:
Round a1
Round b1
Round c1
Round a2
Round b2
Round c2
Round a3
Round b3
Round c3
Round a4
Round b4
Round c4
Round a5
Round b5
Round c5
[End of input]
5. Push Iterator

And now, to the opposite use case, where the Producer feeds the Consumer coroutine with messages, one at a time. As in the previous solution (Consumer iterates on the Producer), control is passed between them in the obvious (prodedural) way, needing no synchronization facilities. Interestingly, Python allows us to use an iterator (over generator function) here as well, but this particular iterator pushes to the data target (instead of pulling from the data source), using dedicated syntax.
https://gist.github.com/023cc7bce5e4e51cbb5b818fbb187ec8
Notes:

The producer primes the iterator (i.e. calls next). Since a Python iterator is born uninitialized, the first next (priming) is required to position it to the start. Otherwise the following send will result in error!
The Producer sends the message to the consumption loop, blocking.
Since send, unlike yield is not terminated by the (end of) loop on the other side, the output stream must be closed explicitly.
The consumption loop blocks, waiting for someone (the Producer, in our case) to send the next message (implemented by yield giving message). When message finally received, the Producer (that sent it) remains blocked until the next yield or consume exit. Enclosing the yield between brackets is not essential, but is common Python practice.
The delay is managed here by the Consumer.
The decision of the producer to stop sending (which, here, is not implicit) raises the Generator Exit exception. The Consumer interprets this "exception" as the legitimate end of loop.
The main program calls consume, giving a Python coroutine wrapper over consume and hands it over to the Producer. The rest of the main program is not affected by the change of protocol.

6. Coroutine essentials

The sequence diagram demonstrates the synchronization by data coupling. In each iteration in the consumption loop, "to receive next message in the Consumer" (yield) is design-wise coupled by "message" (dotted arc) with (and programmatically blocked by) "to push message to the Consumer" (send) in the consumption side, which is looping in parallel.

Note the difference!

The pull-iterator's yield produces the argument, blocks the production loop and releases the consumption loop.
The push-iterator's yield blocks the production loop, releases the consumption loop and consumes the argument,

"Calling" a dependent coroutine (like our consumer) does not yet execute the function. Instead, it returns an iterator, ready to run, somewhat resembling the thread "constructor" (for example using initialization parameters, where needed). When the coroutine is entered for the first time (when requested to push the next - that is first - output), it executes up to the first yield and pauses, a procedure known as priming. Then, each item sent from the other side releases the coroutine to "push" the pending yield value.
Of course, this mechanism will work as long as the push-iterator coroutine is blocked at a yield point, which is guranteed during the output loop, but not at the first time, when the coroutine is not yet positioned anywhere usefull. This is why priming is needed. Admittedly, the functionality and motivation of generator priming is a somewhat obscure. In order to fully understand it, one must delve into the history of the language. The push iterator (generator coroutine) is a late addition to Python, and was implemented over the existing (pull) iterator. To demonstrate this, it is possible to write such code as:
inp = yield outp
Although the utility of such code is dubious, and is strongly adviced against by all textbooks, it will work. (The yield will happen first and the assignment - driven by the send - last). It all begins with the Python iteration paradigm, which does not distinguish between pointing at the current element and returning (yielding) it, as does, for example, the C++ STL. (Python is not unique in this. So does C#, and others, and it makes better sense, design-wise). A Python iterator is born "uninitialized" and will only "point" to the first element (and return it) when required to get next. The first next request will fetch the first element (if any). Incidentally, it will also run through the iteration initialization procedure, if present. This works well for the original pull iterator, because it controls the iteration (the other side is blocked untill it yields). However, the push iterator cannot be born "uninitialized", because it is controlled from outside! It is blocked, untill someone does it a favor and sends into it. When that happens, it must be positioned at some yield point, waiting for the send to come. To make that happen for the first time, we must use it in its (unwanted) capacity as pull iterator and instruct it to retrieve the next element, i.e. get to the first yield point (where it retrieves nothing, because it is given nothing), and stand ready to receive. This is an admittedly lame excuse, but that is the way it works...
At the other end of the coroutine lifespan, while it is terminated by returning "normally" (as distinct from yielding), it is not a good idea for it to do so by its own accord. Typically, it is the other side that should decide when to stop pushing and terminate the loop. (In contrast with the pull-iterator that keeps sending back the goods, and whose natural duty is to stop the loop, by raising StopIteration). When the producer is through sending messages, it signals the consumer to stop receiving, by raising GeneratorExit. This gives the consumer the opportunity to do cleanup, prior to exiting.
7. Automatic Priming

The burden of priming the generator may be generalized and delegated to a decorator, as in the example below. (Oddly enough, this is not part of the built-in Python library).
https://gist.github.com/9222efd848666653568cbdcc85c6a394
Notes:

The coroutine function takes a(nother) function.
Inside the coroutine decorator hides another function called start, that is meant to wrap the received function. Since we do not know in advance what parameters the wrapped function is going to take, start settles for the widest case possible: so many unnamed parameters and so many named parameters.
Start creates a coroutine (by calling the wrapped function). It does not need the wrapped function as formal parameter, because it is already in its closure.
The Consumer is decorated as coroutine. The function name consume is now pointing at the function start of coroutine. (Consume is not lost. It is in the closure of start.) Calls to consume will call start (of the coroutine function) instead (as result of the decorator above), which will happen to call consume, prime it and return the (primed) resulting coroutine object.
The main program calls what it holds to be consume (actually, start of coroutine) which calls the real consume, giving a Python coroutine wrapper over consume, primes it and returns it to the main program, which hands it over to the Producer. The rest of the main program is not affected by the change of protocol.

8. Single Producer, adapted to push to multiple Consumers (Multi-cast)

This version "multi-casts" the messages of the producer to multiple consumers. A multicast coroutine is inserted between the producer and its consumers. The delay is moved to the multi-caster
https://gist.github.com/fa0b415e71687efb9ed6c4a42f3ae0ae
Notes:

The different consumers are identified by prefix.
The consumer's message is prefixed accordingly.
The multi-cast adapter takes a list of generators.
The message is dispatched to each consumer, in turn.
The delay is delegated to the multi-caster. (we would not like it to depend on the number of consumers which is arbitrary).
Finally, the muti-cast adapter closes each consumer, in turn.
The main program loads the muti-cast adapter with consumer generators, prefixed 'a' to 'c'.

Output:
a: Round 1
b: Round 1
c: Round 1
a: Round 2
b: Round 2
c: Round 2
a: Round 3
b: Round 3
c: Round 3
a: Round 4
b: Round 4
c: Round 4
a: Round 5
b: Round 5
c: Round 5
a: [End of input]
b: [End of input]
c: [End of input]
9. Propagating the Data-sink

This version generalizes the display of the message by introducing a substitutable sink medium - yet another coroutine in the chain - that is propagated down the hierarchy.
https://gist.github.com/93397474e3c903d595e19601e44acc97
Notes:

The producer takes two generators: the consumer and the output medium.
Output is fed to the consumer, using the output medium.
The consumer is fed by two arguments: the message and the target medium. (Note that, in this design, the target medium is sent again, in each message, stressing the independence of the consumer from the output medium. An alternative design would be to initialize the consumer using the output medium, assuming it does not change and does not depend upon the message).
The consumer forwards the display to the given output medium.
The output medium is implemented as coroutine to conform with the design. (It could as well be a function or a substitutable  object)
The main program initializes the producer with the output medium.

10. Comaprison of event-driven vs. coroutine based solutions

For the next discussion, we are going to switch to the use case of building a course structure, implemented by dictionary-of-lists, and loaded from XML. We shall use the procedure of loading the data from XML, in order to compare between a "conventional" state-machine solution and an alternative cororutine based solution, which may seem unusual at first, but has its charm and certainly demonstrates the paradigm.
This is the data structure instance of our example:
https://gist.github.com/adf27264a77dc4a50765c8825d060c23

XML input:
https://gist.github.com/5d3b5cf0c3ce2ad28861260aead360cf
A "conventional" version: to populate course according to tag, observing loading state
https://gist.github.com/22f4e5a029cd1b13cc822e6480e868e5
Notes:

The "Course" tag resets the parsing.
The "Paradigm" tag opens a new paradigm in the course, with (for now) empty course list and sets both as current.
The "language" tag adds a language to the current paradigm.
The end element is registered but not implemented. (Reserveed for future use).

Output:
Procedural
    C
    COBOL
OO
    Python
    C++
    C#
    Java
Functional
    Clojure
A coroutine-based solution: to populate course, iterating in data hierarchy
And here is a coroutine-based XML builder, using the same parser. The builder registers begin-tag and end-tag handlers at the parser which feed the couroutine. The coroutine is built as loop within loop within loop, true to the one-to-many-to-many course data structure
https://gist.github.com/405f8407cd92760dd7121a147d0cc411
Notes:

Course loop (normally executed once).
The begin-course tag clears the course and resets the current paradigm.
Paradigm loop.
The begin-paradigm tag opens a new paradigm as current and clears its language list.
Language loop
the language tag adds a language to the current paradigm.
The end-language tag (introduced by the parser) is ignored.
Next language.
Next paradigm
Next course (expected end-tag).

This little gem is fascinating methocidally (although its actual utility is admittedly questionable). What we have here is what appears to the eye as traditional procedural code, but is emulated by functional programming! It is event-driven, but under the surface. The corouine usage effectively hides the state machine from the programmer. As in the common procedural program, the control structure follows true to the data structure. Of course, the fidelity to the data structure accounts for the strong points of this design pattern, as well as its weak points. On the one hand, the code is crystal clear. Any beginner Python programmer can understand what this code does, even if not familiar with exactly what yield does. (in contrast with the event-driven solution that is as detached from the data structure as can be, and is hardly readable). On the other hand, it is not open-closed/extensible. Any change to the data structure requires to physically change the code (while the event-driven solution can do with just registering extra handlers). Still, I would recommend considering this design pattern, due to its superior readability, where the data structure is solid, or extending the code when it does change is not an issue.
11. Exercise: Cooperative tail server

Refactor your solution of the multi-threaded tail server exercise presented in lesson 4, to use coroutines.
Here is a "schoolbook solution":
https://gist.github.com/cb17e4d0f4db080d13f5ed760b5e23ee
Output:
File 1 to watch: file1.txt
File 2 to watch: file2.txt
File 3 to watch: file3.txt
File 4 to watch: 
[Tailing "file1.txt"]
[Tailing "file2.txt"]
[Tailing "file3.txt"]
file3.txt: 1. Mon Aug 23 23:02:22 2021
file1.txt: 1. Mon Aug 23 23:02:22 2021
file2.txt: 1. Mon Aug 23 23:02:22 2021
file3.txt: 2. Mon Aug 23 23:02:27 2021
file1.txt: 2. Mon Aug 23 23:02:27 2021
file2.txt: 2. Mon Aug 23 23:02:27 2021
file3.txt: 3. Mon Aug 23 23:02:32 2021
file1.txt: 3. Mon Aug 23 23:02:32 2021
file2.txt: 3. Mon Aug 23 23:02:32 2021
file3.txt: 4. Mon Aug 23 23:02:37 2021
file1.txt: 4. Mon Aug 23 23:02:37 2021
file2.txt: 4. Mon Aug 23 23:02:37 2021
file1.txt: 5. Mon Aug 23 23:02:42 2021
file3.txt: 5. Mon Aug 23 23:02:42 2021
file2.txt: 5. Mon Aug 23 23:02:42 2021
[stopped tailing "file1.txt"]
[stopped tailing "file2.txt"]
[stopped tailing "file3.txt"]
In this solution, each Tail Watcher, as well as the very Tail Server, lives on a thread of its own. Refactor the solution to a Tail Server that does run on a thread of its own, but iterates on its Tail Watchers, syncronously, using the pull iterator pattern. all in all, There are going to be two threads in the solution: the Thread Server and the file-touch function.
What next?

One more lesson to go... In the next lesson, we are going to consider the dispatch-based cooperative processing alternative, called async execution.
Contents:

Introduction
The Thread
Synchronization Primitives (Multi-threading)
Synchronization Primitives (Multi-processing)
Cooperative Processing - synchronous - (you are here!)
Cooperative Processing - asynchronous