Parallel Programming in Python
Lesson 1. Introduction
by Avner Ben, July 2021
This is the first in a series of lessons, covering the various facilities that the Python programming language offers for parallel programming and the motivation for using each of them. The series concentrates on the use of parallelism as a software architecture idiom, (as in real-time and system applications), intentionally avoiding technical usage of parallelism, such as performance optimization (which is widely covered elsewhere).
- Motivation for Event-driven, Parallel and concurrent Design
- The "Terminator UI" example
- Timer Thread
1. The Motivation for Event-driven, Parallel and concurrent Design
Software spends most of its time consuming input in order to produce output. Algorithms live somewhere in the middle (and give insight on the transformation. We have little use for algorithms that do nothing, leaving everything behind as they found it).
Not surprisingly, the prevailing programmatic model is of a processor that admits a sequence of discrete inputs, one at a time, leaving behind a trail of outputs. Each time the processor consumes input, it consults its current state what to do with it, possibly resulting in output, and possibly switching to a different internal state (containing the wisdom what to do next, with this and other input). This programmatic model is rigorous, simple, easy to implement and reliable: it will always work exactly as expected. But it has one problem: it is based upon the assumption that input is blocking. (Precisely: processing it blocks anything else). It is synchronous. Pending input is suspended until the processor is through with the current input. The processor does not handle two inputs at the same take (among other things, because it is governed by one and only ones internal state).
Oops! This bears little resemblance to the real, chaotic world in which we happen to live, where some inputs, at least, may coexist and do not necessarily block! Designing programmatic solutions to real world problems often involves an additional challenge (to just understanding the problem): finding smart ways to come to terms with our legacy synchronous infrastructure. Precisely: How to implement the processing of non-blocking input, given a synchronous processing infrastructure.
The obvious solution, reached independently by many programmers and wearing various specific forms, is the discontinued process: to somehow divide the processing among so many synchronous partial-processes. (And how to control each synchronous partial process, in turn, is something that we already know). Identifying the process parts is often surprisingly easy. The genuine problem is how to coordinate them. Or precisely, how to synchronize them, since this is all about timing. It is important to note that, contrary with popular notion, synchronization is rarely about partial processes unfolding at the same time. It is mainly about controlling each partial process to proceed (and suspend) at exactly the right timing (in respect to the other parts of the big process we have in mind).
Once the relationship between inputs and outputs and the transformations involving them have been analyzed, design may be reduced to the challenge of orchestrating so many partial processes (typically, programmatic functions), each responsible for producing some output and/or consuming some input - with parallelism being the exception, rather than the rule! (Because it introduces complexity, and that is to be avoided unless strictly necessary!) In the trivial case, the design consists of so many challenges of the form: how to couple the input of function B with the output of function A? With luck, this challenge is solved by the introduction of function C that invokes A and then B, with the I/O somehow channeled between them. Where this straightforward, synchronous solution works, then by all means take it, no parallelism needed! parallel programming devices are required when, for various reasons, this will not do. To find that out, we must analyze the complexity of the relationship between producers (of output) and consumers (of input). But there is no need to rush. Often, what may appear to the naked eye to invite parallelism, may turn out to be solved trivially in a synchronous process. Consider for example the "game loop" challenge.
A "first person shooter" video game presents a three-dimensional maze, in which so many monsters are progressing, each in its own pace, and they are shot at by the user (whose back we see), also progressing in own pace, with projectiles going all over (also in their own pace). In addition, these moving objects can collide, with much ceremony. One would expect the design to consist of dozens of threads (one for each monster, projectile and the user), but this is rarely necessary. It would be, had the main design challenge been how to coordinate the movement of so many moving objects. But this is a relatively technical issue, involving conventional geometry and physics, that we know how to solve procedurally. This would, still, be the case, had the problem domain been linear, but it is not! With the current computer video technology, this problem domain is digitized in quanta of frames (say, 20 per second or more). In this problem domain, the moving elements are not required to output motion, but rather the exact opposite: still frames (precisely, their part of the general frame, and in the correct covering order). So, the problem is reduced to calculating the position of each moving element by the time of the next frame ("game tick"), detecting collisions, solving them and accumulating the next frame in the correct covering order (and on time). (In a more sophisticated solution, the moving objects accumulate their data into a three-dimensional database, e.g. the "opengl" library, and the frame is rendered from there). So the design challenge is reduced to so many producers (emanating partial frames) that are blocked by a single consumer (accumulating the next frame). And this bottle-neck topology dictates the architecture. Luckily, in this problem domain, the inputs are flexible (their preparation is completely determined by the game tick) so - in the absence of good reason otherwise - parallelism is not needed. All frame-contributing functions may be invoked sequentially from a game-loop function, of course, accounting for the added complexity of collisions and their consequence.
Many discontinued processes do require an event-driven design, but not parallelism. The archetypal example is GUI (Graphic User Interface - i.e. windowing) systems, where the user is allowed to manipulate the screen in any order, but the software that responds to these events (obviously, consisting of so many independent event handler functions) lives in a single thread. In this problem domain, the software is not required to respond to any two events at once! (While such an alternative architecture could - arguably - improve user experience, it is not worth the expense! Window system users are used to have some of their output ignored. They will try again...)
Event-driven design is prerequisite for parallel design, for at least the following reasons:
- It shares a design pattern with parallelism: the callback. It involves the paradigm shift of treating a "function" (sequence of capabilities to be fulfilled) as an object, that may be referenced by a variable, a parameter (i.e. sent to another function for deferred invocation), or a function return (the result of another function invocation). One registers Event-handler functions at a central dispatcher, which runs the event loop where, in each cycle, the next event is discovered (if any - e.g. mouse press in the GUI loop, game tick in the game loop) and the appropriate handler function for the event (if indeed registered) is invoked, usually blocking.
- Often, careful analysis of a problem that seems to suggest parallelism turns out to do with event-driven design (in a single thread). The obvious examples are the GUI message pump and and "game loop" discussed above. (In the latter, the game loop may be implemented by registration of frame contributor functions. This "open" design allows to add and remove moving object types in each release, as well as to add and remove moving objects during run time, without having to rewrite switch/case constructs or maintain the burden of complex logic in the wrong place!
- Dispatch mechanisms may be used to implement a "soft" version of (as if) parallelism, where the latter is not really needed but its interface is. See the Python asynch facility, to be discussed in another lesson.
First, a short reminder of event-driven design, using registration of callbacks.
Here is a simple "scrolling menu" example (in Python 3.9, comments below): https://gist.github.com/35fd03b026e8006515874fa171a80919
- The doMenu function takes a list of functions ("callable" objects), each taking no arguments (indicated by the empty square brackets) and returning nothing.
- An event loop repeats virtually forever (unless stopped from inside, by the appropriate closing event or an exception).
- The menu takes the option titles from the provided functions - their doc strings. A function, like any object, features member functions and attributes. The doc string (triple-quoted string at top of function body) is kept in the "member data" of the function object.
- The exit option gets procedural treatment. It is not implemented by a callback (though such alternative design is possible and legitimate), because its number (99) is intentionally out of bounds. (So, extra logic is required anyway).
- The dispatcher attempts to detect (and, in this case, actively obtain) the next event: precisely: a number, typed at the keyboard.
- Invalid input (non-numeric) is identified and safely gets rid off.
- The exit event (99) gets procedural treatment. The loop terminates.
- More invalid input (numeric, but out of range) is identified and safely gets rid off.
- The callback indexed by the numeric selection is retrieved to the variable "callback".
- The callback is called. (We do not have to know that the "called" object is actually a function. Any object that responds to the function call (round brackets) operator will do!
- The variable is just for convenience. The same code may be expressed straightforward (giving some headache to the uninitiated):
options[selection - 1]()
- A menu function. It is a normal Python function. Nothing indicates that it is going to be used as someone's callback.
- Functions passed as arguments (collected in a list) to the menu function. In Python there is no need to take the address of the function or "wrap" it in any way. The function, which was declared in module scope, is registered in the module's dictionary ("globals") as an object by that name. (Only the name, no argument types or immutability. Python does not support function overloading).
1. Print 2. Edit 3. Erase 99. exit select: 1 Print logic to come here...
- exit select: 99
2. The "Terminator UI" example
And here is a problem that genuinely requires parallel design: the simple Stop-compile UI. In this case, interrupting another process (which may be doing something at the same time) is the name of the game!
The problem domain: The user shall be capable to stop the compile process, within acceptable delay, via UI. The UI (for Human Interface) is displayed by the compiler, allowing the user to abort the compilation. Note that the compiler's logic is given. Apparently, we must make some adjustment to compiler code, but we would not like to rewrite the compiler to suit the UI! The two functional requirements (compilation and key-press leading to abort) seem pretty concurrent. Are they indeed?
A Solution domain: The user is posed with the option to "press any key to stop...", following this prompt. The compiler is provided with an "abort" flag. Pressing any key signals (turns on) the abort flag. (While the proper way to do this is through GUI, displaying a button for the user to push, the present solution is purposely downgraded, to keep the following code example simple).
Obviously, the "stop compile UI" problem calls for a real parallel solution, because it involves a producer (terminator) and consumer (compiler) progressing in different resolutions! The terminator responds to its input (key press) immediately. The compiler responds to its relevant input (abort signaled) at a very course resolution that is completely out of the control of the terminator. Obviously, a procedural solution will not do: the terminator cannot control the compiler, the compiler cannot control the terminator and both cannot be controlled from above! In other words, this problem involves input that must not block. It is asynchronous.
Here is the big process we have in mind:
- The compiler starts
- The compiler initializes
- The terminator initializes1
- The Terminator listens to the keyboard (TBD)2
- Assume a key has (at some arbitrary point in time) been pressed:
- The terminator signals the "abort" flag in the compiler (if still relevant)3
- The compiler executes its functions where in each cycle:
- The compiler executes the next compilation function
- The compiler polls the "abort" flag
- Assume the "abort" flag is signaled:4
- Execution loop stops
- The compiler finalizes (either normally or aborted by user):
- The compiler finalizes the terminator
- The terminator is not part of the compiler and is not initialized from it.
- Exactly how to do that is to be defined. (And what is the problem?)
- Non-blocking output
- Non-blocking input
Of course, we cannot implement this design literally using procedural code! The first step (compiler initialization) always occurs at the start of the process. The fourth step (compiler finalization) always comes in the end and the third step (compiler functions execution loop) In the middle. The second step (terminator initialization) may come either before or after compiler initialization (but before the compiler function loop). But we have a problem positioning the main function of the second part: signaling abort may occur anytime within the third step. (Actually, it may also occur before the end of compiler initialization and during finalization. Doing that would be senseless of the user, but the program must be ready to cope with it). This analysis leaves no other choice, but to deploy both functions to separate threads of control!
Here is a parallel design. It features the same required functions, but injects synchronization. Note that it makes the somewhat counter-intuitive decision to invoke the terminator in the main thread and launch the compiler in a side thread. This design decision acknowledges a limitation imposed by Python's implementation of standard I/O that favors the main thread.
- In the compiler thread of control:
- Compiler starts:
- Compiler initializes
- The compiler executes its functions where in each cycle...
- Compiler executes the next compilation function
- Compiler polls the "abort" flag
- [when "abort" signaled]: Escape compilation loop
- Compiler finalizes:
- Compiler signals the abort flag (to allow Terminator to exit properly)
- In the main thread of control:
- Compiler launched (in a separate thread of control)
- Terminator polls the keyboard periodically, where in each cycle...1
- Terminator attempts to detect key press (non-blocking)2
- [opt key pressed]: Terminator signals Compiler to "abort"
- [when abort signaled (by Terminator or Compiler itself)]: Terminator exits
- Program ends
- The terminator blocks the main thread (from ending prematurely).
- The replaces the naive "obvious" solution: to simply sit down and wait for the key to be pressed, signal abort and go home. But this would block the terminator, freezing the program, when the compiler ends normally (when any key is not pressed).
Here is a Python implementation (discussion follows):
- IDE's (such as IDLE, pyCharm, Wing etc.) replace the standard input and output with their own console, which is unlikely to behave as expected here. Even if you use an IDE to save this code, you must run it from the operating system console (or double click on its entry in the file explorer)!
- The program imports the threading library
- Python does not support non-blocking keyboard input, so we resort to
a third-part library msvcrt (Microsoft C runtime), available in
the python MS-Windows implementation.
- In Unix systems the implementation is somewhat more complicated (involving select), so we will skip it here for convenience.
- The compiler abort flag is implemented, in this simple example, as a global variable. (In a more robust design, one would expect it to be encapsulated in the Compiler object and accessed through a getter).
- The terminator lives in the terminate function. It iterates on polling the keyboard for input, in 1 second resolution, querying kbhit (is the keyboard hit?).
- The terminator prepares for the possibility that the compiler is no longer active (having already signaled the abort flag on its own, before leaving).
- Its job done, the terminator signals abort and returns, releasing the main program to end.
- But before leaving, the terminator "eats" the character pressed by the user (to prevent it from being echoed unexpectedly when the user finally presses enter).
- The do-nothing represents the various compiler functions, for the sake of this simple example. It outputs progress indication for every second, five times in a row, unless it finds out that abort has been signaled, which makes it stop and return prematurely.
- The dot output is flushed to the screen. (Normally, stdout will wait for enter or some input to issue the entire dot sequence at once). This is why the standard output is used directly, rather than through the built-in print.
- Our dummy compiler goes through a sequence of six activities, fulfilled by the do-nothing function.
- The compiler iterates through its sequence of activities, guarded by the abort flag.
- Compiler functions done, and in the lack of user intervention, the compiler itself signals the abort flag, to release the terminator from blocking the program. (otherwise, the terminator will never return, "freezing" the program.
- Context-aware prompts are issued from the main program. The compiler knows about the abort flag, which is enough. What makes it tick (key press, in the present case) is beyond the compiler's scope (and susceptible to change!). It is in the scope of the main program, which ties all ends together.
- A Python thread takes a function and, optionally, arguments to pass to it. The thread is as yet inactive (the function is yet to be invoked). The thread function is invoked when the thread object is explicitly requested to start.
- Once the compiler is allowed to proceed on its own, the terminator is invoked, blocking the main thread. It will return when the compiler is done, either normally or by user abort (which does not concern the main program).
- Note that starting the compiler thread does not guarantee that the thread function has already been invoked when the terminator starts. But this does not matter (in this case). The user may press the keyboard even before the compiler exists. Still, the terminator will do its job (giving the compiler a short life indeed, when it finally starts).
- The terminator lives in the main thread. There is no need to launch the main thread. It is launched automatically and invokes the main program.
- Finally, proper keyboard input (enter-terminated) is requested, and for a number of reasons:
- to keep the terminal window from closing prematurely, in some runtime and debug configurations that involve this.
- to give the compiler thread time to exit gracefully.
Press any key to abort compilation! Dummy Compiler v1.0 initializing..... parsing..... building memory database..... checking dependencies..... generating code..... finalizing..... Compilation successful! Press [enter] to close
Press any key to abort compilation! Dummy Compiler v1.0 Initializing..... Parsing.... Aborted by user! Press [enter] to close
3. Timer Thread
Deploying the compiler to a minor thread and the terminator to the main thread has functional justification (the terminator buffers the application from the compiler, by blocking it until the latter terminates, one way or another). Still, it is rather counter-intuitive! One would expect the compiler, playing the main role, to occupy the main thread, and the terminator, playing a side and disposable role, to be exiled to a side thread. Alas, the implementation platform is standing in our way! But nothing stands in the way of the simple timer, which we are going use as a more intuitive minor thread example.
- The problem: A "Timer" implements the requirement to do something after a given time interval (or on a given absolute time), and regardless of what the requester is doing in the mean time.
- A solution: The obvious programmatic implementation is a thread taking a "callback" and interval. The thread's job is rather straightforward: it sleeps the given interval, invokes the callback and expires.
Here is a Python implementation (discussion follows):
Snoring... Snoring... Snoring... Snoring... Snoring... Snoring... Snoring... Snoring... Snoring... Snoring... Oops, woke up!
- In this simple example, the flag to signal is, again, global.
- The timer thread waits for the specified time and invokes the callback.
- Note that this optimistic design does not allow to cancel the timer prematurely. A more realistic implementation would iterate on sleeping for a quantum of time, polling the exit condition.
- All the callback does is to signal the exit condition.
- The Sleeper turns on the alarm clock before going to sleep (naturally).
- The Sleeper iterates on snoring until signaled to wake up.
Admittedly, from a programmatic point of view, this algorithm could as well be written without a timer, by simply iterating on snoring and computing how long is left. However, this makes less sense in the problem domain. You cannot require the Sleeper to consult his/her watch and make computations every second while sleeping. The impulse to wake up must come from the outside! (But it is correct to expect the Sleeper to snore while sleeping...). The current design is more realistic, and therefore, extensible, scalable, etc.
In the next lesson, we are going to study the Python thread in detail. Then, we meet the classical Producer/Consumer example and iterate on implementing it using various Python facilities, from "primitive" to advanced.
- Introduction (you are here!)
- The Thread
- Synchronization Primitives (Multi-threading)
- Synchronization Primitives (Multi-processing)
- Cooperative Processing - synchronous
- Cooperative Processing - asynchronous