John Belmonte, 2022-Sep
I've started writing a toy structured concurrency implementation for the Lua programming language. Some motivations:
- use it as a simple introduction to structured concurrency from the perspective of Lua (this article)
- learn the fundamental properties of structured concurrency and how to implement them
- share code that could become the starting point for a real Lua library and framework
So what is structured concurrency? For now, I'll just say that it's a programming paradigm that makes managing concurrency (arguably the hardest problem of computer science) an order of magnitude easier in many contexts. It achieves this in ways that seem subtle to us—clearly so, since its utility didn't reach critical mass until around 20181 (just as control structures like functions, if
, and while
weren't introduced to languages until long after the first computer program). Rather than start with a long explanation, however, I'll jump into a very simple example of structured concurrency in Lua. From there, we'll build up the rationale, examples, and implementation in stages—hopefully as a series of installments.
trio = require('trio')
function await_main()
-- two parallel tasks that sleep for 2 and 3 seconds, respectively
do
local nursery <close> = trio.open_nursery()
nursery.start_soon(function()
print('child 1 start')
trio.await_sleep(2)
print('child 1 end')
end)
nursery.start_soon(function()
print('child 2 start')
trio.await_sleep(3)
print('child 2 end')
end)
print('waiting for child tasks')
end
print('done')
end
trio.run(await_main) -- launch the async event loop
A "task" is an independent thread of execution, and this example runs three of them concurrently: one main and two spawned tasks. For Lua, tasks are implemented with cooperative multitasking via coroutines. However, rather than use the coroutine API directly, we're using a higher level API that applies a hierarchy to the set of tasks. The lifetime of any spawned task is constrained by the containing code block, called a nursery. The concluding "done" won't be printed until the nursery block, and hence both of its child tasks, have completed successfully. Here is the program output:
$ time lua example_1.lua
waiting for child tasks
child 2 start
child 1 start
child 1 end
child 2 end
done
real 0m3.126s
user 0m0.043s
sys 0m0.079s
Since the child tasks run in parallel, and we're waiting for the slowest one to complete, the expected wall time is about 3 seconds, as confirmed here. "User" time is nearly zero, since the CPU doesn't do any actual work (sleep is implemented by the appropriate OS call rather than a busy-wait). Also interesting is that the "waiting for child tasks" message is printed before the children begin. This reflects the semantics of the nursery's start_soon()
method which, as its name suggests, does not immediately execute the given function. Rather, the task is queued—to be picked up once the parent task yields execution.
It may appear, at first, that structured concurrency is merely fork
/join
, which has existed for a long time. However, structured concurrency is ensuring that 1) all tasks have well-defined parents, siblings, and children, and 2) the task hierarchy mirrors the structure of the code. This has profound implications for managing concurrency. For example, when a task is cancelled or has an error, the hierarchy suggests a natural way to clean up (cancel all descendants). Moreover, since the task and code structure mirror each other, the code that spawned a task will never be out of scope when the child exits. Exceptions and return values can thus be propagated to the caller, lending to natural use of the host language's built-in control structures (especially exception handling). In the future, we'll delve into these and more aspects of structured concurrency.
At the end of the example code is an administrative detail: the run()
function, which is handed another function to execute as the root task. This is the bridge into the structured concurrency world, launching an event loop.
A last thing to mention for this example is the use of Lua's relatively new feature of "to-be-closed" variables. They allow the scope of each nursery to be carefully delineated—important since that, in turn, defines our task hierarchy. Additionally, the structured concurrency implementation will rely on these timely finalizers to achieve expected behavior regarding exceptions, cancellations, etc. (Naming convention: any function prefixed by open_
expects its return value to be held in a <close>
variable.)
In a real program, we wouldn't write so much boilerplate to create some background tasks. There might be a utility like await_all()
, which is useful when you have a heterogeneous set of tasks to run, just for their side effects. The implementation:
function await_all(...)
local nursery <close> = trio.open_nursery()
for _, await_f in ipairs({...}) do
nursery.start_soon(await_f)
end
end
Then await_main()
of the original example reduces to:
function await_main()
await_all(
function() trio.await_sleep(2) end,
function() trio.await_sleep(3) end))
print('done')
end
Note that the caller of await_all()
does not need to deal with nurseries or to-be-closed variables. Moreover, with structured concurrency, he can be confident that tasks (again, Lua coroutines) will never leak out of such a call. The concurrency details are completely encapsulated by the function.
In the future, we'll cover how timeouts and cancellation can easily be applied to such calls (or any block of async code), despite the encapsulation, and without ad-hoc function parameters or forethought from library authors.
Though it's not a strict requirement for structured concurrency, we'd like the points of context switching to always be explicit and readily visible in our programs. This keeps local code easy to reason about, and can reduce the amount of explicit locking needed.
While some languages have function coloring, such as async
/await
qualifiers, which identifies declarations and calls that can yield execution, Lua does not. Rather, we rely on naming convention: any function prefixed by await_
is awaitable, or expected to yield execution to other tasks. (These are also referred to as asynchronous, or async, functions.)
Putting this together: code between await_
calls can be treated as atomic, relative to other running tasks. For example, let's say there is websocket handler that deals with a discrete set of messages, and we'd like to maintain a count of how many times each message is seen. The information is global state, shared by an arbitrary number of concurrent connections running the same message loop. Yet, by reading the code, it's clear that the bookkeeping is safely atomic:
count_by_message = {}
function await_websocket_echo_handler(websocket)
while true do
local msg = websocket.await_receive_message()
if count_by_message[msg] == nil then
count_by_message[msg] = 0
end
count_by_message[msg] = count_by_message[msg] + 1
websocket.await_send_message(msg)
end
end
A corollary: the fewer the number awaitable calls, the easier a program is to reason about. As a consequence, API's try to minimize the number of awaitable functions (relative to synchronous functions), as it helps reduce the cognitive load of using the API. An example is the nursery API itself. Since start_soon()
is synchronous, it's possible to orchestrate the launch of a set of interrelated tasks, atomically. Otherwise, there would be a context switch each time a task was queued, and it would be necessary to consider what program state might change before queueing the next task.
When writing your own functions, there are two important rules to honor:
- any function that calls an
await_
function is itself awaitable, and should be named appropriately - if a function is named as awaitable, it should be yielding execution regardless of the code path taken. For example, if there is a short-circuit case at the start of the function, at least call
trio.await_sleep(0)
before returning. Otherwise, it's easy for users of the function to mistakenly starve the scheduler with a busy loop.
Besides awaitable calls, remember that having a nursery go out of scope will also yield execution, assuming there are pending child tasks. While this is a more subtle context switch, it's still evident from the code's structure.
So far we've seen one "blocking" API call, or explicit way for a task to yield execution: await_sleep()
. While programs with concurrency often sleep, they also need to do other things asynchronously, such as file or network I/O, as well as calling into any API itself using file or network I/O.
Typically, a structured concurrency library (or any library implementing an asynchronous event loop) will offer basic async file and network I/O that is compatible with the framework. That is the easy part. The daunting challenge becomes (re)implementing popular prototcols and client/server libraries on top of that API.
While async I/O isn't implemented yet, another common awaitable is: events. These are single-use signals with a synchronous "set" method. It's the primary means of communicating between tasks, and a building bock for higher-level synchronization primitives. For (a very contrived) example:
function await_event_example()
do
-- two child tasks wait for an event
local nursery <close> = trio.open_nursery()
local event = trio.Event()
nursery.start_soon(function()
print('child 1 waiting')
event.await()
print('child 1 got event')
end)
nursery.start_soon(function()
print('child 2 waiting')
event.await()
print('child 2 got event')
end)
trio.await_sleep(1)
event.set()
end
print('done')
end
$ time lua example_2.lua
child 2 waiting
child 1 waiting
child 1 got event
child 2 got event
done
real 0m1.118s
user 0m0.042s
sys 0m0.066s
Along with this article, I'm sharing a toy implementation of structured concurrency on Lua that just barely implements the examples here. Refer to the source code for the significant caveats (in the form of TODO's).
The module name, API, and implementation are inspired by Python Trio, one of the first structured concurrency libraries.
The module requires Lua 5.4.3 or newer, and fractional time()
and thread sleep()
functions which are not in the standard Lua build. See the source for placeholder Unix implementations using popen()
, which may need to be revised for your OS.
As already mentioned, tasks are implemented with Lua coroutines. The event loop uses the return value of coroutine.resume()
to control task scheduling. For example, to implement await_sleep()
, the task yields the desired duration. To pause until manually rescheduled later (for example, within the implementation of Event
), the task yields a sentinel value. As the library is built up, this coroutine protocol will be expanded to support propagating exceptions, cancellation, blocking on I/O events, etc.
To demonstrate a limitation of the current implementation, let's try raising an exception from a child task:
trio = require('trio')
function await_error_example()
-- child task raises an error
do
local nursery <close> = trio.open_nursery()
nursery.start_soon(function()
trio.await_sleep(1)
error('oops')
end)
end
print('done')
end
trio.run(await_error_example)
$ lua example_3.lua
lua: ./trio.lua:140: example_3.lua:9: oops
stack traceback:
[C]: in function 'assert'
./trio.lua:140: in function 'trio.run'
example_3.lua:15: in main chunk
[C]: in ?
This is not yet meeting the potential of structured concurrency, as the exception does not propagate into await_error_example()
(which spawned the error-raising task), and rather originates from the implementation's event loop. We'll tackle this in the next installment!
continue reading: Structured concurrency and Lua (part 2)
article © 2022 John Belmonte, all rights reserved
Footnotes
-
While it's argued that the core concept of structured concurrency has existed in some libraries and languages prior to the recent boom, these lacked significant parts of the story for making concurrency manageable. One is powerful, ergonomic control of timeouts and cancellation—itself dependent on structured concurrency. Another is striving for a program that follows structured concurrency as a whole, including direct and transient dependencies. Nathaniel J. Smith was first to piece this all together, with a set of articles, and corresponding Python Trio concurrency library. ↩
About
pcall()
: it's unfortunate to use wrapped functions, which are inferior to the convenience and readability of code blocks. Code blocks honorreturn
,goto
, etc. in the context of the larger function, so the imperative style is preserved. The stack traces will also be more manageable.From what I understand, you were able to implement
move_on_after()
, etc. by requiring a nursery for these constructs. Nurseries are relatively heavy-weight and imply context switching. A strong point of Trio is how light cancel scopes are, which is important because real programs will use them heavily. Inlusc
, since you're usingpcall()
anyway, I think it's possible to implement first-class cancel scopes. (As I mentioned, I couldn't with to-be-closed.) @svermeulen