This paper seeks to add support for lazy task creation and deferred execution to P0443, while also simplifying the fundamental concepts involved in asynchronous execution. It seeks to do this as a minimal set of diffs to P0443. It achieves this by replacing P0443's six Executor::*execute()
member functions with two lazy task constructors that each return a lazy Future-like type known as a "Sender". Work is actually enqueued by calling submit()
on a Sender.
P0443 presents a unified abstraction for agents of asynchronous execution. It is the result of a long collaboration between experts from many subdomains of concurrent and parallel execution, and achieved consensus within SG1 and, to some degree, LEWG. Although there were known gaps in the abstraction, there were papers in flight to address them, and for all intents and purposes P0443 seemed on-target for a TS or, possibly even C++20.
At the Spring 2018 meeting in Rapperswil, a significant design problem was identified: poor support for lazy execution, whereby executors participate in the efficient construction and optimal composition of tasks prior to those tasks being enqueued for execution. A solution was put forward by P1055: "senders" (called "deferred" in that paper) and "receivers". Senders and receivers could be freely composed into richer senders and receivers, and only when a receiver was "submitted" to a sender would that task be scheduled for execution.
P1055 represented a radical departure from -- and a conceptual simplification of -- the P0443 design. LEWG in particular found the promise of a simplified conceptualization in this space too good to pass up, and deferred the decision to merge P0443 until the alternative had been explored. There are, however, good reasons to be circumspect: a significant redesign now would almost certainly push Executors, and possibly the Networking TS which depends on it, out past C++20. Also, the authors of P1055 had not yet proved to the satisfaction of the experts in SG1 that senders and receivers could efficiently address all the use cases that shaped the design of P0443.
This paper seeks a middle way: a minimal set of changes to P0443 that add proper support for lazy execution and brings a reduction in overall complexity from the point of view of both implementors of executors and their generic consumers. The changes are limited to such a degree that it should be obvious from inspection that no functionality has been lost.
The root of the problem with P0443 and lazy asynchronous execution is that the Executor concepts' six execute member functions (oneway, twoway, and then_execute, and bulk variants) combined two things: task construction and work submission. Twoway and then_execute return futures, which are necessarily handles to already-running tasks.
- Replace the six Executor
execute
member functions with two (potentially lazy) task creation functions and onesubmit
function. - The task creation functions accept "senders" and callables, and return new senders.
- The
submit
function takes a receiver.
Additionally, define a new
Although senders and receivers seem like a new and unproven abstraction, they are really just a minor reformulation of concepts that already appear in P0443 and related papers.
Four of the the six execute
functions from P0443 return a type that satisfies the as-yet-unspecified Future
concept. A Future
, presumably, is a handle to an already-queued work item to which additional work can be chained by passing it back to an executor's (bulk_)?then_execue
function, along wih a continuation that will execute when the queued work completes.
A sender is a generalization of a future. It may be a handle to already queued work, or it may represent work that will be queued when a continuation has been attached, which is accomplished by passing a "continuation" to the sender's submit()
member function.
In P0443, a continuation was a simple callable, but that didn't give a convenient place for errors to go if the preceding computation failed. This shortcoming had already been recognized, and was the subject of P1054, which recommended promises -- which have both a value and an error channel -- as a useful abstraction for continuations. A receiver is little more than a promise, with the addition of a cancellation channel.
In short, if you squint at P0443 and P1054, the sender and receiver concepts are already there. They just weren't fully spelled out.
The real change in this proposal is to break the execute
functions up into two steps: task creation (s = ex.make_(bulk_)?value_task(...)
) and work submission (s.submit(...)
). It is not hard to see how the reformulation is almost completely isometric with the original:
auto fut2 = ex.then_execute(fut1, f);
maps cleanly to
auto sender2 = ex.make_value_task(sender1, f);
with the option to submit the work later with some continuation (possibly a promise p
):
sender2.submit(p);
There is one way in which the decomposition of execute
into make_value_task
/submit
does not quite reach parity. That way, and its mitigation in the suggested compromise design, is discussed below.
-
P0443
execute
functions return a future. The type of the future is under the executor's control. By splittingexecute
into lazy task costruction and a (void
-returning) work submission API, we take away from the executor a way to customize its preferred "eager" task handle, and thus an executor-specific way to block until the task completes.Note that P0443 didn't really have that either since the Future concept was not yet specified, but there were papers in flight to address that shortcoming. Presumably the Future concept had a way to block, or else the executor would be given a
wait
API that accepted a Future and blocked for comkpletion.Such an API could easily be added to the compromise proposal as well.
This document arose out of offline discussion largely between Lee, Bryce and David, as promised during the 2018-08-10 executors call.
The discussed compromise is that we should replace the enqueue functions (ex.*execute()
) with a limited set of lazy task factories. Any syntactic cost of providing a trivial output receiver to these operations can be hidden in wrapper functions. We do not expect a runtime cost assuming we also provide a trivial parameter-dropping receiver in the standard library against which the implementation can optimize.
Aside: The Story |
---|
The story here is that the execute functions from P0443 were conflating two things: task creation and work queuing. The conflation interfered with lazy task submission, which was the thrust of P1055. By teasing these two responsibilities apart and allowing an executor to customize them separately, we lose nothing and gain first-class support for lazy execution models. |
By passing a full set of parameters to task construction functions, any task we place in a task graph may be type erased with no loss of efficiency. There may still be some loss of efficiency if the executor is type erased before task construction because the compiler may no longer be able to see from the actual executor into the passed functions. The earlier we perform this operation, however, the more chance there is of making this work effectively.
An executor should either be a sender of a sub executor, or a one way fire and forget entity. These can be implemented in terms of each other, and so can be adapted if necessary, potentially with some loss of information.
We may require
or prefer
to switch between these.
A SenderExecutor is a Sender and meets the requirements of Sender. The interface of the passed receiver should be noexcept.
Function | Semantics |
---|---|
void submit(Receiver< SenderExecutor::SubExecutorType>) noexcept |
At some future point calls on_value(subExecutor) on the receiver on success, where subExecutor is *this or some subset of *this as useful. At some future point calls on_error(ErrorType) on failure. |
ExecutorType executor() noexcept |
Returns an executor. By default, *this . |
submit
and executor
are required on an executor that has task constructors. submit
is a fundamental sender operation that may be called by a task at any point.
To avoid deep recursion, a task may post itself directly onto the underlying executor, giving the executor a chance to pass a new sub executor. For example, if a prior task completes on one thread of a thread pool, the next task may reenqueue rather than running inline, and the thread pool may decide to post that task to a new thread. Hence, at any point in the chain the sub-executor passed out of the executor may be utilized.
The passed function may or may not be noexcept
. The behavior on an exception escaping the passed task is executor-defined.
Function | Semantics |
---|---|
void execute(void(void)) |
At some future point calls the passed callable. |
An executor may provide one of the following task construction functions at a minimum. This minimal set is intended to cover the functionality in P0443.
We may require
or prefer
an executor with support for any of these task construction functions.
In all cases, an exception that escapes the passed function will be passed to the on_error
function of the outgoing receiver.
Function | Semantics |
---|---|
SenderOf<T'> make_value_task( SenderOf<T...>, T'(T...)) |
Accepts a sender that triggers the work, returns a Sender that confirms completion of the work. T... may be empty and/or T' may be void . If the Sender calls on_error(E) or done() then those operations will propagate from Sender' |
SenderOf<R> make_bulk_value_task(SenderOf<T...>, void F(size_t, T..., R?, S), size_t, S SF(), R RF()) |
Accepts a sender that triggers the work, returns a Sender that confirms completion of the work. T... may be empty and/or R may be void . If the Sender calls on_error(E) or done() then those operations will propagate from Sender' . Other parameters are as in P0443. |
In the above SenderOf<T...>
refers to a SenderOf<T...>
or a SenderOf<SenderOf<T...>>
. The option to pass a sender through the value propagation path above allows the system to encapsulate a subexecutor in the propagation, or to provide greedy enqueue. The actual value is boxed into a sender explicitly in the type system and under control of the executor.
The type propagated through the graph here will be either T...
, as defined by the function, or a sender of T...
. The reason for this is that any executor may choose to wrap its value in some information - that information may be propagation of a sub executor, it may be asynchronous information so that the tasks can be greedily enqueued and run in some other context, out-of-band calls to on_value
. At any point the passed SenderOf<T...>
can take a receiver of T...
to collapse the Sender
back into a value, providing compatibility with an arbitrary receiver and a terminal point for asynchronous work chaining.
Required additional concepts from P1055:
Receiver<To>
: A type that declares itself to be a receiver by responding to thereceiver_t
property query.ReceiverOf<To, E, T...>
: A receiver that accepts an error of typeE
and the (possibly empty) tuple of valuesT...
. (This concept is useful for constraining aSender
'ssubmit()
member function.)Sender<From>
: A type that declares itself to be a sender by responding to thesender_t
property query.TypesSender<From>
: A type that declares itself to be a sender by responding to thesender_t
property query, returning asender_desc<E, T...>
sender descriptor that declares what types are to be passed through the receiver's error and value channels.SenderTo<From, To>
: ASender
andReceiver
that are compatible.
These concepts are defined in terms of the following global customization point objects (shown as free functions for the purpose of exposition):
These customization-points are defined in terms of an exposition-only _ReceiverLike
concept that checks only that a type responds to the receiver_t
property query.
Signature | Semantics |
---|---|
template < _ReceiverLike To > void set_done(To& to); |
Cancellation channel: Dispatches to to.set_done() if that expression is well-formed; otherwise, dispatches to (unqualified) set_done(to) in a context that doesn't contain the std::set_done customization point object. |
template < _ReceiverLike To, class E > void set_error(To& to, E&& e); |
Error channel: Dispatches to to.set_error((E&&) e) if that expression is well-formed; otherwise, dispatches to (unqualified) set_error(to, (E&&) e) in a context that doesn't contain the std::set_error customization point object. |
template < _ReceiverLike To, class... Vs > void set_value(To& to, Vs&&... vs); |
Value channel: Dispatches to to.set_value((Vs&&) vs...) if that expression is well-formed; otherwise, dispatches to (unqualified) set_value(to, (Vs&&) vs...) in a context that doesn't contain the std::set_value customization point object. |
These customization-points are defined in terms of an exposition-only _SenderLike
concept that checks only that a type responds to the sender_t
property query; and _Executor
represents a refinement of TypedSender
that is a light-weight handle to an execution context, and that sends a single value through the value channel representing another executor (or itself).
Signature | Semantics |
---|---|
template < _SenderLike From > _Executor&& get_executor(From& from); |
Executor access: sks a sender for its associated executor. Dispatches to from.get_executor() if that expression is well-formed and returns a _SenderLike ; otherwise, dispatches to (unqualified) get_executor(from) in a context that doesn't include the std::get_executor customization point object and that does include the following function:template<_Executor Exec> Exec get_executor(Exec exec) { return (Exec&&) exec; } |
template < _Executor Exec, Sender Fut, class Fun > Sender make_value_task( Exec& exec, Fut fut, Fun fun); |
Task construction (w/optional eager submission): Dispatches to exec.make_value_task((Fut&&)fut, (Fun&&)fun) if that expression is well-formed and returns a Sender ; otherwise, dispatches to (unqualified) make_value_task(exec, (Fut&&)fut, (Fun&&)fun) in a context that doesn't include the std::make_value_task customization point object.Logically, make_value_task constructs a new sender S that, when submitted with a particular receiver R , effects a transition to the execution context represented by Exec . In particular:* submit(S,R) constructs a new receiver R' that wraps R and Exec , and calls submit(fut, R') .* If fut completes with a cancellation signal by calling set_done(R') , then R' 's set_done method effects a transition to exec 's execution context and propagates the signal by calling set_done(R) .* Otherwise, if fut completes with an error signal by calling set_error(R', E) , then R' 's set_error method attempts a transition to exec 's execution context and propagates the signal by calling set_error(R, E) . (The attempt to transition execution contexts in the error channel may or may not succeed. A particular executor may make stronger guarantees about the execution context used for the error signal.)* Otherwise, if fut completes with a value signal by calling set_value(R', Vs...) , then R' 's set_value method effects a transition to exec 's execution context and propagates the signal by calling set_value(R, fun(Vs...)) -- or, if the type of fun(Vs...) is void , by calling fun(Vs...) followed by set_value(R) .Eager submission: make_value_task may return a lazy sender, or it may eagerly queue work for submission. In the latter case, the task is executed by passing to submit an eager receiver such as a promise of a continuable future so that the returned sender may still have work chained to it.Guarantees: The actual queuing of work happens-after entry to make_value_task and happens-before submit , when called on the resulting sender (see below), returns. |
template < _SenderLike From, Receiver To > void submit(From& from, To to); |
Work submission. |
There are two strong guarantees made here, to allow eager execution:
- Memory operations made before the call to a task constructor happen-before the execution of the task.
- Completion of the task happens-before a call to
set_value
on the next receiver in the chain, including the implicit receiver implied by a task constructor.
A task may therefore run at any point between constructing it, and being able to detect that it completed, for example, by seeing side effects in a subsequent task. This allows tasks to run on construction, in an eager fashion, and does not allow a user to rely on laziness.
The definition of the API does, however, allow laziness and as such no sender is guaranteed to run its contained work until a receiver is passed to its submit
method (or it is passed to a task constructor that calls submit
implicitly).
As an optional extension, we may define properties that define whether tasks are guaranteed, or allowed, to run eagerly on construction, assuming that their inputs are ready.
In the table below, where a future is returned this is represented by the promise end of a future/promise pair in the compromise version. Any necessary synchronization or storage is under the control of that promise/future pair, rather than the executor, which will often allow an entirely synchronization-free structure.
Before | After |
---|---|
executor.execute(f) |
executor.execute(f) |
executor.execute(f) |
executor.make_value_task(sender{}, f).submit(receiver{}) |
fut = executor.twoway_execute(f) |
executor.make_value_task(sender{}, f).submit(futPromise) |
fut' = executor.then_execute(f, fut) |
executor.make_value_task(fut, f).submit(fut'Promise) |
executor.bulk_execute(f, n, sf) |
executor.make_bulk_value_task(sender{}, f, n, sf, []{}).submit(receiver{}) |
fut = executor.bulk_twoway_execute(f, n, sf, rf) |
executor.make_bulk_value_task(sender{}, f, n, sf, rf).submit(futPromise{}) |
fut' = executor.bulk_then_execute(f, n, sf, rf, fut') |
executor.make_bulk_value_task(fut, f, n, sf, rf).submit(fut'Promise{}) |
If a constructed task is type erased, then it may benefit from custom overloads for known trivial receiver types to optimize. If a constructed task is not type erased then the outgoing receiver will trivially inline.
We do not expect any performance difference between the two forms of the one way execute definition. The task factory-based design gives the opportunity for the application to provide a well-defined output channel for exceptions. For example, the provided output receiver could be an interface to a lock-free exception list if per-request exceptions are not required.