LeeHowes/executor_compromise.md

## executor_compromise.md

      
    Raw
  

              executor_compromise.md
            
          
    P1055 and P0443 - a compromise

Summary

The goal here is to provide a minimal set of changes to P0443 that allows us to aim for a clean, lazy-by-default API, that does not drastically change the expectations of the P0443 authors.
I am making a few assumptions in writing this:

I am not worrying about bikeshedding names. Where appropriate I will use names from P0443 even if they don't really make sense any more as names. We can change them.
I believe we may wish to tweak the particular parameters to the bulk execute operation. We will likely also want other types of task such as a value + error task as I described in P1054. I do not worry about that here.
I do not expect changes to the properties mechanism except in so much as the types they return may differ slightly.

In reading this, don't try to project complexity too early. I'm trying to describe it progressively. For example, I will get to greedy enqueue, but not in the first section.
For the fundamental concepts: sender, receiver, adapter, executor and so on, please reference Kirk Shoop's document: https://gist.github.com/kirkshoop/f743929f16dd2ba2f89f8d3fe3e10d42
These are basic concepts intended to give clean fundamental interoperation between constructs. In this document I propose how we minimally modify P0443 to align with those concepts and leave full complexity to be built on top.
Summary of the changes


No change to properties other than return types.
Modify execute and twoway_execute and bulk variants to return pushmi-style lazy senders.
Modify then_execute and bulk_then_execute to take a pushmi-style sender and to return pushmi-style lazy sender.
Define execute functions in terms of free adapter construction functions and customisation points that call into the executor.
Addition of fork and join task constructors to enter and leave a greedy work stream.
Addition of the pusmi sender, receiver and adapter concepts.

Executors: execution or task construction

One of the problems we have had to date is that executors encapsulate both task construction and execution. This means that the fundamental concepts are complex and are tied together with greedy enqueue, avaialability of handles to work and so on. It has also made a transition to lazy constructs complex.
So, let's start with the basic changes to executors. We split work dispatch and task construction.
Executor as a sender

The executor concept makes an executor a single sender that sends itself, or a sub-executor, to the set_value method of some receiver. It may also call set_error with some error that we make available to the task.
Work is dispatched by a sender when the work chain is complete. So calling submit on a terminal chain of work will dispatch the work.
As a general sender, an executor also has an executor() method available as part of the concept, that makes it possible to query the executor and propagate it through a chain of senders.
Executor as a task constructor

We have had a lot of complications trying to redescribe the execution functions. I propose that instead we consider them task constructors. Each returns a pushmi-style sender. These functions should not be on the concept, but may looked for and potentially obtained through calls to require/prefer or other mechanisms if already present.
A sender has a submit method on it that accpets a receiver with a compatible interface and causes the work to be ready to run.
Rather than taking and returning futures, execute and twoway_execute (and bulk variants) return a sender that can be chained to a receiver. then_execute also takes a sender that will provide the input.


Before
After


executor.execute([](){doSomething()})
executor.execute([](){doSomething()}).submit(Receiver{})


auto f = executor.twoway_execute([](){return doSomething();})
executor.twoway_execute([](){return doSomething()}).submit(Receiver<int>{})


auto f = executor.then_execute([](int a){return doSomething(a);}, f')
executor.then_execute([](int a){return doSomething(a);}), f').submit(Receiver<int>{})


For a void task this seems more complicated. However:

It is consistent with twoway_execute and then_execute structurally
If we are passing a trivial receiver to it, as is the common case, then this is trivially the same as the LHS except executed lazily.
We have the opportunity to build a receiver that accepts the done() or on_error() signal if we so wish, adding synchronization in the mix and getting notification of completion of the task which should help satisfy people who are concerned about having no ability to get a completion signal or error from a oneway task.

With a layer of code on top this becomes very simple to use.
In the twoway case the receiver is some receiver that can accept an int. That could be storage and thus be a reference to an actual int on the stack, or inside a future. Remember, these are the fundamentals, the future code on top of this would wrap these primitives to return clear objects.
The final case assumes that f' is a sender already on an executor. If we need to transition between executors then a higher level library would build the transition by adding a receiver that calls submit on executor when f' completes.
On naming

Note that while I've kept the same function names on the left and right for the type constructors, this is to make sure we have well-known typing on the passed function. It's open to discuscussion what the precise set of task constructors we want in the standard should be, and whether they infer types or are more explicit. I'd like to focus that discussion later, though. The important thing is that these task constructors are all structurally identical - they all return a sender that operates lazily on a chain of work.
Task construction functions and a library definition

Rather than defining a chain of concepts and using require to move between them we should define the task construction functionality in terms of task factory functions in the form of those in P1054. This style will be easier to describe expansions to. If we either make an appropriate generalisation of the functions on the executors, then we might actually build multiple task factories on top that use the same fundamentals.
At this time in P0443 we have a limited set of task functions. One of the goals of P1053 and P1054 among others was to expand that set. For example, P0443's executor.then_execute(...) creates value-only task, much like P1055's op::transform(...). It is also very likely that we will immediately want an error-handling task like P1054's then_error, or a task that takes a pair of callables handling value and error separately.
I propose that we define the higher level factory functions first, as these are what people will use. We should define these in terms of a customisation onto the executor. Any factory function in P0443's derivatives should be defined in terms of an executor function.
These factories are defined to return adapters that then plug together underneath. This allows the type to be aware of both the input and the output.
An adapter is a construction that allows chaining of senders.
adapter.adapt(sender)

chains a sender on the front of the adapter to build a chain.
Similarly
adapter.adapt(sender).submit(receiver)

Chains a sender on the front and a receiver on the back. We can build high level libraries on top of this to simplify the use for the user.
For the current P0443 functions, that means we have:


Free functions
Customisation Point
Default implementation


Adapter execute(void(void))
NoneSender execute_customisation(Executor, void(void))
NoneSender Executor::execute(void(void())


Adapter twoway_execute(T(void))
SingleSender<T> twoway_execute_customisation(Executor, T(void))
SingleSender<T> Executor::twoway_execute(T(void))


Adapter then_execute(T(T'))
SingleSender<T> then_execute_customisation(Executor, SingleSender<T'>, T(T'))
SingleSender<T> Executor::then_execute(SingleSender<T'>, T(T'))


In all cases because we use the sender interface, the hook for how we get a value in or out is well-defined as a function call. For greedy enqueue see next section.
The high level function returns an adapter and is very simple. In use we might do:
execute([](){}).adapt(executor).submit(Reciever{})

because all senders implement the Executor Sender::executor() method to pass an executor along, we can implement the adaptation as a call to the customisation point and get back a sender from executor.
Similarly, for the then_execute case:
then_execute([](int a){return a;}).adapt(f').submit(Receiver<int>);

where f' is a Sender<int> and also has the executor() method, and can provide the executor to the then_execute_customisation customisation point and forward it to the executor.
Note: We can have multiple different adaptor factories use the same customisation points with task wrapping if we like. For example, if we added an executor method that handled value and error as two functions, and wanted to build a variant-based one on top, we could combine and split the variant in wrapper code without having to add more customisation points to the executor.
Greediness

Greedy enqueue of work is fundamental to the GPU queuing model, among others. It is important that a work chain be able to be constructed where the knowledge of the existence of work and completion state is propagated greedily, while actual execution of work happens later. This is very different from only enqueuing work when the value is available from the previous task in the chain.
To achieve this we add a further primitive, that roughly aligns with make_promise_contract in p1054 and its logical inverse.
executor.fork

This takes a sender pushing a value type V and convert that to a value type of Token<V> where the token is an executor-defined SingleSender<V> type. The token represents the knowledge that at some point there will be a V, with enough information to know how to synchronise and obtain the value as well as its lifetime. We could, for example, encode a queue in the token for a CUDA executor. Passing a Receiver to the token's submit method will propagate the value or error from the token at the appropriate point.
We define this as a customisation point too. The trivial default for any new executor would be to pass the value through.


Free function
Customisation Point
Default implementation


Adapter fork()
SingleSender fork_customisation(Executor, Sender)
SingleSender Executor::fork(Sender)


Errors can be passed through directly, or hoisted into the asynchronous stream. An error at fork time passed in the asynchronous stream can of course be caught at the point where the token is received, rather than where tasks are ready, and simply not enqueue, forwarding it all the way to the end. In this way we have a very simple mechanism for skipping enqueue completely.
The join adapter

This is the inverse of fork. For example, the implementation might enqueue a callback into the queue associated with the incoming token that when that work completes, propagates the value.
There is no explicit customisation point necessary. Join returns an adapter that accepts a SingleSender<Token<V>> and returns a SingleSender<V>. The implementation of on_value will forward the passed SingleReceiver<V> directly to the token's submit method.


Free function
Implementation


Adapter join()
An adapter that submits its outgoing receiver to the incoming token when it arrives.


Intermediate senders/receivers

Each intermediate sender/receiver link constructed by a task constructor on the executor should, if the executor is capable of handling an async chain (that is if Token<V> != V) has for any set_value(V&&) on its interface a matching set_value(Token<V>&&).
That is that each constructed task should be able to act in both a synchronous and an asynchronous mode. Note that the words 'synchronous' and 'asynchronous' here refer to when the task is processed relative to the input value, not relative to the thread that constructs the chain. An asynchronous task may be a sequence of enqueue operations and the actual task runs much later.
The reason for handling both is that it means we can fork or not fork and use the same tasks returned from an executor. A GPU task that is not forked would be effectively synchronous in that it might cause a queue callback to the CPU after each task, a GPU task that is forked could enqueue unlimited work before a callback. Both are valid use cases, however, and the concept of switching to asynchrony using fork/join can be orthogonal in use to bulk/non-bulk, for example.
Of course it is also valid for a given task constructor on an executor to not return a valid task in a synchronous mode, or an asynchronous mode, which would be caught at compile time.
Conclusion

These changes are intended to be minimal, and just raise the executor document to a level where we can agree on fundamental laziness and integration into a higher level future model we build on top.
As first steps we can consider modifying the document as described in the introduction.
As next steps we should have a focused discussion on the precise set of task construction functions and their parameterisation. We should add task constructors that can process errors in stream and discuss whether the design of bulk is optimal.
Before	After
`executor.execute([](){doSomething()})`	`executor.execute([](){doSomething()}).submit(Receiver{})`
`auto f = executor.twoway_execute([](){return doSomething();})`	`executor.twoway_execute([](){return doSomething()}).submit(Receiver<int>{})`
`auto f = executor.then_execute([](int a){return doSomething(a);}, f')`	`executor.then_execute([](int a){return doSomething(a);}), f').submit(Receiver<int>{})`
Free functions	Customisation Point	Default implementation
`Adapter execute(void(void))`	`NoneSender execute_customisation(Executor, void(void))`	`NoneSender Executor::execute(void(void())`
`Adapter twoway_execute(T(void))`	`SingleSender<T> twoway_execute_customisation(Executor, T(void))`	`SingleSender<T> Executor::twoway_execute(T(void))`
`Adapter then_execute(T(T'))`	`SingleSender<T> then_execute_customisation(Executor, SingleSender<T'>, T(T'))`	`SingleSender<T> Executor::then_execute(SingleSender<T'>, T(T'))`