Herein I'll use some hand-wavy implementation of the widely-suggested policy.on(executor)
where, for instance, par.on(ex)
returns something that matches a concept called ParallelPolicyOn<Executor>
. I'll also be hand-wavy about a bunch of other things (e.g., ignoring the existence of ranges), but if they become a distraction (or if someone thinks something I've ignored is relevant) I'll be happy to change them. I'm also using compact concept notation for brevity.
Oversimplified, strawman implementation of std::sort
using P0443+P1054. The second version, below this, is more analogous to what I've tried (but failed) to write using P1055-like semantics.
P0443+P1054 Authors: Is there another way to express some of the hand-wavy "missing" stuff here using what we have?
// assumptions: swap and/or comp may be expensive compared to task creation,
// in which case we do not want to be lazy (or don't want to be lazy until
// a certain depth is reached) in order to take advantage of available parallelism
auto _par_qs_impl(Executor ex, RandomIt data, size_t lo, size_t hi, Compare comp) {
if(lo - hi <= 1) {
// P0443+P1054 needs something like this:
return execution::query(ex, ready_future_factory<void>).make_ready_future();
}
auto p = ex.twoway_execute([data,lo,hi,comp]{
// copied from Wikipedia, because I didn't feel like thinking about it
auto& pivot = data[hi];
auto i = lo - 1;
for(size_t j = lo; j < hi; ++j) {
if(comp(data[j], pivot)) std::swap(data[++i], data[j]);
}
std::swap(data[i+1], data[hi]);
return i + 1;
}).share(); // share is hand-wavy; not all futures will necessarily support it,
// so something that returns a shared future should probably be a
// primitive of the model somewhere...
auto lo_done = p.then([ex,data,lo,comp](size_t p) {
// this may need to include some sort of "unwrap" notation, depending on what
// p1054 decides
return par_qs_impl(ex, data, lo, p-1, comp);
});
auto hi_done = p.then([ex,data,hi,comp](size_t p) {
// this may also need some sort of unwrap notation
return par_qs_impl(ex, data, p+1, hi, comp);
});
// when_all is still part of unresolved additions to P1054,
// so we'd have do do something like:
return lo_done.then([hi_done=std::move(hi_done)(){
// still need some sort of unwrap, possibly
return hi_done;
}]);
}
void sort(ParallelPolicyOn<Executor>&& policy, RandomIt first, RandomIt last, Compare comp) {
if(first == last) return;
std::this_thread::future_get(_par_qs_impl(policy.executor(), first, 0, last-first-1, comp));
}
Or another version that minimizes extra task creation at the expense of exposing less of the tasking structure to the executor:
// most of the same comments from the above version still apply
auto _par_qs_impl(Executor ex, RandomIt data, size_t lo, size_t hi, Compare comp) {
if(lo - hi <= 1) {
return execution::query(ex, ready_future_factory<void>).make_ready_future();
}
auto& pivot = data[hi];
auto p = lo - 1;
for(size_t j = lo; j < hi; ++j) {
if(comp(data[j], pivot)) std::swap(data[++p], data[j]);
}
std::swap(data[++p], data[hi]);
auto lo_done = ex.twoway_execute([ex,data,lo,comp,p] {
// still needs some unwrapping, except this time twoway needs to be involved
return par_qs_impl(ex, data, lo, p-1, comp);
});
auto hi_done = par_qs_impl(ex, data, p+1, hi, comp);
return lo_done.then([hi_done=std::move(hi_done)(){
return hi_done;
}]);
}
void sort(ParallelPolicyOn<Executor>&& policy, RandomIt first, RandomIt last, Compare comp) {
if(first == last) return;
std::this_thread::future_get(_par_qs_impl(policy.executor(), first, 0, last-first-1, comp));
}
Updated with input from Kirk:
auto _par_qs_impl(RandomIt data, size_t lo, size_t hi, Compare comp) {
return [data,lo,hi,comp](auto ex){
return mi::none_deferred{[ex,data,lo,hi,comp](auto out){
if(lo - hi <= 1) {
op::empty() | op::submit(out);
} else {
ex |
op::transform([](auto ex){
auto& pivot = data[hi];
auto p = lo - 1;
for(size_t j = lo; j < hi; ++j) {
if(comp(data[j], pivot)) std::swap(data[++p], data[j]);
}
std::swap(data[++p], data[hi]);
return std::tuple{
ex | _par_qs_impl(data, lo, p-1, comp),
ex | _par_qs_impl(data, p+1, hi, comp)};
}) |
op::zip(ex) |
op::submit(out);
}
};
}
void sort(ParallelPolicyOn<Executor>&& policy, RandomIt first, RandomIt last, Compare comp) {
if(first == last) return;
policy.executor() |
_par_qs_impl(first, 0, last-first-1, comp) |
op::blocking_submit();
}
I don't know for sure, but I think this has to be wrong. The recursive calls to
_par_qs_impl()
need to depend on the result of the pivot computation (that's where the argumentsp-1
andp+1
come from.I have more questions too, but let me think about them in the context of whatever changes this comment induces.