This is a summary of current discussions, and a follow-up to our recent meetings, prompted by the issue here. Anyone who's been following along with the last few weeks of discussions and presentations might wish to skip directly to the follow-up part.
Reducible control flow is the control flow directly representable through semi-structured control flow constructs (loops, conditionals, and break
/continue
). It can be characterised in terms of a property of the control flow graph that all loops are single-entry. For more details, see the diagram and "Reducibility" subsection here.
Irreducible control flow is roughly "everything else". For example, goto
into the middle of a loop would result in irreducible control flow.
WebAssembly only has semi-structured control flow constructs, and can therefore only directly encode reducible CFGs. When compiling a program with irreducible control flow to WebAssembly, every multi-entry loop must be transformed into a single-entry one, by inserting a dynamic indirection (see this blog post for a more in-depth explanation). This incurs some overhead, and the transformation process is not trivial. In LLVM this transformation is a pass called FixIrreducibleControlFlow
.
Note that many programs using goto
(for example, to jump to cleanup code) still have reducible control flow, and so incur no inherent overhead when compiled to WebAssembly. Only programs using goto
in a way which creates multi-entry loops can't be expressed without additional indirections.
Benefits:
- producer/toolchain ergonomics - people may be more likely to target Wasm if they don't need to implement their own
FixIrreducibleControlFlow
. - performance - see "Performance" sub-heading below.
Complications:
- need to compose with Wasm's existing control flow and maintain single pass validation/compilation
- need a path of adoption/implementation that doesn't cause issues for engines
This is an idiomatic way in which WebAssembly could represent irreducible control flow. It's based on the funclets proposal. As its name suggests, it is a loop (in the sense of the existing Wasm loop
construct) with multiple bodies, and within any body, one can jump to the start of any body using the regular suite of br
instructions. It was originally sketched by @rossberg.
multiloop tf^n (e* end)^n
To facilitate one-pass validation and compilation, in general we must pre-declare the types of all (n
) bodies before the code of any body.
more details
(C,labels (t1*)^n |- e* end)^n
-----------------------------------------
C |- multiloop (t1* -> t2*)^n (e* end)^n : to_bikeshed
If we go with fallthrough semantics, the overall type of the multiloop
will be the input type of the first block, and the output type of the last block, with an additional condition on intermediate blocks to make sure input and output types match up.
The formal semantics would need a little extension to describe this properly, so I'll just be informal for now. With fallthrough semantics, execution starts at the first multiloop
body, and upon reaching the end of a body, the next body is entered until every body has been executed and the multiloop
terminates.
An important point is how br
works. The De Bruijn indexing counts outwards as normal, and when a multiloop is reached, indexing counts through each body in order before continuing outwards.
For each <k = n>
, where does the inner (br k)
target?
loop ([]->[]) <k = 3>:
multiloop ([]->[]), ([]->[])
<k = 1>: <multiloop first body>
end
<k = 2>: (block ([]->[]) (br k) ... end <k = 0>:)
end
end
- the name (which I'm rather attached to, because of the analogy to Wasm's existing
loop
) - should bodies have fallthrough semantics (I'd argue yes)
- unifying
block
,loop
, andmultiloop
in formal semantics - type signature representation/binary encoding
- hook into the existing function/block type id mechanism?
- if bodies have fallthrough semantics, only need to encode the input type of each block and a single output type
- bodies which are not targetted by a (syntactically) forward jump could potentially have their type annotations elided, although this would not be idiomatic considering existing Wasm control flow constructs
We've now had two sessions of presentation and discussion on this, and we've identified some areas for follow-on discussion.
Presentation slides:
- first (meeting notes)
- second (mainly just a short discussion starter) (meeting notes TODO)
Can we find realistic examples of programs which would run faster as compiled Wasm if we added multiloop
, to motivate this feature to implementers? The current state-of-the-art is that code compiled through WebAssembly ends up with a single control flow indirection per multi-entry loop. Even programs using goto
often do so in a way that doesn't create multi-entry loops, so in principle experience no inefficiencies. However, programs compiled through LLVM could potentially have irreducible control flow introduced as a result of optimisations, even if not present in the source language.
Put another way, this is "how common is irreducible control flow in real compilation flows?". Note that it's somewhat easy to craft a program with control flow that would be inefficient in Wasm; the challenge would be to find compelling "wild" examples either in source or LLVM IR form. This motivation for multiloop
is also separate from the producer ergonomics argument, which we seemed to agree was relatively obvious.
As suggested by @kripken, if there would be issues implementing multiloop
in some Web engines, could it be more immediately implemented at (e.g.) the level of LLVM/Binaryen's Wasm object files, giving toolchains a target with less friction?
@RossTate raised the point that multiloop
could cause scoping issues for variables initialised in a flow-sensitive way (related: see the long discussion here). The default would be to simply re-let
such variables (which should be rare) in each body and rely on the stack to merge them. We can revisit if this proves to be inefficient.
@RossTate also raised the question of how multiloop
would compose with Wasm's proposed try/catch
EH constructs, with specific reference to LLVM complications which I describe below.
The obvious semantics at the Wasm level is that an uncaught exception in one body of the multiloop should break out of the multiloop as a whole. The question is whether this is semantics-preserving when compiling a source-level try/catch
. It turns out that C++ and C# disallow goto
-style jumps into try
and catch
bodies from outside. This restriction in C++ is discussed here (section 18, pdf page 394). This means that a C++ try
block that is part of a larger program is always single-entry, while a catch
block is only entered through exceptions thrown in its associated try
. Therefore, C++ try/catch
can in principle be compiled faithfully to Wasm with multiloop
, without the need for FixIrreducibleControlFlow
. The body of the try
(analogously, the catch
) becomes a multiloop
, while the Wasm try/catch
is embedded inside another outer multiloop
to handle any jumps out of the body (which are still allowed).
There is a complication with LLVM however. I'm very grateful to @aheejin for walking me through the details of this. The LLVM pipeline currently uses a Windows-style representation of exceptions which defines well-bracketed/nested and single-entry catch
blocks, but no explicit try
. Instead, possibly-throwing instructions are annotated with an "exception" label, which explicitly lists the catch
block that is jumped to if an exception occurs. Due to optimisations, when compiling C++ to LLVM IR the well-bracketed nature of the source try
blocks are not preserved, and therefore the positions of try
headers must be reconstructed when compiling LLVM IR to Wasm.
A particularly problematic shape in LLVM is the following:
start:
goto a
block a:
...
invoke f (normal: b) (throw: pad)
block b:
...
invoke f (normal: cont) (throw: pad)
pad:
...
catchret from pad to b
cont:
...
This defines a single catch
"pad", and the two possible-throwing function calls will transfer control to this pad in the case of an exception. However, the pad itself jumps back to block b
, forming a multi-entry loop with block b
as the head, where one entrance is through an exception-taking edge from block a
to pad
, and one entrance is an ordinary control flow edge from block a
to block b
.
This CFG could not directly arise from any C++ program, but can arise out of optimisation of the following weird program, eliminating the if (x)
test and redirecting the goto
of the catch through dataflow analysis:
int x = 1;
lab:
try {
if (x) {
f_maythrow();
}
f_maythrow();
}
catch {
x = 0;
goto lab;
}
This CFG couldn't be directly translated into Wasm with multiloop
and try/catch
- translating each non-pad
basic block as a multiloop
body, we'd fail to find a correct place to put the try/catch
that could both catch block a
's exception, and allow a jump to block b
. There are two solutions that come to mind.
First, accept that LLVM may still need a limited FixIrreducibleControlFlow
pass to deal with their catch blocks, even with multiloop
. The current implementation of FixIrreducibleControlFlow
doesn't fully handle the example above, but it will be extended to do this (perhaps through catch
-block duplication) irrespective of whether we pursue multiloop
. Other toolchains which preserve the well-bracketed/nested and single-entry nature of source language try/catch
can still use multiloop
without needing a FixIrreducibleControlFlow
pass, as detailed above.
The second possibility is to define a generalised try-multiloop/catch
, which is like a regular multiloop
but additionally carries a list of catch
blocks which catch an exception leaving any body, and may either rethrow to a subsequent catch block in the list (or an outer try/catch
, as normal), or alternatively jump to any regular body of the multiloop
. This generalised structure, which should capture the CFGs expressible by LLVM (modulo unwind mismatches), could be pursued instead of multiloop
, or as a post-MVP (leaving a flag in the multiloop
binary representation which is initially forced to be 0
).
It's currently idiomatic in Wasm for scoped things to be represented through explicit block constructs, which means that when interacting with multiloop
they must either wrap all multiloop
bodies, or be pushed inside one body. As we can see above, this isn't necessarily an insurmountable problem (and often reflects the invariants of source languages), but there may be other proposed block-like features which interact with multiloop
that I've not thought about.