conrad-watt/multiloop-followup.md

## multiloop-followup.md

      
    Raw
  

              multiloop-followup.md
            
          
    Irreducible control flow in Wasm

This is a summary of current discussions, and a follow-up to our recent meetings, prompted by the issue here. Anyone who's been following along with the last few weeks of discussions and presentations might wish to skip directly to the follow-up part.
What is irreducible control flow?

Reducible control flow is the control flow directly representable through semi-structured control flow constructs (loops, conditionals, and break/continue). It can be characterised in terms of a property of the control flow graph that all loops are single-entry. For more details, see the diagram and "Reducibility" subsection here.
Irreducible control flow is roughly "everything else". For example, goto into the middle of a loop would result in irreducible control flow.
WebAssembly only has semi-structured control flow constructs, and can therefore only directly encode reducible CFGs. When compiling a program with irreducible control flow to WebAssembly, every multi-entry loop must be transformed into a single-entry one, by inserting a dynamic indirection (see this blog post for a more in-depth explanation). This incurs some overhead, and the transformation process is not trivial. In LLVM this transformation is a pass called FixIrreducibleControlFlow.
Note that many programs using goto (for example, to jump to cleanup code) still have reducible control flow, and so incur no inherent overhead when compiled to WebAssembly. Only programs using goto in a way which creates multi-entry loops can't be expressed without additional indirections.
Should we generalise WebAssembly's control flow?

Benefits:

producer/toolchain ergonomics - people may be more likely to target Wasm if they don't need to implement their own  FixIrreducibleControlFlow.
performance - see "Performance" sub-heading below.

Complications:

need to compose with Wasm's existing control flow and maintain single pass validation/compilation
need a path of adoption/implementation that doesn't cause issues for engines

What is multiloop?

This is an idiomatic way in which WebAssembly could represent irreducible control flow. It's based on the funclets proposal. As its name suggests, it is a loop (in the sense of the existing Wasm loop construct) with multiple bodies, and within any body, one can jump to the start of any body using the regular suite of br instructions. It was originally sketched by @rossberg.
Abstract syntax

multiloop tf^n (e* end)^n
To facilitate one-pass validation and compilation, in general we must pre-declare the types of all (n) bodies before the code of any body.

  more details
Validation

(C,labels (t1*)^n |- e* end)^n
-----------------------------------------
C |- multiloop (t1* -> t2*)^n (e* end)^n : to_bikeshed

If we go with fallthrough semantics, the overall type of the multiloop will be the input type of the first block, and the output type of the last block, with an additional condition on intermediate blocks to make sure input and output types match up.
Execution

The formal semantics would need a little extension to describe this properly, so I'll just be informal for now. With fallthrough semantics, execution starts at the first multiloop body, and upon reaching the end of a body, the next body is entered until every body has been executed and the multiloop terminates.
An important point is how br works. The De Bruijn indexing counts outwards as normal, and when a multiloop is reached, indexing counts through each body in order before continuing outwards.
example

For each <k = n>, where does the inner (br k) target?
loop ([]->[]) <k = 3>:
  multiloop ([]->[]), ([]->[])
    <k = 1>: <multiloop first body>
  end
    <k = 2>: (block ([]->[]) (br k) ... end <k = 0>:)
  end
end


Bikeshedding


the name (which I'm rather attached to, because of the analogy to Wasm's existing loop)
should bodies have fallthrough semantics (I'd argue yes)
unifying block, loop, and multiloop in formal semantics
type signature representation/binary encoding

hook into the existing function/block type id mechanism?
if bodies have fallthrough semantics, only need to encode the input type of each block and a single output type
bodies which are not targetted by a (syntactically) forward jump could potentially have their type annotations elided, although this would not be idiomatic considering existing Wasm control flow constructs


Follow-up on irreducible control flow discussion

We've now had two sessions of presentation and discussion on this, and we've identified some areas for follow-on discussion.
Presentation slides:

first (meeting notes)
second (mainly just a short discussion starter) (meeting notes TODO)

Performance

Can we find realistic examples of programs which would run faster as compiled Wasm if we added multiloop, to motivate this feature to implementers? The current state-of-the-art is that code compiled through WebAssembly ends up with a single control flow indirection per multi-entry loop. Even programs using goto often do so in a way that doesn't create multi-entry loops, so in principle experience no inefficiencies. However, programs compiled through LLVM could potentially have irreducible control flow introduced as a result of optimisations, even if not present in the source language.
Put another way, this is "how common is irreducible control flow in real compilation flows?". Note that it's somewhat easy to craft a program with control flow that would be inefficient in Wasm; the challenge would be to find compelling "wild" examples either in source or LLVM IR form. This motivation for multiloop is also separate from the producer ergonomics argument, which we seemed to agree was relatively obvious.
Producer-side

As suggested by @kripken, if there would be issues implementing multiloop in some Web engines, could it be more immediately implemented at (e.g.) the level of LLVM/Binaryen's Wasm object files, giving toolchains a target with less friction?
Let variables

@RossTate raised the point that multiloop could cause scoping issues for variables initialised in a flow-sensitive way (related: see the long discussion here). The default would be to simply re-let such variables (which should be rare) in each body and rely on the stack to merge them. We can revisit if this proves to be inefficient.
Exception handling

@RossTate also raised the question of how multiloop would compose with Wasm's proposed try/catch EH constructs, with specific reference to LLVM complications which I describe below.
The obvious semantics at the Wasm level is that an uncaught exception in one body of the multiloop should break out of the multiloop as a whole. The question is whether this is semantics-preserving when compiling a source-level try/catch. It turns out that C++ and C# disallow goto-style jumps into try and catch bodies from outside. This restriction in C++ is discussed here (section 18, pdf page 394). This means that a C++ try block that is part of a larger program is always single-entry, while a catch block is only entered through exceptions thrown in its associated try. Therefore, C++ try/catch can in principle be compiled faithfully to Wasm with multiloop, without the need for FixIrreducibleControlFlow. The body of the try (analogously, the catch) becomes a multiloop, while the Wasm try/catch is embedded inside another outer multiloop to handle any jumps out of the body (which are still allowed).
There is a complication with LLVM however. I'm very grateful to @aheejin for walking me through the details of this. The LLVM pipeline currently uses a Windows-style representation of exceptions which defines well-bracketed/nested and single-entry catch blocks, but no explicit try. Instead, possibly-throwing instructions are annotated with an "exception" label, which explicitly lists the catch block that is jumped to if an exception occurs. Due to optimisations, when compiling C++ to LLVM IR the well-bracketed nature of the source try blocks are not preserved, and therefore the positions of try headers must be reconstructed when compiling LLVM IR to Wasm.
A particularly problematic shape in LLVM is the following:
start:
   goto a

block a:
   ...
   invoke f (normal: b) (throw: pad)

block b:
   ...
   invoke f (normal: cont) (throw: pad)

pad:
   ...
   catchret from pad to b

cont:
   ...


This defines a single catch "pad", and the two possible-throwing function calls will transfer control to this pad in the case of an exception. However, the pad itself jumps back to block b, forming a multi-entry loop with block b as the head, where one entrance is through an exception-taking edge from block a to pad, and one entrance is an ordinary control flow edge from block a to block b.
This CFG could not directly arise from any C++ program, but can arise out of optimisation of the following weird program, eliminating the if (x) test and redirecting the goto of the catch through dataflow analysis:
int x = 1;
lab:
try {
  if (x) {
    f_maythrow();
  }
  f_maythrow();
}
catch {
  x = 0;
  goto lab;
}

This CFG couldn't be directly translated into Wasm with multiloop and try/catch - translating each non-pad basic block as a multiloop body, we'd fail to find a correct place to put the try/catch that could both catch block a's exception, and allow a jump to block b. There are two solutions that come to mind.
First, accept that LLVM may still need a limited FixIrreducibleControlFlow pass to deal with their catch blocks, even with multiloop. The current implementation of FixIrreducibleControlFlow doesn't fully handle the example above, but it will be extended to do this (perhaps through catch-block duplication) irrespective of whether we pursue multiloop. Other toolchains which preserve the well-bracketed/nested and single-entry nature of source language try/catch can still use multiloop without needing a FixIrreducibleControlFlow pass, as detailed above.
The second possibility is to define a generalised try-multiloop/catch, which is like a regular multiloop but additionally carries a list of catch blocks which catch an exception leaving any body, and may either rethrow to a subsequent catch block in the list (or an outer try/catch, as normal), or alternatively jump to any regular body of the multiloop. This generalised structure, which should capture the CFGs expressible by LLVM (modulo unwind mismatches), could be pursued instead of multiloop, or as a post-MVP (leaving a flag in the multiloop binary representation which is initially forced to be 0).
Other block constructs

It's currently idiomatic in Wasm for scoped things to be represented through explicit block constructs, which means that when interacting with multiloop they must either wrap all multiloop bodies, or be pushed inside one body. As we can see above, this isn't necessarily an insurmountable problem (and often reflects the invariants of source languages), but there may be other proposed block-like features which interact with multiloop that I've not thought about.