RossTate/resumeref.md

## resumeref.md

      
    Raw
  

              resumeref.md
            
          
    A Composable, Efficient, and Expressive 2-Instruction Design for Stack-Switching

The following is a design for stack-switching that has a number of useful properties that seem to make it a good fit for the needs of this extension to WebAssembly.
I have made an effort to address all the concerns that have been expressed throughout the lifetime of the Stacks subgroup to the best of my understanding of them.
For example, I have checked that all prior proposals have a straightforward and efficient translation into this design, and I have identified a composability property that this design (but no prior proposal) satisfies.
Please let me know your thoughts, and I am happy to provide a translation if anyone requests one.
The Design

When one performs a call $func, the currently executing function is conceptually left in a suspended state: it stores its current state (e.g. the values of its locals) as a frame on the stack and provides a code address to use to resume the function.
The $func is then performed on the same stack, with that "resumption" address as its return address.
The type of that return address is (result to*), where $func has type func (param ti*) (result to*).
This design treats suspended stacks as essentially having a dangling resumption address.
The type of a suspended stack indicates the type of its dangling resumption address.
The design then provides instructions generalizing call $func by letting $func be performed on a different stack with its dangling resumption address as the return address.
Types

First, consider function types in WebAssembly: func (param ti*) (result to*). We often think of this as saying the function takes ti* arguments and returns to* results. But at the assembly level it really says the function takes a pointer to a stack with a return address expecting to* results along with ti* arguments.
That is, (result to*) really describes how the caller of the function can be resumed, so we can think of (result to*) as the required resumption type (rt) of the caller of the function.
The design of WebAssembly ensures this resumption can always occur without allocating a new stack, but this perspective offers a nice generalization for typing stack switching.
Right now the only resumption types are of the form result to*, but this perspective leaves a nice path forward for easily integrating stack-switching with potential extensions to WebAssembly that might extend what a callee can do with its caller, such as inspecting the stack or return through alternate paths.
With that in mind, this design introduces just one new type: resumeref rt.
a resumeref rt is either null or a possibly-expired reference to a stack that can be resumed according to rt (e.g. by providing values for a  result).
In other words, it is a stack with a dangling return address whose resumption type is rt.
A resumeref expires if and only if it has been resumed, ensuring that a resumption reference can be used at most once, that a stack has an unexpired resumption reference to it if and only if it is suspended, and that a suspended stack has at most one unexpired resumption reference to it.
Although resumeref is a kind of reference, it need not be a garbage-collected object; it can be implemented using a reference to a stack and a counter.
In fact, the only instructions in this design requiring memory allocation are those containing new that involve allocating a new stack.
Core Instructions

This proposal has just 2 core instructions, from which the remaining instructions can be derived .
Both instructions are constant-time, always.
resume.new

resume.new rt: [] -> [(resumeref rt)]
This instruction allocates a new stack whose root frame has resumption type rt but which will trap if it is ever resumed, meaning the new stack is conceptually empty.
resume.switch_call

The instruction resume.switch_call rt1 $func: [ti* (resumeref rt2)] -> rt1 references a function $func : [ti* (resumeref rt1)] -> rt2.
This instruction transfers control to the referenced suspended stack (trapping if the resumeref is null or expired), expires the resumeref, and calls $func on the target stack with its dangling return address as the function's return address, and with the arguments taken from the value stack and the resumeref referencing the following instruction on the now-suspended stack.
Since resumption types are a new concept, it might be helpful to illustrate what this instruction looks like using results:

resume.switch_call (result t1*) $func: [ti* (resumeref (result t2*))] -> [t1*] references a function $func : [ti* (resumeref (result t1*))] -> [t2*]

It is essentially performing call $func (which would normally return to the instruction following the call) but switching the stack-pointer register with the register containing resumeref's stack-pointer just before jumping to the head of the function.
Derived Instructions

The following instructions can all be translated into the above core instructions, but special-casing them can enable engines to perform useful optimizations.
resume.switch

resume.switch rt: [ti* (resumeref (result ti* rt))] -> rt
This instruction transfers control to the referenced suspended stack (trapping if the resumeref is null or expired), expires the resumeref, and resumes the stack with the arguments taken from the value stack and the resumeref referencing the following instruction on the now-suspended stack.
This can be translated to resume.switch_call rt with a function that simply returns its arguments.
resume.switch_drop_call

The instruction resume.switch_drop_call $func: [ti* (resumeref rt2)] -> rt1 references a function $func : [ti*] -> rt2.
This instruction transfers control to the referenced suspended stack (trapping if the resumeref is null or expired), expires the resumeref, and calls $func on the target stack with its dangling return address as the function's return address, and with the arguments taken from the value stack but not the resumeref referencing the following instruction on the now-suspended stack.
This can be translated to resume.switch_call rt1 with a function that simply drops its last argument and (tail) calls $func.
resume.switch_drop

resume.switch_drop: [ti* (resumeref (result ti*))] -> rt
This instruction transfers control to the referenced suspended stack (trapping if the resumeref is null or expired), expires the resumeref, and resumes the stack with the arguments taken from the value stack but not the resumeref referencing the following instruction on the now-suspended stack.
This can be translated to resume.switch_call rt with a function that simply drops its last argument and returns the rest.
resume.new_closure

The instruction resume.new_closure t* $func: [ti*] -> (resumeref (result t*)) references a function $func : [ti* t*] -> rt.
This instruction allocates a new stack whose root frame has resumption type rt, but which will trap if the root frame is ever resumed, and which has another frame awaiting t* values that will (tail) call $func once resumed.
This can be translated to resume.new rt followed by resume.switch_call (result (resumeref (result t*))) with a function that takes the ti* arguments and a resumeref (result (resumeref (result t*))), immediately switches back to the resumeref, and then (tail) calls $func once it is resumed.
High-Level Example: Synchronous Channels for Green Threads

To get a sense of how this can be used, consider implementing a green thread sending an i32 on a typed synchronous channel.
For context, suppose the program is supposed to trap if ever there is more than one waiting receiver or waiting sender.
Also, suppose the program has some global queue of paused threads that are simply waiting to be scheduled for a turn; this queue can be added to using $add_to_paused_queue: [(resumeref (result))] -> [] and can be removed from using $pop_from_paused_queue: [] -> [(resumeref (result))].
Channels (some channelref) would have two fields (though only one is non-null at a time): a resumeref (result i32) for a waiting receiver, and a resumeref (result (resumeref (result i32))) for a waiting sender.
For simplicity, suppose the program is supposed to trap if ever there is more than one waiting receiver or waiting sender.
The send function first needs to get a receiver to send to.
If there is a waiting receiver already on the channel, then set the local $receiver: (resumeref (result i32)) to it and null the field.
If there is no waiting receiver, then the send first needs to check if there is already a waiting sender.
If there is, then the program traps.
Otherwise, the send needs to add the current thread to the channel and yield control to some paused thread.
We can do this with resume.switch_call (result (resumeref (result i32))) $set_waiting_sender (local.get $channel) (call $pop_from_paused_queue), where $set_waiting_sender: [channelref (resumeref (result (resumeref (result i32))))] -> [] sets the waiting-sender field of its first argument to the value of its second argument and returns.
This leaves the stack in a suspended state, but---when it returns---a (resumeref (result i32)) will be on the value stack.
So the following instruction can be simply local.set $receiver.
Now that $receiver has been initialized, one way or another, the send needs to resume the receiver with the message.
This leaves the sender in a suspended state; the suspender does not need any data, it just needs to be given a turn to execute.
As such, it also needs to add itself to the queue of paused threads.
We can do all this with the instruction resume.switch_call $add_to_paused_queue_then_return (result) (local.get $message) (local.get $waiting_receiver), where $add_to_paused_queue_then_return : [i32 (resumeref (result))] -> [i32] calls $add_to_paused_queue with its second argument and then returns its first argument.
This leaves the sending thread in a suspended state.
When it gets resumed, no values will be on the value stack.
At this point, the send function simply returns.
Notice that this example does not use any derived instructions.
In particular, resume.switch is not expressive enough to support such a simple implementation of send; only the resume.switch_called functions have the critical ability to do something with the suspending stack before transferring control to the target resumption point, such as putting it in a channel or adding it to a queue.
Composability Guarantee

Original WebAssembly satisfies a useful "control abstraction" property.
In short, a WebAssembly instance that just imports and exports (numeric) functions is a blackbox except for how control transfers across those imported and exported functions.
That is, you can see that after you call its export "foo" with argument "5" it calls import "bar" with argument "6", and after that import returns with result "7" its export returns with result "8"; however, you cannot tell anything else about its internal state or how it is implemented.
In particular, how its functions call each other internally is completely invisible to the outside world except for those bits of explicit interaction with the outside world.
This means you can formally specify a WebAssembly instance's behavior entirely through these explicit I/O interactions.
It also means you can completely predict how two WebAssembly instances will compose knowing only their I/O interactions.
That is a nice composability guarantee to have!
This is the first presented stack-switching design to preserve WebAssembly's existing control-abstraction guarantee.
This means that this is the first design where WebAssembly programs are free to implement their intended explicit I/O interactions without worrying about how they implement those interaction affecting how they compose with other WebAssembly programs.
The fibers design violates control abstraction for two reasons.
One is persistent references: a fiberref stays valid throughout the lifetime of the fiber, even if it is suspended and resumed multiple times.
As such, if you call an import "bar" on some fiber F you have a reference to and you somehow get control back (e.g. "bar" calls some export of yours), then whether F or some other fiber gets released when you use fiber.release depends on whether or not "bar" switched to some other fiber.
Another is hierarchy: there is an explicit hierarchy between fibers that determines whom their root functions return to, and the switching functions trap if they modify this hierarchy in some potentially unsound way.
In the above example, "bar" might change the hierarchy (e.g. by suspending up to some other fiber it has a reference to), which in turn affects whether a subsequent switching instruction will trap or not.
The typed-continuations design violates control abstraction due to its use of dynamic scope (and the corresponding hierarchy).
One instance might expect that a callback it provides to some export will be called within the dynamic scope that export was called within.
But if that exported function is also implemented by an instance using typed continuations, the callback might be called in some handler further up the stack.
It is known that even type systems that ensure all effects are handled fail to ensure composability guarantees due to such issues with dynamic scope.
But what about undelimited continuations?

Undelimited continuations have poor composability behavior.
However, this design does not use undelimited continuations, and as such these issues with undelimited continuations are irrelevant.
In particular, undelimited continuations assume all continuations can return implicitly to the root of the program.
For example, with undelimited continuations any part of the program can make the entire program return "5".
With the design presented here, a part of the program can only transfer control to points in the program that it has been given explicit access to, and there is no resumeref for the "root" of the program.
A WebAssembly instance can provide a global that refers to its local root, i.e. it creates a resumeref when its main export is entered, but there is no way it can transfer control elsewhere without another instance providing it a resumeref.
Detailed Example: Wasmtime Fibers

Wasmtime fibers are a form of statically typed, lexically scoped, and delimited continuations.
The API is designed around the expectation that FiberStacks are a memory resource that is explicitly allocated prior to creating a Fiber on a FiberStack.
The benefits of FiberStacks, such as explicit stack-memory sizes, are not well-suited to WebAssembly, where this would be a major leak of implementation-dependent behaviors.
As such, the API is simplified to the following (where type parameters are monomorphized during compilation):
Rust types:
Fiber<Resume,Yield,Return>
Suspend<Resume,Yield,Return>

Operations:
<Arguments,Resume,Yield,Return,root: (Arguments,Resume,Suspend<Resume,Yield>)->Return> newFiber(arguments: Arguments): Fiber<Resume,Yield,Return>
<Resume,Yield,Return> resume(fiber: &mut Fiber<Resume,Yield,Return>, Resume): Result<Return,Yield>
<Resume,Yield,Return> suspend(suspender: &Suspend<Resume,Yield,Return>, yield: Yield): Resume

One key detail is the return type of resume: Result<Return,Yield>.
This type is lowered into a flattened tagged union of Return and Yield.
This is why Suspend and suspend are parameterized by Return even though it seems unnecessary according to the interface: the implementation needs to know what Return is in order to coerce a Yield value into a Result<Return,Yield> value.
For Rust types Return and Yield, we will use pseudo-instructions to_OK_Return_Yield : [Return] -> [Result<Return,Yield>] and to_Err_Return_Yield : [Yield] -> [Result<Return,Yield>].
Also, it is in error to resume a fiber after a previous resume returned an Ok (i.e. it returned from its root function); for simplicity, we will just trap in this case.
A Rust type can lower down to a sequence of multiple wasm types.
For simplicity, I will treat some Rust types as if they were a wasm type, but only in contexts where there is an obvious way to make the usage work for a sequence of wasm types.
Rust generally needs to be implemented using linear memory and tables, but for simplicity we will use GC-like types.
Note: as a convenience, the following uses pseudo-instructions locals.get $local* and locals.set $local*.
Fiber<Resume,Yield,Return> and Suspend<Resume,Yield,Return>

The Rust type Fiber<Resume,Yield,Return> is implemented as a pair of fields: $parent : (mutref (resumeref (result Result<Return,Yield> (resumeref (result Resume))))) and $suspension: (resumeref (result Resume)).
Both of these are non-null, but dereferencing $parent will always provide a null value.
The Rust type Suspend<Resume,Yield,Return> is implemented as a non-null (readref (resumeref (result Result<Return,Yield> (resumeref (result Resume))))), i.e. a read-only reference to a cell storing whom to yield the result to.
Dereferencing this will also always result in a non-null value.
newFiber(arguments: Arguments): Fiber<Resume,Yield,Return>

The implementation of newFiber is specialized for the three type parameters Resume/Yield/Return, the argument type for the closure Argument, and the function $root : [Arguments Resume Suspend<Resume,Yield,Return>] -> Return that will be first run on the fiber:
(func $newFiber_Resume_Yield_Return_Arguments_root
    (param $arguments Arguments)
    (result (mutref (resumeref (result Result<Return,Yield> (resumeref (result Resume))))) (resumeref (result Resume)))
    (local $new_parent (mutref (resumeref (result Result<Return,Yield> (resumeref (result Resume))))))
    (local $new_stack (resumeref (result Resume)))
  (local.set $new_parent (mutref.new (resumeref (result Result<Return,Yield> (resumeref (result Resume)))) (ref.null)))
  (local.set $stack
    (resume.new_closure
      Suspend
      $newFiber_Resume_Yield_Return_Arguments_root_helper
      (locals.get $new_parent $arguments)
    )
  )
  (return (local.get $parent) (local.get $stack))
)

(func $newFiber_Resume_Yield_Return_Arguments_root_helper
    (param $arguments Arguments)
    (param $message Resume)
    (param $suspend Suspend<Resume,Yield,Return>)
    (local $result Return)
    (local $current_parent (resumeref (result Result<Return,Yield> (resumeref (result Resume)))))
  (local.set $result (call $root (locals.get $arguments $message $suspend))
  (local.set $current_parent (ref.get $suspend))
  (resume.switch_drop (to_Ok_Return_Yield (local.get $result)) (ref.null))
)

resume(fiber: &mut Fiber<Resume,Yield,Return>, Resume): Result<Return,Yield>

The implementation of resume is specialized for the three type parameters Resume/Yield/Return:
(func $resume_Resume_Yield_Return
    (param $fiber (mutref Fiber<Resume,Yield,Return>))
    (param $message Resume)
    (result Result<Return,Yield>)
    (local $target (resumeref (result Resume)))
    (local $fiber_parent (mutref (resumeref (result Result<Return,Yield> (resumeref (result Resume))))))
    (local $result Result<Return,Yield>)
  (local.set $target (struct.get $suspension (local.get $fiber)))
  (struct.set $suspension (local.get $fiber) (ref.null))
  (local.set $fiber_parent (struct.get $parent (local.get $fiber)))
  (locals.set $result $target
    (resume.switch_call
      (result Result<Return,Yield> (resumeref (result Resume)))
      $resume_Resume_Yield_Return_helper
      (locals.get $fiber_parent $message $target)
    )
  )
  (struct.set $suspension (locals.get $fiber $target))
  (ref.set (local.get $fiber_parent) (ref.null))
  (return (local.get $result))
)

(func $resume_Resume_Yield_Return_helper
    (param $fiber_parent (mutref (resumeref (result Result<Return,Yield> (resumeref (result Resume))))))
    (param $message Resume)
    (param $suspending (resumeref (result Result<Return,Yield> (resumeref (result Resume)))))
    (result Resume)
  (ref.set (local.get $fiber_parent) (local.get $suspending))
  (return (local.get $message))
)

suspend(suspender: &Suspend<Resume,Yield,Return>, yield: Yield): Resume

The function for suspend is specialized for the three type parameters Resume/Yield/Return:
(func $suspend_Resume_Yield_Return
    (param $suspend (readref (resumeref (result Result<Return,Yield> (resumeref (result Resume))))))
    (param $message Yield)
    (result Resume)
    (local $current_parent (resumeref (result Result<Return,Yield> (resumeref (result Resume)))))
  (local.set $current_parent (ref.get (local.get $suspend)))
  (return (resume.switch (result Resume) (to_Err_Return_Yield (local.get $message)) (local.get $current_parent)))
)

Integration with the Exception-Handling Proposal

The $func referenced by resume.switch_call rt $func is not only performed on the target stack, it is also performed within the target's dynamic scope.
As such, any exceptions thrown by $func are propagated from the dangling return address on the target stack.
This means we do not need any additional instructions to throw an exception on another stack.
As such, any language with a dynamic-scope hierarchy across its stacks can explicitly propagate exceptions from one stack to another according to whatever dynamic-scope semantics the language has.
Furthermore, this functionality can be used to circumvent a current limitation of WebAssembly.
In many applications of stack-switching there are multiple ways to resume a stack: e.g. a generator resuming its consumer with the value it is yielding vs. a generator resuming its consumer to indicate it has no more values to yield.
At present, WebAssembly has no way for function calls to have multiple explicit return paths.
But exceptions can be used to encode such alternate return paths.
Exceptions can also be used in situations where WebAssembly's type system is unable to recognize the safety of a certain switch or gets in the way of managing resumerefs (e.g. because an application wants to store all resumerefs in a single table).
For these situations, an application can simply throw and catch exceptions as a means of encoding dynamically typed stack switching.
Dynamic Example: Multicore OCaml's Algebraic Effects

Multicore OCaml's algebraic effects offer a form of dynamically scoped and, if a match exists, statically typed suspension along with a form of statically typed resumption.
This means that a program can be run with some "effect" handlers installed.
If the program "performs" an effectful operation, its dynamic scope is searched for the closest matching handler.
If it finds one, then it transfers control to that handler, passing it the inputs to the effectful operation as well as call stack from the point where that handler was installed to the point where the effectfual operation was performed.
This call stack is waiting for an answer to the effectful operation's outputs.
Eventually, someone can provide an answer and resume the call stack within that answering site's dynamic scope.
Whether the handler itself is captured as part of the call stack, so that it can handle potential future effectful operations performed by this captured call stack, depends on whether one uses the available "deep" library or the available "shallow" library.
OCaml is also a polymorphic language with frequent currying.
Both of this introduce complications, particularly for WebAssembly, but those complications are fairly orthogonal to the challenges of stack-switching.
As a consequence, we put them aside.
OCaml also has first-class effects: each effect is a case of an open variant type, which is the input to the perform operation, and a handler can choose to handle all values of this type.
While these first-class uses exist and are important, they are also uncommon and essentially represent the slow path.
In fact, supporting only this slow path can quadratically increase the number of stack switches performed during a program's execution.
So, for simplicity, we will focus on how this design can efficiently support the common fast path where the effect being performed or handled (or at least its payload) is known at compile time.
The more important complication, though, is that effect constructors are generative and can be generated at run time (which is used by, for example, the standard library's implementation of generators).
To handle this, at run time we generate a globally unique integer every time an effect is dynamically generated.
Multicore OCaml implements dynamic scope using mutably-linked lists of effect handlers, whose GC type defined by OCaml will be shorthanded with handlerref.
Multicore OCaml uses thread-local storage to maintain the dynamic scope of the current execution, which in WebAssembly is currently best encoded as a global of type handlerref.
Just as Multicore OCaml updates this thread-local value every time it performs an effect or resumes a continuation, the WebAssembly implementation would update the global to reflect the corresponding change to the dynamic scope of the current execution.
Each handlerref has at least three fields.
The first is a mutable handlerref field indicating the current parent of the handler, i.e. the handlerref that is currently this handler's dynamic scope.
The second is an array of i32 and funcref pairs indicating the unique integers of the effects this handler handles and the function that implements the handling for that effect.
If the i32 represent an effect whose signature is [ti*] -> [to*], then that function will have type [handlerref ti*] -> [to*].
The handlerref for this function is intended to be the handler at hand, i.e. we are essentially encoding method dispatch on an object.
The third field is a bit indicating whether the handler is "deep" or "shallow".
When an effect is performed, the global (or thread-local storage) is used to get the closest handler for the current dynamic scope, and then the linked list is searched for a node that handles the effect at hand, comparing the unique integer of the effect being performed with the integers in the handlers' arrays.
If no match is found, then some default action is performed, such as an exception being thrown.
If a match is found, then first a local $former_scope is set to the current value of the global handlerref, and the global is set to the parent of the matching handler.
Then another local $root_scope is initialized depending on whether the matching handler is "deep" or "shallow".
If deep, then it is set to the matching handler.
If shallow, then it is set to the handler who the matching handler is the parent of, or null if the global was directly the matching handler.
Then $root_scope's parent field is cleared.
After making these adjustements to the dynamic scope, we downcast the matching funcref and call it with the matching handler along with the inputs the effectful operation—what this function does depends on the handler, as discussed below.
Once that function returns, we set the parent field of $root_scope to the value of the global, and we set the global to the value of $former_scope.
Finally, we use the values returned by the function as the outputs of the effectful operation.
What the funcref does depends on the details of the matching handler's handling of the specific effect at hand.
In particular, if the handling is tail resumptive, meaning every local control path necessarily ends with it providing an answer for the outputs of the effectful operation, then no stack switching is necessary.
Instead, the funcref can simply execute this handling on the current stack and return at each of these answer points.
Otherwise, when the handler was first installed it ran the code it was handling on a new stack.
This code is expected to return some t*, and the new stack was given some newly allocated mutref (resumeref (result t*)) to return these values to should that code complete.
This same mutref is stored as a field in the handler.
The resumeref (result t*) is fetched from the mutref (whose content is then cleared).
We then resume.switch_call (result to*) to this resumeref, using a function that executes the handler's handling of the effect at hand.
The handling OCaml code is given a continuation to resume with its answer to effectful operation.
This continuation is implemented as a ref (struct (mutref (resumeref (result t*))) (resumeref (result to*))), containing the mutref stored in the handler and the resumeref resulting from the above resume.switch_call (result to*).
In order to resume such a continuation, one switches to the resumeref (result to*) using resume.switch_call (result t*) with a function that first takes the resulting resumeref (result t*) and stores it in the mutref before returning the provided to* values.
The above is nearly a complete description of how to implement OCaml's algebraic effects with this design.
The key missing component is exceptions, which we can efficiently support the aforementioned integration with the exception-handling proposal.
Each place a handler is installed is set up to catch all exceptions, pop the handler off the current dynamic scope (by updated the global), and rethrow the exception.
Every root frame of the newly created stacks is set up to catch all exceptions and then use a resume.switch_drop_call to rethrow the exception on the resumeref currently in the mutref the stack was given when it was created.
And, as a final detail, the functions implementing tail-resumptive handlings that might throw an exception are also set up to catch all exceptions and rethrow them on the resumreref in the mutref stored in the handler; they cannot simply propagate the exception to the caller because the caller is not semantically included as part of their dynamic scope.
JavaScript Promise Integration (JSPI)

JSPI was developed to make a common application of stack-switching for WebAssembly on the web available earlier while the core stack-switching proposal was developed.
With this core stack-switching design, JSPI can be implemented simply as a standard library.
Having this implementation both eases the burden of maintaining JSPI and provides a semantics for how JSPI should interact with core stack-switching.
A Suspender simply has a private field of type mutref (resumeref (result))—we will use suspenderref as a shorthand for this type.
A suspender is active if the mutref's content is not null.
Whether a suspender is moribund or suspended is not visible from its contents (and does not need to be).
A WebAssembly.Function constructed with usage promising: last is created by instantiating the following module with the function being wrapped and the obvious JS imports:
(module
  (import "wrapped" (func $wrapped (param ti* externref) (result to*)))
  (import "to_suspender" (func $to_suspender (param suspenderref) (result externref)))
  (import "Promise.withResolvers" (func $new_promise (result externref externref externref)))
  (import "many_to_externref" (func $many_to_externref (param to*) (result externref)))
  (import "exnref_to_externref" (func $exnref_to_externref (param exnref) (result externref)))
  (import "call_with" (func $call_with (param externref externref)))

  (func (export "export")
      (param $arguments ti*)
      (result externref)
      (local $promise externref)
      (local $resolve externref)
      (local $reject externref)
    (locals.set $promise $resolve $reject (call $new_promise))
    (resume.switch_call (result) $root (locals.get $arguments $resolve) (resume.new (result)))
    (return (local.get $promise))
  )
  
  (func $root
      (param $arguments ti*)
      (param $resolve externref)
      (param $reject exnref)
      (param $parent (resumeref (result)))
      (local $suspender suspenderref)
    (local.set $suspender (mutref.new (local.get $parent)))
    (block $finished ;; []
      (block $exn_handler ;; [exnref]
        (catch_all $exn_handler
          (call $call_with (local.get $resolve) (call $many_to_externref (call $wrapped (locals.get $arguments) (call $to_suspender (local.get $suspender)))))
          (br $finished)
        )
      )
      (call $exnref_to_externref) ;; changes value stack from [exnref] to [externref]
      (call $call_with (local.get $reject)) ;; uses externref on value stack as second argument
    )
    (local.set $parent (mutref.get (local.get $suspender)))
    (mutref.set (local.get $suspender) (ref.null))
    (resume.switch_drop (local.get $parent))
  )
)

A WebAssembly.Function constructed with usage suspending: last is created by instantiating the following module with the function being wrapped, the obvious JS imports, and the await_promise function defined afterwards:
(module
  (import "wrapped" (func $wrapped (param ti*) (result externref)))
  (import "from_suspender" (func $from_suspender (param externref) (result suspenderref)))
  (import "is_promise" (func $is_promise (param externref) (result i32)))
  (import "externref_to_many" (func $externref_to_many (param externref) (result to*)))
  (import "add_listener_to_promise" (func $await_promise (param externref suspenderref)))

  (func (export "export")
      (param $arguments ti*)
      (param $extern_suspender externref)
      (result to*)
      (local $suspender suspenderref)
      (local $result externref)
      (local $parent (resumeref (result)))
    (local.set $suspender (call $from_suspender (local.get $extern_suspender)))
    (local.set $result (call $wrapped (locals.get $arguments)))
    (if (call $is_promise (local.get $result))
      (then
        (local.set $parent (mutref.get (local.get $suspender)))
        (mutref.set (local.get $suspender) (ref.null))
        (local.set $result (resume.switch_call (result externref) $add_listener_to_promise (locals.get $result $suspender $parent)))
      )
    )
    (return (call $externref_to_many (local.get $result)))
  )
)

The add_listener_to_promise(promise, wasm_suspender, suspension) JS function is defined as promise.then((value) => resolve(wasm_suspender, value, suspension), (exception) => reject(wasm_suspender, exception, suspension)) (ignoring the returned promise), where resolve and reject are the exports of the following module:
(module
  (import "throw" (func (param externref))))

  (func (export "resolve")
      (param $suspender suspenderref)
      (param $value externref)
      (param $suspension (resumeref (result externref)))
    (resume.switch_call (result) $resume_with_value (locals.get $suspender $value $suspension))
  )
  
  (func $resume_with_value
      (param $suspender suspenderref)
      (param $value externref)
      (param $parent (resumeref (result)))
      (result externref)
    (mutref.set (locals.get $suspender $parent))
    (return (local.get $value))
  )

  (func (export "reject")
      (param $suspender suspenderref)
      (param $exception externref)
      (param $suspension (resumeref (result externref)))
    (resume.switch_call (result) $resume_with_exception (locals.get $suspender $exception $suspension))
  )
  
  (func $resume_with_exception
      (param $suspender suspenderref)
      (param $exception externref)
      (param $parent (resumeref (result)))
      (result externref)
    (mutref.set (locals.get $suspender $parent))
    (call $throw (local.get $exception))
    unreachable
  )
)

Note that this implementation makes no use of globals.
In combination with the composability guarantee of this core stack-switching design, this demonstrates that using lexical scoping for suspenders indeed made the JSPI design composable.