Bike/clasp_nonlocal.md

## clasp_nonlocal.md

      
    Raw
  

              clasp_nonlocal.md
            
          
    Clasp is an implementation of the Common Lisp programming language which uses LLVM as its compiler backend. This has, mostly, worked pretty well; despite LLVM being originally designed for C++, problems with converting Lisp semantics to a form LLVM can understand have mostly been minor to moderate.
A prominent and unfortunate exception to this is in LLVM's treatment of what i'll call "nonlocal control flow". Lisp and C++ have very different semantics in this area, and LLVM is almost entirely oriented around C++'s way of doing things. While Clasp does fully implement Lisp's nonlocal exit semantics, it does so in a very inefficient and convoluted way. This has caused major performance issues in real programs (the compiler itself).
Nonlocal semantics

By "nonlocal control" I mean a transfer of control between functions by any means other than a call or a normal return. A nonlocal "exit" is one that transfers to some function that is already on the call stack, i.e. a caller of the current function, or its caller, or etc. Languages like Scheme support nonlocal "entrances" as well; this is an interesting topic, but Clasp and Lisp don't support them so I won't talk about it.
C++

C++ has, as far as I know, two systems for nonlocal control flow: the throw operator, and longjmp. There may be some others e.g. POSIX siglongjmp and formerly setcontext, but I don't think they're different enough to describe specially here.
throw

throw takes (possibly copies) some object, interrogates the call stack to look for a handler keyed to that type, and transfers control to this handler. This is a very involved operation that requires library support in any C++ implementation I am aware of, especially once you include complications like rethrowing semantics.
Because throw is conventionally used for exception handling, i.e. reporting exceptional situations to callers, C++ implementations generally use a "zero-cost" implementation that maintains performance when throw is not used, and lets it crater when it is used. The Itanium ABI is what I've dealt with most here, and is common. To interrogate the stack for handlers, the functions underlying throw check the control stack for pointers, then look at the executable object for tables describing what handlers are in place for what calls. These reading and parsing operations are extremely slow compared to most normal program operation. [Numbers here?]
LLVM's compiler support for throw and catch is in the form of the invoke/landingpad instructions and systems. (There's also catchpad - I haven't used it, but it doesn't seem much less complicated.) These are closely identifiable with the Itanium exception tables - the frontend specifies C++ types almost how they would be with the C++ catch operator, and marks calls present in try blocks, so that the backend can lay out the tables correctly.
longjmp

longjmp is a lower level means of executing a nonlocal exit, inherited from C. longjmp immediately exits to the setjmp which provided its first argument. longjmp takes a parameter, which is "passed" to setjmp, but this cannot provide an actual value and can only be used to direct control flow. (The underlying runtime functions can be used to provide a value, practically speaking, but this is not part of C++ semantics.)
Use of longjmp is usually discouraged nowadays, as the C++ exception system is a cleaner way of dealing with exceptions.
Implementation of longjmp and setjmp is done mainly in the runtime. LLVM support includes the returns_twice function attribute for setjmp, as some optimizations don't work correctly in functions that can be nonlocally exited to repeatedly. There are also "SJLJ" intrinsics, but they are used internally by LLVM for implementing C++ exception semantics with a non-"zero cost" model.
Treatment of runtime problems

When the exception system runs into a problem, the C++ standard solution is generally to call std::terminate, resulting in the entire process halting. This is done if there is a throw with no corresponding catch, if a destructor invoked during unwinding attempts to throw again, etc.
The behavior of longjmp to a setjmp whose function has exited is left undefined. As far as I know C++ implementations don't generally make an effort to detect and report this circumstance.
Lisp

Lisp has several sets of built in operators related to nonlocal exits. tagbody/go, block/return-from, and unwind-protect are most relevant here. The other operators can be implemented on top of these ones.
go

tagbody is a Lisp construct to allow the use of a goto-like operator, go. tagbody is not usually used directly by programmers, but is instead used for the implementation of higher-level control constructs (loops, etc.).
A basic example use of these operators is (tagbody loop (unless (f) (go loop))); this is a loop that repeatedly calls the function f until it returns true. For this sort of usage tagbody and go have more or less identical semantics to C++ goto statements.
Unlike C++ goto, however, go is not constrained to be in the same function as the corresponding tagbody. For example, (tagbody repeat (f (lambda () (go repeat)))) will call f with one argument, an anonymous function. If f calls this function (with no arguments), control will immediately exit back to repeat, and f will be called again.
tagbody/go can be implemented in terms of setjmp/longjmp, and Clasp does so in limited circumstances, as this is a major performance improvement over the use of C++ exceptions - more on how Clasp implements these semantics in LLVM below. However, tagbody/go is more limited, and these limitations could be of use for optimization: The control point a go returns to is always apparent statically, which is not the case for longjmp. Since longjmp is a normal function, it must restore all possible context that could have been altered between setjmp and itself - and setjmp must have saved all this information - whereas a smartly compiled go could only restore what needs to be restored and a tagbody could save only what needs to be saved, for example not bothering with registers that are not actually used in the tagbody's function.
Sometimes the compiler may be able to prove that a go's exit point will always be on the stack, and so can elide any check code. In a subset of those cases, the dynamic contour may be determinate to some extent: it may be possible to determine that there are no intervening cleanups (unwind-protects, below), or what any cleanups are, or most specifically the exact frame offset that will need to be undone; in these cases the compiler can generate more efficient code.
return-from

block and return-from allow Lisp programs to return values nonlocally. return-from can be used similarly to a C/C++ return statement, but with an indicated point to return to that may be in another function. For example, (block a (f (lambda (x) (return-from a x)))) calls the function f with an anonymous function as an argument. If f doesn't call this function, f may return normally, and this value will be the value of the (block a ...) form. If f does call this function, the (block a ...) form immediately returns the value passed to the anonymous function. If f was defined as (defun f (g) (funcall g 13) (print "hello world")), the (block a ...) form would return 13 and not print anything.
As with tagbody/go, block/return-from can be implemented with setjmp/longjmp, and Clasp does so in limited circumstances. Similarly to that case, block/return-from is more limited in that the return point for a given return-from is always apparent statically. Additionally, however, return-from needs to return a value. This cannot be accomplished with longjmp itself, which can only return an int and with restrictions.
Because of the statically apparent link, with compiler support it would be possible to arrange for values to be returned nonlocally in a similar fashion to how they are locally, e.g. through registers which are coordinated between the "caller" and "callee". LLVM does not currently provide any such support and Clasp must use other storage to return values nonlocally.
unwind-protect

The unwind-protect operator allows Lisp programs to interpose "cleanup" code during any exit, local or not, from some code. This is commonly used in a similar fashion to destructors executed upon scope exit in C++, e.g. to clean up filesystem resources.
Use of nonlocal exits

Unlike in C++, Lisp programs commonly expect nonlocal exits to be quick and efficient, and use them for much more than just exception handling. Essentially they can be used for many (but not all) situations a Scheme fan will cheer the use of continuations for; for example, terminating a recursive search as soon as a result is found. For example, a depth-first search to find the first value satisfying some predicate might be (mostly) idiomatically written as
(defun dfs (predicate tree)
  (block dfs
    (labels ((recur (node)
                (cond ((leafp node)
                       (let ((data (leaf-data node)))
                         (when (funcall predicate data)
                           (return-from dfs data))))
                      (t
                       (mapc #'recur (children node))))))
      (recur tree))))
Here, the labels operator defines a recursive local function recur. If a call to recur finds a leaf that fits the criterion, it exits immediately, skipping search of any further nodes.
Treatment of runtime problems

Similar to the C++ treatment of longjmp, the Lisp standard leaves the consequences of expired nonlocal exits undefined. In practice, Lisp implementations generally attempt to detect this circumstance and report it using the Lisp error/condition system (not described here, but implemented in terms of these operators). This can be done fairly cheaply by saving the block/tagbody frame pointer and possibly a reasonably unique token, and then checking for the existence of these on the control stack when the return-from/go is executed.
Lisp treats some forms of exit more forgivingly. It is permitted to begin a new nonlocal exit in the middle of an unwind-protect cleanup, a circumstances that results in a std::terminate call in C++, as long as the new nonlocal exit point is farther out from the original, i.e. exits to a more distant caller. (In practice, implementations often allow exiting to any non-expired point, even if it does not meet this condition.) This can be used to smoothly indicate problems that take place during the stack unwinding process.
Clasp's implementation

Clasp aims for C++ interoperation, to the level of having its Lisp functions call C++ functions and vice versa. This means that, in general, Clasp must use the C++ exception system in order to ensure C++ frames are cleaned up correctly and so on. In limited circumstances, i.e. when it is known that no C++ frames will be between a nonlocal exit operation and its destination, Clasp tries to take advantage of a more Lisp-friendly system, to the extent supported by LLVM.
Exception-based

With the general exception mechanism, Clasp needs to indicate the particular frame an exit point (block or tagbody) refers to, rather than use a dynamic type-based mechanism like C++ does. As such, the generated code for these operator is somewhat analogous to
void* fa = __builtin_frame_address(0);
try {
  ...block body...
  // example return-from/go. in reality, this would be in a separate function
  throw Unwind(fa);
  ...block body...
} catch (Unwind& cu) {
  if (cu.frame == fa) { goto block; }
  else throw;
}
block:
On the other end, the return-from/go code is passed down the frame address, and uses it to construct a Unwind exception object which is then thrown.
unwind-protect uses essentially a catch (...) { ... } block. It cannot be implemented with destructors, even though their behavior is more closely analogous and simpler to generate, because an unwind-protect cleanup may initiate a nonlocal exit, which is forbidden for C++ destructors executed during an exit.
In order to implement the check of whether an exit point is expired, return-from and go use libunwind to manually inspect the control stack to see if the frame address they have been provided is still valid. This is done before the C++ throw, which of course inspects the stack again to find the nearest handler. Previously, I attempted to use the Itanium ABI to customize this behavior (specifically, overriding how the phase 1 search works), but this turns out to be a no-go as Itanium forbids the foreign exception objects I defined from being rethrown; and in any case the performance loss here is trivial compared to that of exception handling to begin with.
To return values, return-from places them in thread-local allocated storage, and the corresponding block code retrieves them. This imposes a size limit, which is not an issue in practice. It also necessitates that any unwind-protect cleanup code temporarily save this data elsewhere while they operate, as any internal nonlocal exits will need the same storage.
longjmp-based

When Clasp can statically determine the frames intervening between a return-from/go and its destination, and that none of these frames involve C++ or unwind-protect, it uses setjmp/longjmp instead.
Common elements

The procedures for both C++ and Lisp nonlocal exits can be broken down into more primitive components.

The stack is searched for the correct exit point. If no exit is found, some error action is taken (std::terminate in C++, an error signal in Lisp).
Stack frames between the exiting frame and the destination frame that are cleanups are found by the unwind routine; this step may or may not take place interleaved with step 1.
Cleanups are executed. This may end the unwind process (by termination in C++, or by beginning a new unwind in Lisp)
Control is transferred to the exit point.