laughinghan/borrow_checking.md

## borrow_checking.md

      
    Raw
  

              borrow_checking.md
            
          
    Borrow Checking in Mechanical

Borrow checking serves two purposes in Mechanical. For compiling to JS, the only purpose is to detect when data structures like records or arrays can be mutated in-place rather than copied. In the future when we compile to WebAssembly or native, the second purpose will be for "compile-time reference counting", like Lobster's memory management model.
Sketch:

part of the inferred signature of a function, in addition to the expected type of each argument, is the "ownership level" the function needs over each argument
there are 5 levels of ownership:

unused means the argument is never used by the function never, or it's only used in pure expressions whose values are never used
borrow means the function just needs temporary read access. It's deduced when all operations on the argument are borrow operations, such as reading a scalar field on a record argument, passing it as a borrow argument to another function, etc. After the function returns, the argument can be deallocated or mutated no problem
mutate means the argument is mutated in-place by the function. It's deduced when any operation on the argument is a mutate operation and none are alias operations, regardless of borrow operations. Examples of mutate operations include updating a field of a record argument, appending something to an array argument, passing it as a mutate argument to another funcion, etc. After the function returns, the caller must not read, access, reference, or otherwise use the argument. If the caller has a heap value they want to pass to the function but also use for other stuff later, the value must be copied before being passed it to the function
alias means the function might return the argument unchanged, might return something inside (transitively referenced by) the argument, or might allocates and returns new heap values that (possibly transitively) reference the argument or something inside. It's deduced when there's at least one possible chain of alias operations from the argument to the return value. Examples of alias operations include the identity function, appending the argument to an array, passing it as a alias argument to another function, closing over it in a function or command, etc. After the function returns, the caller should either return it or deallocate it.

if a function wraps and returns an argument, but also separately mutates it, the function must copy the argument value before passing it to the mutating operation
if a function mutates an argument value and then aliases and returns the mutated value, that argument is marked mutate not alias. alias only applies to arguments that are aliased unchanged, because it means the caller can pass it to lots of other borrow and alias operations without copying


(unused probably won't be very useful at first since functions that unconditionally ignore their arguments will be uncommon, but might become reasonably useful when ownership level can depend on the type of the argument or the type of other arguments)
type inference tells us for every variable & expression, whether the value is heap-allocated
what about conditional_mutate?

if caller doesn't reuse argument, def don't want to allocate and copy in the function
if caller can stack-allocate because it eg throws away result after reading it, probably don't want to heap-allocate and copy in the function
but if caller does reuse argument and does return result, then there's no reason for the caller to possibly wastefully pre-emptively heap-allocate and copy, the function should use borrow-semantics and allocate and copy

if the caller conditionally reuses the argument or conditionally throws away the result, we could try to check if that's a superset of the cases where the function mutates; if so the function should use the given memory (and in the other cases the function is known to use borrow/alias semantics)


during codegen:

heap-allocated values must generally be copied before being mutated, except if all of the following apply:

the value isn't returned
the value isn't aliased, or if it is, the (transitively) aliased value isn't returned or mutated
this is the last mutate operation on the value; all other mutate operations should be passed a copy
(despite all the restrictions this should be a common case, think x.mutate('this').mutate('that').mutate('the other'))


functions are pure, so if a value is created but then neither returned nor mutated, it needs to be deallocated, no refcount check necessary
well, if it's borrowed to access a subvalue, and that subvalue later is returned (possibly after being aliased or consumed), that subvalue shouldn't be deallocated. Gotta watch out for that
refcount checks are definitely necessary when deallocating a value that's being overwritten in a mutation operation (e.g. when updating the field of a record, if the previous value of the field was heap-allocated, it may need to be deallocated)
wrap operations should generally insert a refcount increment
if we have a heap-allocated value, we do a new_ref operation on it, and return the results of a borrow operation on the wrapped value, then we need to deallocate and don't need to check its refcount. In theory the wrap operation doesn't even need to increment its refcounts, but I kinda doubt that would be a worthwhile optimization
TODO: commands?


note that the borrow/mutate/alias analysis necessary for mutation is actually a superset of the analysis necessary for the "compile-time refcounting" memory management; without mutation, refcounting analysis would just treat mutates as borrows. Kinda cool that refcounting analysis will be able to just piggyback onto the existing mutation analysis

wait a minute. Can't we view it as the other way around? Can't we view mutation as a free-immediately-followed-by-allocate, so we should do compile-time refcounting and then allow mutation without copy in cases where the refcount is known to be exactly 1?

so functions annotate their parameters as just either borrow or mutate (or read/write?); but mutate arguments aren't actually mutated unless its refcount is exactly 1, otherwise they're copied
and then callers, just before passing in a mutate argument, increments its refcount iff the argument is reused later in the caller, decrementing right after the call (these may later be collapsed away)

if an argument is reused but the result of the call is eventually discarded, caller can stack-allocate a buffer, set the refcount to 1, and pass that in as the mutate argument

note that stack-allocated buffers still need a refcount, not to track when to deallocate but to track mutability, the function may pass the buffer to two different functions as mutate arguments in which case it needs to have incremented refcount before passing to one of them


As explained by (TODO: find this paper), the overhead of constantly incrementing and decrementing refcounts can be nearly eliminated by realizing that if you only need the refcount to be correct at points in time A and B, then no matter what happens in between, you just have to add 1 to the refcount for every new reference to the thing at time B that wasn't present at time A, and subtract 1 for every reference at time A that is no longer present at time B. In this case for example, the refcount isn't read at all on borrow arguments (it may be incremented if the argument is aliased in the return value, but it won't be read) so it needn't be incremented; but the refcount is read on mutate arguments (to decide whether to reuse the memory or copy) so it needs to be incremented appropriately.
Future ideas:

fine-grained lifetime-tracking: instead of just tracking whether an argument is aliased anywhere at all in the returned value, track the specific paths inside the returned value at which the argument is referenced
can we try to allocate on the stack as much as possible by generating "pre-allocators" that calculate how much memory a function will need, and part of the caller contract is to call the pre-allocator first, stack-allocate however much it says to, and pass the allocated block into the function? This seems like it should be a straightforward program transformation that turns every allocation into an integer computation, and ignores everything that doesn't involve an allocation that's returned
region aka arena-based memory management, where allocations are grouped into regions/arenas, the entirety of which can be deallocated in one step. Could this be useful for functions for which we're unable to generate pre-allocators?

possibly this isn't that good of an idea—stack allocation works for non-recursive functions, and in a long-running tail-recursive loop you might really care about deallocating memory on a fine-grained basis
but maybe for a case like a loop joining an array of strings, it's actually slower to loop through the array twice, once to calculate the total memory and then again to write the string. In that specific case bump-allocating is actually better than pre-allocating on the stack because there's nothing to fine-grained deallocate anyway
so maybe a 4-pronged strategy:

non-recursive functions with inexpensive pre-allocation: caller stack-allocates, passes in fixed-offsets allocator
simple tail-recursive loops: caller initializes and passes in an arena allocator for return values; for values that aren't returned, the functions themselves should also stack- or arena-allocate
non-recursive functions that may have expensive pre-allocation: similar to simple loops, caller initializes and passes in an arena allocator for return values, otherwise calculations that aren't returned should hopefully be stack-allocatable
complex recursive functions: caller passes in regular heap allocator


seems over-engineered
are there non-recursive functions that shouldn't be arena-allocated?
maybe a simpler, no pre-allocation, arena-only strategy:

function calls take in a return-value allocator and a temp allocator, initialized to the same arena allocator
non-recursive functions: pass on both allocators to transitive calls
tail-recursive functions (every recursive call is a tail-call):

ignore given temp allocator; each iteration, re-initialize a temp arena allocator and deallocate before tail-recursing
for non-recursive calls, use that temp allocator for both return-value and temp
for arguments to tail-calls that will eventually be returned, use return-value allocator
for other arguments to tail-calls, use heap allocator


general recursive functions:

ignore given temp allocator
for values that will be returned (possibly after tail-calls), use return-value allocator
for function calls (recursive or not, tail-calls or not), use heap allocator for arguments and return value, and use an arena allocator re-initialized per call for temp


(note: recursive calls may be direct (function calling itself) or indirect (function calls another function that transitively calls the original))