paniq/borrow_inference.md

## borrow_inference.md

      
    Raw
  

              borrow_inference.md
            
          
    Borrow Inference

by Leonard Ritter, Duangle GbR
This document has only historical significance and does not describe the borrow checker as it is now implemented. Please see this document for a more recent description.
This is a description of borrow inference, an alternative to borrow checking
that requires no declarative annotations to support proper management of
unique values and borrowed references at compile time.
We are trying to infer whether a value is borrowed or has to be unique, applying
this constraint going backwards through the function, as we also need to prevent
lifetimes from lingering, and generate destructors as early as possible.
fn example (c x y)
    # and ends here
    let z =
        if c x
        else y
    # inference starts here
    return z

Moving

We move implicitly only from the interior of expressions, and we only move
within the expression's scope and above; thus it is impossible to propagate
a move backwards into a function so that the function itself enforces a move.
This is so we can infer a single definite profile for a function.
As a result, inconsistencies are reported as errors rather than being resolved,
and the user must make alterations to the code to resolve those issues.
Within an expression, we tag values as soon as we can establish a definite role.
A value is tracked from end of life to its point of declaration. First seen,
we analyze in which fashion the value appears and then ensure that all
previous uses before that point are consistent with its role.
If we see a value first as being returned from an expression (lifted to a higher scope),
we check whether the value was generated within this expression or outside
of it (indicating that its lifetime is guaranteed to exceed ours). If the value was
generated within the expression, a move is necessary, and will be auto-generated;
The value is tagged as to-be-moved. If the value is from outside of the expression,
we tag it as to-be-borrowed.
If we see a value being passed as an argument to an expression (e.g. a call),
we need to check if the expression consumes that value.
If the expression moves the value (and thus takes ownership), but the value
is already tagged as to-be-moved or to-be-borrowed, an error is generated.
Otherwise the value is tagged as to-be-borrowed.
Lastly, if this is the first appearance of the value, and it has not been
moved by the expression, then the value must be implicitly moved and a
destructor generated for it after the expression. (unless the value is borrowed,
see further down)
If we see a value's original declaration, we tag it as to-be-created.
Special care has to be taken when merging branches or exiting multiple scopes
at once, before they have naturally concluded.
If a scope is exited, all values of the scope(s) that we jump out of, whose
state is to-be-moved, but not to-be-created, must be destroyed. We can either
automatically destroy the values here, or generate an error message, forcing the
user to explicitly transfer ownership to the exiting scope, which will naturally
tag those values as to-be-moved. Forcing the user to tag manually helps to
read the code better, so is preferred.
If a value is moved into just one branch of an if-expression, we can either
automatically destroy the value in the other branch, or generate an error message,
forcing the user to explicitly move the value to the other branch as well, so
the pattern is consistent. forcing the user to tag manually helps to
read the code better, so is preferred.
When branches are merged, all altered value states need to be merged into the
parent, and they need to be consistent.
Borrowing

A borrowed value appears as its own value, but keeps the original value alive
as long as the value itself is in use. A borrowed value going out of scope
has therefore no effect, unless the unique value it is referencing has not
been tagged yet as to-be-moved or to-be-borrowed.
But we do not know yet, when we first see a value, whether it is borrowed or
unique. To this effect, we keep a data dependency tree, tracking for each
value which values exterior or inside the local scope they depend on.
# data dependency is tracked forward
fn example (c x y)
    let z = # borrows x or y
        # merge the dependencies of these branches
        if c x               # returns borrowed x (still exists in parent scope)
        else y               # returns borrowed y
    let z = # is unique
        if c
            let k = (move x) # depends on x
            k                # returns local unique, can not borrow
        else y
            let k = (move y) # depends on y
            k                # returns local unique, can not borrow
    return x # returns borrowed x (still exists in caller's scope)

When a value depends on no other value, it is original/unique. When it depends
on one or multiple other values, It keeps those values alive, and thus is a
borrowed value that can not be moved, or returned out of the scope of the
values it is borrowing.
Thus the very first appearance of a borrowed value (that is, a value with
dependencies), which is, as we're walking backwards, its very last appearance,
is a potential move point for the unique values it depends on.
We therefore modify our rules to accommodate for borrowed values:
If we see a borrowed value first as being returned from an expression (lifted
to a higher scope), we check whether the values it depends on were generated
within this expression or outside of it. If the unique value was generated
within the expression, the return is illegal and must be reported as an error.
If the unique value is not yet tagged as to-be-moved or to-be-borrowed, then we
would have to generate a destructor here, but the borrowed value would be
immediately invalid; therefore, we must also report an error.
If we see a borrowed value being passed as an argument to an expression
(e.g. a call), and the expression wants to move that value, an error is
generated. Otherwise the associated unique value is tagged as to-be-borrowed.
Lastly, if this is the first appearance of the associated unique value, that is
the value has not been tagged as to-be-moved or to-be-borrowed, the unique value
must be moved and a destructor generated for it after the expression. We do
this implicitly.
Conflicts can arise when a unique value is last referenced by a borrowed value
in one branch, but not another; by the previously stated rule, this branch
inconsistency would be reported as an error. But the user can quickly fix this
problem by either moving the value in the alternative branch, or moving it in
an expression after the branch.
Function Boundaries

Another problem that arises is at function boundaries: here, we do not know
whether the parameters are borrowed or not. Therefore we assume that we are
looking at unique values whose ownership has not been transferred to our
function initially.
When function parameters are referenced, no destructor is ever generated; yet
the parameters can be moved, and once tagged as such, the resulting function
signature is altered.
Therefore it is necessary for function signatures to carry the data dependency
information, if any return value is borrowing from input arguments, or an
input argument is moved into the function.
To indicate that a parameter is going to be moved by the function, the signature
prefixes a † in front of the argument type T.
A borrowed return type is prefixed by %p0|p1|...|pN:, where p0..pN are
indices of input parameters. the typechecker can use this information to
correctly reconstruct the data dependency tree.
Non-returning functions are special, in that they are not permitted to borrow
unique values, since the caller is unable to destroy them. So non-returning
functions must always move all unique arguments. Note that this limitation
applies only to functions which do neither return regularly nor raise an error;
Functions which always raise errors return on a second path, and so the caller
is still able to clean up after the call. More about that in the next section.
Mutation

So far we have only considered immutable values. There are several ways to
create mutable values in Scopes:

Directly, through generation of globals or allocating mutable pointers on
stack or heap. The fundamental and only operation to mutate pointers contents
is store. The pointers themselves remain immutable on the top level.
Indirectly, through resource handle API functions. Popular examples are system
file handles, OpenGL resource handles or just Scopes' own IL API. Handles
are almost always opaque, and immutable.

Until now, immutable handles to mutable data have not been considered.
We observe that if a handle is permitted to have borrowed references, which may
either be references to the full handle or just a part of it, those references
may be invalidated by a mutation.
Therefore we want to enforce the constraint that at the time that a mutation
is taking place, all borrowed values for a particular unique value become
invalid, except for the borrowed value used to undertake the mutation.
We implement this by counting the number of live borrows to the unique value
depended on by the mutating instruction. If that number is greater than one,
indicating that more than one borrowed value will be accessed after the
mutation, an error is generated.
Furthermore, we propagate mutation event(s) to the function signature, so that
the mutation barrier can be enforced for calls as well. Parameters that borrow
a value to be mutated in the function will have their type postfixed by !.
Exceptions

A special situation occurs when a function call is implicitly able to exit
the current function, through raising an error. If a unique value is moved
after a trycall, the value may not be cleaned up. Therefore local unique values
can not be moved after an unhandled trycall, which means they must not be
referenced at all, ensuring that all destructors are generated before the
trycall. One fix would be to automatically collect values with to-be-moved tags
(but no to-be-created tags) and generate destructors for those -- trycalls would
then be extended with a cleanup block that contains the destructors.
But even without automatic handling, the user is still able to solve this
problem manually, by wrapping any series of trycalls in a try-except construct,
ensuring control flow does not exit the current scope. Moving values after this
section is once again safe.
Loops

Moving inside a loop is only valid as long as the value being moved is inside
the loop scope; A value from a parent scope could only be moved once, but
the loop body is going to run repeatedly. The user has to pass the parent
value to be moved as a loop argument.
Loop parameters can be inferred to be movable/borrowed based on their init
arguments; if the init argument is borrowed, each repeat instruction will
also only borrow its argument. If the init argument is movable, however,
we can't know in advance if the argument will really be moved. This only
becomes clear after evaluating the loop body.
But we can infer move status from a repeat instruction alone. When a unique value
is passed to a repeat instruction, it is always being moved, because the loop
scope does not nest. We can backpropagate borrow/move status of repeat arguments
to the loop arguments, and continue working ourselves backwards through the
function.
We treat the repeat like a return out of the loop scope, same as the break
instruction. That means repeating or breaking borrowed unique values from the
loop scope is illegal (though it is fine to borrow unique values living above
the loop scope).
TODO


Aggregate types that can hold unique or borrowed values have not been
considered yet. Here, explicit type annotations similar to the ones used for
functions will likely have to be made, and inference will have to be extended
to track individual aggregate elements.