Skip to content

Instantly share code, notes, and snippets.

@paniq
Last active January 22, 2024 06:59
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save paniq/71251083aa52c1577f2d1b22be0ac6e1 to your computer and use it in GitHub Desktop.
Save paniq/71251083aa52c1577f2d1b22be0ac6e1 to your computer and use it in GitHub Desktop.
Borrow Inference

Borrow Inference

by Leonard Ritter, Duangle GbR

This document has only historical significance and does not describe the borrow checker as it is now implemented. Please see this document for a more recent description.

This is a description of borrow inference, an alternative to borrow checking that requires no declarative annotations to support proper management of unique values and borrowed references at compile time.

We are trying to infer whether a value is borrowed or has to be unique, applying this constraint going backwards through the function, as we also need to prevent lifetimes from lingering, and generate destructors as early as possible.

fn example (c x y)
    # and ends here
    let z =
        if c x
        else y
    # inference starts here
    return z

Moving

We move implicitly only from the interior of expressions, and we only move within the expression's scope and above; thus it is impossible to propagate a move backwards into a function so that the function itself enforces a move. This is so we can infer a single definite profile for a function.

As a result, inconsistencies are reported as errors rather than being resolved, and the user must make alterations to the code to resolve those issues.

Within an expression, we tag values as soon as we can establish a definite role.

A value is tracked from end of life to its point of declaration. First seen, we analyze in which fashion the value appears and then ensure that all previous uses before that point are consistent with its role.

If we see a value first as being returned from an expression (lifted to a higher scope), we check whether the value was generated within this expression or outside of it (indicating that its lifetime is guaranteed to exceed ours). If the value was generated within the expression, a move is necessary, and will be auto-generated; The value is tagged as to-be-moved. If the value is from outside of the expression, we tag it as to-be-borrowed.

If we see a value being passed as an argument to an expression (e.g. a call), we need to check if the expression consumes that value. If the expression moves the value (and thus takes ownership), but the value is already tagged as to-be-moved or to-be-borrowed, an error is generated. Otherwise the value is tagged as to-be-borrowed.

Lastly, if this is the first appearance of the value, and it has not been moved by the expression, then the value must be implicitly moved and a destructor generated for it after the expression. (unless the value is borrowed, see further down)

If we see a value's original declaration, we tag it as to-be-created.

Special care has to be taken when merging branches or exiting multiple scopes at once, before they have naturally concluded.

If a scope is exited, all values of the scope(s) that we jump out of, whose state is to-be-moved, but not to-be-created, must be destroyed. We can either automatically destroy the values here, or generate an error message, forcing the user to explicitly transfer ownership to the exiting scope, which will naturally tag those values as to-be-moved. Forcing the user to tag manually helps to read the code better, so is preferred.

If a value is moved into just one branch of an if-expression, we can either automatically destroy the value in the other branch, or generate an error message, forcing the user to explicitly move the value to the other branch as well, so the pattern is consistent. forcing the user to tag manually helps to read the code better, so is preferred.

When branches are merged, all altered value states need to be merged into the parent, and they need to be consistent.

Borrowing

A borrowed value appears as its own value, but keeps the original value alive as long as the value itself is in use. A borrowed value going out of scope has therefore no effect, unless the unique value it is referencing has not been tagged yet as to-be-moved or to-be-borrowed.

But we do not know yet, when we first see a value, whether it is borrowed or unique. To this effect, we keep a data dependency tree, tracking for each value which values exterior or inside the local scope they depend on.

# data dependency is tracked forward
fn example (c x y)
    let z = # borrows x or y
        # merge the dependencies of these branches
        if c x               # returns borrowed x (still exists in parent scope)
        else y               # returns borrowed y
    let z = # is unique
        if c
            let k = (move x) # depends on x
            k                # returns local unique, can not borrow
        else y
            let k = (move y) # depends on y
            k                # returns local unique, can not borrow
    return x # returns borrowed x (still exists in caller's scope)

When a value depends on no other value, it is original/unique. When it depends on one or multiple other values, It keeps those values alive, and thus is a borrowed value that can not be moved, or returned out of the scope of the values it is borrowing.

Thus the very first appearance of a borrowed value (that is, a value with dependencies), which is, as we're walking backwards, its very last appearance, is a potential move point for the unique values it depends on.

We therefore modify our rules to accommodate for borrowed values:

If we see a borrowed value first as being returned from an expression (lifted to a higher scope), we check whether the values it depends on were generated within this expression or outside of it. If the unique value was generated within the expression, the return is illegal and must be reported as an error. If the unique value is not yet tagged as to-be-moved or to-be-borrowed, then we would have to generate a destructor here, but the borrowed value would be immediately invalid; therefore, we must also report an error.

If we see a borrowed value being passed as an argument to an expression (e.g. a call), and the expression wants to move that value, an error is generated. Otherwise the associated unique value is tagged as to-be-borrowed.

Lastly, if this is the first appearance of the associated unique value, that is the value has not been tagged as to-be-moved or to-be-borrowed, the unique value must be moved and a destructor generated for it after the expression. We do this implicitly.

Conflicts can arise when a unique value is last referenced by a borrowed value in one branch, but not another; by the previously stated rule, this branch inconsistency would be reported as an error. But the user can quickly fix this problem by either moving the value in the alternative branch, or moving it in an expression after the branch.

Function Boundaries

Another problem that arises is at function boundaries: here, we do not know whether the parameters are borrowed or not. Therefore we assume that we are looking at unique values whose ownership has not been transferred to our function initially.

When function parameters are referenced, no destructor is ever generated; yet the parameters can be moved, and once tagged as such, the resulting function signature is altered.

Therefore it is necessary for function signatures to carry the data dependency information, if any return value is borrowing from input arguments, or an input argument is moved into the function.

To indicate that a parameter is going to be moved by the function, the signature prefixes a in front of the argument type T.

A borrowed return type is prefixed by %p0|p1|...|pN:, where p0..pN are indices of input parameters. the typechecker can use this information to correctly reconstruct the data dependency tree.

Non-returning functions are special, in that they are not permitted to borrow unique values, since the caller is unable to destroy them. So non-returning functions must always move all unique arguments. Note that this limitation applies only to functions which do neither return regularly nor raise an error; Functions which always raise errors return on a second path, and so the caller is still able to clean up after the call. More about that in the next section.

Mutation

So far we have only considered immutable values. There are several ways to create mutable values in Scopes:

  • Directly, through generation of globals or allocating mutable pointers on stack or heap. The fundamental and only operation to mutate pointers contents is store. The pointers themselves remain immutable on the top level.
  • Indirectly, through resource handle API functions. Popular examples are system file handles, OpenGL resource handles or just Scopes' own IL API. Handles are almost always opaque, and immutable.

Until now, immutable handles to mutable data have not been considered.

We observe that if a handle is permitted to have borrowed references, which may either be references to the full handle or just a part of it, those references may be invalidated by a mutation.

Therefore we want to enforce the constraint that at the time that a mutation is taking place, all borrowed values for a particular unique value become invalid, except for the borrowed value used to undertake the mutation.

We implement this by counting the number of live borrows to the unique value depended on by the mutating instruction. If that number is greater than one, indicating that more than one borrowed value will be accessed after the mutation, an error is generated.

Furthermore, we propagate mutation event(s) to the function signature, so that the mutation barrier can be enforced for calls as well. Parameters that borrow a value to be mutated in the function will have their type postfixed by !.

Exceptions

A special situation occurs when a function call is implicitly able to exit the current function, through raising an error. If a unique value is moved after a trycall, the value may not be cleaned up. Therefore local unique values can not be moved after an unhandled trycall, which means they must not be referenced at all, ensuring that all destructors are generated before the trycall. One fix would be to automatically collect values with to-be-moved tags (but no to-be-created tags) and generate destructors for those -- trycalls would then be extended with a cleanup block that contains the destructors.

But even without automatic handling, the user is still able to solve this problem manually, by wrapping any series of trycalls in a try-except construct, ensuring control flow does not exit the current scope. Moving values after this section is once again safe.

Loops

Moving inside a loop is only valid as long as the value being moved is inside the loop scope; A value from a parent scope could only be moved once, but the loop body is going to run repeatedly. The user has to pass the parent value to be moved as a loop argument.

Loop parameters can be inferred to be movable/borrowed based on their init arguments; if the init argument is borrowed, each repeat instruction will also only borrow its argument. If the init argument is movable, however, we can't know in advance if the argument will really be moved. This only becomes clear after evaluating the loop body.

But we can infer move status from a repeat instruction alone. When a unique value is passed to a repeat instruction, it is always being moved, because the loop scope does not nest. We can backpropagate borrow/move status of repeat arguments to the loop arguments, and continue working ourselves backwards through the function.

We treat the repeat like a return out of the loop scope, same as the break instruction. That means repeating or breaking borrowed unique values from the loop scope is illegal (though it is fine to borrow unique values living above the loop scope).

TODO

  • Aggregate types that can hold unique or borrowed values have not been considered yet. Here, explicit type annotations similar to the ones used for functions will likely have to be made, and inference will have to be extended to track individual aggregate elements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment