rikkimax/dip-variable-isolate.md

## dip-variable-isolate.md

      
    Raw
  

              dip-variable-isolate.md
            
          
    Variable Isolation

Memory graph isolation, is a form of a immutable references within a programs graph of memory to create subgraphs which do not reference others.
In this proposal a focus is upon the analysis on the graph in creating and maintaining isolate sub graphs.
It takes advantage of DIP1000 for escape analysis of reference into and out of sub graph manipulation functions.
Literature Review

https://dl.acm.org/doi/10.1145/3167132.3167245
https://dl.acm.org/doi/10.1145/2384616.2384619
https://www.semanticscholar.org/paper/Types-for-Immutability-and-Aliasing-Control-Giannini-Servetto/35a68a62f21f1e9b3edc0b10efbe51e067663ac3
Type state analysis for initialized vs unitialized/invalidated.
As of February 2024, the United States of America as published a report on memory safety and its importance in programming language choice moving forward.
Of note is of particular interest is temporal vulnerabilities that cannot be solved by a garbage collector.
Nor by a borrow checker that is operating only on knowledge from within a function that does not also have knowledge of what others would have done.
https://www.whitehouse.gov/wp-content/uploads/2024/02/Final-ONCD-Technical-Report.pdf
Sub Graph of Data

In a program's data there is a single graph with multiple roots that are added and removed during the execution of a program.
Each root points to a subgraph that may reference other subgraphs.
However all non-immutable memory that the subgraph references is owned by that subgraph.
Understanding of the subgraphs and their interrelation enables the compiler to prove certain guarantees that can be used to optimize the code as well as ensuring multithreading safety without the use of runtime mechanisms.
The subgraphs references by variables is only tracked for isolated annotated variable declarations, parameters and return value.
A key element to subgraphs is the ability to transfer ownership to it, this requires that the references be known at compile time to reference a specific subgraph.
This is where the term "immutable reference" comes from.
The reference into the sub graph is immutable.
It cannot be changed to point to something else, although it can be made unusable.
When a subgraph is made unusable it is considered to be invalidated.
This can occur for a number of reasons.
Implicit invalidation occurs at the end of scope.
Explicit invalidation occurs due to some action in the code, returning or assigning are good examples of this.
After invalidation occurs, the subgraph is inaccessible, it is an error to attempt to use it.
To guarantee that complex data flow analysis is not required, all branches of conditionals must have the same result for a subgraph.
If one branch invalidates, so will the others.
This can occur with either implicit or explicit invalidation, the exact form is unimportant.
Memory Lifetime

So far the description of variable declarations and their interrelationship has been described.
However they do not cover individual values that get stored into the variables.
Understanding the values themselves allows some fine grained guarantees around aliasing of memory as well optimizing deallocation of allocated memory.
Each value that has been allocated using a known method (GC or reference counting), will have behavior specific to that value to elide unnecessary work and guarantee cleanup where possible.
To do this, at the point of acquisition of known memory strategy, the value will be tracked throughout the lifecycle in a function.
It will store if the value is predictable in its lifecycle to the function body.
This requires the subgraph that it is on to be invalidated implicitly.
When a reference counted value is predictable, its reference add and sub methods will be elided, except the last subtraction.
A GC owned memory will instruct the GC to deallocate similarly.
In situations where the compiler cannot understand the flow of memory at runtime purely by compile time measures predictability will be set to false for a given value.
This happens to be where the name originates from.
The compiler must be able to predict the runtime flow of a value within a function body to be able to safely offer these benefits.
Type Qualifiers

Understanding isolation of variables from each other is not always desirable behavior, especially in @system code.
For this we introduce two type qualifiers that also function as storage classes.
These are isolate and mayisolate.
Together we refer to variables whose type is annotated as either as being isolated.
Each type qualifier when used on a variable declaration introduce a sub graph.
However this sub graph may not be unique, it will get its instance after the initialization expression has been evaluated.
For function parameters we ignore this rule and instead introduce a unqiue subgraph for each.
As a type qualifier it only has meaning on encapsulation units or pointers.
When applied to basic types we ignore isolated type qualifiers, this does not apply to function parameters that are ref.
Explicit casting on isolated type qualifiers may only occur in @system and @trusted functions.
This is due to needing to guarantee that no other references will be alive after this has been done.
Similar to immutable in reasoning.
It is allowed to explicitly cast off isolate in @system and @trusted functions.
However as this is unsafe it is not recommended unless you are going to deallocate it.
If you do cast it off, the language will help out by explicitly invalidated the subgraph it is on, which may have unintended consequences.
On the other hand mayisolate may be implicitly cast off as long as a context pointer is not referenced by the subgraph.
The subgraph will not be invalidated, instead it will ignore that mayisolate was ever applied to it.
Disallowing Isolated Memory

A new attribute is added @readonlycontext.
This indicates that a delegate may not access any variable that is not immutable or isolate.
An isolate variable, may not be mutated, only members that are const or immutable are allowed to be referenced.
It may appear that const variables that are not isolate should be accessible in such a delegate.
However because there is no information on how many other references there are over the entire program, it is unfortunately required to assume that there could be one that is mutatable or aliasable.
The application of this attribute may be used in cases of parallelism with a stack own context similar to std.parallelism : parallel to ensure one thread will not trample on another, or it may be used with coroutines to enable the use of a delegate instead of a function pointer.
Functions

Functions have added restrictions that guarantee that subgraphs will not be conflicted in the callee.
Each function has returnable variables.
This includes the function return type as well as any parameter that is ref or out.
Each function has contextual parameter.
This includes the function this pointer as well as any function parameter.
If multiple contextual variables are isolated and marked return this is an error.
Isolated parameters that are ref may not be invalidated explicitly.
Implicit invalidation will be ignored.
Calls

When the parameter is isolated the argument may be:

scope and is isolated.
Not scope and is isolated, the argument sub graph is invalidated explicitly after the call.

If the argument is immutable and the parameter is isolated it must also be const or immutable.
If the argument is isolated, the parameter may be const or immutable if it is also scope and not return.
The isolated type qualifier will be implicitly cast off.
If a parameter is isolated and is marked ref, the argument must be from the same subgraph as the contextual parameter if there is one.
Only one isolated parameter's argument may map to a given value, if multiple arguments map to a single value this is an error.
If an argument does not map to a value, instead mapping to a subgraph is considered as a fallback.
This guarantees no aliasing and prevents double-free's.
When mapping an argument to its subgraph instead of value, only one argument may map to a given subgraph.
This requires extra processing over the other arguments to ensure only one argument maps to a given subgraph.
It does not consider arguments that map to a subgraph that is not this one.
If the called function contextual parameter is mayisolate and its returnable variable is also mayisolate, then the returned variable will inherit its isolate or mayisolate from the contextual variable argument as seen by the caller.
This is needed so that a method can apply to both isolate and mayisolate callers without duplication.
During overload resolution or an is expression, if an argument and parameter are both isolate then it is an exact match.
But if the argument is isolate and the parameter is mayisolate then it is a coersion.
Assignment

An assignment on isolated variables may only occur to unitialized variables.
This includes variables declaration initialization expressions, as well as a void initialized variable declaration with a follow up assignment.
If the lvalue is isolated the rvalue may be:

Isolated.
immutable, but the lvalue must be either const or immutable.

Tracking of value from the rvalue will be assigned to the lvalue if existant.
Use Cases

SIMD

An assumption of no aliasing may be taken if all parameters (including this pointer) are isolated with up to one non-isolated parameter.
void setter(scope isolate(float[]) result, float* input) {
    foreach(ref v; result) {
        v = *input;
    }
}
Both result and input will not alias, therefore optimizations will be possible, as if @restrict was annotated on both parameters.
This has been shown to double performance. https://forum.dlang.org/thread/zbkjkvloafmmlzxzyslc@forum.dlang.org
Reference Counting

When working with a reference counting data structure, ownership of borrowed memory is handled via DIP1000, the purpose of the behavior of isolate in connection to it is instead, to ensure only one reference is available at any given time isolating the values stored.
Of note is how opIndex in this example does not connect the this pointer or parameter to the return value.
Using @trusted constructor for View it is able to by-pass the checks of both isolate and scope to give a new sub graph.
void usage(isolate(DataStructure!T) data /* sub graph 1 */) {
    isolate View!T value0 = data[0]; // sub graph 2
    isolate View!T value1 = data[1]; // sub graph 3
    
    data.tryRemove(value0); // invalidates sub graph 2
    
    writeln(value0.get); // ERROR: cannot access sub graph [`value0`] due to invalidation in line $previous
}

struct DataStructure(T) {
    this(RCAllocator) mayisolate;

    void opRefSub() scope mayisolate;
    
    bool tryRemove(isolate View!T toRemove) scope mayisolate;

    mayisolate(View!T) opIndex(size_t offset) scope mayisolate const @trusted {
        synchronized {
            Node* node = &nodes[offset];
            if (atomicLoad(node.refCount) > 0)
                throw new Exception("Multiple instances to this offset");
            return cast(mayisolate(View!T))View!T(this, node);
        }
    }
    
private:
    Node[] nodes;

    static struct Node {
        shared(ptrdiff_t) refCount;
        T value;
        
        void opRefSub() scope isolate;
    }
}

struct View(T) {
    private {
        DataStructure!T state;
        Node* node;
        
        this(scope isolate(DataStructure!T) state, scope isolate(Node*) node) @trusted {
            // escapes, but @trusted ignores that!
            this.state = state;
            this.node = node;
            atomicOp!"+="(node.refCount, 1);
        }
    }
    
    void opRefSub() scope mayisolate {
        node.opRefSub;
        state.opRefSub;
    }
    
    ref T get() scope isolate {
        return node.value;
    }
    
    alias get this;
}
Multithreading

Not all multithreading scenarios should support global memory like isolated.
In the below case, a write is occuring to the same location, which will not do what is expected of it when multithreading.
It is clearly a mistake by the user. Even if it isn't memory unsafe.
int[] output;

foreach(i; iota(10).parallel) {
    output[0] = 2;
}
Instead, we propose with isolated to map the array prior to giving it to parallel and operating on it.
isolate(int[]) input, output;

assert(input.length == output.length);
foreach(size_t i, isolate(int*) o; output.byPointer.parallel) {
    *o = input[i];
}
However as a consequence of isolate, accesing any isolated variable mutability inside of the foreach body is disallowed by the usage of the readonlycontext attribute.
But it is still possible to access the input array, as the access is not mutation.