Memory graph isolation, is a form of a immutable references within a programs graph of memory to create subgraphs which do not reference others.
In this proposal a focus is upon the analysis on the graph in creating and maintaining isolate sub graphs. It takes advantage of DIP1000 for escape analysis of reference into and out of sub graph manipulation functions.
https://dl.acm.org/doi/10.1145/3167132.3167245
https://dl.acm.org/doi/10.1145/2384616.2384619
Type state analysis for initialized vs unitialized/invalidated.
As of February 2024, the United States of America as published a report on memory safety and its importance in programming language choice moving forward. Of note is of particular interest is temporal vulnerabilities that cannot be solved by a garbage collector. Nor by a borrow checker that is operating only on knowledge from within a function that does not also have knowledge of what others would have done.
https://www.whitehouse.gov/wp-content/uploads/2024/02/Final-ONCD-Technical-Report.pdf
In a program's data there is a single graph with multiple roots that are added and removed during the execution of a program. Each root points to a subgraph that may reference other subgraphs. However all non-immutable memory that the subgraph references is owned by that subgraph.
Understanding of the subgraphs and their interrelation enables the compiler to prove certain guarantees that can be used to optimize the code as well as ensuring multithreading safety without the use of runtime mechanisms.
The subgraphs references by variables is only tracked for isolated annotated variable declarations, parameters and return value.
A key element to subgraphs is the ability to transfer ownership to it, this requires that the references be known at compile time to reference a specific subgraph. This is where the term "immutable reference" comes from. The reference into the sub graph is immutable. It cannot be changed to point to something else, although it can be made unusable.
When a subgraph is made unusable it is considered to be invalidated. This can occur for a number of reasons. Implicit invalidation occurs at the end of scope. Explicit invalidation occurs due to some action in the code, returning or assigning are good examples of this.
After invalidation occurs, the subgraph is inaccessible, it is an error to attempt to use it.
To guarantee that complex data flow analysis is not required, all branches of conditionals must have the same result for a subgraph. If one branch invalidates, so will the others. This can occur with either implicit or explicit invalidation, the exact form is unimportant.
So far the description of variable declarations and their interrelationship has been described. However they do not cover individual values that get stored into the variables. Understanding the values themselves allows some fine grained guarantees around aliasing of memory as well optimizing deallocation of allocated memory.
Each value that has been allocated using a known method (GC or reference counting), will have behavior specific to that value to elide unnecessary work and guarantee cleanup where possible.
To do this, at the point of acquisition of known memory strategy, the value will be tracked throughout the lifecycle in a function. It will store if the value is predictable in its lifecycle to the function body. This requires the subgraph that it is on to be invalidated implicitly.
When a reference counted value is predictable, its reference add and sub methods will be elided, except the last subtraction. A GC owned memory will instruct the GC to deallocate similarly.
In situations where the compiler cannot understand the flow of memory at runtime purely by compile time measures predictability will be set to false for a given value. This happens to be where the name originates from. The compiler must be able to predict the runtime flow of a value within a function body to be able to safely offer these benefits.
Understanding isolation of variables from each other is not always desirable behavior, especially in @system
code.
For this we introduce two type qualifiers that also function as storage classes.
These are isolate
and mayisolate
.
Together we refer to variables whose type is annotated as either as being isolated.
Each type qualifier when used on a variable declaration introduce a sub graph. However this sub graph may not be unique, it will get its instance after the initialization expression has been evaluated. For function parameters we ignore this rule and instead introduce a unqiue subgraph for each.
As a type qualifier it only has meaning on encapsulation units or pointers.
When applied to basic types we ignore isolated type qualifiers, this does not apply to function parameters that are ref
.
Explicit casting on isolated type qualifiers may only occur in @system
and @trusted
functions.
This is due to needing to guarantee that no other references will be alive after this has been done.
Similar to immutable
in reasoning.
It is allowed to explicitly cast off isolate
in @system
and @trusted
functions.
However as this is unsafe it is not recommended unless you are going to deallocate it.
If you do cast it off, the language will help out by explicitly invalidated the subgraph it is on, which may have unintended consequences.
On the other hand mayisolate
may be implicitly cast off as long as a context pointer is not referenced by the subgraph.
The subgraph will not be invalidated, instead it will ignore that mayisolate
was ever applied to it.
A new attribute is added @readonlycontext
.
This indicates that a delegate may not access any variable that is not immutable
or isolate
.
An isolate
variable, may not be mutated, only members that are const
or immutable
are allowed to be referenced.
It may appear that const
variables that are not isolate
should be accessible in such a delegate.
However because there is no information on how many other references there are over the entire program, it is unfortunately required to assume that there could be one that is mutatable or aliasable.
The application of this attribute may be used in cases of parallelism with a stack own context similar to std.parallelism : parallel
to ensure one thread will not trample on another, or it may be used with coroutines to enable the use of a delegate instead of a function pointer.
Functions have added restrictions that guarantee that subgraphs will not be conflicted in the callee.
Each function has returnable variables.
This includes the function return type as well as any parameter that is ref
or out
.
Each function has contextual parameter.
This includes the function this pointer as well as any function parameter.
If multiple contextual variables are isolated and marked return
this is an error.
Isolated parameters that are ref
may not be invalidated explicitly.
Implicit invalidation will be ignored.
When the parameter is isolated the argument may be:
scope
and is isolated.- Not
scope
and is isolated, the argument sub graph is invalidated explicitly after the call.
If the argument is immutable
and the parameter is isolated it must also be const
or immutable
.
If the argument is isolated, the parameter may be const
or immutable
if it is also scope
and not return
.
The isolated type qualifier will be implicitly cast off.
If a parameter is isolated and is marked ref
, the argument must be from the same subgraph as the contextual parameter if there is one.
Only one isolated parameter's argument may map to a given value, if multiple arguments map to a single value this is an error. If an argument does not map to a value, instead mapping to a subgraph is considered as a fallback. This guarantees no aliasing and prevents double-free's.
When mapping an argument to its subgraph instead of value, only one argument may map to a given subgraph. This requires extra processing over the other arguments to ensure only one argument maps to a given subgraph. It does not consider arguments that map to a subgraph that is not this one.
If the called function contextual parameter is mayisolate
and its returnable variable is also mayisolate
, then the returned variable will inherit its isolate
or mayisolate
from the contextual variable argument as seen by the caller.
This is needed so that a method can apply to both isolate
and mayisolate
callers without duplication.
During overload resolution or an is
expression, if an argument and parameter are both isolate
then it is an exact match.
But if the argument is isolate
and the parameter is mayisolate
then it is a coersion.
An assignment on isolated variables may only occur to unitialized variables.
This includes variables declaration initialization expressions, as well as a void
initialized variable declaration with a follow up assignment.
If the lvalue is isolated the rvalue may be:
- Isolated.
immutable
, but the lvalue must be eitherconst
orimmutable
.
Tracking of value from the rvalue will be assigned to the lvalue if existant.
An assumption of no aliasing may be taken if all parameters (including this pointer) are isolated with up to one non-isolated parameter.
void setter(scope isolate(float[]) result, float* input) {
foreach(ref v; result) {
v = *input;
}
}
Both result
and input
will not alias, therefore optimizations will be possible, as if @restrict
was annotated on both parameters.
This has been shown to double performance. https://forum.dlang.org/thread/zbkjkvloafmmlzxzyslc@forum.dlang.org
When working with a reference counting data structure, ownership of borrowed memory is handled via DIP1000, the purpose of the behavior of isolate in connection to it is instead, to ensure only one reference is available at any given time isolating the values stored.
Of note is how opIndex
in this example does not connect the this
pointer or parameter to the return value.
Using @trusted
constructor for View
it is able to by-pass the checks of both isolate
and scope
to give a new sub graph.
void usage(isolate(DataStructure!T) data /* sub graph 1 */) {
isolate View!T value0 = data[0]; // sub graph 2
isolate View!T value1 = data[1]; // sub graph 3
data.tryRemove(value0); // invalidates sub graph 2
writeln(value0.get); // ERROR: cannot access sub graph [`value0`] due to invalidation in line $previous
}
struct DataStructure(T) {
this(RCAllocator) mayisolate;
void opRefSub() scope mayisolate;
bool tryRemove(isolate View!T toRemove) scope mayisolate;
mayisolate(View!T) opIndex(size_t offset) scope mayisolate const @trusted {
synchronized {
Node* node = &nodes[offset];
if (atomicLoad(node.refCount) > 0)
throw new Exception("Multiple instances to this offset");
return cast(mayisolate(View!T))View!T(this, node);
}
}
private:
Node[] nodes;
static struct Node {
shared(ptrdiff_t) refCount;
T value;
void opRefSub() scope isolate;
}
}
struct View(T) {
private {
DataStructure!T state;
Node* node;
this(scope isolate(DataStructure!T) state, scope isolate(Node*) node) @trusted {
// escapes, but @trusted ignores that!
this.state = state;
this.node = node;
atomicOp!"+="(node.refCount, 1);
}
}
void opRefSub() scope mayisolate {
node.opRefSub;
state.opRefSub;
}
ref T get() scope isolate {
return node.value;
}
alias get this;
}
Not all multithreading scenarios should support global memory like isolated.
In the below case, a write is occuring to the same location, which will not do what is expected of it when multithreading. It is clearly a mistake by the user. Even if it isn't memory unsafe.
int[] output;
foreach(i; iota(10).parallel) {
output[0] = 2;
}
Instead, we propose with isolated to map the array prior to giving it to parallel and operating on it.
isolate(int[]) input, output;
assert(input.length == output.length);
foreach(size_t i, isolate(int*) o; output.byPointer.parallel) {
*o = input[i];
}
However as a consequence of isolate, accesing any isolated variable mutability inside of the foreach body is disallowed by the usage of the readonlycontext
attribute.
But it is still possible to access the input
array, as the access is not mutation.