DIP - Unique pointers
The use case for unique semantics is pretty common:
struct SmartPtr {
int* ptr;
@safe:
~this() { reset(); }
ref int get() return { assert(ptr); return *ptr; }
void increment() { (*ptr)++; }
void decrement() { (*ptr)--; }
@trusted:
void init() { ptr = cast(int*)malloc(sizeof(int)); *ptr = 0; }
void reset() { free(ptr); ptr = null; }
}
void foobar() @safe {
SmartPtr s = SmartPtr();
s.init();
byte* ptr = s.get();
if (something())
s.reset();
*ptr = 1; // Potential memory corruption
}
Users often need to dynamically allocate and free resources, where the lifetime of the allocated data can't be determined at compile-time.
For these cases, most container implementations often look like SmartPtr in the example above: the methods that deallocate data are marked as @trusted
, which creates a hole in the memory safety of the program.
This is a common problem in language design, which has two known solutions: garbage collection, which has performance problems, and lifetime annotations. D currently implements garbage collection, but is moving towards lifetime annotations, similar to Rust.
Rust addresses the above problem by noticing that reset()
mutates SmartPtr
; it enforces a rule that data cannot be mutated if other references to it exist. Therefore, as long a ptr
exists, s.reset()
cannot be called, and the example above triggers a compile error.
This is useful both for memory safety and concurrency, since it guarantees that no thread can mutate data at the same time another thread is reading it.
This DIP comes from the observation that the Rust model is overly conservative. For instance, the following code:
byte* ptr1 = s.get();
byte* ptr2 = s.get();
byte* ptr3 = s.get();
s.increment();
s.decrement();
*ptr1 = 1;
*ptr2 = 2;
*ptr3 = 3;
... doesn't compile in the Rust model, even though there is no possible way it could lead to memory corruption.
In fact, in @safe
code there is no possible way to corrupt memory by calling any combination of get()
, increment()
and decrement()
alone. The only method that requires special treatment is reset()
, because it can only be guaranteed to be @safe
if no other references to this
exist. Therefore, this document proposes that a unique
qualifier be added, to guarantee just that.
Walter has recently expressed intentions of moving towards the Rust memory model.
While the exact rules such a scheme will follow are still a work in progress, the ideas Walter posted follow two points:
- Allow only one active mutable reference to a given object at a time.
- To allow for progressive adoption, only apply the new rules to functions annotated with
@live
This proposal is motivated by the idea that the above scheme would be impractical, and that having a unique
qualifier would be a better solution to the memory safety problem.
First, because the Rust model is, again, too restrictive. As shown above, there are many use cases where having multiple active mutable references to the same data is perfectly memory-safe. That strictness has benefits, eg, it allows the developer to know that a reference's contents can't "change out from under her", which can be helpful in a "functional programming brings clarity" kind of way, it also forces developers to structure their entire programs around these constraints in a way that isn't necessary for memory safety. Rust embraces these constraints, because it is a new language that pursues the "pit of success" design philosophy; but D is an existing language with a legacy codebase, and which relies as least partly on easy migration from C/C++; as such, it would benefit from a less strict approach.
Additionally, a user wishing to apply the Rust memory model could simply make sure that all her functions take only unique
or immutable
parameters.
This document proposes that a new type qualifier unique
be added to the language. This type qualifier can only be applied to pointers and reference types (references, classes, slices and delegates).
Note: The rest of this document will only mention C-style pointers for simplicity; the same semantics apply to other reference types.
unique int x; // ERROR
unique int* p; // ok
void foobar(unique ref int x); // ok
unique int function() x; // ERROR
unique int delegate() x; // ok
Additionally, a new @safe
rule is added that says that, for every object in memory, there can be either:
- One
unique
reference to that object. - Any number of non-unique references to that object.
So, give these rules, the example at the beginning
void foobar() @safe {
SmartPtr s = SmartPtr();
s.init();
byte* ptr = s.get();
if (something())
s.reset(); // ERROR
*ptr = 1;
}
... refuses to compile, because s
is not unique
as long as ptr
exists.
Note: In more technical terms, the type system enforces one overarching guarantee: given that we represent variables as nodes of a graph, and references as edges, when an unique reference is assigned, the compiler guarantees that this reference dominates all the nodes it can reach for the duration of its lifetime.
The above guarantee is enforced through 4 rules:
-
Rule 0: Memory allocation returns unique references.
-
Rule 1: The address of a local
T
can be assigned to aunique T*
iff -
Rule 2: A variable
var1
of unqualified typeT*
can be assigned tovar2
of typeunique T*
iff- Its qualified type is
unique T*
, - There is no live copy of that variable, given scope and return scope semantics,
- There is no possible codepath where that variable is copied or dereferenced during the lifetime of
var2
.
- Its qualified type is
-
Rule 3: Non-uniqueness is infectious. Given
T == unique int*
:unique T* == unique(unique(int*)*)
T* == int**
-
Rule 4:
free()
only accept unique pointers; but is still@system
, because not all unique references point to the result of amalloc()
call.
void read(T copy);
void consume(unique scope T*);
// RULE 1
{
T t = ...;
consume(&t); // ERROR: t later used in read(t)
read(t);
}
// RULE 2
{
unique T* ptr = new T(...);
consume(ptr); // ERROR: ptr later used in read(*ptr)
read(*ptr);
}
// Lifetimes
{
T* identity(return scope T*);
unique T* ptr;
{
T* copy = identity(ptr);
consume(ptr); // ERROR: copy is still alive (unless we do non-lexical lifetimes)
}
consume(ptr); // ok
consume(ptr); // ok
}
Rule 3 (non-uniqueness is infectious) is mostly relevant for structs. That is, x.y.z
is unique only if x
, y
, and z
are declared as unique.
struct Foobar {
int *x;
unique int* y;
unique(int*) f1() {
return x; // ERROR, `x` is not unique
}
unique(int*) f2() {
return y; // ERROR, `this` is not unique
}
unique(int*) f3() unique {
return y; // ok
}
}
Opposite of scope Except unique**
Containers still need @trusted methods Provided these methods respect invariants, @safe code can't corrupt or leak memory
Error messaging
Cast to immutable
inout includes unique?
delete operator
Container example:
struct Array(T)
{
private:
T* m_values;
size_t m_size;
public:
this();
void clear() unique;
void reserve(size_t elements) unique;
bool tryPush();
void push() unique;
T pop();
unique(T) pop() unique;
ref inout(T) opIndex(size_t i) return inout;
ref unique(inout(T)) opIndex(size_t i) return inout unique;
}
- (unique T)**
Note: When
unique
is used through a function boundary, it's the philosophical opposite ofscope
:scope
constrains the code of the callee and gives guarantees to the caller, whereasunique
gives guarantees to the callee and constrains the code of the caller.
Also, before I forget, I need some way to figure out the double indirection problem, eg: