Skip to content

Instantly share code, notes, and snippets.

@PoignardAzur
Last active September 29, 2019 08:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PoignardAzur/9896ddb17b9f6d6f3d0fa5e6fe1a7088 to your computer and use it in GitHub Desktop.
Save PoignardAzur/9896ddb17b9f6d6f3d0fa5e6fe1a7088 to your computer and use it in GitHub Desktop.
Draft for a DIP to add unique pointers

DIP - Unique pointers

Rationale

The use case for unique semantics is pretty common:

struct SmartPtr {
    int* ptr;

@safe:
    ~this() { reset(); }
    ref int get() return { assert(ptr); return *ptr; }

    void increment() { (*ptr)++; }
    void decrement() { (*ptr)--; }

@trusted:
    void init() { ptr = cast(int*)malloc(sizeof(int)); *ptr = 0; }
    void reset() { free(ptr); ptr = null; }
}

void foobar() @safe {
    SmartPtr s = SmartPtr();
    s.init();

    byte* ptr = s.get();
    if (something())
      s.reset();
    *ptr = 1; // Potential memory corruption
}

Users often need to dynamically allocate and free resources, where the lifetime of the allocated data can't be determined at compile-time.

For these cases, most container implementations often look like SmartPtr in the example above: the methods that deallocate data are marked as @trusted, which creates a hole in the memory safety of the program.

This is a common problem in language design, which has two known solutions: garbage collection, which has performance problems, and lifetime annotations. D currently implements garbage collection, but is moving towards lifetime annotations, similar to Rust.

Rust addresses the above problem by noticing that reset() mutates SmartPtr; it enforces a rule that data cannot be mutated if other references to it exist. Therefore, as long a ptr exists, s.reset() cannot be called, and the example above triggers a compile error.

This is useful both for memory safety and concurrency, since it guarantees that no thread can mutate data at the same time another thread is reading it.

A less restrictive model

This DIP comes from the observation that the Rust model is overly conservative. For instance, the following code:

byte* ptr1 = s.get();
byte* ptr2 = s.get();
byte* ptr3 = s.get();

s.increment();
s.decrement();

*ptr1 = 1;
*ptr2 = 2;
*ptr3 = 3;

... doesn't compile in the Rust model, even though there is no possible way it could lead to memory corruption.

In fact, in @safe code there is no possible way to corrupt memory by calling any combination of get(), increment() and decrement() alone. The only method that requires special treatment is reset(), because it can only be guaranteed to be @safe if no other references to this exist. Therefore, this document proposes that a unique qualifier be added, to guarantee just that.

Advantages over @live

Walter has recently expressed intentions of moving towards the Rust memory model.

While the exact rules such a scheme will follow are still a work in progress, the ideas Walter posted follow two points:

  • Allow only one active mutable reference to a given object at a time.
  • To allow for progressive adoption, only apply the new rules to functions annotated with @live

This proposal is motivated by the idea that the above scheme would be impractical, and that having a unique qualifier would be a better solution to the memory safety problem.

First, because the Rust model is, again, too restrictive. As shown above, there are many use cases where having multiple active mutable references to the same data is perfectly memory-safe. That strictness has benefits, eg, it allows the developer to know that a reference's contents can't "change out from under her", which can be helpful in a "functional programming brings clarity" kind of way, it also forces developers to structure their entire programs around these constraints in a way that isn't necessary for memory safety. Rust embraces these constraints, because it is a new language that pursues the "pit of success" design philosophy; but D is an existing language with a legacy codebase, and which relies as least partly on easy migration from C/C++; as such, it would benefit from a less strict approach.

Additionally, a user wishing to apply the Rust memory model could simply make sure that all her functions take only unique or immutable parameters.

Description

This document proposes that a new type qualifier unique be added to the language. This type qualifier can only be applied to pointers and reference types (references, classes, slices and delegates).

Note: The rest of this document will only mention C-style pointers for simplicity; the same semantics apply to other reference types.

unique int x;                     // ERROR
unique int* p;                    // ok

void foobar(unique ref int x);    // ok

unique int function() x;          // ERROR
unique int delegate() x;          // ok

Additionally, a new @safe rule is added that says that, for every object in memory, there can be either:

  • One unique reference to that object.
  • Any number of non-unique references to that object.

So, give these rules, the example at the beginning

void foobar() @safe {
    SmartPtr s = SmartPtr();
    s.init();

    byte* ptr = s.get();
    if (something())
      s.reset();          // ERROR
    *ptr = 1;
}

... refuses to compile, because s is not unique as long as ptr exists.

Note: In more technical terms, the type system enforces one overarching guarantee: given that we represent variables as nodes of a graph, and references as edges, when an unique reference is assigned, the compiler guarantees that this reference dominates all the nodes it can reach for the duration of its lifetime.

The above guarantee is enforced through 4 rules:

  • Rule 0: Memory allocation returns unique references.

  • Rule 1: The address of a local T can be assigned to a unique T* iff

    • There is no live reference to that local, given scope and return scope semantics (similar to Rusts's E0382),
    • There is no possible codepath where that local is referenced again during the lifetime of the unique reference (similar to Rusts's E0505).
  • Rule 2: A variable var1 of unqualified type T* can be assigned to var2 of type unique T* iff

    • Its qualified type is unique T*,
    • There is no live copy of that variable, given scope and return scope semantics,
    • There is no possible codepath where that variable is copied or dereferenced during the lifetime of var2.
  • Rule 3: Non-uniqueness is infectious. Given T == unique int*:

    • unique T* == unique(unique(int*)*)
    • T* == int**
  • Rule 4: free() only accept unique pointers; but is still @system, because not all unique references point to the result of a malloc() call.

Examples

void read(T copy);
void consume(unique scope T*);

// RULE 1
{
  T t = ...;

  consume(&t);                // ERROR: t later used in read(t)
  read(t);
}

// RULE 2
{
  unique T* ptr = new T(...);

  consume(ptr);               // ERROR: ptr later used in read(*ptr)
  read(*ptr);
}

// Lifetimes
{
  T* identity(return scope T*);

  unique T* ptr;
  {
    T* copy = identity(ptr);
    consume(ptr);             // ERROR: copy is still alive (unless we do non-lexical lifetimes)
  }
  consume(ptr);               // ok
  consume(ptr);               // ok
}

Rule 3 (non-uniqueness is infectious) is mostly relevant for structs. That is, x.y.z is unique only if x, y, and z are declared as unique.

struct Foobar {
  int *x;
  unique int* y;

  unique(int*) f1() {
    return x;           // ERROR, `x` is not unique
  }

  unique(int*) f2() {
    return y;           // ERROR, `this` is not unique
  }

  unique(int*) f3() unique {
    return y;           // ok
  }

}

Relation to existing semantics

scope

Opposite of scope Except unique**

const/immutable/inout

Move constructor

Destructor

Memory safety guarantees

Containers still need @trusted methods Provided these methods respect invariants, @safe code can't corrupt or leak memory

Progressive adoption

Error messaging

Limitations and future improvements

Cast to immutable

inout includes unique?

delete operator


Container example:

struct Array(T)
{
private:
  T* m_values;
  size_t m_size;

public:
  this();

  void clear() unique;
  void reserve(size_t elements) unique;

  bool tryPush();
  void push() unique;

  T pop();
  unique(T) pop() unique;

  ref inout(T) opIndex(size_t i) return inout;
  ref unique(inout(T)) opIndex(size_t i) return inout unique;
}
  • (unique T)**

Note: When unique is used through a function boundary, it's the philosophical opposite of scope: scope constrains the code of the callee and gives guarantees to the caller, whereas unique gives guarantees to the callee and constrains the code of the caller.

@PoignardAzur
Copy link
Author

They're orthogonal.

But you're right, that example is a mistake from me.

Given Rule 1 as stated, the unique pointer created by consumed is expired by the time read is called, so there's not possible conflict. I need to rewrite this.

unique + scope doesn't make much sense to me.

Well, imagine that consume is a function that does:

void consume(unique scope Vector* v)
{
    v.reset();
}

In that case, v is "forgotten" at the end of the function. But it still needs to be passed a unique pointer, because v.reset will create problems if references to data internal to v are still alive.

(however, in example 1, calling consume(t) then read(t) would be safe, as you point out)

@PoignardAzur
Copy link
Author

Also, before I forget, I need some way to figure out the double indirection problem, eg:

Vector vec = ...;
int* vec0 = vec[0];

unique Vec* ptr;
Vec** ptrptr = &ptr;
*ptrptr = &vec;

ptr.reset();
*vec0 = 42;      // Memory corruption

@skoppe
Copy link

skoppe commented Sep 19, 2019

unique + scope doesn't make much sense to me.

Well, imagine that consume is a function that does:

void consume(unique scope Vector* v)
{
v.reset();
}
In that case, v is "forgotten" at the end of the function. But it still needs to be passed a unique pointer, because v.reset will create problems if references to data internal to v are still alive.

Oh yes, of course. You need unique because you want to freely mutate v's internals without having other references to it, and you need scope to signal to the compiler you are not going to escape v.

👍

@PoignardAzur
Copy link
Author

Precisely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment