Skip to content

Instantly share code, notes, and snippets.

@PoignardAzur
Last active September 29, 2019 08:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PoignardAzur/9896ddb17b9f6d6f3d0fa5e6fe1a7088 to your computer and use it in GitHub Desktop.
Save PoignardAzur/9896ddb17b9f6d6f3d0fa5e6fe1a7088 to your computer and use it in GitHub Desktop.
Draft for a DIP to add unique pointers

DIP - Unique pointers

Rationale

The use case for unique semantics is pretty common:

struct SmartPtr {
    int* ptr;

@safe:
    ~this() { reset(); }
    ref int get() return { assert(ptr); return *ptr; }

    void increment() { (*ptr)++; }
    void decrement() { (*ptr)--; }

@trusted:
    void init() { ptr = cast(int*)malloc(sizeof(int)); *ptr = 0; }
    void reset() { free(ptr); ptr = null; }
}

void foobar() @safe {
    SmartPtr s = SmartPtr();
    s.init();

    byte* ptr = s.get();
    if (something())
      s.reset();
    *ptr = 1; // Potential memory corruption
}

Users often need to dynamically allocate and free resources, where the lifetime of the allocated data can't be determined at compile-time.

For these cases, most container implementations often look like SmartPtr in the example above: the methods that deallocate data are marked as @trusted, which creates a hole in the memory safety of the program.

This is a common problem in language design, which has two known solutions: garbage collection, which has performance problems, and lifetime annotations. D currently implements garbage collection, but is moving towards lifetime annotations, similar to Rust.

Rust addresses the above problem by noticing that reset() mutates SmartPtr; it enforces a rule that data cannot be mutated if other references to it exist. Therefore, as long a ptr exists, s.reset() cannot be called, and the example above triggers a compile error.

This is useful both for memory safety and concurrency, since it guarantees that no thread can mutate data at the same time another thread is reading it.

A less restrictive model

This DIP comes from the observation that the Rust model is overly conservative. For instance, the following code:

byte* ptr1 = s.get();
byte* ptr2 = s.get();
byte* ptr3 = s.get();

s.increment();
s.decrement();

*ptr1 = 1;
*ptr2 = 2;
*ptr3 = 3;

... doesn't compile in the Rust model, even though there is no possible way it could lead to memory corruption.

In fact, in @safe code there is no possible way to corrupt memory by calling any combination of get(), increment() and decrement() alone. The only method that requires special treatment is reset(), because it can only be guaranteed to be @safe if no other references to this exist. Therefore, this document proposes that a unique qualifier be added, to guarantee just that.

Advantages over @live

Walter has recently expressed intentions of moving towards the Rust memory model.

While the exact rules such a scheme will follow are still a work in progress, the ideas Walter posted follow two points:

  • Allow only one active mutable reference to a given object at a time.
  • To allow for progressive adoption, only apply the new rules to functions annotated with @live

This proposal is motivated by the idea that the above scheme would be impractical, and that having a unique qualifier would be a better solution to the memory safety problem.

First, because the Rust model is, again, too restrictive. As shown above, there are many use cases where having multiple active mutable references to the same data is perfectly memory-safe. That strictness has benefits, eg, it allows the developer to know that a reference's contents can't "change out from under her", which can be helpful in a "functional programming brings clarity" kind of way, it also forces developers to structure their entire programs around these constraints in a way that isn't necessary for memory safety. Rust embraces these constraints, because it is a new language that pursues the "pit of success" design philosophy; but D is an existing language with a legacy codebase, and which relies as least partly on easy migration from C/C++; as such, it would benefit from a less strict approach.

Additionally, a user wishing to apply the Rust memory model could simply make sure that all her functions take only unique or immutable parameters.

Description

This document proposes that a new type qualifier unique be added to the language. This type qualifier can only be applied to pointers and reference types (references, classes, slices and delegates).

Note: The rest of this document will only mention C-style pointers for simplicity; the same semantics apply to other reference types.

unique int x;                     // ERROR
unique int* p;                    // ok

void foobar(unique ref int x);    // ok

unique int function() x;          // ERROR
unique int delegate() x;          // ok

Additionally, a new @safe rule is added that says that, for every object in memory, there can be either:

  • One unique reference to that object.
  • Any number of non-unique references to that object.

So, give these rules, the example at the beginning

void foobar() @safe {
    SmartPtr s = SmartPtr();
    s.init();

    byte* ptr = s.get();
    if (something())
      s.reset();          // ERROR
    *ptr = 1;
}

... refuses to compile, because s is not unique as long as ptr exists.

Note: In more technical terms, the type system enforces one overarching guarantee: given that we represent variables as nodes of a graph, and references as edges, when an unique reference is assigned, the compiler guarantees that this reference dominates all the nodes it can reach for the duration of its lifetime.

The above guarantee is enforced through 4 rules:

  • Rule 0: Memory allocation returns unique references.

  • Rule 1: The address of a local T can be assigned to a unique T* iff

    • There is no live reference to that local, given scope and return scope semantics (similar to Rusts's E0382),
    • There is no possible codepath where that local is referenced again during the lifetime of the unique reference (similar to Rusts's E0505).
  • Rule 2: A variable var1 of unqualified type T* can be assigned to var2 of type unique T* iff

    • Its qualified type is unique T*,
    • There is no live copy of that variable, given scope and return scope semantics,
    • There is no possible codepath where that variable is copied or dereferenced during the lifetime of var2.
  • Rule 3: Non-uniqueness is infectious. Given T == unique int*:

    • unique T* == unique(unique(int*)*)
    • T* == int**
  • Rule 4: free() only accept unique pointers; but is still @system, because not all unique references point to the result of a malloc() call.

Examples

void read(T copy);
void consume(unique scope T*);

// RULE 1
{
  T t = ...;

  consume(&t);                // ERROR: t later used in read(t)
  read(t);
}

// RULE 2
{
  unique T* ptr = new T(...);

  consume(ptr);               // ERROR: ptr later used in read(*ptr)
  read(*ptr);
}

// Lifetimes
{
  T* identity(return scope T*);

  unique T* ptr;
  {
    T* copy = identity(ptr);
    consume(ptr);             // ERROR: copy is still alive (unless we do non-lexical lifetimes)
  }
  consume(ptr);               // ok
  consume(ptr);               // ok
}

Rule 3 (non-uniqueness is infectious) is mostly relevant for structs. That is, x.y.z is unique only if x, y, and z are declared as unique.

struct Foobar {
  int *x;
  unique int* y;

  unique(int*) f1() {
    return x;           // ERROR, `x` is not unique
  }

  unique(int*) f2() {
    return y;           // ERROR, `this` is not unique
  }

  unique(int*) f3() unique {
    return y;           // ok
  }

}

Relation to existing semantics

scope

Opposite of scope Except unique**

const/immutable/inout

Move constructor

Destructor

Memory safety guarantees

Containers still need @trusted methods Provided these methods respect invariants, @safe code can't corrupt or leak memory

Progressive adoption

Error messaging

Limitations and future improvements

Cast to immutable

inout includes unique?

delete operator


Container example:

struct Array(T)
{
private:
  T* m_values;
  size_t m_size;

public:
  this();

  void clear() unique;
  void reserve(size_t elements) unique;

  bool tryPush();
  void push() unique;

  T pop();
  unique(T) pop() unique;

  ref inout(T) opIndex(size_t i) return inout;
  ref unique(inout(T)) opIndex(size_t i) return inout unique;
}
  • (unique T)**

Note: When unique is used through a function boundary, it's the philosophical opposite of scope: scope constrains the code of the callee and gives guarantees to the caller, whereas unique gives guarantees to the callee and constrains the code of the caller.

@atilaneves
Copy link

I didn't read all of it, but I don't think D needs "unique" (which as mentioned by @jmh530 is the same as Pony's iso reference capability). The reason why I think we don't need it is that D has pure factory functions. An rvalue returned from a pure factory function can't have any aliases to it and is a mutable value that is safe to send to another thread.

@PoignardAzur
Copy link
Author

@jmh530

What you're talking about may correspond to Pony's iso reference capability.

I skimmed a description of Pony's capabilities, and yeah, unique is similar to Pony's iso. Though the semantics are a little different.

You might make clear how this fits in with D's mutable/const/immutable/inout.

They're orthogonal. You can have a const unique or an immutable unique, although the use cases are limited (like const rvalue references in C++). I'm planning to add a future extension for unique to be safely cast to immutable, but it's not essential.

The approach taken by Rust's borrow checker and Walter's new @LiVe proposal is that there is only one mutable reference or there are many const/immutable references. You might address how this is different and a better approach than the others.

unique as described in this proposal is mostly the same as mut in Rust: there can only be one unique/mut reference "in use" at a time, and as long as a unique/mut reference is "in use", you can't use any other reference to the same object.

The advantage of having a unique keyword instead of having every mutable reference be unique, is that it's permissive while still being memory-safe. Rust is infamous for having infuriating situations where you want to eg swap the values inside two unions; and the borrow checker won't allow it, even though your code is provably memory-safe. Having a unique keyword restricts these situations to when you're actually allocating/deallocating memory.

@atilaneves

The reason why I think we don't need it is that D has pure factory functions.

Factory functions don't help if you want to eg resize an array.

@jmh530
Copy link

jmh530 commented Jul 25, 2019

@PoignardAzur For this DIP to be successful, someone like Walter should be able to read this DIP and the future one covering @LiVe and be convinced that this is the better approach. Addressing that proposal directly in the text of your DIP may help with that.

@atilaneves
Copy link

@PoignardAzur

Factory functions don't help if you want to eg resize an array.

No, but Walter has a DIP for that.

@PoignardAzur
Copy link
Author

I'm writing his DIP specifically because I think his DIP is inadequate.

I have two reasons to believe that:

  • The arguments above, that Rust's memory model is more restrictive than it needs to be.

  • Adopting that memory model is a radical change, and essentially breaks compatibility with a large part of existing code. Although Walter has mentioned a progressive adoption plan (using a @live function attribute), I don't think that plan will work, and I think it will create a backwards compatibility rift. This is less of a problem with my proposal.

But yes, I need to explain that in more detail, with better data to support my arguments.

@PoignardAzur
Copy link
Author

Description updated. Hopefully it's clearer now.

@ntrel
Copy link

ntrel commented Aug 30, 2019

Interesting proposal, better than @live as it's less restrictive as you say.

Does this require flow analysis to check a pointer is not used after passing to a unique parameter?

byte* ptr = s.get();

BTW The first example has get returning a ref not a pointer.

I think it would be good to refine this proposal to a minimum that allows smart pointers, without worrying about calling code expressivity. That can be added later as enhancements. Then it would be easier to persuade Walter, especially if unique was not a full type qualifier.

To make this simpler, let's forget about returning a pointer and just support returning refs. That might not need flow analysis.

void bad(ref S s, ref int i)
{
  s.reset; //error: s is not unique
  i++;
}
bad(s, s.get);

@PoignardAzur
Copy link
Author

To make this simpler, let's forget about returning a pointer and just support returning refs. That might not need flow analysis.

That's not enough, if only because a function returning a ref can pass its result to a function taking a return ref which can return a pointer.

There is simply no way to uphold the central guarantee (every unique reference dominates all the nodes it can reach for the duration of its lifetime) without flow analysis.

@ntrel
Copy link

ntrel commented Aug 30, 2019

a function returning a ref can pass its result to a function taking a return ref which can return a pointer

I've filed an issue about this because it can violate memory safety:
https://issues.dlang.org/show_bug.cgi?id=20183

@skoppe
Copy link

skoppe commented Sep 19, 2019

I don't understand this example:

void read(T copy);
void consume(unique scope T*);

// RULE 1
{
  T t = ...;

  consume(&t);                // ERROR: t later used in read(t)
  read(t);
}

Since consume is scope, it isn't allowed to leak t. Therefor, shouldn't the call to read be allowed? In other words, unique + scope doesn't make much sense to me.

@PoignardAzur
Copy link
Author

They're orthogonal.

But you're right, that example is a mistake from me.

Given Rule 1 as stated, the unique pointer created by consumed is expired by the time read is called, so there's not possible conflict. I need to rewrite this.

unique + scope doesn't make much sense to me.

Well, imagine that consume is a function that does:

void consume(unique scope Vector* v)
{
    v.reset();
}

In that case, v is "forgotten" at the end of the function. But it still needs to be passed a unique pointer, because v.reset will create problems if references to data internal to v are still alive.

(however, in example 1, calling consume(t) then read(t) would be safe, as you point out)

@PoignardAzur
Copy link
Author

Also, before I forget, I need some way to figure out the double indirection problem, eg:

Vector vec = ...;
int* vec0 = vec[0];

unique Vec* ptr;
Vec** ptrptr = &ptr;
*ptrptr = &vec;

ptr.reset();
*vec0 = 42;      // Memory corruption

@skoppe
Copy link

skoppe commented Sep 19, 2019

unique + scope doesn't make much sense to me.

Well, imagine that consume is a function that does:

void consume(unique scope Vector* v)
{
v.reset();
}
In that case, v is "forgotten" at the end of the function. But it still needs to be passed a unique pointer, because v.reset will create problems if references to data internal to v are still alive.

Oh yes, of course. You need unique because you want to freely mutate v's internals without having other references to it, and you need scope to signal to the compiler you are not going to escape v.

👍

@PoignardAzur
Copy link
Author

Precisely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment