Skip to content

Instantly share code, notes, and snippets.

@tgehr

tgehr/inout.md Secret

Last active February 12, 2018 01:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tgehr/769ac267d76b74109a195334ddae01c3 to your computer and use it in GitHub Desktop.
Save tgehr/769ac267d76b74109a195334ddae01c3 to your computer and use it in GitHub Desktop.

Type Qualifiers in D

The D programming language has a number of built-in type constructors. A type constructor is a function that takes a type argument and returns another type.

For example, if we have a type T, we can create the new type T[] of dynamic arrays whose elements are of type T.

In this post, we are going to explore the type constructors immutable, shared, const and inout. Those type constructors are special and also called type qualifiers, as they create a type that is identical to their argument, except that it allows a different set of operations.

I will make statements about type safety that are guaranteed by the language in the @safe subset of D.

immutable, const, shared

For a type T,

  • immutable(T) represents a type that is like T, except data of this type will never be modified.
  • const(T) represents a type that is like T, except lvalues of this type cannot be modified.
  • shared(T) represents a type that is like T, but can be shared among threads and does not support built-in non-atomic read-modify-update operations.

We say a type T is mutable if there does not exist a type S such that T == immutable(S) or T == const(S). For example int is mutable, but immutable(int) and const(int) are not mutable.

The difference between immutable and const is that immutable documents that all references to the data will not be used to modify it. In contrast, for an lvalue of type const(T), there could be another references to the data that allows it to be modified.

The motivation for const(T) is that it can be used to give away a reference to either mutable or immutable data to a subprogram while being certain that this subprogram will not modify the data.

We say a type T is unshared if there does not exist a type S such that T == shared(S). For example int is unshared, but shared(int) is not unshared. The main motivation for shared is that unshared data (the default!) will be accessible only from the current thread.

Typing Rules

Readers who are already familiar with const, immutable and shared and mainly interested in inout may want to skip this section.

There are a few simple additional typing rules from which we can deduce the interplay of immutable, const and shared.

  1. Type qualifiers commute and are idempotent:

    • For all type qualifiers a and b, we have: a(b(T)) == b(a(T)). For example, const(shared(T)) is the same type as shared(const(T)).

    • For all type qualifiers a we have: a(a(T)) == a(T). For example, const(const(int)) is the same type as const(int).

  2. Indirection-free values of differently qualified types are interconvertible:

    void main(){
        int x;
        immutable(int) y = x; // ok
        shared int z = y; // ok
        x = z; // ok
    }
  3. Pointers and dynamic arrays propagate qualifiers:

    For all type qualifiers a, the types

    • a(T*) and a(T)*
    • a(T[]) and a(T)[] are interconvertible.

    In particular, for a(T*) x; we have typeof(*x)==a(T) and for a(T[]) x; we have typeof(x[i])==a(T).

    This means that we cannot decare a qualified reference to unqualified data. This is particularly important for shared.

  4. Both mutable and immutable references can be converted to const references:

    • T* : const(T)*, T[] : const(T)[] and C : const(C) for a class C.
    • immutable(T)* : const(T)*, immutable(T)[] : const(T)[] and immutable(C) : const(C) for a class C.
  5. immutable data is implicitly const and shared:

    • immutable(T) == const(immutable(T))
    • immutable(T) == shared(immutable(T))

    (In particular, this means that immutable data is not unshared.)

  6. If S is a struct (or a static array) and a is a composition of type qualifiers, then S converts to a(S) if and only if the same is true for each of the types of its fields.

  7. Type qualifiers on member functions are applied to the implicit this reference:

    struct S{
        void foo(){ static assert(is(typeof(this)==S)); }
        void foo()const{ static assert(is(typeof(this)==const(S))); }
        void foo()immutable{ static assert(is(typeof(this)==immutable(S))); }
    }
  8. Type qualifiers are propagated on field access:

    struct S{
        int x;
    }
    void main(){
        immutable(S) s;
        static assert(is(typeof(s.x)==immutable(int)));
    }

inout

Consider the following struct:

struct S{
    private T[] _payload;
    this(T[] payload){
        this._payload = payload;
    }
    @property T[] payload(){
        return _payload;
    }
    // ...
}

The struct has a field whose value can be accessed through a property accessor. (This is a contrived example, but this pattern is not useless: it prevents _payload's address from being taken and allows accesses to do some additional computation.)

void main(){
    int[] p = [1,2,3]
    auto s = S(p);
    writeln(s.payload); // [1,2,3]
}

However, there is a problem: S cannot be immutable:

void main(){
    immutable(int)[] p = [1,2,3];
    auto s = immutable(S)(p); // error
    writeln(s.payload); // [1,2,3]
}
Error: mutable method tt.S!int.S.this is not callable using a immutable object

The reason is simple: Within the constructor, the type of this is S, but we need it to be immutable(S). Furthermore, the payload argument needs to be immutable. We can add another constructor overload:

struct S{
    private int[] _payload;
    this(int[] payload){
        this._payload = payload;
    }
    this(immutable(int)[] payload)immutable{
        static assert(is(typeof(payload)==immutable(int[])));
        this._payload = payload; // note: an initialization, not an assignment
    }
    @property int[] payload(){
        return _payload;
    }
    // ...
}

void main(){
    immutable(int)[] p = [1,2,3];
    auto s = immutable(S)(p);
    writeln(s.payload); // error
}

This is still not sufficient, however:

Error: mutable method tt.S.S.payload is not callable using a immutable object

It is easy to see what is going on: the payload property expects a this reference of type S, but we provide an immutable(S). As before, we can solve the problem by adding another overload:

struct S{
    private int[] _payload;
    this(int[] payload){
        this._payload = payload;
    }
    this(immutable(int)[] payload)immutable{
        this._payload = payload;
    }
    @property int[] payload(){
        return _payload;
    }
    @property immutable(int)[] payload()immutable{
        return _payload;
    }
    // ...
}

void main(){
    immutable(int)[] p = [1,2,3];
    auto s = immutable(S)(p);
    writeln(s.payload); // [1, 2, 3]
}

However, we will soon discover that this is still not sufficient. We might want to have a const(S) which can wrap any value of type const(int)[]. We end up with this:

struct S{
    private int[] _payload;
    this(int[] payload){
        this._payload = payload;
    }
    this(immutable(int)[] payload)immutable{
        this._payload = payload;
    }
    this(const(int)[] payload)const{
        this._payload = payload;
    }
    @property int[] payload(){
        return _payload;
    }
    @property immutable(int)[] payload()immutable{
        return _payload;
    }
    @property const(int)[] payload()const{
        return _payload;
    }
    // ...
}

Clearly, this approach does not scale; after all, this is only a toy example. Note that one way to remove the boilerplate would be to use code generation (for example, using static foreach and string mixins), but this is not really desirable when all three generated implementations will be identical. We are going to solve the problem in a different way. Enter inout:

struct S{
    private int[] _payload;
    this(inout(int)[] payload)inout{
        this._payload = payload;
    }
    @property inout(int)[] payload()inout{
        return _payload;
    }
    // ...
}

This code is equivalent to the previous code: A function whose signature involves inout-qualified types is treated (essentially) as if it had 3 signatures simultaneously, one for each of the type qualifiers const, immutable, as well as an unqualified version. The respective qualifiers replace inout within the original function signature. At the same time, such a function has only a single implementation at run time.

Within the function body, inout is a "wildcard qualifier": we know that for any particular call, inout will be one of the qualifiers const, immutable or unqualified, but in the function body, we don't know which one it is, as the same code is shared for all calls. Therefore, within the function body, inout has many of the same properties as const.

struct S{
    int* x;
    inout(int)* foo()inout{
        // *x = 3; // error
        return x;
    }
}```

Limitations of `inout`
----------------------

`inout` is limited in its applicability.

1. `inout` cannot be part of the type of a field.

```d
struct S{
    inout(int)[] x; // error
}

In particular, this means that inout-qualified types cannot be used with templates that want to store parameters into a data structure temporarily, for example inside a std.typecons.Tuple. This is a severe limitation, as it prevents abstracting over inout data.

  1. There can be only one wildcard qualifier in scope at any given point (namely, inout).

This means, there is also no way to collapse the following code into a single overload with inout:

void assign(ref int* a,int* b,ref int* c,int* d){
    a=b;
    c=d;
}
void assign(ref int* a,int* b,ref const(int)* c,const(int)* d){
    a=b;
    c=d;
}
void assign(ref int* a,int* b,ref immutable(int)* c,immutable(int)* d){
    a=b;
    c=d;
}
void assign(ref const(int)* a,const(int)* b,ref int* c,int* d){
    a=b;
    c=d;
}
void assign(ref const(int)* a,const(int)* b,ref const(int)* c,const(int)* d){
    a=b;
    c=d;
}
void assign(ref const(int)* a,const(int)* b,ref immutable(int)* c,immutable(int)* d){
    a=b;
    c=d;
}
void assign(ref immutable(int)* a,immutable(int)* b,ref int* c,int* d){
    a=b;
    c=d;
}
void assign(ref immutable(int)* a,immutable(int)* b,ref const(int)* c,const(int)* d){
    a=b;
    c=d;
}
void assign(ref immutable(int)* a,immutable(int)* b,ref immutable(int)* c,immutable(int)* d){
    a=b;
    c=d;
}

We can use inout for one pair of parameters, so we can get either:

void assign(ref inout(int)* a,inout(int)* b,ref int* c,int* d){
    a=b;
    c=d;
}
void assign(ref inout(int)* a,inuot(int)* b,ref const(int)* c,const(int)* d){
    a=b;
    c=d;
}
void assign(ref inout(int)* a,inout(int)* b,ref immutable(int)* c,immutable(int)* d){
    a=b;
    c=d;
}

or

void assign(ref int* a,int* b,ref inout(int)* c,inout(int)* d){
    a=b;
    c=d;
}
void assign(ref const(int)* a,const(int)* b,ref inout(int)* c,inout(int)* d){
    a=b;
    c=d;
}
void assign(ref immutable(int)* a,immutable(int)* b,ref inout(int)* c,inou(int)* d){
    a=b;
    c=d;
}

Producing a single overload is not possible, however:

void assign(ref inout(int)* a,inout(int)* b,ref inout(int)* c,inout(int)* d){
    a=b;
    c=d;
}

This now requires all four parameters to share the same mutability, while we only want to require that the first two parameters have the same mutability and the second two parameters have the same mutability.

Furthermore, nested functions are forced to share inout with the enclosing function.

int x;
inout(int)* foo(inout(int)* p){
    auto bar(inout(int)* q){
        // for the purpose of illustration, let's assume creating a tuple magically works,
        // as otherwise this particular instance of the limitation is harder to appreciate
        return tuple(p,q);
    }
    // note: in the following we actually ignore the argument that is passed to `bar`,
    // the return value is actually the `p` it received from outside.
    // return bar(&x)[0]; // error
    return bar(p)[0]; // ok.
}

The error is:

Error: modify inout to mutable is not allowed inside inout function

The reason for this is that the nested function bar has access to both its own inout-qualified parameters as well as inout-qualified variables in the outer context, therefore the qualifier has to mean the same thing for both. Nested function signatures are not expanded; they only have their original inout-qualified signature.

Worse, nested functions even conflate their inout with inout within enclosing contexts that are not themselves inout, leading to a loss of type safety:

@safe:
int a;
immutable(int) b=2;

inout(int)* delegate(inout(int)*) dg;
inout(int)* prepare(inout(int)* x){
    dg = y=>x;
    return x;
}
void main(){
    prepare(&b);
    int* y=dg(&a);
    assert(&b is y); // passes. ouch.
    *y=3;
    assert(b is *&b); // fails!
}

This means: D allows immutable data to be mutated in @safe code!

  1. Delegate arguments and return values have their own inout.

This means that inout often cannot be used in code containing higher-order functions.

For example, we can write:

int delegate(int delegate(int*)) foo(int* x){
    return f => f(x);
}
int delegate(int delegate(const(int)*)) foo(const(int)* x){
    return f => f(x);
}
int delegate(int delegate(immutable(int)*)) foo(immutable(int)* x){
    return f => f(x);
}

But there is no way to make this code any shorter using inout.

The following code does not do the job:

int delegate(int delegate(inout(int)*)) foo(inout(int)* x){
    return f => f(x);
}
void main(){
    int x;
    int bar(int* x){
        return *x;
    }
    foo(&x)(&bar); // error
}

The error message is:

Error: delegate foo(& x) (int delegate(inout(int)*)) is not callable using argument types (int delegate(int* x) pure nothrow @nogc @safe)

The inout within the delegate in the return type of foo is not related to inout in foo's argument list.

Note how the meaning of inout in the return type is inconsistent with the meaning of inout within the function body. inout in the signature of local functions within the function body is the same as the inout in the enclosing signature, but this is not true for the return type.

This inconsistency would usually directly lead to a loss of type safety, but it is actually patched over using one of the strangest typing rules in D:

inout(int)* delegate(inout(int)*) foo(inout(int)* x){
    inout(int)* bar(inout(int)* y){ return x; }
    return &bar; // note: ok, even though the interpretation of the two `inout`s is different
}

void main(){
    int x=2;
    pragma(msg, typeof(&foo));
    pragma(msg, typeof(foo(&x)));
}

This prints:

inout(int)* delegate(inout(int)*) function(inout(int)* x)
const(int)* delegate(const(int)*)

I.e., the compiler notices that type safety could be lost, and prevents it crudely by replacing inout with const.

Note that this means that it is impossible to write a generic identity function in D which preserves the type of its argument:

T id(T)(T arg){
    return arg;
}
// clearly, the return type should be the same as the argument type:
enum preservesType(T) = is(typeof(id(T.init))==T);

static assert(preservesType!(inout(inout(int)* delegate(inout(int)*)))); // fails!

A closely related case, however, is not caught:

@safe:
int a;
immutable(int) b=2;

inout(int)* delegate(inout(int)*)@safe delegate()@safe foo(inout(int)* y){
    inout(int)* bar(inout(int)* p){
        return y;
    }
    return ()=>&bar;
}
void main(){
    int* y=foo(&b)()(&a);
    *y=3;
    assert(&b is y); // passes. ouch.
    assert(b is *&b); // fails!
}

And again: D allows immutable data to be mutated in @safe code!

inout and Type Theory

Using concepts from higher-order type theory that are not available in D, all limitations immediately disappear, while resulting in a system that is obviously type safe. Therefore, one way to design a correct inout is to mentally translate ordinary D code into an extended D language that can make use of those concepts. This can then answer the question which pieces of ordinary D code should compile. (It compiles iff the translation compiles.) In this way, we can observe inout in its natural habitat and fully understand its behaviour (and misbehaviour), as well as why the limited vocabulary of D is not sufficient to describe all the necessary intricacies.

The extended D language would have the following constructs:

  1. First-class types. I.e. there should be a type Type whose values are the types.
Type t = int;
  1. Make the type qualifiers first-class:

    static assert(is(typeof(const)==MutabilityQualifier));
    static assert(is(((Type x)=>const(x))(int)==const(int)));
  2. Allow generic parameters with homogeneous translation (i.e. parametric polymorphism). Those parameters are like template parameters but only one version of the function is actually compiled.

The identity function

inout(int)* id(inout(int)* p){
    return p;
}

would be represented as:

mq(int)* id[MutabilityQualifiery mq](mq(int)* p){
    return p;
}

Here, the id[MutabilityQualifier mq]-notation is similar to a template argument list, except that the template body is type checked and emitted into the object file only once. At the same time, the signature can be instantiated with different mutability qualifiers mq.

Within the function body, we don't know what mq is, but we know its type is MutabilityQualifier. This means we can check that mq can be applied to a type to yield a qualified version of that type, for example mq(int).

Note the similarity to the previous explanation of inout. The main difference is that here, we pass mutability qualifiers around explicitly, instead of implicitly using the same expression with multiple types. (This is strictly more expressive.)

For

inout(int)* id(inout(int)* p){
    return p;
}

We implicitly get multiple signatures:

int* id(int* p);
immutable(int)* id(immutable(int)* p);
const(int)* id(immutable(int)* p);

Whereas with the supposed extended D language, we can specify explicitly which signature we want:

static assert(is(typeof(&id[mutable])==int* function(int*)))
static assert(is(typeof(&id[immutable])==immutable(int)* function(immutable(int)*)))
static assert(is(typeof(&id[const])==const(int)* function(const(int)*)))

Let's now discuss how to get around the limitations.

  1. inout cannot be part of the type of a field.

The following is a simplified implementation of std.typecons.Tuple.

struct Tuple(T...){
    T expand;
}
auto tuple(T...)(T vals){
    return Tuple!T(vals);
}

Now, let's try to store values with parameterized mutability into the tuple:

Tuple!(mq(int)*) makePointerTuple[MutabilityQualifier mq](mq(int)* p){
    return tuple(p);
}

This will just work. How does that work?

Getting the details of the interplay between template instantiation and parametric polymorphism completely right is not entirely straightforward, but what would happen here is that tuple and Tuple would be instantiated with a generic parameter. This means, the instantiated versions of Tuple and tuple are:

struct TupleMqIntPtr[MutabilityQualifier mq]{
    AliasSeq!(mq(int)*) expand;
}
auto tupleMqIntPtr[MutabilityQualifier mq](mq(int)* p){
    return TupleMqIntPtr[mq](p);
}

The expanded version of makePointerTuple then reads:

TupleMqIntPtr[mq]
makePointerTuple[MutabilityQualifier mq](mq(int)* p){
    return tuple(p);
}

Note that now, we can apply different mutability qualifiers. The functions

&makePointerTuple
(&makePointerTuple)[mutable]
(&makePointerTuple)[immutable]
(&makePointerTuple)[const]

have types

TupleMqIntPtr[mq] function[MutabilityQualifier mq](mq(int)*)
TupleMqIntPtr[mutable] function(int*)
TupleMqIntPtr[immutable] function(immutable(int)*)
TupleMqIntPtr[const] function(const(int)*)

This way, it is now actually possible to store values with unknown mutability within non-trivial data structures. This also illustrates that parametric polymorphism is compatible with virtual functions, unlike templates.

The function now behaves the way we want it to:

void main(){
    int x=1;
    immutable(int) y=2;
    const(int) z = 3;
    int* p = &x;
    immutable(int)* q = &y
    const(int)* r = &z;
    static assert(is(typeof(makePointerTuple(p)[0])==int*));
    static assert(is(typeof(makePointerTuple(q)[0])==immutable(int)*));
    static assert(is(typeof(makePointerTuple(r)[0])==const(int)*));
}

A similar technique can enable Voldemort structs whose definition depends on the generic parameter within parametrically polymorphic functions.

  1. There can be only one wildcard qualifier in scope at any given point (namely, inout).

In the extended D language, we can specify precisely the function signature that we want:

void assign[MutabilityQualifier mq1,MutabilityQualifier mq2]
           (ref mq1(int)* a,mq1(int)* b,ref mq2(int)* c,mq2(int)* d){
    a=b;
    c=d;
}

The example that demonstrated this inout limitation in relation to nested functions reads (note that now creating the tuple works without magic):

int x;
mq1(int)* foo[MutabilityQualifier mq1](mq1(int)* p){
    auto bar[MutabilityQualifier mq2](mq2(int)* q){
        return tuple(p,q); // ok
    }
    return bar(&x)[0]; // ok!
}

Now recall the type system unsoundness discussed above. In the extended D language, the example reads:

@safe:
mq1(int)* delegate[MutabilityQualifier mq1](mq1(int)*) dg;
mq2(int)* prepare[MutabilityQualifier mq2](mq2(int)* x){
    dg = y=>x; // error
    return x;
}

Now, the assignment does not compile. The type checker first figures out that the lambda should have an implicit generic parameter mq1 if it is to convert to the left hand side, and that the type of the y parameter should be mq1(int)*. Furthermore, this is also the return type of mq1. However, the type of x is mq2(int)* which is not compatible with mq1(int)*. Hence the compiler would print the following error message:

Error: cannot implicitly convert expression `x` of type `mq2(int)*` to `mq1(int)*`

Note however, that the function prepare is not prevented from assigning to dg, and it can even call it. The following code compiles:

@safe:
mq1(int)* delegate[MutabilityQualifier mq1](mq1(int)*) dg;
mq2(int)* prepare[MutabilityQualifier mq2](mq2(int)* x){
    dg = y=>y;
    return dg(x);
}

This behaviour is quite hard to get right with only the inout syntax.

  1. Delegate arguments and return values have their own inout.

This is easily fixed, because the new syntax allows both cases to be specified: The case when the delegate has the same mutability qualifier, as well as the case where it has its own generic parameter:

int delegate(int delegate(mq(int)*)) foo[MutabilityQualifier mq](mq(int)* x){
    return f => f(x);
}
void main(){
    int x;
    int bar(int* x){
        return *x;
    }
    foo(&x)(&bar); // ok
}

// the signature using `inout` is equivalent to the following wrong
// signature, that's why `inout` does not work here:
int delegate(int delegate[MutabilityQualifier mq2](mq2(int)*)) foo[MutabilityQualifier mq1](mq1(int)* x);

The case that DMD fixes with a strange typing rule correctly yields an error:

mq2(int)* delegate[MutabilityQualifier mq2](mq2(int)*) foo[MutabilityQualifier mq1](inout(int)* x){
    mq2(int)* bar[MutabilityQualifier mq2](mq2(int)* y){ return x; } // error
    return &bar;
}

If bar instead returns y, the code compiles and the return value of foo can be used polymorphically. (I.e., it does not degrade to const.)

The second example of inout type unsafety can be written as:

@safe:
int a;
immutable(int) b=2;

mq2(int)* delegate[MutabilityQualifier mq2](mq2(int)*)@safe delegate()@safe foo[MutabilityQualifier mq1](mq1(int)* y){
    mq2(int)* bar(mq2(int)* p){
        return y; // error: cannot convert `y` of type `mq1(int)*` to `mq2(int)*`
    }
    return ()=>&bar;
}

Note how this is also naturally caught.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment