Skip to content

Instantly share code, notes, and snippets.

@pbackus
Last active September 4, 2020 04:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pbackus/39b13e8a2c6aea0e090e4b1fe8046df5 to your computer and use it in GitHub Desktop.
Save pbackus/39b13e8a2c6aea0e090e4b1fe8046df5 to your computer and use it in GitHub Desktop.
Proposed Revisions for DIP 1035

@system Variables

Field Value
DIP: 1035
Review Count: 1
Author: Dennis Korpel dkorpel@gmail.com
Implementation:
Status: Post-Community 1

Abstract

The memory-safety of a program depends on the ability of the programmer and the language implementation to maintain the run-time invariants of the program's data.

For built-in types, like arrays and pointers, the D compiler is aware of their run-time invariants, and can use compile-time checks to ensure they are maintained. For user-defined types, however, these checks are not always sufficient. In order to reliably maintain invariants beyond those that the compiler has hard-coded knowledge of, D programmers must resort to manual verification of @safe code and defensive run-time checks.

This DIP proposes a new language feature, @system variables, to address this lack of expressiveness in D's memory-safety system. In @safe code, @system variables cannot be directly written to, and cannot have their values altered in uncontrolled ways via casting, overlapping, void-initialization, etc. As such, they can be relied upon to store data subject to arbitrary run-time invariants.

Contents

Background

D's memory safety system distinguishes between safe values, which can be used freely in @safe code without causing undefined behavior, and unsafe values, which cannot. A type that has only safe values is a safe type; one that has both safe and unsafe values is an unsafe type. (For more detailed definitions of these and other related terms, refer to the Function Safety section of the D language spec.)

The D compiler has built-in knowledge of which types are safe and which are not. In broad terms, pointers, arrays, and other reference types are unsafe; integers, characters, and floating-point numbers are safe; and the safety of aggregate types is determined by the safety of their members.

A run-time invariant (or just "invariant") of a type is a rule that distinguishes between that type's safe and unsafe values. The values that satisfy the invariant are safe; those that do not are unsafe. It follows that any type with a run-time invariant is unsafe, and that a safe type has no run-time invariants.

To ensure that their invariants are not violated, the use of unsafe types is restricted in @safe code. Specifically:

  • They cannot be void-initialized.
  • They cannot be overlapped in a union.
  • A T[] cannot be cast to a U[] when U is an unsafe type.
  • Certain operators (pointer arithmetic, unsafe casts) are disallowed.

In the proposed changes, both the list of unsafe types and the the list of restrictions will be extended.

Rationale

While the system described above works well for built-in types and their invariants, it does not provide any way for the programmer to indicate that a user-defined type has additional invariants that the compiler may not be aware of. As a result, maintaining such invariants requires extra effort from the programmer. For unsafe types, the programmer may be required to manually verify that those invariants are maintained in @safe code. For safe types, the programmer may additionally be required to insert defensive run-time checks to ensure that those invariants are maintained.

Example: User-Defined Slice

module intslice;

struct IntSlice
{
    private int* ptr;
    private size_t length;

    @safe
    this(int[] src)
    {
        ptr = &src[0];
        length = src.length;
    }

    @trusted
    ref int opIndex(size_t i)
    {
        assert(i < length);
        return ptr[i];
    }
}

Invariant: The value of length must be equal to the length of the array pointer to by ptr.

First, observe that this code is memory-safe as-written (modulo bugs in the compiler). There are only two functions that directly access ptr and length, and both of them correctly maintain the invariant.

However, in order to prove that this code is memory-safe, it is not sufficient for the programmer to verify the correctness of its @trusted functions. Instead, every function that touches ptr and length, including the @safe constructor, must be manually checked.

If ptr and length were @system variables, then all code that directly accesed them would have to be @trusted, and the programmer would not need to manually verify any @safe code in order to prove that IntSlice's invariant is maintained.

The same general pattern occurs with other user-defined types whose invariants involve the relationship between two or more variables, such as tagged unions and reference-counted smart pointers.

Example: Short String

module shortstring;

struct ShortString
{
    private ubyte length;
    private char[15] data;

    @safe
    this(const(char)[] src)
    {
        assert(src.length <= data.length);

        length = cast(ubyte) src.length;
        data[0 .. src.length] = src[];
    }

    @trusted
    const(char)[] opIndex() const
    {
        // should be ok to skip the bounds check here
        return data.ptr[0 .. length];
    }
}

Invariant: length <= 15

Once again, there is a constructor that establishes an invariant, and a member function that relies on the invariant to do its work. Unlike in the previous example, however, this code is not memory-safe as-written, though it may appear to be at first glance.

To understand why, consider the following program, which uses ShortString to cause undefined behavior in @safe code:

@safe
void main()
{
    import shortstring;
    import std.stdio;

    ShortString oops = void;
    writeln(oops[]);
}

void-initializing a ShortString will very likely produce an instance that violates its invariant. Because opIndex relies on that invariant to skip the bounds check, this results in an out-of-bounds memory access rather than a safe, predictable crash.

Why does the compiler allow a ShortString to be void-initialized in @safe code? Because, according to the rules in the language spec, a struct containing only ubyte and char data is a safe type, and therefore must not have any invariants. It follows that @safe code is free to initialize a ShortString to any value, including an unspecified one, without risking memory corruption.

In order to make this code memory-safe, the programmer must include an additional bounds check in opIndex:

@safe
const(char)[] opIndex() const
{
    return data[0 .. length];
}

This solution is unsatisfying: the program must do redundant work at run-time to compensate for the language's lack of expressiveness, or give up on the guarantees of @safe. If ShortString.length could be marked as @system, this dilemma would not exist.

The same general pattern occurs with other user-defined types that attempt to impose invariants on types the compiler considers "safe", such as enum types used in final switch statements and integer "handles" used as array indices by external libraries.

Prior work

The need for encapsulation of data / restricted access to data in order to achieve memory safety has been mentioned in several discussions:

Other languages

Many other languages either do not allow systems programming at all (e.g. Java, Python) or do not support language enforced memory safety (e.g. C/C++).

A notable exception is Rust, where the equivalent of this DIP has been proposed multiple times before: Unsafe fields #381

Some excerpts from the discussion there are:

OTOH, privacy is primarily intended for abstraction (preventing users from depending on incidental details), not for protection (ensuring that invariants always hold). The fact that it can be used for protection is basically an happy accident. To clarify the difference, C strings have no abstraction whatever - they are a raw pointer to memory. However, they do have an invariant - they must point to a valid NUL-terminated string. Every place that constructs such a string must ensure it is valid, and every place that consumes it can rely on it. OTOH, a safe, say, buffered reader needs abstraction but doesn't need protection - it does not hold any critical invariant, but may want to change its internal representation.

source

This doesn't seem very useful to me. Within a module I would expect the authors to know what they're doing, and the unit-tests to save them when they do not. For other users, you could simply introduce getters and setters, and functions/methods can already be marked unsafe.

source

Ultimately the proposal has not been accepted yet. The idea of using private instead of @system variables for D will be discussed in the alternatives section. More information about Rust's stand on unsafe functions can be found here:

Description

Existing rules for @system

Before the proposed changes, here is an overview of the relevant existing rules of what declarations can have the @system attribute.

@system int w = 2; // compiles, does nothing
@system enum int x = 3; // compiles, does nothing
enum E {
    @system x, // error: @system is not a valid attribute for enum members
    y,
}
@system alias x = E; // compiles, does nothing
@system template T() {} // compiles, does nothing

void func(@system int x) // error: @system attribute for function parameter is not supported
{
    @system int x; // compiles, does nothing
}
template Temp(@system int x) {} // error: basic type expected, not @

In short, anything that can be marked private can also be marked @system. Additionally, local variables can be marked @system (while they cannot be marked private).

Any function attribute can be attached to a variable declaration, but they cannot be retrieved:

@system @nogc pure nothrow int x;
pragma(msg, __traits(getFunctionAttributes, x)); // Error: first argument is not a function
pragma(msg, __traits(getAttributes, x)); // tuple()

Proposed changes

(0) Writing to variables or fields marked @system is not allowed in @safe code

Examples:

@system int x;

struct S {
    @system int y;
}

S s;

void main() @safe {
    x += 10; // error: cannot modify @system variable 'x'
    s.y += 10; // error: cannot modify @system field 'y'

    @system int z;
    z += 1; // error: cannot modify @system variable 'z'
}

// inferred as a @system function
auto foo() {
    x = 0;
}

Further operations disallowed in @safe code on @system variables or fields are:

  • creating a mutable pointer to it using &
  • passing it as an argument to a function parameter marked ref without const
  • returning it by ref without const

When using an alias to a @system variable, that alias has the same restrictions as the symbol it aliases to.

@system int x = 3;
alias xAlias = x;

void increment(ref int x) @safe {
    x++;
}

void checkX(const(int)* x) @safe {
    assert(*x < 10);
}

void main() @safe {
    xAlias += 1; // error, cannot modify `@system` variable `x`
    increment(xAlias); // error, cannot take mutable reference of `@system` variable `x`
    checkX(&x); // fine, because the parameter is const. Otherwise it would be an error.
}

Initialization of a @system variable or field is allowed in @safe code. This includes static initializtion, the automatically generated constructor, user-defined constructors, and the .init value of a type.

@system int x;

shared static this() @safe {
    x = 3; // allowed, this is initialization
    x = 3; // second time disallowed, this is assignment to a `@system` variable
}

struct T {
    @system int y;
    @system int z = 3; // allowed
    this(int y, int z) @safe {
        this.y = y; // allowed, this is initialization
        this.y = y; // second time disallowed, this is assignment to a `@system` variable
        this.z = z; // disallowed, this is assignment
    }
}

struct S {
    @system int y = 2;
}

void main() @safe {
    S s0 = {y: 3}; // static initialization
    S s1 = S(3); // automatically generated constructor
    S s2 = S.init; // .init value
    S s3; // same as above
    s3 = s2; // disallowed
}

Note that while it may be desirable to require a @trusted annotation near initialization of @system variables, realizing this is problematic since there is no syntax for @trusted assignment. @trusted as a function annotation has its limitations:

  • it does not work for global or local variables, since a @trusted lambda there would move the declaration to that function's scope.
  • it not only trusts initialization of the variable on left hand side of the =, but also the initialization expression on right hand side.
  • it disables scope / return scope checks of -dip1000
struct S {
    this(ref scope S s) @system {
        *(cast(int*) 0xDEADBEEF) = 0;
    }
}

struct Wrapper(T) {
    @system T t;
    this(T t) @trusted {
        this.t = t; // Oops! Calls a `@system` copy constructor
    }
}

void main() @safe {
    auto w = Wrapper!S(S.init); // program killed by signal 11

    () @trusted {@system int x = 3;}();
    // x is not in scope anymore
}

@system int x = (() @trusted => 3)(); // this still does not mark the assignment `@trusted`
//() @trusted {@system int x = 3;}(); // does not work

(1) An aggregate with at least one @system field is an unsafe type

It gets the same restrictions as existing unsafe types:

struct Handle {
    @system int handle;
}

void main() @safe {
    Handle h = void; // error
    union U {
        Handle h;
        int i;
    }
    U u;
    u.i = 3; // error

    ubyte[Handle.sizeof] storage;
    auto array = cast(Handle[]) storage[]; // error
}

Without this, implicit writes to @system variables are still possible.

(2) Reading from variables or fields marked @system is not allowed in @safe code if their type is unsafe

While writing to a @system variable is always unsafe, reading from one is only dangerous when it could yield an unsafe value.

struct Handle {
    @system int handle;
}

// struct with @system field is an unsafe type
@safe   Handle safeHandle = Handle(1);
@system Handle systemHandle = Handle(-1);

// pointers are an unsafe type
@safe   immutable int* safePtr   = null;
@system immutable int* systemPtr = cast(int*) 0x8035FDF0;

// integers are a safe type
@safe   int safeInt   = 20;
@system int systemInt = 20;

void main() @safe {
    Handle h0 = safeHandle;        // allowed, @safe variable
    Handle h1 = systemHandle;      // error, reading @system var of unsafe type
    immutable int* p0 = safePtr;   // allowed, @safe variable
    immutable int* p1 = systemPtr; // error, reading @system var of unsafe type
    int i0 = safeInt;              // allowed
    int i1 = systemInt;            // allowed, not an unsafe type
}

(3) Variables and fields without annotation are @safe unless their initial value is not @safe

The rules regarding variables and fields are as follows:

  • An initialization expression x is @system when the function (() => x) is inferred as @system.
  • When marked @system, the result is always @system regardless of the type.
  • When marked @trusted, the initialization expression x is treated as (() @trusted => x).
  • When marked @safe, the initialization expression must be @safe.
  • In the absence of an annotation, the result is @system only if the type is unsafe and the initialization expression is @system.
int* getPtr() @system {return cast(int*) 0x8035FDF0;}
int  getVal() @system {return -1;}

extern int* x0;                   // @safe by default
int* x1 = x0;                     // @safe, (() => x0) is @safe
int* x2 = cast(int*) 0x8035FDF0;  // @system, (() => cast(int*) 0x8035FDF0) is @system
int* x3 = getPtr();               // @system, (() => getPtr()) is @system
int  x4 = getVal();               // @safe, int is not an unsafe type
@system int x5 = 1;               // @system as requested
@trusted int* x6 = getPtr();      // @safe, the getPtr call gets trusted
@safe int* x7 = getPtr();         // error: cannot initialize @safe variable with @system initializer

struct S {
    // same rules for fields:
    int* x9 = x3; // @system
    int  x8 = x5; // @safe
}

An exception to the last rules is made on unsafe types when the compiler knows the resulting value is safe.

int* getNull() pure @system {return null;}
int* n = getNull(); // despite unsafe type with @system initialization expression, inferred as @safe

Annotations with a scope (@system {}) or colon (@system:) affect variables just like they do functions.

@system {
    int y0; // @system
}

@system:
int y1; // @system

(4) __traits(getFunctionAttributes) may be called on variables and fields

Currently it is possible to give function attributes to declarations that aren't functions. It is not possible however to inspect any of them.

@system @nogc pure nothrow int x;
pragma(msg, __traits(getFunctionAttributes, x)); // error: first argument is not a function
pragma(msg, __traits(getAttributes, x)); // tuple()

Since memory safety-related attributes now have an effect on variables and fields, it becomes useful to inspect them. Therefor the restriction on the getFunctionAttributes trait gets lifted.

The name "function attributes" is a bit unfortunate in this case, but this DIP does not aim to fix that.

Grammar changes

There are no proposed grammar changes, since placing @system annotations is already allowed on the places where it's needed for this DIP.

Alternatives

Using private

While the need for giving a way of ensuring struct invariants in @safe code is in line with this DIP, the idea to use private for it is argued against.

First of all, disallowing bypassing private in @safe code is not sufficient for ensuring struct invariants. As mentioned in the quote, sometimes invariants need to hold on types that are not unsafe such as int. When there are no pointer members, then the private fields can still be indirectly written to using overlap in a union, void-initialization or array casting.

Second, private only acts on the module level, so a @trusted member function cannot assume that a struct's invariants are upheld unless all other @safe code in the module has been manually certified not to violate them. This undermines the ability of the programmer to easily distinguish code requiring manual verification from code that can be checked automatically, especially since certain member functions like constructors, destructors, and operator overloads must be defined in the same module as the data they operate on.

Finally, it would mean that circumventing visibility constraints using __traits(getMember, ...) must become @system or deprecated entirely similar to .tupleof. This would break all (@safe) code that uses this feature, and re-introduces the problems of issue 15371. All things considered, relying on private to maintain invariants appears to be a bigger hassle than introducing checks for @system variables and fields.

Breaking Changes and Deprecations

It is already allowed to attach the @system attribute to variables, but this didn't add any compiler checks. The additional checks for @system variables can cause existing @safe code to break (note that @system code is completely unaffected by everything in this DIP). However, since @system does not do anything, it is suspected that users didn't add this attribute to any variables at all, let alone variables that are meant to be used in @safe code. The biggest risk here is that variables accidentily fall inside a @system {} block or under a @system: section.

@system:

int x; // suddenly not writable in @safe code anymore
void unsafeFuncA() {};
void unsafeFuncB() {};

void main() @safe {
    x++; // not allowed anymore
}

Misconstructed pointers can also be inferred @system under the new rules.

struct S {
    int* a = cast(int*) 0x8035FDF0;
}

void main() @safe {
    S s;
    *s.a = 0; // this gives an error now
    int[1] intArr = [-1];
    auto boolArr = cast(bool[]) intArr; // this too
}

Whenever this happens, there is a risk of memory corruption, so a compile error would be in its place. In any case, a two-year deprecation period is proposed where instead of raising an error, a deprecation message is given whenever the new memory safety rules are broken. A preview flag -preview=systemVariables can also be added that immediately raises errors for violations while leaving other deprecation messages as warnings. There will also be a flag to revert it, -revert=systemVariables, so users can choose to keep the old behavior for a little longer.

Reference

Copyright & License

Copyright (c) 2019 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

Reviews

Community Review Round 1

Reviewed Version

Discussion

Feedback

In the Feedback Thread, most of the feedback was related to details such as terminology, whether to use assert(x) in the examples, etc.

The one structural piece of criticism was that making initialization of @system variables safe is unsound, to wit, "Memory safety cannot depend on the correctness of a @safe constructor." The DIP author replied that this boils down to "@trusted assumptions about @safe code", on which there is no consensus, and he has yet to determine a satisfactory design.

Of note, a detailed list of feedback was misplaced in the Discussion Thread. In short, the reviewer asserted that this proposal is essentially a response to bugs in the implementation of @safe, and those bugs should be fixed rather than a new feature added to the language. Subsequent discussion appears to have led to consensus among the particpants that the DIP is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment