Field | Value |
---|---|
DIP: | 1035 |
Review Count: | 1 |
Author: | Dennis Korpel dkorpel@gmail.com |
Implementation: | |
Status: | Post-Community 1 |
The memory-safety of a program depends on the ability of the programmer and the language implementation to maintain the run-time invariants of the program's data.
For built-in types, like arrays and pointers, the D compiler is aware of their run-time invariants, and can use compile-time checks to ensure they are maintained.
For user-defined types, however, these checks are not always sufficient.
In order to reliably maintain invariants beyond those that the compiler has hard-coded knowledge of, D programmers must resort to manual verification of @safe
code and defensive run-time checks.
This DIP proposes a new language feature, @system
variables, to address this lack of expressiveness in D's memory-safety system.
In @safe
code, @system
variables cannot be directly written to, and cannot have their values altered in uncontrolled ways via casting, overlapping, void
-initialization, etc.
As such, they can be relied upon to store data subject to arbitrary run-time invariants.
- Background
- Rationale
- Prior work
- Description
- Alternatives
- Breaking Changes and Deprecations
- Reference
- Copyright & License
- Reviews
D's memory safety system distinguishes between safe values, which can be used freely in @safe
code without causing undefined behavior, and unsafe values, which cannot.
A type that has only safe values is a safe type; one that has both safe and unsafe values is an unsafe type.
(For more detailed definitions of these and other related terms, refer to the Function Safety section of the D language spec.)
The D compiler has built-in knowledge of which types are safe and which are not. In broad terms, pointers, arrays, and other reference types are unsafe; integers, characters, and floating-point numbers are safe; and the safety of aggregate types is determined by the safety of their members.
A run-time invariant (or just "invariant") of a type is a rule that distinguishes between that type's safe and unsafe values. The values that satisfy the invariant are safe; those that do not are unsafe. It follows that any type with a run-time invariant is unsafe, and that a safe type has no run-time invariants.
To ensure that their invariants are not violated, the use of unsafe types is restricted in @safe
code.
Specifically:
- They cannot be void-initialized.
- They cannot be overlapped in a union.
- A
T[]
cannot be cast to aU[]
when U is an unsafe type. - Certain operators (pointer arithmetic, unsafe casts) are disallowed.
In the proposed changes, both the list of unsafe types and the the list of restrictions will be extended.
While the system described above works well for built-in types and their invariants, it does not provide any way for the programmer to indicate that a user-defined type has additional invariants that the compiler may not be aware of.
As a result, maintaining such invariants requires extra effort from the programmer.
For unsafe types, the programmer may be required to manually verify that those invariants are maintained in @safe
code.
For safe types, the programmer may additionally be required to insert defensive run-time checks to ensure that those invariants are maintained.
module intslice;
struct IntSlice
{
private int* ptr;
private size_t length;
@safe
this(int[] src)
{
ptr = &src[0];
length = src.length;
}
@trusted
ref int opIndex(size_t i)
{
assert(i < length);
return ptr[i];
}
}
Invariant: The value of length
must be equal to the length of the array pointer to by ptr
.
First, observe that this code is memory-safe as-written (modulo bugs in the compiler).
There are only two functions that directly access ptr
and length
, and both of them correctly maintain the invariant.
However, in order to prove that this code is memory-safe, it is not sufficient for the programmer to verify the correctness of its @trusted
functions.
Instead, every function that touches ptr
and length
, including the @safe
constructor, must be manually checked.
If ptr
and length
were @system
variables, then all code that directly accesed them would have to be @trusted
, and the programmer would not need to manually verify any @safe
code in order to prove that IntSlice
's invariant is maintained.
The same general pattern occurs with other user-defined types whose invariants involve the relationship between two or more variables, such as tagged unions and reference-counted smart pointers.
module shortstring;
struct ShortString
{
private ubyte length;
private char[15] data;
@safe
this(const(char)[] src)
{
assert(src.length <= data.length);
length = cast(ubyte) src.length;
data[0 .. src.length] = src[];
}
@trusted
const(char)[] opIndex() const
{
// should be ok to skip the bounds check here
return data.ptr[0 .. length];
}
}
Invariant: length <= 15
Once again, there is a constructor that establishes an invariant, and a member function that relies on the invariant to do its work. Unlike in the previous example, however, this code is not memory-safe as-written, though it may appear to be at first glance.
To understand why, consider the following program, which uses ShortString
to cause undefined behavior in @safe
code:
@safe
void main()
{
import shortstring;
import std.stdio;
ShortString oops = void;
writeln(oops[]);
}
void
-initializing a ShortString
will very likely produce an instance that violates its invariant.
Because opIndex
relies on that invariant to skip the bounds check, this results in an out-of-bounds memory access rather than a safe, predictable crash.
Why does the compiler allow a ShortString
to be void
-initialized in @safe
code?
Because, according to the rules in the language spec, a struct
containing only ubyte
and char
data is a safe type, and therefore must not have any invariants.
It follows that @safe
code is free to initialize a ShortString
to any value, including an unspecified one, without risking memory corruption.
In order to make this code memory-safe, the programmer must include an additional bounds check in opIndex
:
@safe
const(char)[] opIndex() const
{
return data[0 .. length];
}
This solution is unsatisfying: the program must do redundant work at run-time to compensate for the language's lack of expressiveness, or give up on the guarantees of @safe
.
If ShortString.length
could be marked as @system
, this dilemma would not exist.
The same general pattern occurs with other user-defined types that attempt to impose invariants on types the compiler considers "safe", such as enum
types used in final switch
statements and integer "handles" used as array indices by external libraries.
The need for encapsulation of data / restricted access to data in order to achieve memory safety has been mentioned in several discussions:
-
#8035: tupleof ignoring private shouldn't be accepted in
@safe
code (March 15, 2018) -
Re: shared - i need it to be useful (October 22, 2018)
-
Re: Manu's
shared
vs the @trusted promise (October 23, 2018) -
Re: Both safe and wrong? (February 7, 2019)
-
Should modifying private members be @system? (October 4, 2019)
-
Borrowing and Ownership (October 27, 2019)
-
#7347: Fix issue 20495 (choose copies unused union member, which is unsafe) (January 9, 2020)
-
Re: @trusted attribute should be replaced with @trusted blocks (January 16, 2020)
Many other languages either do not allow systems programming at all (e.g. Java, Python) or do not support language enforced memory safety (e.g. C/C++).
A notable exception is Rust, where the equivalent of this DIP has been proposed multiple times before: Unsafe fields #381
Some excerpts from the discussion there are:
OTOH, privacy is primarily intended for abstraction (preventing users from depending on incidental details), not for protection (ensuring that invariants always hold). The fact that it can be used for protection is basically an happy accident. To clarify the difference, C strings have no abstraction whatever - they are a raw pointer to memory. However, they do have an invariant - they must point to a valid NUL-terminated string. Every place that constructs such a string must ensure it is valid, and every place that consumes it can rely on it. OTOH, a safe, say, buffered reader needs abstraction but doesn't need protection - it does not hold any critical invariant, but may want to change its internal representation.
This doesn't seem very useful to me. Within a module I would expect the authors to know what they're doing, and the unit-tests to save them when they do not. For other users, you could simply introduce getters and setters, and functions/methods can already be marked unsafe.
Ultimately the proposal has not been accepted yet.
The idea of using private
instead of @system
variables for D will be discussed in the alternatives section.
More information about Rust's stand on unsafe functions can be found here:
Before the proposed changes, here is an overview of the relevant existing rules of what declarations can have the @system
attribute.
@system int w = 2; // compiles, does nothing
@system enum int x = 3; // compiles, does nothing
enum E {
@system x, // error: @system is not a valid attribute for enum members
y,
}
@system alias x = E; // compiles, does nothing
@system template T() {} // compiles, does nothing
void func(@system int x) // error: @system attribute for function parameter is not supported
{
@system int x; // compiles, does nothing
}
template Temp(@system int x) {} // error: basic type expected, not @
In short, anything that can be marked private
can also be marked @system
.
Additionally, local variables can be marked @system
(while they cannot be marked private
).
Any function attribute can be attached to a variable declaration, but they cannot be retrieved:
@system @nogc pure nothrow int x;
pragma(msg, __traits(getFunctionAttributes, x)); // Error: first argument is not a function
pragma(msg, __traits(getAttributes, x)); // tuple()
(0) Writing to variables or fields marked @system
is not allowed in @safe
code
Examples:
@system int x;
struct S {
@system int y;
}
S s;
void main() @safe {
x += 10; // error: cannot modify @system variable 'x'
s.y += 10; // error: cannot modify @system field 'y'
@system int z;
z += 1; // error: cannot modify @system variable 'z'
}
// inferred as a @system function
auto foo() {
x = 0;
}
Further operations disallowed in @safe
code on @system
variables or fields are:
- creating a mutable pointer to it using
&
- passing it as an argument to a function parameter marked
ref
withoutconst
- returning it by
ref
withoutconst
When using an alias
to a @system
variable, that alias has the same restrictions as the symbol it aliases to.
@system int x = 3;
alias xAlias = x;
void increment(ref int x) @safe {
x++;
}
void checkX(const(int)* x) @safe {
assert(*x < 10);
}
void main() @safe {
xAlias += 1; // error, cannot modify `@system` variable `x`
increment(xAlias); // error, cannot take mutable reference of `@system` variable `x`
checkX(&x); // fine, because the parameter is const. Otherwise it would be an error.
}
Initialization of a @system
variable or field is allowed in @safe
code.
This includes static initializtion, the automatically generated constructor, user-defined constructors, and the .init
value of a type.
@system int x;
shared static this() @safe {
x = 3; // allowed, this is initialization
x = 3; // second time disallowed, this is assignment to a `@system` variable
}
struct T {
@system int y;
@system int z = 3; // allowed
this(int y, int z) @safe {
this.y = y; // allowed, this is initialization
this.y = y; // second time disallowed, this is assignment to a `@system` variable
this.z = z; // disallowed, this is assignment
}
}
struct S {
@system int y = 2;
}
void main() @safe {
S s0 = {y: 3}; // static initialization
S s1 = S(3); // automatically generated constructor
S s2 = S.init; // .init value
S s3; // same as above
s3 = s2; // disallowed
}
Note that while it may be desirable to require a @trusted
annotation near initialization of @system
variables, realizing this is problematic since there is no syntax for @trusted
assignment.
@trusted
as a function annotation has its limitations:
- it does not work for global or local variables, since a
@trusted
lambda there would move the declaration to that function's scope. - it not only trusts initialization of the variable on left hand side of the
=
, but also the initialization expression on right hand side. - it disables
scope
/return scope
checks of-dip1000
struct S {
this(ref scope S s) @system {
*(cast(int*) 0xDEADBEEF) = 0;
}
}
struct Wrapper(T) {
@system T t;
this(T t) @trusted {
this.t = t; // Oops! Calls a `@system` copy constructor
}
}
void main() @safe {
auto w = Wrapper!S(S.init); // program killed by signal 11
() @trusted {@system int x = 3;}();
// x is not in scope anymore
}
@system int x = (() @trusted => 3)(); // this still does not mark the assignment `@trusted`
//() @trusted {@system int x = 3;}(); // does not work
(1) An aggregate with at least one @system
field is an unsafe type
It gets the same restrictions as existing unsafe types:
struct Handle {
@system int handle;
}
void main() @safe {
Handle h = void; // error
union U {
Handle h;
int i;
}
U u;
u.i = 3; // error
ubyte[Handle.sizeof] storage;
auto array = cast(Handle[]) storage[]; // error
}
Without this, implicit writes to @system
variables are still possible.
(2) Reading from variables or fields marked @system
is not allowed in @safe
code if their type is unsafe
While writing to a @system
variable is always unsafe, reading from one is only dangerous when it could yield an unsafe value.
struct Handle {
@system int handle;
}
// struct with @system field is an unsafe type
@safe Handle safeHandle = Handle(1);
@system Handle systemHandle = Handle(-1);
// pointers are an unsafe type
@safe immutable int* safePtr = null;
@system immutable int* systemPtr = cast(int*) 0x8035FDF0;
// integers are a safe type
@safe int safeInt = 20;
@system int systemInt = 20;
void main() @safe {
Handle h0 = safeHandle; // allowed, @safe variable
Handle h1 = systemHandle; // error, reading @system var of unsafe type
immutable int* p0 = safePtr; // allowed, @safe variable
immutable int* p1 = systemPtr; // error, reading @system var of unsafe type
int i0 = safeInt; // allowed
int i1 = systemInt; // allowed, not an unsafe type
}
(3) Variables and fields without annotation are @safe
unless their initial value is not @safe
The rules regarding variables and fields are as follows:
- An initialization expression
x
is@system
when the function(() => x)
is inferred as@system
. - When marked
@system
, the result is always@system
regardless of the type. - When marked
@trusted
, the initialization expressionx
is treated as(() @trusted => x)
. - When marked
@safe
, the initialization expression must be@safe
. - In the absence of an annotation, the result is
@system
only if the type is unsafe and the initialization expression is@system
.
int* getPtr() @system {return cast(int*) 0x8035FDF0;}
int getVal() @system {return -1;}
extern int* x0; // @safe by default
int* x1 = x0; // @safe, (() => x0) is @safe
int* x2 = cast(int*) 0x8035FDF0; // @system, (() => cast(int*) 0x8035FDF0) is @system
int* x3 = getPtr(); // @system, (() => getPtr()) is @system
int x4 = getVal(); // @safe, int is not an unsafe type
@system int x5 = 1; // @system as requested
@trusted int* x6 = getPtr(); // @safe, the getPtr call gets trusted
@safe int* x7 = getPtr(); // error: cannot initialize @safe variable with @system initializer
struct S {
// same rules for fields:
int* x9 = x3; // @system
int x8 = x5; // @safe
}
An exception to the last rules is made on unsafe types when the compiler knows the resulting value is safe.
int* getNull() pure @system {return null;}
int* n = getNull(); // despite unsafe type with @system initialization expression, inferred as @safe
Annotations with a scope (@system {}
) or colon (@system:
) affect variables just like they do functions.
@system {
int y0; // @system
}
@system:
int y1; // @system
(4) __traits(getFunctionAttributes)
may be called on variables and fields
Currently it is possible to give function attributes to declarations that aren't functions. It is not possible however to inspect any of them.
@system @nogc pure nothrow int x;
pragma(msg, __traits(getFunctionAttributes, x)); // error: first argument is not a function
pragma(msg, __traits(getAttributes, x)); // tuple()
Since memory safety-related attributes now have an effect on variables and fields, it becomes useful to inspect them. Therefor the restriction on the getFunctionAttributes trait gets lifted.
The name "function attributes" is a bit unfortunate in this case, but this DIP does not aim to fix that.
There are no proposed grammar changes, since placing @system
annotations is already allowed on the places where it's needed for this DIP.
While the need for giving a way of ensuring struct
invariants in @safe
code is in line with this DIP, the idea to use private
for it is argued against.
First of all, disallowing bypassing private
in @safe
code is not sufficient for ensuring struct invariants.
As mentioned in the quote, sometimes invariants need to hold on types that are not unsafe such as int
.
When there are no pointer members, then the private fields can still be indirectly written to using overlap in a union, void-initialization or array casting.
Second, private
only acts on the module level, so a @trusted
member function cannot assume that a struct's invariants are upheld unless all other @safe
code in the module has been manually certified not to violate them.
This undermines the ability of the programmer to easily distinguish code requiring manual verification from code that can be checked automatically, especially since certain member functions like constructors, destructors, and operator overloads must be defined in the same module as the data they operate on.
Finally, it would mean that circumventing visibility constraints using __traits(getMember, ...)
must become @system
or deprecated entirely similar to .tupleof
.
This would break all (@safe
) code that uses this feature, and re-introduces the problems of issue 15371.
All things considered, relying on private
to maintain invariants appears to be a bigger hassle than introducing checks for @system
variables and fields.
It is already allowed to attach the @system
attribute to variables, but this didn't add any compiler checks.
The additional checks for @system
variables can cause existing @safe
code to break (note that @system
code is completely unaffected by everything in this DIP).
However, since @system
does not do anything, it is suspected that users didn't add this attribute to any variables at all, let alone variables that are meant to be used in @safe
code.
The biggest risk here is that variables accidentily fall inside a @system {}
block or under a @system:
section.
@system:
int x; // suddenly not writable in @safe code anymore
void unsafeFuncA() {};
void unsafeFuncB() {};
void main() @safe {
x++; // not allowed anymore
}
Misconstructed pointers can also be inferred @system
under the new rules.
struct S {
int* a = cast(int*) 0x8035FDF0;
}
void main() @safe {
S s;
*s.a = 0; // this gives an error now
int[1] intArr = [-1];
auto boolArr = cast(bool[]) intArr; // this too
}
Whenever this happens, there is a risk of memory corruption, so a compile error would be in its place.
In any case, a two-year deprecation period is proposed where instead of raising an error, a deprecation message is given whenever the new memory safety rules are broken.
A preview flag -preview=systemVariables
can also be added that immediately raises errors for violations while leaving other deprecation messages as warnings.
There will also be a flag to revert it, -revert=systemVariables
, so users can choose to keep the old behavior for a little longer.
- Safe Values
- What type soundndess theorem do you really want to prove?
- The scope of unsafe
- safe unsafe meaning
Copyright (c) 2019 by the D Language Foundation
Licensed under Creative Commons Zero 1.0
In the Feedback Thread, most of the feedback was related to details such as terminology, whether to use assert(x)
in the examples, etc.
The one structural piece of criticism was that making initialization of @system
variables safe is unsound, to wit, "Memory safety cannot depend on the correctness of a @safe
constructor." The DIP author replied that this boils down to "@trusted assumptions about @safe code", on which there is no consensus, and he has yet to determine a satisfactory design.
Of note, a detailed list of feedback was misplaced in the Discussion Thread. In short, the reviewer asserted that this proposal is essentially a response to bugs in the implementation of @safe
, and those bugs should be fixed rather than a new feature added to the language. Subsequent discussion appears to have led to consensus among the particpants that the DIP is necessary.