JavierZunzunegui/go2-readonly-and-immutability.md

## go2-readonly-and-immutability.md

      
    Raw
  

              go2-readonly-and-immutability.md
            
          
    proposal: Go2: read-only and immutability

original github issue and discussion
Abstract

This document presents a solution on how to introduce read only and immutability support in go2. It is split into two parts:

A read-only language change that provides the expected compile-time guarantees with limited additional complexity or inconvenience to developers.
How immutability may be implemented on top of these language change, transparently to developers.

This document and it's ideas build up on the Go 2: read-only types proposal by Jonathan Amsterdam, which is referred to as 'the original' throughout. It focuses on the two points above while deliberately omitting the more general questions 'why is read-only needed?', 'how is read-only different to immutability?', 'why a read-only and not an immutable keyword?', etc. These are discussed in the original proposal and follow up comments and don't need to be repeated here. I encourage anyone reading this proposal to also look at this previous one for additional context, although this one should be understandable on its own.
Other posts from which I may borrow or get inspiration are Evaluation of read-only slices, Go - Immutability or Why Const Sucks.
Language changes

The ro keyword

The fundamental change in this proposal is the ro (read-only) keyword, taken in almost the same form as in the original proposal. Some highlights that still hold:

Permissions are transitive: a component retrieved from a read-only value is treated as read-only.


There is an automatic conversion from T to ro T.


This proposal is concerned exclusively with avoiding modifications to values, not variables.


A value of read-only type may not be immutable, because it may be referenced through another type that is not read-only.

On the other hand, the ideas around 'Permission Genericity' in the original proposal are dropped (as well as the ro? syntax).
The meaning of ro applied to interfaces is also subtly changed. In the original it is described as:

If I is an interface type, then ro I is effectively the sub-interface that contains just the read-only methods of I.
If type T implements I, then type ro T implements ro I.

In this proposal the meaning is slightly different - for an interface I, ro I contains a ro T for T implementing I. If type T implements I, then type ro T implements ro I, and vice versa.
(edited)
Additionally, if T implements I and all methods in I have a ro receiver in T, then ro T also implements I.
Anonymous functions are not mentioned in the original proposal, but this one regards them as a write operation and therefore not callable in ro form.
relaxed typing rules for ro qualifiers

The introduction of the ro keyword is intended to bring all the compile-time read-only guarantees at the minimum cost to developers. To achieve it, this proposal departs from the original one and establishes that the typing rules for ro-qualified types are somewhat relaxed, as in many contexts they are interchangeable with their non-ro form. Conceptually, ro-qualifying introduces compile time restriction on what operations are allowed but are largely indistinguishable at runtime.
It is a given a T may be automatically converted to ro T, but not in reverse. Similarly special allowances are introduced for functions and methods, not otherwise allowed in go.
In more detail:
Any ro-qualified argument in a function may be given a non-ro argument:
func foo(ro X) {...}
x := X{}
foo(x) // legal
Any method requiring a ro-qualified receiver may be called on a non-ro receiver:
type foo struct {}
func (_ ro foo) Bar() {...}
f := foo{}
f.Bar() // legal
Any function may be converted to another similar function with some ro-qualifying argument(s) removed:
func foo(_ ro X) {...}
f := foo          // type func(ro X)
var _ func(X) = f // legal, type func(X)
Any method with a ro receiver or argument may implement a method without it:
type foo struct {}
func (_ ro foo) Bar1() {...}
func (foo) Bar2(_ ro X) {...}

type I interface {
    Bar1()
    Bar2(X)
}
var _ I = foo{} // legal
Similar rules apply to return values - a function or method with a non-ro return value can be converted or used to implement one where the return value is ro-qualified. The same rules apply to maps, arrays, slices, channels, i.e. []X may be used as an argument to []ro X, etc. Overall, the principle is that inputs may be relaxed to permit more general forms (non-ro), and outputs may be tightened to produce more specific forms (ro).
By having ro-qualifying not produce fully independent types the need for duplicate implementations (ro and non-ro forms) is averted and the read-only can be supported with very little additional complexity or work by the developer.
Note this is not overloading as only a single function or method may exist with a given name, and the same implementation will be used regardless of the ro-ness of the arguments.
To support this language changes the linker must be relaxed to allow linking when the function types differ only in ro-ness as described above.
Interface type assertion and ro

Fundamentally, the above ro syntax allowing seamless conversions around ro-ness relies on the runtime form of a type X being identical to that of ro X. This is trivially true for non-interface types, as in the go model and unlike many other languages the type is not embedded in their runtime value. This can even be trivially extended to interfaces with regards to methods being called - calling a method on a ro I can be implemented identically to calling the same method on I. The itab can be reused and the same function will be called regardless, while non-ro receiver methods can't be called in the first place so need no special handling. However, type assertion does not immediately follow. Consider the below:
i := interface{}(X{})  // represented in runtime as {type of X, value}
j := ro interface{}(i) // type ro interface, has same runtime representation as above {type of X, value}. The value within it though is not X but ro X
To satisfy this, asserting of ro-qualified interfaces to non-ro qualified struct is illegal. When asserting a ro I to a ro-qualified struct ro X the the runtime must confirm the runtime type within the interface is X, even though it is returning a ro X. For example:
i := interface{}(X{})  // X in an interface
j := ro interface{}(i) // ro X in a ro interface, represented in memory as {type of X, value}
_, _ = j.(X)           // illegal, ro interface can't be a non-ro struct
_. ok := j.(ro Y)      // legal but false, the runtime type in the interface is X not Y
_. ok := j.(ro X)      // legal and true, the runtime type in the interface is X which is implicitly ro X
The assertion of ro-interfaces to other ro-interfaces is fundamentally the same as that of non-ro interfaces, just in ro space:
x := X{}           // assume X implements interfaces I1 and I2, but not I3
i := ro I1(x)      // type ro I1, represented in memory as {type of X, value}
_, ok := i.(ro I3) // legal but false, X does not implement I3
_, ok := i.(ro I2) // legal and true, X does implement I2
(edited)
There is also the case ro T implementing I(which occurs when T has ro-receivers on all methods required by I), which means I is implemented by both the ro and non-ro form. The fundamental property they must guarantee is that a ro T assigned to one such I may not be asserted back to a T, nor be asserted to any interface a ro T would not satisfy. Since both T and ro T may be contained by such an I and yet the distinction must be made by the runtime, they must be represented differently within the interface's type value at runtime and interpreted correctly while asserting. For example:
// Foo has non-ro receivers
type Foo interface {
    Bar()
}

// foo1 implements Foo and ro Foo
type foo1 struct{...}
func (_ ro foo) Bar() {...}

// foo2 implements Foo but not ro Foo
type foo2 struct{...}
func (_ foo) Bar() {...}

var f Foo

f = foo2{}
_, ok := f.(ro foo2) // illegal, does not implement it
f = ro foo2{}        // illegal, does not implement it

f = foo1{}          // foo1 in a Foo, represented in memory as {type of foo, value}
_, ok := f.(ro foo1) // legal but false, runtime type is foo
_, ok := f.(foo1)    // legal and true, runtime type is foo

f = ro foo1{}       // ro foo in a Foo, interface representation in runtime is {type of ro foo, value}
_, ok := f.(ro foo1) // legal and true, runtime type is ro foo
_, ok := f.(foo1)    // legal but false, runtime type is ro foo
To support this language changes, the runtime must be modified to correctly implement this new type asserting rules.
Remarks

The empty interface

The empty interface interface{} still matches everything and can always be asserted back to the exact type that was assigned to it. No change in that respect, it can still be used as a reversible wildcard type match.
Migration

(edited)
These changes, in it's rawest form, are backwards compatible in that existing go code will work for it.
However many methods and types (and some interfaces, though those may be rarer) should be modified to reflect the new ro functionality. A key aspect though is that they could be migrated somewhat gradually with modest (if any) breaking changes at each step. In particular note that there is no cost in making the arguments of functions ro when not in breach of read-only rules in the first place (including receivers for methods). This way, the methods and functions lowest in the stack can be migrated first, and then those relying on those, etc. gradually progressing up the stack. Interfaces will not normally require an update (as ro types may implement non-ro interfaces), though in some cases they may want to be updated regardless (such as for immutability purposes), although that will likely result in backward incompatible changes.
Code relying on type assertion may also need re-visiting given interfaces formerly using T may be migrated to be ro T instead and assertion on T will no longer work.
The case for mutable keyword

Adding ro can be a very limiting restriction and an exception to the rule may be wanted, such as a mut (mutable) keyword. This is done in some other languages with similar features. I believe adding this is a bad idea but it is worth debating given its implication for establishing best practices around ro.
ro []ro T vs ro []T

For slices I have taken the ro syntax of the original proposal, namely []ro T to mean 'slice of read-only Ts' and ro []T to mean 'read-only slice of read-only Ts'. Alternatively, ro []ro T could take that meaning, and ro []T becomes 'read-only slice of Ts'. While more verbose, this syntax is more expressive and powerful. I believe either this or comparable solutions should be explored, and possibly also for maps and arrays.
Immutability

Transparent feature on top of ro

Consider the following:
IF the compiler can assert all references to a ro variable are also ro themselves, it can treat the variable as immutable
Note here the reference to compiler - it is not the developer who decides if a ro variable is, or even might be, immutable, but the compiler. The immutable property is completely transparent to developers and is irrelevant in all other regards other than performance - the logic of a method is identical regardless. To achieve this, the compiler must effectively do escape analysis on ro's - any non ro-escaping ro variable can be treated as immutable, but one that ro-escapes can't. This is very similar to go's heap escape analysis - whether a variable ends up in heap or stack - which is up to the compiler and irrelevant for all (non-performance) purposes to the developer.
In other languages immutability often means the memory is only written to once (during initialisation) and remains unchanged until reclaimed (const in C, final in java, etc). This is not the case in this proposal - non-immutable memory may become immutable after it has been initialised or even modified, and mutable ro variables may coexist and reference the same memory as immutable ro variables (though no non-ro variables). In this sense immutable ros are not like const (in the current go sense, not referring to C-style const).
Initialisation

Initialising immutable ros (via ro return statement)`:
// an immutable ro
func foo() /* im */ ro []int {
  out := []int{1}      // not immutable, not even ro
  out = append(out, 1) // not immutable or ro (and in fact is being modified!)
  return out           // ro AND immutable, despite having been non-immutable (and non-ro) before. Has NOT been deep-copied
}

// an immutable ro, which may be referenced by another ro (mutable or not, irrelevant here)
func bar() /* im */ ro []int {
  // note ro argument
  auxFunc := func(x ro []int) {...}

  out := []int{1} // not immutable, not even ro
  auxFunc(out)    // out escapes, but in ro form
  return out      // ro AND immutable, despite having been non-immutable (and non-ro) before and being referenced by another ro. Has NOT been deep-copied
}

// ro but non-immutable, may have non-ro references remaining
func nonImmutable() /* mut */ ro []int {
  auxFunc := func(x []int) {...} // note non-ro argument

  out := []int{1} // not immutable, not even ro
  auxFunc(out)    // out escapes in non-ro form
  return out      // ro but NON immutable as it may be referenced by a non-ro. Has NOT been deep-copied
}
Initialising immutable ros (via non-ro return statement and casting afterwards):
// non ro-escaping return value
func foo() []int {
  out := []int{1}
  return out // not ro but non ro-escaping
}

var _ /* im */ ro []int = foo() // immutable as return value not available after assignment

x := foo()
var _ /* mut */ ro []int = x // mutable as x is ro-escaping this assignment
// some non-ro use of x

// ro-escaping return value
func bar() []int {
  auxFunc := func(x []int) {...}

  out := []int{1}
  auxFunc(out) // causes ro-escaping
  return out   // not ro and ro-escaping
}

var _ /* mut */ ro []int = bar() // mutable as return value is already ro-escaping
Implementation

As stated before, ro is primarily a means of restricting what operations are allowed, but is virtually irrelevant at runtime. Immutability is the exact opposite, it is transparent to the developer but modifies how the code is executed. For this reason, while ro-qualified types are separate types, immutability is not. Instead it is an optimisation built into the binary, whereby function calls with inputs known by the compiler to be immutable may be linked to alternative, optimized implementations of the same function. This is similar to overloading in that a single function may have multiple implementations, but differs in that it is not the developer but the compiler that picks which one is used.
I am not certain my understanding about the go runtime, compiler and linker are entirely correct here, please share if you know better.
Some changes to both the compiler and the linker would be needed to support this 'immutability from read-only' feature'. In particular methods with ro inputs (methods that have some ro argument or receiver) would require the compiler creating multiple implementations, potentially one for each ro combination. It would then be up to the linker to decide which is called based on the immutability of the arguments and target (as known at compile/link time). For example:
func foo(ro []int) {...}
would be compiled in 2 separate ways, identical in all ways except the immutable one may include the additional optimisations:
/* compiler pseudo-code, not actual go syntax */
// compiler: foo#0 func(mut ro []int) {...}
// compiler: foo#1 func(im ro []int) {...}
It would be up to the linker to link to the right one:
foo([]int{1}) // uses immutable implementation foo#1

x := []int{1}
foo(x) // uses mutable implementation foo#0
// some other non-ro use of x
This property must extend to interfaces. There will be nothing in the runtime value of an interface that states it's immutability state, hence the compiler/linker must be the one that decides again. I think this can be done by modifying the use of itab (or a similar new type, say itab2) to support not just multiple types but also multiple mutability-status on these types. When a ro input method is called on an interface the type resolution getitab (or a similar new method, say getitab2) will be called with some additional argument that denotes it's immutability state, imState. That way it can resolve the correct method to call, including immutability status. For example:
type foo struct {}

/* compiler pseudo-code, not actual go syntax */
// compiler: foo:Bar#0 func(mut ro []int) {...}
// compiler: foo:Bar#1 func(im ro []int) {...}
func (foo) Bar(ro []int) {...}

type foo2 struct {}

/* compiler pseudo-code, not actual go syntax */
// compiler: foo2:Bar#0 func(mut ro []int) {...}
// compiler: foo2:Bar#1 func(im ro []int) {...}
func (foo2) Bar(ro []int) {...}

type Foo interface {
    // compiler: itab2 [imState][type]
    //           itab2 [0][0] = foo:Bar#0
    //           itab2 [0][1] = foo2:Bar#0
    //           itab2 [1][0] = foo:Bar#1
    //           itab2 [1][1] = foo2:Bar#1
    Bar(ro []int)
}

f := Foo(foo{})

// immutable
x := ro /* im */ []int{1}

// immutable input => imState=1.
// imState is a new argument to getitab2 (otherwise equal to getitab)
// the value of imState known at compile time and is different constant for each separate form of a ro input method (#0, #1 in my syntax).
// the below interface therefore calls
// getitab2(*interfacetype, *_type, bool, imState=1)
// which then resolves to the immutable form of the specific method, i.e. either foo:Bar#1 or foo2:Bar#1
// given the type of f (this is still, of course, evaluated at runtime) it calls foo:Bar#1, the immutable form of the method
f.Bar(x)
Note that the below only has one ro argument, but of course methods may have multiple ones, and the receiver itself may be ro, sogetitab2 must support different values for imState. How best to define those I leave as a future task and should be explored more carefully.
Remarks

How meaningful are the performance gains?

The purpose of immutability is to provide performance gains by avoiding unnecessary memory reads. How significant those gains are is not obvious, though. It should be investigated before commiting to this changes.
Binary bloating

If all the above changes are implemented, these may have the unintended consequence of bloating the binary as functions may have multiple implementations. Immutability is, however, at the compiler/linker's discretion and so it may decide to drop immutable implementations of methods that do not get called often or are otherwise large and provide little gain, and instead use the mutable forms.
Compile time impact

Build time of individual packages may be negatively affected as multiple implementations may need to be built for a single method, as well as an additional step of escape analysis of ro. Longer build time for binaries will inevitably follow.
Benchmarking and immutability

Given immutability is transparent to the developer but improves performance, it may be a double edge sword when running benchmarks. It is possible for a benchmark to be run on immutable inputs but then the binary does not (or vice-versa), present misleading results with little visibility to the developer.
Some risk mitigation best practices should be explored.
Immutable parameters

So far immutability has covered arguments to functions, but it should also cover parameters held within other types. For example, in the below, can parameter t be considered immutable?
type X struct {
    t ro T // immutable or mutable?
    ...
}
It is generally impossible for the compiler to identify where such X was initialised or modified, and therefore must assume the parameter is mutable. An exception can be made when all writes to such parameter are known to set immutable values, in which case the parameter can be considered immutable. This can only realistically be done for private members, and even then while the parameter may have an immutable value, it may still be changed for another immutable value. For example:
type X struct {
    t ro []int // immutable, because is private and all writes in the package set an immutable rp []int
    ...
}

x := X{
    t: []int{1,2} // t set to immutable slice
}
func(_ ro []int) {...} (x.t)  // called in it's immutable form, x.t is immutable
func(_ ro *[]int) {...} (x.t) // called in it's mutable form, &x.t is mutable
x.t = []int{1} // legal, t is re-set to another immutable slice
This makes the immutability of parameters a fairly limited property. One way to make it more powerful would be to allow ro to be applied to parameters in the following way:
type X struct {
    ro t []int // this would mean a read-only []int which can't be overwritten with different ro []int value. Also &x.t can now be considered immutable
    ...
}
For reference, this is similar to 'avoiding modifications to values, not variables' mentioned in the original proposal, but applied to parameters not variables.
Although this would also require restrictions on how dereference is done in go, i.e.
x := X{
    t: []int{1,2} // t set to immutable slice
}
xPtr := &x  // is still type *X, not ro *X
*xPtr = X{} // has to be made illegal, this overwrites x.t
This would be a significant change to the language with many breaking changes, so should not be seen as a requirement of this proposal.
Immutability Check

There may be situations when the runtime wants to validate the immutability state of a variable. A public method may be added to the runtime package for this purpose:
package runtime

/* compiler pseudo-code, not actual go syntax */
// compiler: IsImmutable#0 func(mut ro interface{}) bool {return false}
// compiler: IsImmutable#1 func(im ro interface{}) bool {return true}
func IsImmutable(ro interface{}) bool
The special thing about this method is that, while for all other methods immutability is considered merely a performance feature, in this the output actually changes. The mutable form of the function returns false, the immutable true.
This allows for the syntax below:
func  ToImmutable(x ro []int) /* im */ ro []int {
    if runtime.IsImmutable(x) {
        // input is immutable, no change needed
        return x
    }

    // input not immutable, deep-copy it
    copied := make([]int, len(x))
    copy(copied, x)
    return copied
}
Which in turns gives the option to actually enforce immutability to any given API (at the cost of deep coping in the inputs aren't immutable). Also note that the output of IsImmutable will be known at compile time anyways, so a smart compiler will inline it and simplify methods like ToImmutable.
For the same reason it could be added as a line directive which will fail compilation when it doesn't hold, say:
//go:immutable
var _ /* im */ ro []int = x
This could in turn provide the same (if more verbose) benefits of other proposals introducing immutable types. While I would make the case that this directive shouldn't be abused, it does ensure that there is support for it for the few times immutability is a serious concern for an API.
string vs  ro []byte

An immutable ro []byte is almost identical to a string, and opens  up for conversion between one and the other without any need for memory allocation and deep copying:
// string to ro []byte
s := "Hello World" // type string
_ = ro []byte(s)     // type ro []byte, is immutable. Does not require deep copying, the SliceHeader can use the same Data as that held by the string

// ro byte to string
imB := ro []byte{'a'} // immutable, ro []byte
_ = string(imB)       // string, does not require deep copying as the  input was immutable
mutB := /* assume some mutable ro []byte*/
_ = string(mutB)     // string, does require deep copying as any other []byte would in current go
This makes the conversion between string and ro []byte effectively free, and that from ro []byte to string potentially free, but depends on context.
Note this exact comparison is discussed extensively in Evaluation of read-only slices
Conclusion

Introducing read-only types to go2 has multiple advantages for go development in the form of clearer APIs and safer code. I believe the performance gains of immutability is also a feature of read-only types, even if it hadn't been recognised as such before. The ideas introduced or expanded in this proposal are not complete, detailed or final, but target much of the concerns from the community and go project maintainers around introducing read-only types. Overall I hope this will lead to new discussion and maybe tilt the balance towards having read-only support added to go2.