Skip to content

Instantly share code, notes, and snippets.

@rogpeppe
Last active September 13, 2018 22:29
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rogpeppe/45f5a7578507989ec4ba5ac639ae2c69 to your computer and use it in GitHub Desktop.
Save rogpeppe/45f5a7578507989ec4ba5ac639ae2c69 to your computer and use it in GitHub Desktop.

Revised generics design

Roger Peppe, 2018-09-01

One common response to the new generics design has been that contracts seem a bit too much like interfaces but not the same.

They are indeed very similar if you squint a little. Consider these two definitions:

type IReader interface {
	Read([]byte)(int, error)
}

contract CReader(r R) {
	interface {
		Read([]byte)(int, error)
	}(r)
}

Both IReader and CReader define a kind of filter on types. They both allow exactly the same set of types, but IReader can only be used in a function parameter list, and CReader can only be used in a type parameter list.

I propose that we unify the concepts of contracts and interfaces while still keeping much of the originally designed syntax.

Proposed rule:

Any contract with exactly one type parameter and a body consisting entirely of type conversions to interface types is considered to be equivalent to an interface that embeds those interfaces.

This allows any interface type to be used in place of a contract, and a restricted subset of all contracts to be used as interfaces.

For example, this code becomes valid:

func ReadAll(type R io.Reader)(r R) ([]byte, error)

It feels like this invites us to adjust the syntax a little. Because a contract is now just a type (albeit a type that we can't always use in function parameters), let's use a type statement to define one.

type Convertible = contract(_ To, f From) {
	To(f)
}

So a contract type is syntactically like a function literal, except that it can only be used in type context, not in value context.

In the original proposal, any set of type parameters can have at most one contract, so if you need two type parameters that fulfil independent contracts, you need to define a single contract that contains both of them. Now that interfaces can act as contracts, that restriction seems even less palatable. If a function takes two unrelated type parameters, it doesn't seem quite right that we should need to define a new type to hold them both at once.

So I propose that for contracts involving two or more type parameters, the types must be parenthesized in the type parameter list.

For example:

type G = contract(n Node, e Edge) {
	var _ []Edge = n.Edges()
	var from, to Node = e.Nodes()
}

type Graph(type (Node, Edge) G) struct { ... }
func New(type (Node, Edge) G)(nodes []Node) *Graph(Node, Edge) { ... }

This means we can now use any number of contracts in a type parameter list. For example:

func DrawGraph(type (Node, Edge) G, I draw.Image)

With the new syntax comes the possibility that a contract itself might be able to refer to external type parameters. This is explicitly disallowed. A contract may not refer to any generic types not defined in the contract's parameter list.

type C(T) contract(x X) {
	T(x)		// INVALID
}

Contract bodies

From the proposal:

The simplest way to ensure that a function only performs the operations permitted by its contract is to simply copy the function body into the contract body. In other words, to make the function body be its own contract, much as C++ does. If people take this path, then this design in effect creates a lot of additional complexity for no benefit.

We think this is unlikely because we believe that most people will not write generic function, and we believe that most generic functions will have only non-existent or trivial requirements on their type parameters.

I propose that instead of allowing arbitrary code in contract bodies, we allow only an extremely restricted syntax. Every statement in a contract body must consist of exactly one expression that uses exactly one of the following operators:

all arithmetic and logical binary and unary operators
channel send, receive and close
array and map indexing
function call
type conversion
interface conversion

Every expression must reference at least one of the contract's parameters. Only externally defined type names may be referenced, not names of functions or variables. The restriction that only externally defined names may be used is dropped.

Note that this form of contract prohibits many of the constructions that were previously used. Notably:

  • no variable declarations
  • no field or method references
  • no range statements

Even with these restrictions, the syntax is powerful enough to express all the examples in the original proposal except for the counter example, which is explicitly about fields.

Some advantages that come from these restrictions:

  • one concept per line.

Each line specifies exactly one required behaviour. When there's a more complex relationship between types and methods, generic interfaces can pull their weight. This makes it easy to read contracts and see exactly what a contract requires or allows.

  • reorderable.

There is no ordering relationship between statements in the contract, so it's possible to reorder them arbitrarily without changing the meaning of the contract, making it possible to canonicalise and compare contracts easily.

  • no intermediate implied types.

With arbitrary contracts, it's possible to have many implied-but-unnamed types inside a contract. For example, consider this contract:

contract Indirect(a A, b B) {
	t1 := a.A()
	t2 := t1.M1()
	// any number...
	t3 := t2.M2()
	b = t2.M3()
}

In that code, all of t1, t2 and t3 have types implied by the values returned by a, t1, t2 etc. Any function using that contract can obtain instances of those types but not name them. This seems wrong to me, just as returning unexported types from exported methods is generally a bad idea. Also, the complexity of such a contract is hard to get one's head around!

  • Known cost of range

One thing that the original proposal mentioned was the possibility of using range on generic types. This means that you can have a range operator in a generic function but no idea whether it's ranging over a map, a channel or a slice, all of which have quite different properties. By eliminating range from the set of possible statements, this is no longer an issue.

  • No need for special rules about value methods

Since method calls are not allowed, there's no way that the Go implicit address-of rule comes into play, so there should be no need to have a special rule that says that value methods are allowed, so my suggestion in the Value Methods section above becomes redundant.

  • Straightforward runtime model

If we wish to generate the same code for a function instantiated with several different types, then there has to be some way to allow the function to manipulate these types. With the simpler form of contract, each statement of the contract can be implemented by a function that provides the behaviour for the statement's operation.

In a sense, each statement in the contract is similar to an entry point in an interface.

The lack of explicit field or method references mean that there's no need to deal with the ambiguity that field and method references in the original proposal lead to. In contracts as originally proposed, there's no way to tell if we're referring to a method or a field.

@extemporalgenome
Copy link

+, -, !, %, ^, ==, !=, >, >=, <, <=, <-, <-=, =, []
type conversion
interface conversion

Perhaps <-= is a typo? What about the other bit-wise operators (you already defined ^) that will allow defining contracts used in size-agnostic integer algorithms, such as << and &^? What about the other Boolean operators, such as ||? We can't expect that programmers should remember which operators are accepted in the contract mini-language and which are not, but we can reasonably expect them to remember the kinds of expressions accepted ("operators" and type and interface conversions).

Is assignment = provided to test assignability? If so, without var in a contract, it's unidirectional: you can have a contract line like x = MyByteSlice(nil) (requiring that x is either MyByteSlice, []byte, an interface, or a type alias to one of those), but you can no longer specify the equivalent to var _ []byte = x (requiring that x is anything with an underlying type of []byte, an interface, or a type alias to one of those). In other words, assignability rules are not commutative, and providing = without var means only partial coverage of assignability, unless you allow something like:

var throwaway []byte
contract AssignableToByteSlice(x X) {
  // throwaway is never assigned to by this contract;
  // throwaway is just used to work around contract syntax restrictions.
  throwaway = x
}

Therefore, I'd suggest getting rid of = or adding back var. The only other statement besides = would have been (Y)(nil) <- x, which is important for specifying in a contract that Y can be sent to, which one can't assume since the language has chan<- int. Assigning to an index can also be used to distinguish between a string and a slice (or a mutable slice and an immutable slice/array, if those ever come along).

In the absence of = we would have conversions, but those can be a bit blunt, and for better or worse, don't allow a contract to specify cheapness. For example, y = []X(nil) allows for algorithms that require only a type re-representation at compile time, but no copy at runtime due to identical memory layouts. Without assignment, the best that can be specified is Y(x), which would in a copy at runtime if exactly one of Y or X is a string, and the algorithm using that contract needs a value of type Y from an input of type X.

There is no ordering relationship between statements in the contract, so it's possible to reorder them arbitrarily without changing the meaning of the contract, making it possible to canonicalise and compare contracts easily.

Either way, it would be possible to define a canonical order of statements for a given contract using some of the same techniques that rsc.io/grind uses. That said, we don't canonicalise interface method order (should we?). The closest we've got to that is import re-ordering.

One thing that the original proposal mentioned was the possibility of using range on generic types. This means that you can have a range operator in a generic function but no idea whether it's ranging over a map, a channel or a slice, all of which have quite different properties. By eliminating range from the set of possible statements, this is no longer an issue.

Generic range algorithms will likely fall into two main categories: 1) utility algorithms, for which neither the element type nor the collection type is particularly important, and 2) those algorithms for which the collection type is assumed and only the element type can vary. I imagine most such uses of range in a contract will be in service of category #2. Even with range contracts, there are limitations in what can be expressed, so a "Values" function that takes a range-capable collection and returns the values as a slice could not possibly operate generically across channels and maps/slices. Conversely, an "Keys" function could operate on all three kinds, but it would actually be returning channel values. Both functions would be unusually expensive for slices, and both functions would use the wrong allocation hint if it used make([]Y, 0, len(x)) on a channel (and would of course either OOM or block forever on a channel that is never closed).

@rogpeppe
Copy link
Author

rogpeppe commented Sep 1, 2018

Thanks very much for your thoughtful feedback.

+, -, !, %, ^, ==, !=, >, >=, <, <=, <-, <-=, =, []
type conversion
interface conversion

Perhaps <-= is a typo?

Yes, it's a typo - my ancient history shows. <-= was the operator for sending on a channel in the Limbo language (a predecessor of Go).

What about the other bit-wise operators (you already defined ^) that will allow defining contracts used in size-agnostic integer algorithms, such as << and &^? What about the other Boolean operators, such as ||? We can't expect that programmers should remember which operators are accepted in the contract mini-language and which are not, but we can reasonably expect them to remember the kinds of expressions accepted ("operators" and type and interface conversions).

Yup, those were all of the top of my head. All bitwise and logical operators should be included. Function invocation should too.
And I've probably missed some other operators too.

Is assignment = provided to test assignability? If so, without var in a contract, it's unidirectional: you can have a contract line like x = MyByteSlice(nil) (requiring that x is either MyByteSlice, []byte, an interface, or a type alias to one of those), but you can no longer specify the equivalent to var _ []byte = x (requiring that x is anything with an underlying type of []byte, an interface, or a type alias to one of those). In other words, assignability rules are not commutative, and providing = without var means only partial coverage of assignability, unless you allow something like:

Yes, assignment is provided to test assignability, but you're absolutely right. I'd forgotten that
an explicit conversion is different from normal assignment. It is still possible within the
current rules in a few ways. For example:

func([]byte){}(x)
[][]byte{x}
map[bool][]byte{true:x}

None of those are particularly nice though. One possibility
might be to allow only variable declarations of the form var _ T = x.

There is no ordering relationship between statements in the contract, so it's possible to reorder them arbitrarily without changing the meaning of the contract, making it possible to canonicalise and compare contracts easily.

Either way, it would be possible to define a canonical order of statements for a given contract using some of the same techniques that rsc.io/grind uses. That said, we don't canonicalise interface method order (should we?). The closest we've got to that is import re-ordering.

The fact that in the original proposal statements can have arbitrary interdependency means that I think
it would be quite hard to provide a canonical representation. There are so many ways
that a given set of possible operations can be represented. My hope here is that by radically
restricting the form of contract statements, this becomes much less of a problem,
and contracts become much more understandable and amenable to tooling.

One thing that the original proposal mentioned was the possibility of using range on generic types. This means that you can have a range operator in a generic function but no idea whether it's ranging over a map, a channel or a slice, all of which have quite different properties. By eliminating range from the set of possible statements, this is no longer an issue.

Generic range algorithms will likely fall into two main categories: 1) utility algorithms, for which neither the element type nor the collection type is particularly important, and 2) those algorithms for which the collection type is assumed and only the element type can vary. I imagine most such uses of range in a contract will be in service of category #2. Even with range contracts, there are limitations in what can be expressed, so a "Values" function that takes a range-capable collection and returns the values as a slice could not possibly operate generically across channels and maps/slices. Conversely, an "Keys" function could operate on all three kinds, but it would actually be returning channel values. Both functions would be unusually expensive for slices, and both functions would use the wrong allocation hint if it used make([]Y, 0, len(x)) on a channel (and would of course either OOM or block forever on a channel that is never closed).

For me the disadvantages of not knowing the underlying semantics (do you get a consistent ordering? can it block indefinitely?) outweight the potential efficiency and expressivity advantages. YMMV :)

@extemporalgenome
Copy link

func([]byte){}(x)
[][]byte{x}
map[bool][]byte{true:x}

Ah, yes! Quite clever. You're right that it's not very readable, but then again, I imagine that testing for assignability is all that common of a need: numeric conversions are fairly cheap, while most non-numeric conversions only "convert at compile time", such as (*int)(PtrToMyInt), so assignability as a way to prevent especially expensive operations that may come from conversions would just apply operations that involve strings. Also, assignability wouldn't prevent the expense associated with assigning very large arrays or structs.

I imagine function literals wouldn't strictly be needed either (though of course anonymous function declarations would be needed).

For me the disadvantages of not knowing the underlying semantics (do you get a consistent ordering? can it block indefinitely?) outweight the potential efficiency and expressivity advantages. YMMV :)

Agreed. Note that even if contracts no longer allowed loops, len(t) does imply that t is suitable for passing to a range loop, since every type that can be passed to len can be the expression in a range loop.

@rogpeppe
Copy link
Author

rogpeppe commented Sep 2, 2018

Note that even if contracts no longer allowed loops, len(t) does imply that t is suitable for passing to a range loop, since every type that can be passed to len can be the expression in a range loop.

That's an interesting point, and leads to the more general question: which operators imply what possible operations? I think it's probably best to keep things as direct as possible. We really should spell out exactly all the rules. For example does x-x imply that you can do x+x because every type that you can do subtraction on, you can always do addition? I think not. If it starts to get tedious spelling out all the possible arithmetic operations, for example, we can use contract embedding to define a reusable contract that provides them:

package contracts
// Int provides all the usual arithmetic operations on integers.
type Int = contract(n N) {
    n = 0
    n + n
    n - n
    n * n
    n < n
    n & n
    n << uint(0)
    []int{}[n]
    // etc
 }

@extemporalgenome
Copy link

Thinking about it more, either equivalence rules or contract embedding for this purpose seem like too much work. Is x + x also allowed any time x ^ x is allowed? Currently, yes. But if Go is given [hopefully restricted] operator overloading at any point, then either the choice to rely on equivalence rules ("compatibility tables") becomes a mistake overnight (a type might implement xor, but not add), or alternatively types are forced to additionally implement all operators/operations implied by the set of operators they wanted to provide.

Or we'd have to have default operation implementations where it's possible to compose them of defined operations. For example, subtraction could be provided a default implementation based on addition and negation. This of course would assume that the operations are behaving properly according to Go rules. Such assumptions/restrictions are probably a good thing.

@rogpeppe
Copy link
Author

rogpeppe commented Sep 3, 2018

That's an interesting thought. The idea that operations specified in contracts might somehow map nicely to Go method calls has been bouncing around in my head over the last few days and it's just crystallised into something...

[Edit] I originally wrote a load of text about the idea here, but it seemed out of place, so I moved
it to a separate gist: Operator overloading in Go

@stevenblenkinsop
Copy link

I don't think it makes sense to use type declaration syntax for contracts, since you can't actually use a contract as a type directly. Yes, an interface is a type, and it makes sense to also be able to use an interface as a contract, but I don't think that makes contracts a kind of type. Interfaces work as types because they inherently talk about the operations available on an individual value, with each value offering those operations being assignable to the interface type. With contracts, you're potentially talking about the interactions between multiple values of the same or different types, so there's no notion of what values the contract itself would represent.

I find it interesting that you used the implied-but-unnamed types introduced by := syntax to justify restricting short declarations in contracts. I wrote a (probably too long-winded) response where I went the other way by seeking to provide a way to give these types a name. I do agree with restricting contracts to a subset of the language so that you don't have arbitrarily complex contracts and interactions between syntax and constraints. In particular, constraining methods using method calls introduces a lot of fuzziness around parameter type matching based on the assignability rules, in addition to the receiver addressability concerns.

It does seem unfortunate to lose the ability to accept a value and call pointer methods on it, though. I mean, strictly, you could enforce that *T have certain methods and accept T values as input, but that means T can never be a pointer type (*T = **U doesn't have any methods). If you want to accept both pointers and values with pointer methods, you're out of luck. I suppose, if embedding type parameters is allowed, you could use a generic wrapper type to mask the double indirection, like

type IndirectionMask(type T) { T }

and then talk about the method set of *IndirectionMask(T).

@rogpeppe
Copy link
Author

rogpeppe commented Sep 4, 2018

@stevenblenkinsop

I find it interesting that you used the implied-but-unnamed types introduced by := syntax to justify restricting short declarations in contracts.

That's just one example. There are implied-but-unnamed type in any multi-expression statement.
For example, in this example (original syntax):

contract T(a A, b B) {
    b = a.B().C().D()
}

the types returned by the C and D methods are implied but unnamed.

It does seem unfortunate to lose the ability to accept a value and call pointer methods on it, though. I mean, strictly, you could enforce that *T have certain methods and accept T values as input, but that means T can never be a pointer type (*T = **U doesn't have any methods). If you want to accept both pointers and values with pointer methods, you're out of luck.

This is the same restriction we have on interface values now, and it seems to work out OK. It would help to have some concrete examples where this would be useful, but I'm struggling to come up with any. Can you think of one?

@stevenblenkinsop
Copy link

@rogpeppe wrote:

This is the same restriction we have on interface values now, and it seems to work out OK. It would help to have some concrete examples where this would be useful, but I'm struggling to come up with any. Can you think of one?

It seems like the use case would be very narrow: you want to be able to rely on value semantics (i.e. assignment creates an independent copy of a value, which doesn't work if there's any indirection) but also need mutating operations (i.e. operations that take a pointer receiver). However, in that case, you don't need to be able to accept pointer types as the argument type, anyways, so it's fine that the method set of *T where T is the type parameter will be empty for pointer argument types.

Also, there's no way to require value semantics for the argument type anyways; a type can always include a pointer. If you need types to support copying behaviour, you'd probably need to specify a method for it.

The reason I brought it up was because one of the underlying assumptions of the original contracts proposal is that generic code should be written in the same way as non-generic code as much as possible. Of course, we've seen all the ways that creates complications, as the various conveniences provided to non-generic code just make it harder to reason about what code is doing in a generic context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment