Skip to content

Instantly share code, notes, and snippets.

@warpfork
Last active December 10, 2018 19:01
Show Gist options
  • Save warpfork/6df17e791936d1f9b0d5e5483678c8bf to your computer and use it in GitHub Desktop.
Save warpfork/6df17e791936d1f9b0d5e5483678c8bf to your computer and use it in GitHub Desktop.
schemaaee
// Prelude:
// - "essential kinds" are map, array, string, bytes, boolean, integer, etc.
// - "perceived kinds" extends essential kinds to include struct, union, and enum.
// - "scalar kinds" are the essential kinds which are represented as a single token
// (string, bytes, boolean, integer); "vector kinds" are map and array.
// All values can be distilled down to a representation in a sequence of essential kinds.
// Perceived kinds are not inherently representable as essential kinds; we'll use
// them in a moment in constructing our type system.
// Vector kinds are interesting to note because when constructing types out of them,
// such types will be required to have more type parameters (like perceived kinds).
//
// Types can be defined which give additional structural requirements composed out
// of the kinds and other types.
// - "essential types" is a term that can be used to refer to types which are just
// wrappers around the scalar essential kinds (e.g. Integer, Bytes, etc).
// Essential types are always predeclared for reference in every type universe.
// - "types" are defined as either aliases of some existing type, or as a new
// type of one of the compound perceived kinds (structs, unions, maps, arrays,
// or enums), each of which must be defined as compositions of other types.
//
// In the syntax below:
// 1. all of the perceived kinds are keywords which denote the start of a type declaration.
// additionally, the keyword "type" denotes an name a new type which is
// structurally identical to an existing type.
// 2. the name of type follows keyword 1.
// 3. additional parameters may follow the type name -- for example, unions must
// declare what strategy transforms them into essential kinds, as more than
// one valid choice exists.
// 4. the type declaration follows parts 1-3 after an equal sign.
//
// The type declaration content varies by kind (e.g. unions and maps use visually distinctive syntaxes).
// Types which have a vector essential kind (arrays, maps) have terse declaration syntaxes:
// `[valueType]` defines arrays, and `{keyType:valueType}` defines maps.
// Struct fields may be declared to contain any named type, or alternatively may
// be declared to contain an anonymous vector type for which the definition is inlined
// (e.g. arrays can be defined inline for a struct field's type: `fieldName [Type]`).
// Vector kind types may also use inline definitions (e.g. `map Foo {Bar:[Baz]}`).
// Struct fields, map values, and array values can be defined as "nullable".
// Struct fields can also be defined as "optional" (distinct from nullable: the
// key may be absent, but if present, the value must be non-null).
//
// Some types may be annotated with an Advanced Layout parameter (for example, "HAMT").
// The syntax of such a type still indicates what essential kind it is (map or array).
// (-- Future: bytes?)
// The advanced layout is a special value which must be known to the parsing program.
// Types with an advanced layout are still manipulated with the semantics usual for
// their essential kind (e.g. `{String:String}<HAMT>` is still fundamentally a map
// and handled as such), but when serialized to an actual stream of essential-kind
// tokens, may have a significantly more complex structure and span multiple
// nodes (connected by CIDs).
// Advanced Layout types may not be used in nested type definitions (e.g., a
// struct field may have a type `{String:String}<HAMT>`, but *not* `[{String:String}<HAMT>]`).
// There is no reason for this limitation other than limiting syntactic complexity.
// Often, types annotated with an Advanced Layout will still use a simple map or array
// rather than linked objects when the content of the value is small enough; but in
// general this is an optimization based on the assumption link traversal and extra full
// nodes has some cost, and should not be relied upon. Not all Advanced Layouts will do this.
//
// Finally (though this was probably already obvious), it should be stated that
// the entirety of this type system is based on *structural* typing -- not nominative typing.
// (However, nominative typing may be approximated using unions (either keyed,
// envelope, or inline unions can be seen as representing a nominative style).)
union UnixfsNode keyed =
// union content always defined in "{TypeName} {opts...}" order.
| UnixfsFile "f"
| UnixfsDir "d"
| UnixfsSymlink "l"
// So for example:
// {"f":{"name": "the-file-name", "data": ...}} -- is a file.
// {"d":{"name": "the-dir-name", "entries": ...}} -- is a dir.
// If instead we declared this as `union UnixfsNode envelope "type" "msg"`...
// {"type": "f", "msg": {"name": "the-file-name", "data": ...}} -- this would be the file.
type Fname = Bytes // The full byte filename.
type Flabel = String // The key used (may be distinct from Fname; is restricted to UTF-8 NFC).
struct UnixfsFile = {
name optional Fname
data [SeekableDataElement]
size Integer
}
union SeekableData kinded =
| Bytes bytes
| SeekableDataNode map
// quick note on how kinded unions work:
// "map" is inferrable from SeekableDataNode, but for clarity.
// note that it's "map" and not "struct"; kinded unions have to work based on *essential* kinds.
struct SeekableDataNode = {
start Integer
length Integer
content [SeekableData]
}
struct UnixfsDir = {
name optional Fname
entries {Flabel:UnixfsDirEnt}<CHAMT>
size Integer
}
type UnixfsDirEnt = CID:UnixfsNode
// ... should we have a marker for the type expected on the other side? yes? how about this hint after the colon?
@warpfork
Copy link
Author

warpfork commented Dec 5, 2018

some feedback to incorporate on a next draft:

  • Make less of a big call-out of "essential types" in the type system docs... I don't think they're actually Interesting in any way, I just wanted to mention that they're types that are always defined. Having them moderately simplifies (I hope) some of the syntax, because this means (outside of the prelude which introduces the essential types, which is out-of-band Magic) you can always define all new types in terms of existing type names (you never declare a new type as a name for an essential kind). But all this is probably better documented in a section about "prelude" things rather than giving a name to "essential types".
  • "essential kinds types" is probably something a.k.a. "rank 0" types in some communities. We should put a note in the sidebar of our docs about this. But it's a sidenote and not more: it's true that they're "rank 0", but that's not the only reason we gave them a concrete name: they're also the innately serializable types kinds, which makes them categorically different.
  • More documentations about unions should probably include some sidebar notes about how we're always, always talking about "open" unions. IMO this kind of goes without saying in a structurally-typed system which is being applied to data which may have been authored outside the system or under different rules, but it should still be... well... said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment