Skip to content

Instantly share code, notes, and snippets.

@sirinath
Last active July 17, 2021 21:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sirinath/27dbe5d68f250ffc42add8123ed99458 to your computer and use it in GitHub Desktop.
Save sirinath/27dbe5d68f250ffc42add8123ed99458 to your computer and use it in GitHub Desktop.
Language for Data Centric Development and Systems Programming

* Language for Data Centric Development and Systems Programming

[TOC]

Overview

* language is to write correct high-performance code in processing large volumes of static, changing and streaming data. Currently many programming languages are not well geared to manipulate large and changing data required by modern big data tasks. Many languages used to write Big Data and Database systems use language which were not really designed to build such systems.

A modern built language should be usable in:

  • mass manipulation of data which is
    • streaming
    • changing
  • reasonably low level for systems programming
  • highly efficient and fast for
    • low latency
    • high throughput workloads

Type Systems

The type system would be statically typed and would include:

  • nominal typing
  • structural typing
  • range types

Type Definition

Type definition is by using type keyword. E.g.

type A

In the above case the type inherits the structure from {} with the nominative label A.

Multiple types can be defined:

type A, B, C

Each items can have own elements and common elements. E.g. when sub-typing is involved:

type A{def a : X = x}, B{def b : X = x}, C{def c : X = x} <:: Z <X>(x: X){def common : X = x}

Common elements appear towards the end.

When sub-typing is not involved

type A{def a: X = x}, B{def b: X = x}, C{def c: X = x} : <X>(x: X){def common: X = x}

Only one primary constructors can be specified. Multiple constructors can be simulated. E.g.

type A{def a : X = x}, B{def b: X = x}, C{def c: X = x} : <X <:: Number>(x  X){
   // Factory methods
   def .<X>(x, y : int), .Named1<X>(x, y: int) = <X>(x + y), <X>(x, y)
   def .<X>(x, y : int; z  long) = <X>(x + y, z :: int)
   def .Named2<X>(x, y: int; z: long) = Named1<X>(x + y, z :: int)
   
   // Non memeber constant
   def .c = 1
   
   // Non memeber functions
   def mAddP, mMinusP : <X>(p: X) -> X = common + p + c, mAddP<X>(-p)

   def common: X = x
   
   // Memeber functions
   def mAddP, mMinusP : <X>(p: X) -> X = common + p, mAddP<X>(-p)
}

def A.a' : Int = 1 // Associated with A

Type Operators

Following type operators are available:

  • : this is to define structural type alias
  • :: this is to define nominative type alias
  • ::: this is to asset structural or nominative type. This cannot be used in a type definition.
  • :> this is to define a structural super type of
  • ::> this is to define a nominative super type of
  • :::> this is to asset a structural or nominative super type of
  • <: this is to define a structural sub type of
  • <:: this is to define a nominative sub type of
  • <::: this is to asset a structural or nominative sub type of
  • <:> this is to asset a structural sub or super type of
  • <::> this is to asset a nominative sub or super type of
  • <:::> this is to asset a structural or nominative sub or super type of
  • & intersection type
  • | union or sum type
  • * product type
  • =:= test structural type equivalence
  • =::= test nominative type equivalence
  • >:> test structural super type
  • >::> test nominative super type
  • <:< test structural sub type
  • <::< test nominative sub type
  • in ordered set / range definition
  • # expand type structure

Atomic and Composite Types

Types can be atomic or composite:

  • Atomic types. E.g. Int. These are also language defined. E.g. type <32, BusSize, 0, PlatformEndiannes> Int this means type Int is 32 bits and aligned at BusSize bit boundary and stating at bit 0.
  • Composite types – made up of elements of atomic types
    • unnamed tuples with unnamed elements (tuples). E.g. Type definition: type A(Int, String); value definition def a: A = (1, "Bob")
    • unnamed tuples with named elements (records). E.g. Type definition: type B{ def rank: Int; def name: String }; value definition: def b: B = {rank = 1, name = "Bob"}
    • named tuples with unnamed elements (named types). E.g. Type definition: type ClassRank(Int, String); value definitions def c:: ClassRank = ClassRank(1, "Bob")
    • named tuples with named elements (names records). E.g. Type definition: type ClassRank`{ def rank: Int; def name: String}; value definitions def d :: ClassRank` = ClassRank`{rank = 1, name = "Bob"}

NB: the distinction is in the definitions where :: is used for nominative and : is used for structural definitions. If it is nominative the name is also used as part of the initialization, e.g., ClassRank(1, "Bob"), ClassRank`{rank = 1, name = "Bob"}.

Following typing relationships hold:

  • tuples are super type of records
  • tuples are super type of names tuples
  • records are super type of named records
  • named tuples are super type of named records
  • transitively tuples are super type of named records through records

The named part in named records and named tuples if the nominative portion of the types system while the rest is the structural part.

Atomic Types

Atomic types are language defined. E.g.

type <32, BusSize, 0, PlatformEndiannes> Int

Here <> appear next to the type keyword. 1st parameter is the number of bits followed by the alignment, offset and endiannes.

Tuples

Tuples are constructed as:

type Point : (Int, Int)
def a: Point = (1, 2)

They can be deconstructed as:

def (b, c: Int) = a

The data structure can be extended as follows:

type A : (Int, Int)
type B <: A (String) // <:: can be used for nominative inheritance. Now B will be(Int, Int, String)
type B <: A (String, #A) // Now B will be(String, Int, Int)

Named Tuples

Named tuples are constructed as:

type Point(Int, Int) // Shorthand definition
type Point :: | Point(Int, Int) // Alternative definition
def a = Point(1, 2)

They can be deconstructed as:

def (b, c: Int) = a

Or

def Point(b, c: Int) = a

In the latter case a should be named Point.

Records

Tuples are constructed as:

type Point : {def x, y: Int}
def a: Point = {x = 1, y = 2}

They can be deconstructed as:

def {b, c: Int} = a

or

def {b: Int = a.x, c: Int = a.y}

Named Records

Named tuples are constructed as:

type Point{def x, y: Int}
def a = Point{x = 1, y = 2}

They can be deconstructed as:

def {b, c: Int} = a

or

def {b: Int = a.x, c: Int = a.y}

or

def Point{b, c: Int} = a

or

def Point{b: Int = a.x, c: Int = a.y}

In the latter 2 case a should be named Point.

Type and Value Constructor Parameters

Type and value parameters are specified following . and |. E.g.

type Rectangle<T <:: Number>{def x, y : T}
type Square <:: Rectangle . <S>{def common :: T}
   | <T <:: Number>{def p : T} => <.T = T >{.x = .y = x} // T and p are parameters and not associated with the structure, while S and common are
def s = Square.<Int>{10}

Types can be extended and also multiple parameters supported

type A{def a :: String} . | {def x, y : String} => A{ a = x + y }
type B{b :: String} <:: A . 
   | {x : String, y : String} => A.{x, y}, B{ b = y }
def b = B.{"Hello", "World"}

You can have multiple parameters. They can also be named. E.g.

type C{c : String} . 
   | {x: String} => C{c = x} 
   | Alt{y : String} => C{c = Alt.y}
def c = C.Alt{"C"}

This is loosely analogous to constructors in other languages.

Type Relationships

Following type relations can be defined

  • Aliases
  • Subtype
  • Supertype

Type Aliases

Type aliases are defined using the :, :: or ::: operators.

type B : A // Type B is structural type which is an alias to the structure of A. If A is nominative the nominative component is ignored.
type C :: A // Type C is nominative type which is an alias to the structure and name of A if A is also a nominative type
type X<D ::: A>(d: D) // Type D is structural or nominative type of A

Subtype

Sub types are defined using the <:, <:: or <::: operators.

type B <: A // Type B is structural sub type of A ignoring the nominative part of A if any
type C <:: A // Type C is nominative sub type of A
type X<D <::: A>(D) // Type D is structural or nominative sub type of A

Multiple subtypes can be defined as:

type D{def p, d : Int}, E{def p, d : Int}, F{def p, d : Int} <:: G, H, I . {common : Int}
   | {common : Int} =>  {.common = common}

Supertype

Super types are defined using the :>, ::> or :::> operators.

type B :> A // Type B is structural super type of A ignoring the nominative part of A if any
type C ::> A // Type C is nominative super type of A
type X<D :::> A>(D) // Type D is structural or nominative super type of A

Bi Variant Types

Bi variant types are defined using the <:>, <::> or <:::> operators.

type X<B <:> A>(B) // Type B is structural sub or super type of A ignoring the nominative part of A if any
type X<C <::> A>(C) // Type C is nominative sub or super type of A
type X<D <:::> A>(D) // Type D is structural or nominative sub or super type of A

Structural and Nominative Types

When structural types or variables are defined the nominative component is dropped, i.e.,

  • named tuples unnamed elements become unnamed tuples with unnamed elements
  • named tuples with named elements become unnamed tuples with named elements

Tuples and Records

Tuples are a super type or records with the same type signature.

Discriminated Unions

Named Tuples and Named Records can be used in Discriminated Unions.

type Pet{def name: String} :: | Cat | Dog // name is a common field

If types are pre defined then type unions can be used for similar effect. E.g.

type Cat{def name : String}
type Dog{def name : String}
type Pet :: Cat | Dog

Each of the types can also be extended. But if it is sub-typing all union members can only have sub-typing. Like wise when super-typing.

type PetRecord{def age : Int} <:: Pet, // name (from the previous definition) and age are common to all and the super-type also has name which is initialized.
   | PetCat <:: Cat 
   | PetDog{def pedigree : String} <:: Dog // pedigree is a new field
   | PetParrot

Super-typing:

type PetRecord{def age : Int} ::
   | PetCat
   | PetDog{def pedigree : String}
   | PetParrot
   
type Pet{def name: String} ::> PetRecord, // A name field is added
   | Cat ::> PetCat
   | Dog ::> PetDog

Also, each type can have shared type and value parameters or individual parameters.

type X<T1, T2>{def p1: T1} :: // T1, T2 and p1 belongs to X and is common to A and B also
   | A<T3>{def p2: Int}       // T3, p2 belongs to A
   | B{def p3: String}        // p3 belongs to B

In case of structural types. Elements should also be different. E.g.

type P :
   | A{def a: Int}
   | B{def b: Int}

Following is not supported it cannot be matched on structure alone

type P :
   | A{def p: Int}
   | B{def p: Int}

In all cases, the definition of each item is preceded by |.

Type Checking

Each type of the language will have structure and an optional symbol attached. In nominative type checking symbol hierarchy and structure is checked. This is only applied to named tuples and named records. In structural type checking, of the purpose of this language, the symbol hierarchy is not checked, but the structure is checked. This is applied to records and tuples.

Nominative type system is implemented by maintaining a hierarchy of symbols associated with the type names. In type checking, it is checked to see if the type is a structurally a subtype and the symbol participates in the hierarchy. E.g. type B <:: A will record that symbol B lower in the symbol hierarchy than A.

In structural type checking, the structure is checked to see if:

  • Subtype has 0 or more new field than the supertype
  • All subtype field are the same type or subtype of the respective supertype fields
  • All subtype function arguments are the same or subtypes, while the return values are of the same or supertype

Nominal Typing

The type system will support nominal typing.

type B <:: A

The above denotes B is a subtype of A. Similarly, the above relationships can also be denoted as:

type A ::> B

The latter can also be used to rewrite type hierarchies. Though this might a not viewed well in an OOP context, this many of the use cases which Bytecode generation libraries facilitate. This also gives the language a dynamic language feel, due to flexibility and also avoids certain pitfalls in supporting mixed structural typing and nominative typing.

Similarly, a type can be a subtype of more than one other type. This is akin to multiple inheritance in OOP.

type C <:: A & B

More explicitly you can specify how each type is treated.

type C <:: { #A } & B // C inherits only the structure of A but inherits the structural and labelling of B

Multiple inheritance is denoted by a & separated list of types.

The following scheme can be used to define structure associated with a type:

type A {
   def a : Int
}

Similarly, the structure can be extended as follows:

type B <:: A {
   def b : Int
}

Or

type B {
   def b : Int
} <:: A

Or

type B <:: A . {
   def b : Int
}

In the above case B inherits the structure of A, the nominative relationship and the following structure:

{
   def b : Int
}

Type alias is defined as:

type X :: Y

This means X is of type Y.

NB: When :: appear towards the right in the RHS this is casting or type cohesion. E.g.

def a = x :: Int

Structural Typing

Structural typing works similar to nominative typing. E.g.

type B <: A

The above defines B with the structure of A. In the case,

type C <: B

then C will have the structure from B. In the context of a language like Java, C#, etc. this will be like copying all the body of class A and B into C.

The following scheme can be used to define structure associated with a type:

type A {
   def a : Int
}

Structural type alias is defined as:

type X : Y

In this case X has the same structure of Y but not the sub typing relationships.

Interaction Between Typing Systems

Both structural and nominative types can be mixed in defining other structural and nominative types. E.g.

type A {
   def a : Int
}

type B {
   def b : Int
}

type C <: A // C only gets the structure of A    
type C <:: { #A } // C only gets the structure of A, hence the same as above
type D <:: B // D gets the structure and establish the nominative hierarchical relationship between B and D

Extending Types

type X : { // No sub typing relationship, hence structure is inherited
   #A // expand A
   #B // expand B
}

type Y <: A, B { // Sub typing relationship and structure is inherited 
   def f1: Int
   #B // expand B
   def f2: Int
   #A // expand A
   def f3: Int
}

ADT Types

type A
type B <: A
type C <: A
type X : A | B | C // Sum type
type Y : A * B * C // Product type
type Z<S, T> : S | T

Intersection and Union Types

type X :: A & B & C // X is a sub type of A, B & C
type Y : A & B & C // Y shared structure among types A, B & C
type Z :: A | B | C // Equivalent to sum type: A + B + C

Range Types

type X in 1..1..10 // Ordered range of values between 1 and 10 both inclusive with step 1. This can be also written as: 1..10
type Y <:: X in 2..9 // Any range type between 2 to 9 inclusive of type X or Y can be assigned
type Z <: X in 2..9 // Value of range type X between 2 to 9 inclusive can be assigned regardless of type
type A in P, Q, R, S, T // Ordered set of values
type B in P | Q | R | S | T // Unordered set of values
type C in 1 | 2 | 3 | 4 | 5 // Unordered set of values
type D in 1.|.1.|.10 // Unordered set of values in range 1 to 10 inclusive with step 1. This can be also written as: type D of 1.|.10
type E <: Int in Int // All numbers in the range of Int. Same as: type E <: Int 
type F <: E in 1... // All numbers in the range of Int from 1
type G <: Int in ..|.. // All numbers in the range of Int as nominal numbers
type H <: G in 1.|.. // All numbers in the range of Int from 1 as nominal numbers
type I <: A in P..S // Ordered subrange of A
type J <: A in P.|.S // Unordered subrange of A

Tag Types

Tag types associate value of other types.

type A
def a = 1
type @X = 1 // Literal value tag
type @Y = A // Type tag
type @Z = a // Variable value tag. Variable should be know at compile time.
type I : Int @(1) // Int type with tag
def u : Int @X = 1 @X // Equivalent to def u :: Int @(1) = 1 @(1)
def v : Int @(1) = 1 @(1) // Add tag to type check. Or def v : Int @(1) = 1 : Int @(1)
type @P = _ > 5 // Value tag who's value is greater than 5
type @Q = ? <: A // Type tag with a sub type of A
type @R = ? <: A & _ > 5 // Multiple tags. All tags must be satisfied.
type @S = A & !1 // Must exclude values tagged as 1
def x : Int = v ~@(1) @(A, 2) // remove tag 1 and retag with A and 2
type J : Int @() // No tags allowed
def y : Int = v ~@ // remove all tag 
type @T = A | 1 // Must have tags 1 or A
type @U = ? | _ // Can have any value or type tag
type @V = A & 1 | 3 // Must have tags A and 1 or 3
type @H = @X, 1, 2, 3 // H inherits tags for X plus 1, 2, and 3

If @ appears at the start of type definition it is treated as and annotation. E.g. @my def f = 1. If it appears at the end it is treated as a tag type like in the above examples. Annotations are not used for type checking but all tags should type check for assignment.

In tag type definitions logical operators (&, |, !, ^) are allowed. The values which are assigned can have multiple tags. They are added by one at a time (@A) or as a list (@(A, 1)). Value tags can be removed (~@(A, 1)).

Constraint Types

Constraints can be imposed through the type system.

type A : Int > 5 // A in an Int greater than 5
type B{def b : Int}
type C : B{def  > 5} // C is of B with property b greater than 5

Type Parameters

Type parameters can be values or types. E.g.

Wild Card Types

Wide card types are denoted by ?.

def a : A<?> // ? denotes any type
def b : B<X: ?, X, Y> // X is any type. You just have to mark only on parameter

Existential Types

Existential type can be used by having ? wild cards in the RHS of the type definition.

type C : A<?> // existential type
type D : B<X: ?, X, X> // existential type with constraint that all parameters are the same 

Kinds

type E : A<?<...>> // ?<...>  a type with a type constructors which takes multiple arguments
type F : A<?<?, ...>> // ?<?, ...> a type with a type constructors which takes one or more arguments
type G : A<?<?>> // a single type with a type constructors which takes another type taking zero arguments

Active Objects

Each non constant value which shared among threads is treated as and active object. There 3 elements in the active object stream:

  • Inputs
  • Operations
  • Outputs

Variables

Variables are defined with def. E.g.

def x : Int = 1
def y : Int > 5 = 6
def a : A { a >= 6 } = A { a = 6 } // a should be of type a with field a which should be greater or equals 6
def v : Vector(length == 10) = p // property length must be 10
def b :> B { a :: Int, b :: Int > 0 } = C { a = -1, b = 1 }

Functions

Functions takes a named or unnamed record or tuple to another named or unnamed record or tuple.

type f1<T>(a: T) -> T // A simple function
type f2 <:: f1 // f2 is a sub type of f1, i.e., it takes a T or sub type of T to T or a super type of T

Pattern Matching

_ is used to search for a value in the environment. Similarly, [...] can be used to search in the declaration scope, e.g., [a*] a variable stating with a in local scope or a[a] a variable a within structure a.

Within a symbol * means zero or more and ? exactly 1 character.

def x : Int = _ :: Int // find last defined Int in scope
def x : Int = _ // find last defined Int in scope. Int is inferred.
def x : Int = (_ :: Int)[0] // find the 1st defined int in scope
def x : Int = _[0] // find the 1st defined int in scope. Int is inferred
def x : Int = (_[`a*] :: Int)[0] // find list value this is an Int and starts with a in the environment and select the 1st
def x : Int = _[`a*][0] // find list value this is an Int and starts with 'a' in the environment and select the 1st. In is inferred.
def y : Int[] = a[> 5] // find elements in 'a' with indices greater than 5
def y : Int[] = a[6...] // find elements in 'a' with indices greater than 5
def y : Int[] = a[6..10] // find elements in 'a' with indices between 6 and 10 inclusive
def y : Int[] = a[a > 5] // find elements in 'a' with values greater than 5 
def z : A = _[{ xxx :: Int }] // find unnamed record with field xxx which is an Int
def z : A = _[Person{ age :: Int == 10 }] // find named record with field age which is an Int and equals 10
def a : String = (_ : Person(name == X : String == ~"M")) & (_ : Person(father == X, sex == 'M')) => X[0] // find a the name of the 1st person found who is male and the father's name starts with "M"
def a : Person = (Y : Person(name == X : String == ~"M")) & (Y : Person(father == X, sex == 'M')) => Y[0] // find a the 1st person found who is male and the father's name starts with "M". Y is interpreted as the search value as the expression is in the value position and Y appears on the RHS of => operator
def b : Person(sex =:::= X) => X // find the type of field sex in Person and define as type of b
def c : Int = x :: Int => 1, x :: String => 2, y :: Int => 3 // if x is an Int valuate to 1, else if x is a String evaluate to 2, else if y is an Int evaluate to 3

Annotations

Annotations can be applied on:

  • types
  • variables
  • closures

Quotations

Quotations are denoted by either $" ... " or ${ ... }.

Annotations

Annotations processing can be used to extend the language.

Pre-defined annotations

System defined annotation generally restrict the language as means to give an extra level of safety.

Access Control

The language itself does not implement access modifiers but implemented through annotation. All elements in structures are public by default.

  • @access(grant | limit) - limit or grant access to a class of types
    • @access(grant = ["A", ":> B"]) grant access to A or sub type of B but limit access to any other type
    • @access(grant = ["A", ":> B"]) limit access to A or sub type of B but grant access to any other type

Object Capabilities

@thosakwe
Copy link

I'm curious abut some of this language's design decisions. For example, why the use of symbols like >: to denote type relationships? I think that having too many symbols makes a language hard to read, and also greatly increases the learning curve. Same with A @B(), etc.

Also, in the case of range types, say Int > 5, it becomes impossible to determine whether the literal 6 is an Int, Int > 1, Int > 2, Int > -23456, etc. This then necessitates type annotations, which as mentioned before, are confusing and somewhat cryptic.

Also, why the need for $ when creating strings? I think the symbol " itself is already enough.

Next, I don't think member access should be facilitated through annotations; it's not consistent with the other design decisions, which would also increase learning curve.

Overall, I'm wondering, how does this design accomplish the goal of both data manipulation and systems programming effectively? The concepts introduced are very high-level, which would require much overhead for a systems programming language, and generally the constructs in it (like a very dense type system) are not really useful in such a context. I'm also not sure why the features outlined for this language would be very useful for data manipulation at all. I find that data manipulation is more of an issue of library design and availability than a language issue.

@dumblob
Copy link

dumblob commented Jul 17, 2021

Agreed @thosakwe. Same questions and wondering on my side.

@thosakwe
Copy link

I don't remember commenting on this, but disregard whatever I said here. I've learned a lot since then, and it's possible much of my opinion has changed.

Either way, if you want to make a compiler - go ahead, there's no downside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment