Skip to content

Instantly share code, notes, and snippets.

@Hugobros3
Last active April 9, 2020 14:02
Show Gist options
  • Save Hugobros3/8b704c0b846cd0147a529000f919c00d to your computer and use it in GitHub Desktop.
Save Hugobros3/8b704c0b846cd0147a529000f919c00d to your computer and use it in GitHub Desktop.

b0neless

b0neless is a pseudo-language/idea I had following reading some interesting discussions on discord. The idea is to completely seperate the idea of types and data storage, and enforcing that by making those two seperate language constructs.

In b0neless, a type is a name, an optional list of type parameters, and has a bunch of properties you can query, given one instance of the type. Properties are pure, and lazily evaluated (potentially extremely so as will become clear later).

type List[T: type] {
	size: Nat,
	access: Nat -> T,
}

Here we have an extremely minimal type "List" that declares the interface for a read-only list. On a List instance, can query the size and access an element in the list. Note the "Nat" return value. Nat is a type too, a built-in one. It stands for natural integers, including zero. There are a few other built-in types, some of which will be subtypes of others:

builtin type Real;
builtin type Int;
builtin type Nat; // This includes zero
builtin type Pos; // Only positive numbers, no zero
builtin type Text;

There is an important thing to point out at this point: None of these types are storable, you can describe them and use them in code, do stuff on them, but at no point can there be such a thing as a Nat in the computer memory. Nat just says "the set of all natural numbers" and says nothing about storage, or practical limits to range. A type is not storable.

Introducing datatypes:

datatype vector[dt: datatype] = struct {
	alloc: *dt,
	size: usize,
}

Datatypes are what you're familiar with: C-style data storage, with structs, unions and good old pointers. Datatypes can have type parameters too. Here you see a type parameter for another datatype called 'dt', and built-in datatype usize makes an appearance. Note the convention: Types start with an uppercase letter, while datatypes don't. This helps differentiate between the two easily.

Now for the interesting bit: a datatype can implement the interface defined by a Type:

builtin datatype usize;
builtin datatype u32;
builtin datatype f32;

impl List[T] for vector[dt] with Nat == usize, T == dt {
	def size := self.size;
	def access := index: usize, item: T => {
		item := self.alloc[index];
	};
}

There is a fair few things to unpack here: we steal Rust's impl syntax and define List[T]'s properties for a vector[dt]. But in order to have something we can actually use in code, we need to bind all types to actual datatypes, this is what the with keyword is for. We bind Nat to built-in datatype usize, therefore setting once and for all the range of natural numbers in that context to [0; 2^32[ or [0; 2^64[ depending on architecture. We also bind T to dt, meaning whatever dt is, it must implement T's properties.

This is the key idea here: write code out of abstract types and higher order functions, very much like functional programming, and then provide the physical data structures as something you bind to logical types and algorithms.

Here is another example:

def map : List[A] -> (A -> B) -> List[B] := { a, f, b =>
	b := List[B] {
		size: a.size,
		access: index: Nat => f(a.access(index))
	};
}

The map function works as you'd expect: you feed it a list, a functor and you get back another list. Note that b is actually our return value, and therefore by assigning it we're binding something to the return value. What is it that we are returning ? It's an implementation for List, that we define anonymously here. But there's no carrying datatype ?! Indeed, what we are doing is not so much returning something as it is binding the description of a List[B] to the return value.

What will actually happen depends on what is done at the call site. For example, maybe all the callsite was interested in was getting the size of the new list after mapping it (for some weird reason!):

let list: vector[f32] := // doesn't matter
let new_list_size: usize := map(list, v => { v * 2.0 }).size;
// Can be rewritten to:
let new_list_size: usize := List[Float] {
		size: list.size,
		access: index: usize => list.access(index) * 2.0
	}.size;
// which simplifies to:
let new_list_size: usize := list.size;

This way of returning stuff by binding it means it's perfectly legal to return something that's not storable, as long as you don't try to store it (duh). Sometimes you might want to hold on symbolically to an abstract type instance, because for example you need to query multiple of it's properties, this can be done by using the def keyword inline:

 // assuming t1 is a datatype implementing T1 of course. Same for t2/T2 !
def pair: Pair[T1, T2] := get_pair();
let a: t1 := pair.first; // t1 gets 'bound' to whatever pair.first describes, triggering eager evaluation of the property
let b: t2 := pair.second;

Constructors

An important piece of the puzzle for how this works is constructors. Constructors are defined as a types. There is a default constructor:

// Basic, no-arguments ctor
type Constructor[datatype dt] {
	new: dt
}

which may be implemented like for any datatype so:

impl Constructor[vector[dt]] {
	def new := ret: vector[dt] => {
		ret := {
			alloc := null,
			size := 0
		};
	};
}

Note the impl isn't bound to a datatype here, since if it was we'd need an value of that datatype to call new on, kind of not what you want for a constructor. Let's move on to a more interesting constructor type, the AssignCtor:

type AssignCtor[type Source, datatype target] {
	new: Source -> target
}

This one has a Source type. Since Source is not a datatype but just a type, any instance of it can be bound to a fresh target value, this is fundamentally how abstract type instances can get coerced to something storable: you call a datatype assignment constructor on them. Here is how you'd initialize a vector with a List:

impl AssignCtor[List[T], vector[rt]] with rt == T, Nat == usize {
	def new := source: List[T], ret: vector[dt] => {
		ret := {
			alloc := malloc(sizeof(rt) * source.size),
			size := source.size
		};
		
		foreach (i: usize in 0 .. source.size) {
			// Binds each element of the source array to the physical vector, triggering eager evaluation of each element
			ret.alloc[i] := source.access(i);
		}
	};
}

TODO: think through destructors as well

Ranges

Ranges can be handled elegantly in b0neless:

type Range[T: type] {
	start: T,
	end: T,
}

You can use infix operator '..' to create a range:

// Constructs a range using two copies of T
def infix `..` : T -> T -> Range[T] := start, end, range => {
	range := Range {
		start : start;
		end : end;
	}
}

Of course you can't store a Range[T]. But you can use the information it stands for, for example in a foreach expression:

foreach(i: usize in 0 .. 600) {
	print(i);
}

What actually happened is, a Range[Nat] type instance was defined, and it's start and end properties were then bound to auto-genererated start/end values by the compiler:

def _range := Range[Nat] {
	def start : Nat := 0; // as in, literally the natural number zero, not any concrete signed/unsigned/fp representation of zero
	def end : Nat := 600; // ditto
};
let _i_start: usize := _range.start; // Binds litteral '0' to a usize (creating something like 0x00000000 in actual memory)
let _i_end: usize := _range.end;
for(let mut i: usize := _i_start; i < _i_end; i := i + 1) {
	print(i);
}

AoS vs SoA example

One of the great advantages of something like b0neless is how neatly you can abstract over concepts such as "a list of two-component vectors". Assuming a simple representation of Vec2f as:

type Vec2f {
	x: Float,
	y: Float
}

One can just write a type such as: List[Vec2f] and there is nothing said about how the data is actually stored, or if is even stored (it could be an anonymous type instance instead defining the contents of the list programatically. Here are two examples of SoA and AoS datatypes for List[Vec2f]:

AoS

datatype vec2f = struct { x: f32, y: f32 }

impl Vec2f for vec2f with Float == f32 {
	def x := self.x;
	def y := self.y;
}

// vector[vec2f] implements List[Vec2f] by way of vector implementing List

SoA

datatype many_vec2fs = struct {
	x : vector[f32],
	y : vector[f32],
	size: usize
}

impl List[Vec2f] for many_vec2fs with Float == f32, Nat == usize {
	def size := self.size;
	def access := index, item => {
		item := Vec2f {
			def x := 'self.x[index]; // ' escapes one level of scope, therefore referring to the 'x' field in datatype many_vec2fs
			def y := 'self.y[index];
		};
	};
}

Inheritance

Types can inherit other types. This is done using the refines keyword:

type MutableList[T: type] refines List[T] {
	set: Nat -> T -> Self // Self makes it so refined types will return themselves instead of MutableList
}

Note that such a list has mild implicit requirements on being backed by memory somewhere (otherwise set would have to be a no-op)

Types can be abstract (you can't construct instances of them, you need to use one of their refinements) and final (you can't refine them anymore).

abstract type Option[T: type] {}
final type Some[T: type] refines Option[T] {
	get: T
}
final type None[T: type] refines Option[T] {}

Mutability

let can be followed by mut. In that case, you can re-bind the value (otherwise it's treated as a constant). Since properties on types are pure, mutable operations in those are modeled in terms of returning a modified copy of the self. There is some syntactic sugar to help with this:

type Counter {
	increment: Self
}

datatype counter = struct { data: u32 }

impl Counter for counter {
	// By naming the output/return bind 'self' it is pre-initialized to be a copy of 'self', which is then shadowed by it (you cannot modify the actual self, that would be breaking the pure properties contract !)
	def increment := self => { self.data := self.data + 1}
	// this is equivalent to doing:
	// def increment := ret => { 
	//	  ret := self;
	//	  ret.data := self.data + 1;
	// }
}

let mut cnter: counter = { data: 0u32 };
cnter~increment;
// is equivalent to
cnter := cnter.increment;

Metaprogramming

Type parameters serve as powerful generics. Type parameters can also be Type instances or datatype values. For example, this is legal:

datatype tree_node[arity: Nat] = struct {
	children: [Self; arity];
}

let node : tree_node[2];

And so is this:

datatype hash_map[key_t: datatype, value_t: datatype, key_hasher: key_t -> usize] = struct {
	elements_count: usize,
	internal_size: usize,
	keys: *keys_t,
	values: *value_t
}

def find : hash_map[kt, vt, kh] -> kt -> *vt := map, key, ret {
	let hashed := kh(key);
	if(map.keys[hashed % map.internal_size] == key) {
		ret := &map.values[hashed % map.internal_size];
	} else { ret := null; }
}

Recap of Types vs datatypes

Types datatypes
naming conv. CamelCase snake_case
can have type parameters yes yes
has physical size no yes
can inherit yes no, but can contain other datatypes
can contain functions only pure ones only fn ptrs
Binding declaration keyword def let
meaning of 'instance' (pure) expressions describing how to evaluate the properties values corresponding to a certain arrangement of bits in physical memory
Polymorphism static no

Runtime polymorphism

There are no vtables in b0neless. You can put function pointers in datatypes though to achieve a similar effect (dynamic dispatch).

Lambdas / local fns

There is a difference in the type system between a Function, ie something like Nat -> Nat -> Text and a function pointer, ie fn(u32, u32) -> string. For one, Functions are like expression, as they are always pure. Function pointers might not be pure ! You can store fn pointers, but not functions, at least not directly. In fact Functions are Types, while function pointers are datatypes. Delegates are function pointers with a context pointer attached.

let a := 5; // compiler will always assume a == 5 from now on (it's UB to change a)
let mut b := 7; // compiler will never assume it knows b

def fn1 : Nat -> Nat -> Nat := a, b => a * b;

let c := fn1(a, b); // resolved at compile-time, and actually probably executed at compile-time too !

let fnptr1: fn(u32, u32) -> u32 := &fn1; // legal because fn1 is pure, this just creates an extra normal function in the binary and gives the address to it

def fn2: Nat -> Nat := b => fn1(a, b); // legal, applies partial evaluation using the a = 5 constant from the scope.
def fn2: Nat -> Nat := a => fn1(a, b); // illegal, and dangerous, this is not pure ! (b is not a constant)

let fnptr2: fn(u32) -> fn(u32, u32) // illegal because you can't return a fn(...) as a value.
let fnptr2: fn(u32) -> u32 := b: u32 => fn1(a, b); // legal, applies fn1 parially with 'a' known and stores the specialized fn in the binary

let fnptr3: fn(u32) -> u32 := a: u32 => fn1(a, b); // fails, as a this would need a pointer to capture the current stack frame and reference b, OR would need an extra argument to store b.

let fnptr4: delegate(u32) -> u32 := a: u32 => fn1(a, b); // legal too, fn1 is resolved at compile-time and inlined, and this results in a lambda capturing the current scope by reference

More thoughts on this later... (not essential to the core idea)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment