Skip to content

Instantly share code, notes, and snippets.

@brosenan
Created October 19, 2018 05:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brosenan/f442472733b8af1b26baf8f67b51628b to your computer and use it in GitHub Desktop.
Save brosenan/f442472733b8af1b26baf8f67b51628b to your computer and use it in GitHub Desktop.
Example code in an imaginary, linearly-typed functional programming language.
;; This is not really Clojure code. I used the .clj extention since it provides the closests syntax highlighting to this imaginary language.
(defabst constantly [?t] (-> ?t (-> ?t))
(fn [c]
(fn []
(copy c))))
(deftype List [?t]
(::nil)
(::item ?t (List ?t)))
(deftype Tree [?k ?v]
(::empty)
(::node ?k ?v (Tree ?k ?v) (Tree ?k ?v)))
(defabst map [?c1 ?t1 ?c2 ?t2]
(-> (& ?c1) (& (-> (& ?t1) ?t2)) ?c2))
(definstance map [?t1 ?t2] [(List ?t1) ?t1 (List ?t2) ?t2]
(fn [l f]
(case l
(& ::nil) ::nil
(& ::item v r) (::item (f v) (map ar f)))))
(definstance map [?k ?v1 ?v2] [(Tree ?k ?v1) ?v1 (Tree ?k ?v2) ?v2]
(fn [t f]
(case t
(& ::empty) ::empty
(& ::node k v l r) (::node k (f v) (map l f) (map r f)))))
(defabst reduce [?c ?t]
(-> (& ?c) (& (-> ?t ?t ?t)) ?t ?t))
(definstance reduce [?t] [(List ?t) ?t]
(fn [l f v0]
(case l
(& ::nil) v0
(& ::item v r) (reduce r f (f v0 (copy v))))))
(definstance reduce [?k ?v] [(Tree ?k ?v) ?v]
(fn [t f v0]
(case t
(& ::empty) v0
(& ::node k v l r) (f (reduce l f (copy (& v0))) (reduce r f v0)))))
(def range (-> (& Int) (& Int) (List Int))
(fn [b e]
(case (compare b e)
:greater ::nil
:equals ::nil
:smaller (::item (copy b) (range (+ b 1) (copy e))))))
(deftest
(= (let [r (range 10)
m (map r (& (fn [x] (+ x 2))))
sum (fn [x y] (+ (& x) (& y)))]
(reduce (& m) (& sum) 0))
(+ 20 (/ (* 10 11) 2))))
(defabst filter [?c ?t]
(-> ?c (& (-> (& ?t) Bool)) ?c))
(definstance filter [?t] [(List ?t) ?t]
(fn [l f]
(case l
::nil ::nil
(::item v r) (case (f v)
:true (::item v (filter r f))
:false (filter r f)))))
(defabst merge [?k ?v] (-> (Tree ?k ?v) (Tree ?k ?v) (Tree ?k ?v))
(fn [t1 t2]
(case t1
::nil t2
(::node k1 v1 l1 r1) (case t2
::nil t1
(::node k2 v2 l2 r2)
(case (compare k1 k2)
:greater (::node k1 v1 (merge l1 t2) r2)
:smaller (::node k1 v1 l1 (merge r1 t2))
:equal (::node k2 v2 (merge l1 l2) (merge r1 r2)))))))
(definstance filter [?k ?v] [(Map ?k ?v) [?k ?v]]
(fn [t f]
(case t
::empty ::emtpy
(::node k v l r) (case (f [k v])
:true (::node k v (filter l f) (filter r f))
:false (merge l r)))))
@brosenan
Copy link
Author

brosenan commented Oct 19, 2018

Some explanations:

Basic Forms

  • def: Define value. This could be a constant or a function. Takes two positional arguments: the type signature, and the value.
  • deftype: Similar to data in Haskell, defines an enumerated type / ADT. Takes a name for the new type (PascalCase), a vector of type arguments (camelCase, prefixed with a '?'), and one or more forms, each being a list starting with a fully-qualified keyword, followed by zero or more types for the parameters.
  • defabst: Defines an abstract value (a function, for all practical purposes). Defines a name (snake-case), a vector of type parameters, a type signature based on these parameters, and optionally, a default value.
  • definstance: Defines a special case for an abstract value. Takes the name of the abstract value to specialize, a vector of type parameters, a vector of the abstract value's type parameters, specialized for this instance, and the specialized value.
  • deftest: Defines a unit-test. Takes a Boolean expression that needs to be true for the test to pass.
  • fn: A lambda abstraction. Takes a vector of arguments (snake-case), and an expression based on these arguments.
  • case: Deconstructs enumerated types (ADTs). Takes an expression, and one or more pairs of cases, consisting of a deconstruction of a specific form, and an expression to be evaluated in its case. There has to be a case for every form of the deconstructed type.
  • let: Takes a vector of assignments and an expression, and evaluates to the expression, under the assignments.

The Type System

So, this unnamed language uses a linear type system, meaning that variables given to a function are consumed by it, unless we explicitly state otherwise. The only way to state otherwise is to mark the parameter's type as a borrow using the (& ...) form, meaning that the function only observes this value, but cannot modify it, and cannot return any part of it.

Function types use the -> form. The arity of the -> form is the arity of the function, plus one for the return value (the last argument). For example, the signature (-> String (& String) String) represents a function that takes two string arguments, the first is consumed and the second is borrowed, and returns a string. This could, for example, represent a concatenation function, which reserves the right to simply append the second string to the first one (thus destroying the original value), if space suffices.

Any non-borrowed value x of any type ?t can be passed to a function requiring type (& ?t) (a borrow of ?t) by passing (& x). The other way, however, is not possible. Instead, the copy function can be used. Its signature is, for every type ?t, (-> (& ?t) t), meaning it borrowed its argument and returns an equivalent value of the same type, fully owned by the caller. copy is an abstract function, and is specialized for every type automatically. Copy-types (all numeric types and all single-form enumerated types that only take copy-types as parameters) are specialized as the identity function, meaning that copy is simply removed after type checking.

case can deconstruct either an owned value or a borrowed one. In the latter case, deconstruction is made with forms looking like (& ::form-name args...), indicating that all arguments are borrowed.

Closures

Closures require special attention. To allow functions to be called multiple times, a function invocation of the form (func params...) only borrows the function. This means that a closure can only borrow its closure parameters, but these parameters are consumed when creating the closure. For example, the constantly function defined in this gist, takes a parameter c and returns a closure. c is consumed by the closure, as it needs to follow its lifecycle, and not the lifecycle of the function constantly itself. However, the closure cannot afford to consume c, because this would hinder further invocations. Therefore, c is exposed within the closure as a borrow ((& ?t)), forcing the function to copy it on each invocation.

Defined Values

Defined values (ones defined using def or defabst) are exposed throughout the program as borrows of their defined types. This is to avoid limitations on the number of times they can be used. This means that a definition such as:

(def pi Float64 3.141592)

will define pi as (& Float64) and not as Float64.

Tuples

Tuples of the form [...] are ad-hoc types. The type [Int32 String], for example, is a pair, containing an integer and a string. A value [3 "hello"] is a proper value of this tuple.

Evaluation

This language design is intended to allow compilation to very efficient machine code. The language uses eager evaluation, making it easy to convert its code to languages such as C. Linear types open the door to using mutable data structures (e.g., arrays and hash-tables), and also allow more cases of tail-recursion elimination (TRE) than other, non-linear, functional programming languages allow.

TRE

TRE is done in two cases. The classical case, when a function calls itself in tail position, and another case, where the function is altering the data structure it is called on. Both example exist in the the filter implementation for the List instance, in the above gist.

The :false case is text-book tail recursion. Because filter returns the result given by calling itself with different parameters (using r instead of l), we can evaluate this case by simply assigning l = r, and jumping to the beginning of the function.

The :true case is slightly more interesting. Here we call filter, but do not return its return value, but rather a construction of a new list ::item, where filter is called on its tail. In most functional programming languages, this would not count as tail recursion, but here it does, because a few conditions are met:

  1. The first argument to filter is owned by filter (not borrowed).
  2. The value constructed in this case is of the same form (::item) as the value deconstructed from the first argument.
  3. A recursive call is made to calculate one of the fields of the constructed ::item.

The first and second conditions tell us we can mutate the ::item we got as filter's first argument, rather than construct a new one. The third condition tells us how we can turn this into a loop. As with the classical case, all we need to do is assign the recursive call's parameters as the arguments (in this case, assign l = r, as with the :false case), and jump to the beginning of the function.

However, there is one subtle point to take care of -- the return value. In classical TRE, the return value is whatever the innermost call returns. Here, this is not the case. If all cases in this functions met the above three conditions, we could simply return the original value of l. However, this will not be the case if, for example, the first element will be filtered out (i.e., during the first iteration we find ourselves in the :false case).

To do this, the compiled version will add, to each function with a signature matching (-> ?t ... ?t) (i.e., a function that takes a first argument of some type ?t (owned), and returns a value of the same type) will include two imperative variables: ret and retptr. ret should have a type corresponding to ?t, and retptr should be a pointer to ret's type, and be initialized to point to ret.

ret contains the value to be returned by the function. This means that after the loop (i.e., after we reach a non-TR case), we return ret. retptr holds the target to where the return value should be written, from the various cases. Initially, it holds the address of ret, meaning that a non-TR case will write its value to ret and exit the loop, making it return the correct value. However, as calls are made to mutational cases (such as the :true case in filter), retptr will no longer point at ret.

We need to distinguish between three cases: Non-TR, classical-TR, and mutational-TR.

  • For non-TR, *retptr is assigned the return value, and the loop is broken. The value of ret is returned.
  • Classical-TR doesn't affect and is not affected by ret and retptr.
  • For mutational-TR, we do the following:
    1. Assign *retptr to be the current object, the one being mutated.
    2. Promote retptr to point to the pointer on which the recursive call is being made. For example, in filter's :true case, that would be &r.

Abstract Values and Instances

Abstract values (functions, for all practical purposes) are inspired by C++ templates, with instances being inspired by template specialization. Code in this hypothetical unnamed language is only generated when all types are known. This is important because types can be low-level (e.g., Int32 or Float64), requiring different storage space and different machine-code instructions to work on them.

When a concrete function is being defined (e.g., using def), all the abstract values it uses are being instantiated. This is done in the following steps:

  1. Based on the concrete type, a matching instance is searched for. If not found, we default to the abstract value's default definition, if given. If not, a compilation error is emitted.
  2. Reference to the abstract value is replaced with a reference to its concrete instance. The name is suffixed with a hash of the type parameters needed to instantiate the abstract value (the type argument vector, with all type variables replaced with concrete types).
  3. If not already defined, a concrete definition of the instantiated value is generated (i.e., a def of the concrete value).

Unlike C++ templates, we do not expect type errors to appear during the instantiation process, because abstract values always define a type signature. The only errors that can appear in this case are missing instances for abstract values with no defaults.

Inlining and Constant Propagation

This hypothetical language uses a deterministic inlining policy. After replacing all abstract values with concrete ones, the compiler goes through all definite execution paths (execution paths that are not conditional) and recursively inlines all functions that are definitely called. Built-in functions (such as the + operator) cannot be inlined, but in case such a function is called with nothing but constants, an equivalent of the function is called at compile-time and the call is replaced with the value. The combination of these methods provides some level of partial evaluation.

For example, consider the expression (range 0 3), calling the range function defined in this gist. In such a case. If this expression is evaluated unconditionally in some function, the call is inlined, replacing the call with the body of the function, with arguments b and e being replaced.

(case (compare 0 3)
  :greater ::nil
  :equals ::nil
  :smaller (::item (copy 0) (range (+ 0 1) (copy 3))))

Now the built-in functions compare, + and copy are called by the compiler, and the calls are being replaced by values.

(case :smaller
  :greater ::nil
  :equals ::nil
  :smaller (::item 0 (range 1 3)))

Now case knows which path to take, so the whole expression is replaced with the :smaller clause.

(::item 0 (range 1 3))

Now the call to (range 1 3) becomes definite, so it can get inlined by itself. The result is:

(::item 0 (::item 1 (::item 2)))

Memory Representation of Objects

Enumerated types (ADTs) are separated into single-form and multi-form. types. Single-form types are represented as structures, containing their respective fields. Multi-form types are represented as pointers, pointing to a structure, in which the first element is a pointer to a destructor function (see Memory Management below), dedicated to this form, and the form's parameter follow as the rest of the members of the structure.

Because the destructor function is unique to each form, the pointer to the destructor can help distinguish which form this is, facilitating case.

Closures are represented as structures, containing a pointer to the closure function as their first element, a destructor function as its second argument, and closure parameters (if exist), as additional elements. The closure function assumes it receives the closure object as its first argument, before the declared arguments. This way it has access to the closure parameters.

Memory Management

Since this language is linearly-typed, it requires no garbage collection. But still, being functional, it creates a lot of small objects (ADT values and closures) and needs to be able to allocate and deallocate them quickly.

We can use a simple, constant-time algorithm for that. First we make an allocation using traditional malloc. When nothing else exists, we allocate objects (ADT values and closures) from that block, by advancing a pointer. When we run out of space in that block, we allocate a new one. This allocation is never going to be freed.

Second, for each target-level type (each ADT form and each closure function) we define three things:

  1. A constructor, responsible for allocating space for this type, and assign values to its fields,
  2. A destructor, responsible for reclaiming the space it takes, along with space taken by all its owned fields, and
  3. A static recycle pointer, beginning a linked list of reclaimed objects of that type (statically initialized to NULL).

Both constructors and destructors work in constant time. The constructor looks at the recycle pointer. If it is NULL, space is allocated from the pre-allocated block, as discussed above. If not, we pop the first element from the linked-list. Because both ADF forms and closures use function pointers as their first field, this pointer is being used to construct the list, and this is the only value being replaced on destruction. This means that the rest of the fields (including pointers to values owned by the recycled value) are still there. Before overriding them with new values, the constructor calls the destructors of all these fields to reclaim their space. This is done during construction and not destruction, to limit both operations to constant time, and avoid potential stack overflows, e.g., in the case when a long list is being destroyed. The price we pay here is that space is not reclaimed right away.

Destructors are much simpler, and simply add the object they destroy to the linked list it belongs to, overriding the first element as the list pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment