Created
October 19, 2018 05:59
-
-
Save brosenan/f442472733b8af1b26baf8f67b51628b to your computer and use it in GitHub Desktop.
Example code in an imaginary, linearly-typed functional programming language.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; This is not really Clojure code. I used the .clj extention since it provides the closests syntax highlighting to this imaginary language. | |
(defabst constantly [?t] (-> ?t (-> ?t)) | |
(fn [c] | |
(fn [] | |
(copy c)))) | |
(deftype List [?t] | |
(::nil) | |
(::item ?t (List ?t))) | |
(deftype Tree [?k ?v] | |
(::empty) | |
(::node ?k ?v (Tree ?k ?v) (Tree ?k ?v))) | |
(defabst map [?c1 ?t1 ?c2 ?t2] | |
(-> (& ?c1) (& (-> (& ?t1) ?t2)) ?c2)) | |
(definstance map [?t1 ?t2] [(List ?t1) ?t1 (List ?t2) ?t2] | |
(fn [l f] | |
(case l | |
(& ::nil) ::nil | |
(& ::item v r) (::item (f v) (map ar f))))) | |
(definstance map [?k ?v1 ?v2] [(Tree ?k ?v1) ?v1 (Tree ?k ?v2) ?v2] | |
(fn [t f] | |
(case t | |
(& ::empty) ::empty | |
(& ::node k v l r) (::node k (f v) (map l f) (map r f))))) | |
(defabst reduce [?c ?t] | |
(-> (& ?c) (& (-> ?t ?t ?t)) ?t ?t)) | |
(definstance reduce [?t] [(List ?t) ?t] | |
(fn [l f v0] | |
(case l | |
(& ::nil) v0 | |
(& ::item v r) (reduce r f (f v0 (copy v)))))) | |
(definstance reduce [?k ?v] [(Tree ?k ?v) ?v] | |
(fn [t f v0] | |
(case t | |
(& ::empty) v0 | |
(& ::node k v l r) (f (reduce l f (copy (& v0))) (reduce r f v0))))) | |
(def range (-> (& Int) (& Int) (List Int)) | |
(fn [b e] | |
(case (compare b e) | |
:greater ::nil | |
:equals ::nil | |
:smaller (::item (copy b) (range (+ b 1) (copy e)))))) | |
(deftest | |
(= (let [r (range 10) | |
m (map r (& (fn [x] (+ x 2)))) | |
sum (fn [x y] (+ (& x) (& y)))] | |
(reduce (& m) (& sum) 0)) | |
(+ 20 (/ (* 10 11) 2)))) | |
(defabst filter [?c ?t] | |
(-> ?c (& (-> (& ?t) Bool)) ?c)) | |
(definstance filter [?t] [(List ?t) ?t] | |
(fn [l f] | |
(case l | |
::nil ::nil | |
(::item v r) (case (f v) | |
:true (::item v (filter r f)) | |
:false (filter r f))))) | |
(defabst merge [?k ?v] (-> (Tree ?k ?v) (Tree ?k ?v) (Tree ?k ?v)) | |
(fn [t1 t2] | |
(case t1 | |
::nil t2 | |
(::node k1 v1 l1 r1) (case t2 | |
::nil t1 | |
(::node k2 v2 l2 r2) | |
(case (compare k1 k2) | |
:greater (::node k1 v1 (merge l1 t2) r2) | |
:smaller (::node k1 v1 l1 (merge r1 t2)) | |
:equal (::node k2 v2 (merge l1 l2) (merge r1 r2))))))) | |
(definstance filter [?k ?v] [(Map ?k ?v) [?k ?v]] | |
(fn [t f] | |
(case t | |
::empty ::emtpy | |
(::node k v l r) (case (f [k v]) | |
:true (::node k v (filter l f) (filter r f)) | |
:false (merge l r))))) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Some explanations:
Basic Forms
def
: Define value. This could be a constant or a function. Takes two positional arguments: the type signature, and the value.deftype
: Similar to data in Haskell, defines an enumerated type / ADT. Takes a name for the new type (PascalCase), a vector of type arguments (camelCase, prefixed with a '?'), and one or more forms, each being a list starting with a fully-qualified keyword, followed by zero or more types for the parameters.defabst
: Defines an abstract value (a function, for all practical purposes). Defines a name (snake-case), a vector of type parameters, a type signature based on these parameters, and optionally, a default value.definstance
: Defines a special case for an abstract value. Takes the name of the abstract value to specialize, a vector of type parameters, a vector of the abstract value's type parameters, specialized for this instance, and the specialized value.deftest
: Defines a unit-test. Takes a Boolean expression that needs to be true for the test to pass.fn
: A lambda abstraction. Takes a vector of arguments (snake-case), and an expression based on these arguments.case
: Deconstructs enumerated types (ADTs). Takes an expression, and one or more pairs of cases, consisting of a deconstruction of a specific form, and an expression to be evaluated in its case. There has to be a case for every form of the deconstructed type.let
: Takes a vector of assignments and an expression, and evaluates to the expression, under the assignments.The Type System
So, this unnamed language uses a linear type system, meaning that variables given to a function are consumed by it, unless we explicitly state otherwise. The only way to state otherwise is to mark the parameter's type as a borrow using the
(& ...)
form, meaning that the function only observes this value, but cannot modify it, and cannot return any part of it.Function types use the
->
form. The arity of the->
form is the arity of the function, plus one for the return value (the last argument). For example, the signature(-> String (& String) String)
represents a function that takes two string arguments, the first is consumed and the second is borrowed, and returns a string. This could, for example, represent a concatenation function, which reserves the right to simply append the second string to the first one (thus destroying the original value), if space suffices.Any non-borrowed value
x
of any type?t
can be passed to a function requiring type(& ?t)
(a borrow of?t
) by passing(& x)
. The other way, however, is not possible. Instead, thecopy
function can be used. Its signature is, for every type?t
,(-> (& ?t) t)
, meaning it borrowed its argument and returns an equivalent value of the same type, fully owned by the caller.copy
is an abstract function, and is specialized for every type automatically. Copy-types (all numeric types and all single-form enumerated types that only take copy-types as parameters) are specialized as the identity function, meaning thatcopy
is simply removed after type checking.case
can deconstruct either an owned value or a borrowed one. In the latter case, deconstruction is made with forms looking like(& ::form-name args...)
, indicating that all arguments are borrowed.Closures
Closures require special attention. To allow functions to be called multiple times, a function invocation of the form
(func params...)
only borrows the function. This means that a closure can only borrow its closure parameters, but these parameters are consumed when creating the closure. For example, theconstantly
function defined in this gist, takes a parameterc
and returns a closure.c
is consumed by the closure, as it needs to follow its lifecycle, and not the lifecycle of the functionconstantly
itself. However, the closure cannot afford to consumec
, because this would hinder further invocations. Therefore,c
is exposed within the closure as a borrow ((& ?t)
), forcing the function tocopy
it on each invocation.Defined Values
Defined values (ones defined using
def
ordefabst
) are exposed throughout the program as borrows of their defined types. This is to avoid limitations on the number of times they can be used. This means that a definition such as:will define
pi
as(& Float64)
and not asFloat64
.Tuples
Tuples of the form
[...]
are ad-hoc types. The type[Int32 String]
, for example, is a pair, containing an integer and a string. A value[3 "hello"]
is a proper value of this tuple.Evaluation
This language design is intended to allow compilation to very efficient machine code. The language uses eager evaluation, making it easy to convert its code to languages such as C. Linear types open the door to using mutable data structures (e.g., arrays and hash-tables), and also allow more cases of tail-recursion elimination (TRE) than other, non-linear, functional programming languages allow.
TRE
TRE is done in two cases. The classical case, when a function calls itself in tail position, and another case, where the function is altering the data structure it is called on. Both example exist in the the
filter
implementation for theList
instance, in the above gist.The
:false
case is text-book tail recursion. Becausefilter
returns the result given by calling itself with different parameters (usingr
instead ofl
), we can evaluate this case by simply assigningl = r
, and jumping to the beginning of the function.The
:true
case is slightly more interesting. Here we callfilter
, but do not return its return value, but rather a construction of a new list::item
, wherefilter
is called on its tail. In most functional programming languages, this would not count as tail recursion, but here it does, because a few conditions are met:filter
is owned byfilter
(not borrowed).::item
) as the value deconstructed from the first argument.::item
.The first and second conditions tell us we can mutate the
::item
we got asfilter
's first argument, rather than construct a new one. The third condition tells us how we can turn this into a loop. As with the classical case, all we need to do is assign the recursive call's parameters as the arguments (in this case, assignl = r
, as with the:false
case), and jump to the beginning of the function.However, there is one subtle point to take care of -- the return value. In classical TRE, the return value is whatever the innermost call returns. Here, this is not the case. If all cases in this functions met the above three conditions, we could simply return the original value of
l
. However, this will not be the case if, for example, the first element will be filtered out (i.e., during the first iteration we find ourselves in the:false
case).To do this, the compiled version will add, to each function with a signature matching
(-> ?t ... ?t)
(i.e., a function that takes a first argument of some type?t
(owned), and returns a value of the same type) will include two imperative variables:ret
andretptr
.ret
should have a type corresponding to?t
, andretptr
should be a pointer toret
's type, and be initialized to point toret
.ret
contains the value to be returned by the function. This means that after the loop (i.e., after we reach a non-TR case), we returnret
.retptr
holds the target to where the return value should be written, from the various cases. Initially, it holds the address ofret
, meaning that a non-TR case will write its value toret
and exit the loop, making it return the correct value. However, as calls are made to mutational cases (such as the:true
case infilter
),retptr
will no longer point atret
.We need to distinguish between three cases: Non-TR, classical-TR, and mutational-TR.
*retptr
is assigned the return value, and the loop is broken. The value ofret
is returned.ret
andretptr
.*retptr
to be the current object, the one being mutated.retptr
to point to the pointer on which the recursive call is being made. For example, infilter
's:true
case, that would be&r
.Abstract Values and Instances
Abstract values (functions, for all practical purposes) are inspired by C++ templates, with instances being inspired by template specialization. Code in this hypothetical unnamed language is only generated when all types are known. This is important because types can be low-level (e.g.,
Int32
orFloat64
), requiring different storage space and different machine-code instructions to work on them.When a concrete function is being defined (e.g., using
def
), all the abstract values it uses are being instantiated. This is done in the following steps:def
of the concrete value).Unlike C++ templates, we do not expect type errors to appear during the instantiation process, because abstract values always define a type signature. The only errors that can appear in this case are missing instances for abstract values with no defaults.
Inlining and Constant Propagation
This hypothetical language uses a deterministic inlining policy. After replacing all abstract values with concrete ones, the compiler goes through all definite execution paths (execution paths that are not conditional) and recursively inlines all functions that are definitely called. Built-in functions (such as the
+
operator) cannot be inlined, but in case such a function is called with nothing but constants, an equivalent of the function is called at compile-time and the call is replaced with the value. The combination of these methods provides some level of partial evaluation.For example, consider the expression
(range 0 3)
, calling therange
function defined in this gist. In such a case. If this expression is evaluated unconditionally in some function, the call is inlined, replacing the call with the body of the function, with argumentsb
ande
being replaced.Now the built-in functions
compare
,+
andcopy
are called by the compiler, and the calls are being replaced by values.Now
case
knows which path to take, so the whole expression is replaced with the:smaller
clause.Now the call to
(range 1 3)
becomes definite, so it can get inlined by itself. The result is:Memory Representation of Objects
Enumerated types (ADTs) are separated into single-form and multi-form. types. Single-form types are represented as structures, containing their respective fields. Multi-form types are represented as pointers, pointing to a structure, in which the first element is a pointer to a destructor function (see Memory Management below), dedicated to this form, and the form's parameter follow as the rest of the members of the structure.
Because the destructor function is unique to each form, the pointer to the destructor can help distinguish which form this is, facilitating
case
.Closures are represented as structures, containing a pointer to the closure function as their first element, a destructor function as its second argument, and closure parameters (if exist), as additional elements. The closure function assumes it receives the closure object as its first argument, before the declared arguments. This way it has access to the closure parameters.
Memory Management
Since this language is linearly-typed, it requires no garbage collection. But still, being functional, it creates a lot of small objects (ADT values and closures) and needs to be able to allocate and deallocate them quickly.
We can use a simple, constant-time algorithm for that. First we make an allocation using traditional
malloc
. When nothing else exists, we allocate objects (ADT values and closures) from that block, by advancing a pointer. When we run out of space in that block, we allocate a new one. This allocation is never going to be freed.Second, for each target-level type (each ADT form and each closure function) we define three things:
NULL
).Both constructors and destructors work in constant time. The constructor looks at the recycle pointer. If it is
NULL
, space is allocated from the pre-allocated block, as discussed above. If not, we pop the first element from the linked-list. Because both ADF forms and closures use function pointers as their first field, this pointer is being used to construct the list, and this is the only value being replaced on destruction. This means that the rest of the fields (including pointers to values owned by the recycled value) are still there. Before overriding them with new values, the constructor calls the destructors of all these fields to reclaim their space. This is done during construction and not destruction, to limit both operations to constant time, and avoid potential stack overflows, e.g., in the case when a long list is being destroyed. The price we pay here is that space is not reclaimed right away.Destructors are much simpler, and simply add the object they destroy to the linked list it belongs to, overriding the first element as the list pointer.