edmundsmith/EmuDepTy_pre.md Secret

## EmuDepTy_pre.md

      
    Raw
  

              EmuDepTy_pre.md
            
          
    Emulating Dependent Types for reasonably efficient compilation

Intro

I have looked for a method to compile dependent types efficiently in a language-runtime-less/VM-less environment for a few years now.
I haven't yet found anything that said 'this is how to compile dependent types for $Lang', where $Lang is a native language with a C-like semantics.
Hopefully this article can be that missing piece.
Motivation

Dependent types are good/useful/nice etc., and I think they should be available to regular programmers/projects, rather than restricted to a few high-barrier-to-entry academic languages/projects.
I really enjoy having them, and they enrich a language's expressive capability.
I first tried to add them to the D programming language as a lowly undergraduate fresher, but stumbled over the divide between literature on semantics in FP languages and compilation in imperative languages. Bridging that gap was then my white whale for a few years, until recently I stubbed my toe on this technique.
Jargon

⟦T⟧ means the interpretation of some mathematical T into Rust's semantics. This may then be compared back to the Rust syntax of said semantics. For example, ⟦ {} ⟧ may be the value HashSet::empty(), or for the zero/empty type ⟦ 0 ⟧ = enum Empty {};
repr is the function from a syntax fragment (e.g. a struct L {h:u32,t:*mut L} definition) to the memory layout of the semantic object represented by the syntax fragment (e.g. repr L = 4 bytes-le 'L.h', 4 bytes padding, 8 bytes-le pointer 'L.t'). repr is used to express part of the compilation progress, so that we can inspect how individual pieces of the language work.
repr forms a bit of a distributive-category-looking structure, where repr ⟦A × B⟧ is repr (A, B) and repr ⟦A + B⟧ is repr (union {A, B}) or repr (enum {A, B}), depending on whether we wish to keep the union's discriminant (tag). This is effectively the same as the structure for sum and product types in Rust; the transformation from Rust data types to memory layouts is pleasantly thin/repr is a pretty simple functor.
repr also contains a partial order, where we say repr A ≤ repr B iff we can safely store and retrieve a value of A within the memory layout of B. For example, i32 ≤_repr (i32, u8), whereas u8 !≤_repr Box<Vec<u8>>, because repr Box<Vec<u8>> = repr (*const Vec<u8>)
Dependent types

Sigma types

Normal sigma types look a bit like
a:A, f:A -> Type, b:f(a) ⊢ Σ(a, b) : (A, f(a))
If we restrict f to a closed finite* set of types rather than the open Type, this becomes more useful (*finite over representations; that is, where repr (dyn Trait) for some Trait has one single type representation for all Trait impls. This is very close to the Rust concept of Sized)
a:A, S: FinSet Type, f:A -> (∃ B:Type ∈ S . B) ⊢ Σ'f(a, b) : (A, B)
This finite set is representable as a sum type (remember repr ⟦A + B⟧ ≡ repr (union { A, B })), so
∀ S, f . repr (⟦ Σ'f(a,b) ⟧) = repr (A, ⟦ ∃ B ∈ S ⟧)

 = repr (A, ⟦ S ⟧)

 = repr (A, ⋃ S)
( = repr (A, union {S_0, S1, S_2, ...}))
Discriminating which member of the set/which variant of the union is stored inside the representation is performed by f(a).
This is similar to normal tagged unions/sum types/Rust enum, but instead of a data-discriminated union, the emulated sigma is a computationally-discriminated union. This also subsumes plain tagged unions/enums, as they are equivalent to
∀ Left, Right : Type . Σ'( λi → [Left, Right][i] )(i, union {Left, Right})
The intuition behind this is the idea that for A:Type, B:Type, f:A->B, we have Σ^A B ≤ A×B and thus repr (A,B) ∼ repr Σ'f(A, B)
Pi types

Pi types are a little larger, and are thus a little trickier. Adopting the same intuition (Π^A B ≤ B^A and B^A ≡ A->B) and finite representation technique, from
A: Type, S: FinSet Type, a:A, f:A -> (∃ B:Type ∈ S . B) ⊢ Π(a, b): (A, f(a)).
we get
∀ S, f . repr (⟦ Π'f(a,b) ⟧) = repr (A, A -> ⟦ ∃ B ∈ S ⟧)

 = repr (A, A -> ⟦ S ⟧)

 = repr (A, A -> ⋃ S)
( = repr (A, A -> union {S_0, S1, S_2, ...}))
Note that this, instead of computationally-discriminated data, is data-discriminated computation.
Vect by example

For an example, let's look at the venerated Vect of dependent type tutorials' fame. In Idris, the type Vect has the type signature
Vect : Nat -> Type -> Type, and we can define functions returning Vect like repeat : (n:Nat) -> a -> Vect n a. I'll use an explicit Sigma and Pi encoding for clarity, although other expressions of Vect (such as constructor pattern matched types) are equivalent.
Translating Vect through our Σ' and Π' transformation, we see
repr ⟦ Vect (n:usize) a ⟧ =
let f = λm → if m == 0 then Nil else (Cons a (Vect (m-1) a))
in repr ⟦ Σ(f)(n, a) ⟧`  


Remark: repr codomain(f) = repr (⟦⋃ {Nil, Cons}⟧) = repr (enum {Nil(repr_Nil), Cons(repr_Cons)}) for some well-defined repr_Nil and repr_Cons.

Also, relying on the definition of the inner Vect is left here for clarity rather than lifting it out into a mutual recursion or combinator, but that can also be done mechanically.

= let
enum Vect_Tag { Nil, Cons };
union Vect_Holder<A> { Nil, Cons(A, &Sigma_f_Vect<A>) };
f: fn(usize) -> Vect_Tag = |m| {
    if m == 0 { Vect_Tag::Nil } else { Vect_Tag::Cons}
};
struct Sigma_f_Vect<A>(usize, Vect_Holder<A>);

in
repr (Sigma_f_Vect<a>)

= let struct V<A> (usize, A, &V<A>); in repr V<A>
Next steps

Having a method to represent Pi and Sigma types brings us on to how we compile to them. It should be doable with a little introspective magic to create a Value -> Set Type function: the return type is an unlifted set of types specifying precisely the codomain of the function, or equivalently, a tagged union/sum of all types that can be returned by the function.
(Un)-Lifting

For the following section we'll use ⤓ to denote the compiler unlifting or lowering a type down to a runtime term representing the type, and ⤒ to represent the compiler lifting a runtime term representing a type to its type.
I'll use unlifting in most places to make its relation to lifting clear
We wish for the following to hold:
⤓ : {t:Type} -> t -> repr t

⤒ : {t:Type} -> repr t -> t

∀ t:Type . t⤓⤒ ~ t   -- equivalent under compilation/t⤓⤒ simulates t

∀ t: Type . (repr t)⤒⤓ ≡ (repr t)
Perhaps this intuition may help:
                t:    Type
∀ t: Type .    ⤒  ⤓ : ⤒  ⤓ 
             (repr t):Value

Or we may call this lifting and unlifting an adjunction between the compile-time type space and the run-time value space. This may just be a clumsy re-wording the act of compilation.
Compilation example

By our finite-set-of-representations restriction, even if the set of a function's codomain is infinite, the representation tagged union/sum is not.
example:
const fn type_for_len(u:usize) -> Type* {
    match u {
        0 | 1 => bool,
        2..=8 => u8,
        _ => Vec<u64>
    }
}
First, the codomain set is {bool, u8, Vec<u64>}. This is represented by a sum type, split into its constituent tag and contents (repr).
//Note bool⤓ ≡ __bool, u8⤓ ≡ __u8, etc.
enum type_for_len_tag {
    __bool, __u8, __Vec_lt_u64_gt
    //= quote (bool⤓), quote (u8⤓), quote (Vec<u64>⤓)
}
union type_for_len_repr {
    __bool, __u8, __Vec_lt_u64_gt
    //= bool⤓, u8⤓, Vec<64>⤓
}
(the compiler also retains the inverse mapping from type_for_len_tag -> Type as ⤒. The separation of tag from union is intentional and useful for later on, too)
and type_for_len is composed to inject into the correct variant:
const fn type_for_len⤓(u:usize) -> __type_for_len_cod {
    match u {
        0 | 1 => __bool,
        2..=8 => __u8,
        _ => __Vec_lt_u64_gt
    }
}
Now, as long as we can refer to expressions in a type position in code (syntactic type-expression equivalence), we can do
let size = 4;
let val: Π type_for_len(size) = ...;
which represents the equivalent
let size = 4;
let val: (type_for_len_tag, type_for_len_repr) = ...;
Match expressions will need extending to support the separation of tag from union, but that's not much effort. The particular f:(A->(S:Set Type)) is encoded in the dependent type, so f⤓:(A->(repr S)) can be recovered to re-compute the tags as necessary.
let a: A;
let s: Σ (A, f(a)); // = ⟦ Σ'f(A, B) ⟧
let p: Π f(a); // = ⟦ Π'f(A,B) ⟧
match s {
    (a1:A, b:B) => ...
    /*
        B is a valid type here iff f⤓⤒(a1) ~ B
        equivalently, iff f⤓(a1) == B⤓
        This can be typechecked at compile time when a1 is a constant literal.
        Broader patterns in the a1 position would require further research/work

        the generated code becomes
        
        match s.a {
            a1 => if let b = unsafe { s.b.B⤓ } in ...
            //         union access  ~~~~^
            // proven safe by type checking and match
        }
    */
}

match p {
    (b:B) => ...
    /*
        B is a valid type here iff B ∈ codomain(f)
        The generated code becomes

        match f⤓(a) {
            B⤓ => if let b = unsafe { p.B⤓ } in ...
            //       union access ~~~~ ^
            // proven safe by type checking and match
        }

        note this is basically computationally-discriminated unions, just with
        the compiler holding our hand
    */
}
Behaviour

I followed the excellent A tutorial implementationof a dependently typed lambda calculus (Löh, McBride, Swierstra) to see if there was any non-trivial behaviour I needed to iron out and properly define, but I think that everything is basically covered - the run-time evaluation of the type terms within the dependently-typed value are just evaluations of the unlifted/lowered function, which is already well-defined in Rust, and this is the main point of this article - to safely overcome the challenge of type checking/type discriminants in a compiled/type-erased/runtimeless* setting.

* here, no runtime as in no interpreter or whole-program RTTI or VM required. Computer science over-overloads its terminology sometimes.
Conclusion

Compiling dependent types is not only possible, but  straightforward, even in the absence of a language runtime or interpreter.
The chief challenge of compiled dependent types of type-checking at compile-time depending on run-time values has been alleviated by restricting the codomain of Pi and Sigma to a finitely-representable subset of Type, rather than all of Type. I believe this then reduces typechecking a dependent type from 'requiring complete evaluation of a depended-upon function' (hard during compilation!) to typechecking said function (much simpler), since a returned value (dependent/discriminant) is then definitionally correctly typed iff the dependeded-upon function is correctly typed.