Skip to content

Instantly share code, notes, and snippets.

@MonoidMusician
Last active July 24, 2020 05:53
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MonoidMusician/c42361964ece15c9a883c005282c9614 to your computer and use it in GitHub Desktop.
Save MonoidMusician/c42361964ece15c9a883c005282c9614 to your computer and use it in GitHub Desktop.
(Homotopy) Type Theory Musings, informal and intuitive (hopefully)

MonoidMusicianʼs Type Theory Musings

Welcome to my (un)scrambled thoughts on all things from the intersection of homotopy type theory, functional programming, and set theory! Thereʼs going to be a lot of terminology being thrown around, but donʼt be intimidated: hopefully Iʼll either describe the term, link to a definition/explanation, or youʼll be able to figure it out from context (or just skip it!). I do believe the best way to be clear is to be precise with language and to use established terminology, but because I am synthesizing pretty disparate areas of knowledge, most people wonʼt know all the terms.

My goal with these are to raise some new perspectives on common themes, and cross-pollinate these fields with each other. Iʼd love it if more mathematicians knew what inductive types were (theyʼre so useful!) and could see how closely their existing terminology is reflected in type theory. Similarly, programmers will be astounded to know how quotients and subtypes can formalize the intuitions that theyʼve been working with for years, and make it rigorous enough for a proof assistant to understand and verify. And of course, everyone can learn a thing or two about functions, these wonderful objects. And I would love it if Univalence was slightly less scary.

But most of all, I believe type theory is an exceptional extensible framework for conceptualizing some of the most important fundamentals questions at all levels of formality:

  • What is data?
  • What is computation?
  • What constitutes a proof?
  • How do we talk about infinity?
  • What is equality?
  • What is a function?
  • What role do axioms play in determining the properties of a formal system?
  • How can we make math usable?

What is Type Theory?

Type theory is an incredibly powerful tool for formalizing mathematical and computational thoughts. Type theory has the semantics of mathematics wrapped up in the syntax of programming, but of course mathematicians can still work in type theory will less formalized notation, and programmers can focus on the computational aspects of type theory to their heartsʼ content.

Type theory has a straightforward syntax that is typically specified in a formal way, often with the intent of being machine-parsable. (I probably wonʼt get into the syntax we use for specifying type theories, unless thereʼs interest.) The syntax is incredibly precise and can also be concise (particularly with anonymous functions!). I believe having this formal foundation is a great boon for readability and teachability.

At the heart of type theory is the type judgment t : T, read as “the term t inhabits (or has) type T”. The user does not get to choose what type a term has, it is simply a consequence of the rules of the language, based on the syntactic structure of the term (also depending on what names are in scope, of course). Typically it is thought of as having a unique type, although certain parts of the expression may be elided resulting in the appearance of an ambiguous type, with the details being understood by convention. (This happens in math notation all the time!)

Type theoryʼs power is offered by functions, above all. A function f from type A to B, i.e. f : A -> B, is simply a term b : B that has variable a : A in scope. Note that we often use f a for function application (instead of the more verbose f(a)), and that every variable in scope has a specified type.

The special sauce of dependent type theory is the dependent functions, where the output type can depend on the value of the input term(!). So given a type A, a type family B : A -> Type, a dependent function g : Π(a : A), B(a) is specified by giving a term b : B(a) with a : A in scope. Note that the type of the output depends on the value of the input! This is the machinery that powers the use of dependent type theory in proof assistants, and other sophisticated applications.

Beyond functions, type theory is quite extensible and modular, and throughout this document I will cover various extensions. Just like common mathematics usage, thereʼs a set of agreed upon conventions as to what is standard and what notation is used. Furthermore thereʼs also some abuse of notation, where a term is technically ill-typed but understood to refer to a different but related term (a fair number of these exceptions could actually be formalized as coercions that are elaborated based on expected types).

In addition, there are a lot of extensions that donʼt change the semantics of the type theory, but make it nicer to work with as a proof/programming language. These are things like modules, type classes, tactics.

I love to explain parts of the process involved in analyzing type theory (particularly type inference!), but itʼs not necessary to understanding how to use it (just know that things are, for the most part, on solid intellectual ground, with decades of research and practical use behind it).

But Iʼm Used to Set Theory?!?

Understandable! Most of modern mathematical parlance is phrased in terms of set theory, when foundational concerns or clarity is necessary, and set theory is overwhelmingly taught to give intuitions for mathematical objects and operations. But type theory is more or less equivalent, and I think it can be clearer once you get a hang of it.

The biggest difference is, of course, that you cannot prove that a term inhabits a type! Instead, youʼll need to use a function, bijection/equivalence, or equality to transport from one type to a related type. For subtypes, youʼll need a proof to construct a term of the subtype from the term of the parent type.

Table of terminology:

set-theoretic type-theoretic
set (e.g. “the set of natural numbers” or “the set of prime numbers”) type or subtype (e.g. “the type of natural numbers” or “the subtype of prime numbers”)
subset subtype or powerset (yes, itʼs not called powertype)
family of sets type family or function from a specific type (the index) to the proper type of all types, e.g. T -> Type
set of equivalence classes quotient

Other differences:

  • We often use spaces as function application, e.g. f a reads “apply argument a to function f” (where in math notation we would use f(a)). Since functions have primary importance, and all functions are unary, the parentheses are a lot of extra noise and not necessary. This convention is particularly common in functional programming parlance (e.g. Haskell does function application like this). However, we will often write tuples as (x, y), which does look a lot like applying multiple arguments to a function (but is more versatile): f (x, y).

In this document

Assumptions

Type theories generally follow similar frameworks, so weʼll assume some basics here. The mathematical study of type theories is, in one sense, the study of starting with a common agreement for what a basic type theory looks like, and how various choices on top of that interact. For instance, many choices are in conflict: assuming Axiom K and Univalence would together produce an inconsistent theory – but the assumption when studying type theories is that they want to be consistent. There are other choices to be made, such as whether there is an impredicative universe. My favorite example of this is the fact that parametricity (a very nice property of polymorphic functions) fails to hold exactly when excluded middle holds (a convenient property for proofs), at least inside a univalent foundation: https://homotopytypetheory.org/2016/02/24/parametricity-and-excluded-middle/.

Anyways, for our purposes here, letʼs choose a fairly straightforward type theory – letʼs not make tough choices. We will have a hierarchy of universes Type 0 : Type 1 : …, all of which are predicative and proof-relevant (those are the straightforward choices). We have pi types and dependent functions, where pi types lie in the largest universe of the input and output: (Π (a : T : Type n), (P a : Type m)) : Type (max n m). And we will have dependent inductive types – thus we can define sigma types pretty easily:

inductive Sigma {T : Type n} (P : T -> Type m) : Type (max n m)
| mk : Π (a : T), P a -> Sigma P

All inductive types are defined by giving their constructors, and then we receive corresponding recursion/induction principles (used to pattern match/eliminate from a datatype) and computation principles (used to tie the two together).

Thatʼs pretty much all we need! Oh, oops, I forgot the most interesting part: equality!

inductive eq {T : Type n} : T -> T -> Type n
| refl : Π {x : T}, eq x x -- notated x = x

Again, the choice that equality lives in the same universe as its arguments is the most straightforward choice. Another choice would be to make it live in Type 0, but that gets in the way of univalence!

Consistency

Now letʼs talk about consistency. Consistency is a mathematical property, who actually cares, right?? Nope! Consistency is great to talk about computationally. Let me show you.

What is consistency? In a consistent formal system, you should only be able to prove things that are true, not things that are false. Whatʼs false? False is defined to be an empty data type (think Void in Haskell), this means that if you “pattern match” on it, you have zero cases to prove, and so you can prove anything from it (this is called ex falso quodlibet or the Principle of Explosion or absurd). Additionally, anything that implies false is considered to be false, and anything that implies everything is false, so this is why we sometimes define false as a type which implies everything else, or forall a. a.

This last part is the most important: if we were able to determine that every type has an inhabitant, that poses a problem mathematically: the theory is inconsistent, it means false is true! But computationally it also poses a problem: what if we "had" a value of type forall a. a? Would it be possible to compute that value?? No!

undefined :: forall a. a
undefined = undefined

Thereʼs no way to extract a value of a type, say Int, from that – it just sits and spins in an infinite loop.

So in this way, computation and consistency are related in the sense that they both ban infinite loops of this kind, which are essentially meaningless.

Intuitive Univalence

(hopefully)

I will try to aim this section at programmers, especially those with a Haskell-like background.

Let me start out by defining two functors in both Haskell and GADT/inductive-like syntax:

data List a = Nil | Cons a (List a)
data Tsil a = Lin | Snoc (Tsil a) a
inductive List (a : Type) : Type
| nil : List a
| cons : a -> List a -> List a

inductive Tsil (a : Type) : Type
| lin : Tsil a
| snoc : Tsil a -> a -> Tsil a

(Each has two constructors, a nullary one and one that adds an element to an existing list/tsil.)

Whatʼs the difference between these?

One answer is that they have different associativities: List naturally associates to the right, and Tsil the opposite. A more obvious answer is that they have different names. But what does it matter? Any code that uses one could easily be changed to use the other, with no observable differences.

There are many ways that we could prove the equivalence: we could keep the source order and switch the associativity, or we could swap the order and keep the associativity. It turns out that the mathematical idea of a bijection or equivalence captures this: a function with an inverse.

The latter equivalence (order-reversing) is captured by the two following functions:

list2tsil :: forall a. List a -> Tsil a
list2tsil Nil = Lin
list2tsil (Cons a as) = Snoc (list2tsil as) a

tsil2list :: forall a. Tsil a -> List a
tsil2list Lin = Nil
tsil2list (Snoc as as) = Cons a (tsil2list as)

Note that we can read these functions as source transformations, even:

  • Take every occurrence of Nil and replace it with Lin, and vice-versa.
  • Take every occurrence of Cons and replace it with (flip Snoc), and vice-versa.

(Aside: It turns out that this is just inlining the definition of the functions, and is a form of partial evaluation that functional programming languages are great at.)

The other way is by composing these functions with reversing functions, which take a list and flip it around. Iʼll omit it since it takes a couple more lines to define and is best explained intuitively, rather than in code.

This reversal would ensure that the cons lists read the same left-to-right when converted to snoc lists, e.g. Cons 1 (Cons 2 (Cons 3 Nil)) and Snoc (Snoc (Snoc Lin 1) 2) 3, but at the cost of flipping the associativity: either the 1 or the 3 will be closed to the root in this example. Perhaps one might say that the isomorphisms that flip the order from the computerʼs perspective are the opposite of the ones that flip the order from the humanʼs perspective. Which one is right? Well, theyʼre both right! Types can be equal in more than one way!

Univalence as program transformation

I already alluded to it above, but I would like to make more explicit the idea that we can think of univalence as rewriting programs to operate on a different type than originally thought.

Letʼs take a simple function:

diagonal :: List a -> List (List a)
diagonal Nil = Cons Nil Nil
diagonal (Cons a as) = Cons (Cons a as) (diagonal as)

By univalence, we can create a similar function that operators on Tsils instead. How would that look like? Well, we actually need to pick which isomorphism we are going to use; letʼs use list2tsil/tsil2list. For this particular type, we can represent it as a composition, or explicitly write it out:

lanogaid = map list2tsil <<< list2tsil <<< diagonal <<< tsil2list

lanogaid :: Tsil a -> Tsil (Tsil a)
lanogaid Lin = Snoc Lin Lin
lanogaid (Snoc as a) = Snoc (lanogaid as) (Snoc as a)

We notice that we start by applying tsil2list to the input – this corresponds to replacing the pattern matching on List with the corresponding pattern matching for Tsil. Then once we get the output of diagonal, we need to apply list2tsil twice to transform List (List a) -> Tsil (Tsil a) (once on the outside, and mapping it on the inside).

This program transformation

It turns out that if we want to use the reversing isomorphism, we can write the result neatly too:

lanogaid :: List a -> List (List a)
lanogaid Nil = Cons Nil Nil
lanogaid (Cons a as) = Cons Nil (map (Cons a) (lanogaid as))

(Instead of adding the elements on the head of cons, we wait to add it to the tail, so now we have a larger list each time, instead of a smaller list each time.)

Do you think that you could applying the program transformation for any isomorphism applying to any function?

Well, thatʼs what univalence tells us: we can rewrite any program to use any equivalent type, and isomorphisms are how we prove that two types are equivalent. Itʼs actually much stronger than programs as we think of them computationally: All properties will be preserved this way, too, even equalities, once you translate them along their types.

The setup is a little complicated to state formally, but say weʼre given a program of type F which mentions some type T, and we want it to instead work with type D, given an isomorphism f : T -> D and g : D -> T – how do we do that practically? It turns out, we just have to follow the types! If F' takes as an input an argument of type D, then we just need to apply g to that argument to get something equivalent of type T to apply in F. If F' gives as an output an argument of type D, we can get it from the corresponding output of type T in F by applying f. This was just for functions, but for each functor we can also give an argument why it works: It might require map : (a -> b) -> F a -> F b or contramap : (b -> a) -> F a -> F b or invmap : (a -> b) -> (b -> a) -> F a -> F b to make it work, or in fact it may only work when we have a true isomorphism (if we need to transfer a proof of equality), but for any type of functor you can define, you can make it work.

Now, maybe it seems hard to believe that every F will follow these rules. This is why we often take univalence to be an axiom. But any F you write down will actually follow these rules, and this means that it should be possible to give a computational interpretation to univalence.

Univalence vs set theory: where does equality come from?

I just have to write on this topic, itʼs been on my mind a lot. It is more philosophical and focuses on high-level concepts, but it also assumes a little more background in set theory than the rest of this discussion.

Sets are nice in set theory because they are ridiculously lightweight and universal: everything is a set, with the empty set at the bottom of it all, sets can be constructed out of almost any predicate and contain almost any element (though at some point, and Iʼm never clear when exactly it occurs, one must use “classes” instead of sets, to avoid the usual inconsistency issues), and elements arenʼt constrained to only belong to one particular set, but can instead be used in contexts where different types [sic.] of things are expected (as long as you can prove they belong, of course). And furthermore, equality is very obvious: two sets are equal when they contain the same elements. But wait up – is that really so obvious?

The first issue with the definition of set equality seems to be that it must be recursive, since sets contain other sets as elements. But this is sound, since there is only one set that is empty, and it (the one and only empty set) is obviously equal to itself, so we can start there and build our way up to real sets.

The real problem with set equality, though, is that set equality requires that its elements exist as independent entities with a global notion of equality. It isnʼt a problem within set theory, of course – set theory is consistent (presumably). But it is a problem with set theory itself, one possible theory among other foundational theories we can choose. Iʼll say it one more time, I think itʼs a good quote:

The biggest problem with set theory is that equality of sets requires that its elements exist as independent entities (sets themselves) with a global notion of equality.

This global notion of equality gets in the way of identifying sets in natural ways. Sets that, by rights, ought to be the same (such as sets of three arbitrary but distinct elements), end up being doggedly unable to be identified only because their elements have different identities globally. Itʼs almost like a Platonic ideal of sets as universal objects is baked into the very foundations of set theory, but it couldnʼt be carried through to the end: the sets exist as universal objects, but the details of their construction were foregrounded, and the concepts that they were supposed to represent were cast in deep shadow, almost eclipsed from sight altogether.

I claim that type theory can fix this. How does it do that? It actually makes things stricter, in order for them to be looser: an element can only belong to one type now (and which type it belongs to must be syntactically obvious), but now equality only has to be local, within the type. By making smaller claims about what our types represent, calling them merely local choices in one universe, we open up the system to greater universality in the abstract: knowing that each type is only a local choice makes us free to acknowledge that there are other choices out there, other perspectives, which are also real and valid, even though ours is a true and complete image of the universal concept by itself.

This might seem paradoxical, or ironic at least, but it really isnʼt. Some kind of compromise must be made, and by retracing oneʼs steps, back up before stepping forward again, one can find their way around the wall and get un-stuck. So here the decision is that, in order for equality of types to be universal, where like things can really be substituted for each other anywhere.

Univalence to the rescue!

What is Univalence? Univalence gives meaning to equality between types (“universe/typal extensionality”) by saying that any two isomorphic types are equal. This is great for the sematics of category theory, where everything should work up to isomorphism, and this fact is reified in type theory.

It is the very nature of the strict locality of the type system that makes this feasible. You canʼt distinguish between isomorphic types, so why not make them equal? In this way, univalence is the maximal(?) expansion of the notion of equality that was hinted at from the beginning: all statements made internally to type theory are consistent with univalent swapping of isomorphic sets. (Of course, there are certain axioms that can be assumed that contradict that, like Axiom K, which implies that each type only has one equality with itself.)

Proof relevance of equality is necessary here, because once it is no longer obvious how two objects are related (i.e. global identity), it must be specified how they are related. This is another case where set theory falls short: by having such a weak notion of equality, it forces every equality to be “obvious”. In constrast, the rich HoTT equality, which captures isomorphisms, allows for both obvious isomorphisms (such as the identity, which means that the types are definitionally equal) and more intricate isomorphisms.

I would say that the richness of considering equality as isomorphisms is essentially motivated by the question: how do I relate two things which seem different? It would seem very weird, and arbitrary, to designate one equivalence relationship as primary, and all others as extraneous, when one doesnʼt even have an obvious relationship between two (syntactically different) types. But then it also means that a type may be related to itself in myriad ways – even where there is an obvious equality, there must be others. It suggests that equality isnʼt as obvious as we have always pretended it to be, and we must consider every relationship carefully.

Type theoretic equality is locally grounded in its type, and so comparing elements between types that are equal actually requires knowledge of how to relate elements of those types, which is exactly what is captured in the proof of equality.

It may seem absurd the implications of this idea, but I think it is very valuable to consider it as a new perspective. It doesnʼt invalidate what we already know, but suggests there may be more nuance there than meets the eye. For example, all finite types with size n are isomorphic to each other in n! ways. Thatʼs alright, but not consider this: all countable types are isomorphic (equal) to each other! Okay, that actually wasnʼt so surprising, but what if I phrase it this way: there is only one type with countably infinite cardinality. Almost all of the basic types we consider in daily programming life, are finite, or countable: natural numbers are countable, strings are countable, lists of strings are countable, json (along with just about any other data format) are countable, and theyʼre all the same type!

I want you to sit with that cognitive dissonance for a bit.

All countable types are the same.

So whatʼs the difference, what gives??

Each countable type is merely a representation of some ur-countable type, and they really canʼt be distinguished from each other in any significant way, but they do provide different representations of a countable type, and these different perspectives make working with them different experiences. We get to make the choice to work with data in whatever form is most convenient for the task at hand, while recognizing that it is the same as any number of similar definitions, and we can exploit this sameness if we find a meaningful isomorphism between two perspectives.

Make no mistake: most isomorphisms wonʼt yield meaningful insights, but hopefully I have convinced you that having them all available to prove types equalities yields a very nice theory.

These are some of the insights Iʼve been musing over by viewing counting and probability through the lens of type theory. The first part will focus on modelling counting using types, thus the emphasis will be on type-level operations. The second part will model probability as a monad, focusing on value-level operations, guided by types.

I hope to include more introductory materials on type theory, here and in the above document. Note that type theory is usually computable, with strong normalization properties, but it doesnʼt have to be, and in particular, the axiom of choice and the law of excluded middle break computability, for the usual reasons. (Iʼll refrain from wading into constructivity issues, for your sake and mine.) Similary, type theory is usually very explicit, but the verbosity can be reduced, such as by omitting proofs and inferrable (implicit) arguments. These conventions bring type theory in line with the way mathematics is conventionally done.

Type arithmetic and counting

The ability of types to model arithmetic is well-known. The algebra (and calculus!) of algebraic data types, by Joel Burget, is a good primer in the basic operations of arithmetic at the type level. Product types result in the product of the cardinalities, sum types in the sum, quite naturally, and functions have the cardinality of the exponential of the output (base) to the input (power): |A -> B| = |B|^|A|. Many of these results are known in set theory too (where product types are interpreted as cartesian products, sum types as disjoint unions), but I think type theory is a more natural setting for them, for reasons I wonʼt go into here.

When we add in dependent types (types that depend on values), we get richer capabilities by representing predicates, propositions, and proofs. We can form subtypes using Sigma like above.

Counting using types

The goal is to model each important combinatorial function using an appropriate type combinator. In fact, we will see that this approach actually models the sematics of the objects we are counting in a rich way.

How to count

First: what do we mean by counting the elements of a type? Letʼs start with defining a type with a specified number of elements:

-- `fin n` A type that has `n` inhabitants
def fin : ℕ → Type := λ(n : ℕ), Σ(m : ℕ), m < n

It is defined as dependently-typed function that takes a natural number n to a subtype of the natural numbers which are less than n. This means that fin 0 is a type with no inhabitants, and fin 1 only has one (⟨0, zero_lt_one⟩, where the angle brackets denote constructing an element of the subtype with value 0 and zero_lt_one : 0 < 1 the proof required by the subtype), etc. In order to count the inhabitants, we want to be able to say that a given type is isomorphic to fin n for some n:

@[class] structure fintype (t : Type) :=
(card : ℕ)
(equiv_fin : trunc (t ≃ fin card))

(original: https://github.com/leanprover-community/mathlib/blob/d5de80376088954d592a59326c14404f538050a1/src/data/fintype.lean#L14-L20) (Notes on what finite means constructively: https://ncatlab.org/nlab/show/finite+set)

This is a typeclass; without getting too deep into it, it provides a way to access canonical members of a specific type: in this case, the cardinality card t of a type t along with a bijection that proves that t has as many elements as fin (card t) (trunc here just means that we donʼt get to inspect the details of the bijection, we can only prove results that are independent of its particular value). Note that card t is unique for each type t, if it exists, so we are more or less justified in abusing notation and treating it as a function Type → ℕ, ommitting the proof, like in conventional math notation. (There are type-theoretic reasons why this is okay, regarding inferring typeclass instances and their uniqueness, but I wonʼt go into details here.)

notation `|` t `|` := card t

Algebraic types

I will state the corresponding types and propositions without any proofs:

structure prod (α : Type) (β : Type) : Type :=
(fst : α) (snd : β)

notation α × β := prod α β

lemma card_prod (α β : Type) [fintype α] [fintype β] : |α × β| = |α| * |β| .

inductive sum (α : Type) (β : Type) : Type
| inl {} (val : α) : sum
| inr {} (val : β) : sum

notation α ⊕ β := sum α β

lemma card_sum (α β : Type) [fintype α] [fintype β] : |α ⊕ β| = |α| + |β| .

-- functions α → β are built-in and so need no definition

lemma card_exp (α β : Type) [fintype α] [fintype β] : |α → β| = |β| ^ |α| .

(definitions adapted from https://github.com/leanprover-community/lean/blob/6ed0977fcbc14354c86240cec7f8ef20842d0b5c/library/init/core.lean)

It turns out this is about all there is for algebraic data types, they all can be made out of these basic building blocks (products, sums, expontentials); we will need a more powerful type system to encode more combinatorial functions.

Dependent types

(a.k.a. Propositions, predicates, and proofs!)

Subtypes of functions

With dependent types, we can start to characterize various subtypes of functions and their numbers:

def injective {α β : Type} (f : α → β) : Prop := ∀ (a₁ a₂ : α), f a₁ = f a₂ → a₁ = a₂

notation α ↪ β := Σ(f : α → β), injective f

def surjective {α β : Type} (f : α → β) : Prop := ∀ (b : β), ∃ (a : α), f a = b

notation α ↠ β := Σ(f : α → β), surjective f

def bijective {α β : Type} (f : α → β) := injective f ∧ surjective f

notation α ↔︎ β := Σ(f : α → β), bijective f

(types from https://github.com/leanprover-community/lean/blob/6ed0977fcbc14354c86240cec7f8ef20842d0b5c/library/init/function.lean)

It turns out that injective functions are counted by the falling factorial, n P k, with arguments switched like the exponential case:

def F : ℕ → ℕ
| 0 := 1
| (n+1) := (n+1)*F n

notation n `!` := F n

def P : ℕ → ℕ → ℕ := λ(n k : ℕ), if n ≥ k then n! / (n-k)! else 0

lemma card_injective (α β : Type) [fintype α] [fintype β] : |α ↪ β| = P |β| |α| .

Surjective functions donʼt have such a well-known function counting them, but using [Stirling numbers of the second kind, S(n,k)], they are counted by k!*S(n,k), which I will call SP(n,k) (this time the arguments are in the same order as the type constructor):

def S : ℕ → ℕ → ℕ
| 0 0 := 1
| 0 (k+1) := 0
| (n+1) 0 := 0
| (n+1) (k+1) := S n k + (k+1) * S n (k+1)

def SP : ℕ → ℕ → ℕ := λ(n k : ℕ), k! * S n k

lemma card_surjective (α β : Type) [fintype α] [fintype β] : |α ↠ β| = SP |α| |β| .

Bijective functions are comparatively simple, Iʼll let B n k count them:

def B : ℕ → ℕ → ℕ
| n n = F n
| _ _ = 0

lemma card_bijective (α β : Type) [fintype α] [fintype β] : |α ↔︎ β| = B |α| |β| .

Quotients of functions

You might have noticed that weʼre still missing one of the most important combinatorial functions: the binomial coefficients, n C k! Can we find a suitable datatype to model it?

We can, but it is harder to use subtypes; instead, weʼll need to use another way to reduce the quantity of elements in a base type: quotient types (which are named after their tendency to divide the cardinality of a type, as in group quotients, though it is not always so simple).

(Need to explain how quotient types work in type theory. Mostly itʼs quite similar to set theory, though.)

For the binomial coefficients, weʼll quotient the injective functions by a relation based on permutations of the input:

def permutation_related {α β : Type} (f g : α ↪︎ β) : Prop :=
  -- compose the underlying functions, not the whole subtype
  ∃(h : α ↔︎ α), f.fn = g.fn ∘ h.fn

def img_from (α β : Type) := Quotient (α ↪︎ β) permutation_related

-- nonstandard notation:
notation α ↬ β := img_from α β

def C : ℕ → ℕ → ℕ := λ(n k : ℕ), P n k / k!

lemma card_img (α β : Type) [fintype α] [fintype β] : |α ↬ β| = C |β| |α| .

Because of the nice properties of actions of the group of permutations, it turns out that |α ↬ β| = |α ↪︎ β| / |α ↔︎ α|, which is exactly what we wanted: C |β| |α|. (I believe a proof of this would involve showing that permutation_related is an equivalence relation that relates each f to exactly one g = f ∘ h for each h : α ↔︎ α, but I havenʼt worked through the details.)

This is all very nice, but img_from α β seems a bit odd – what is it really?? It turns out that it is the type of subsets of β which are equinumerous with α, encoded as an injection α ↪︎ β with knowledge of the specific mapping from α to β hidden, although the image of the function is still available (in fact, the only information available). This suggests a different encoding, closer to the first wording about subsets of β which are like α:

def img_from' (α β : Type) := Σ(p : β → Bool), trunc (α ↔︎ (Σ(b : β), p b = True))

That was too abstract, so I want to also show you that it matches the intuition of what “choose” means. On the one hand, with the subset method, one wants to choose a subset of the elements of β which has just as many elements as α. This is like marking a bunch of golf balls with a red marker, and then counting to make sure you marked the right amount of them. But another way to choose a subset of a type is to specify a function α → β, considering α the pointing set and β is the set of objects of interest, require it to be injective (so that it points at exactly α many elements), and also say to ignore the identity of elements in α, given that they should only serve the role of pointer. This is like taking a stack of the right number of stickers, and using them each to mark a different golf ball. They arrive at the same result: a sized bunch of marked golf balls, but they take different methods to arrive at the same result.

Practically, I think it is more in line with the previous results to still be considering functions α → β, even if it is just designating a subset; type-theoretically, it is a good excuse to show how quotient types work.

As is usual with quotient types, in order to obtain information from a term, you have to prove that you are using it in a well-defined manner that respects the equality relation, and it is precisely this restriction which allows the quotient type to identify formerly distinct elements. (In type theory language, this means that the recursor, through which pattern matching and elimination are defined, requires proofs that the functions for each constructor are well-behaved.) This is very powerful: you get to redefine what equality means for the type! (Subject to restrictions that ensure it is well-behaved, of course: equality always needs to be an equivalence relation.)

Data structures

Of course, we donʼt have to only be considering functions. It turns out that many combinatorial objects are modelled by list-like structures. See the Inductive Types file for more info about inductive types.

Letʼs start with lists, which are a great building block for all sorts of containers:

inductive list (t : Type) : Type
| nil : list
| cons {} : t → list t → list t

(original: https://github.com/leanprover-community/lean/blob/6ed0977fcbc14354c86240cec7f8ef20842d0b5c/library/init/core.lean#L298-L300)

One important function is the length function; in fact, any parametric function that looks like Π {t : Type}, list t → r for some result type r, can be factored through the length function (meaning it must be f ∘ length, for some f : ℕ → r):

def length {t : Type} : list t → ℕ
| nil := zero
| (cons _ t) := succ (length t)

(original: https://github.com/leanprover-community/lean/blob/6ed0977fcbc14354c86240cec7f8ef20842d0b5c/library/init/data/list/basic.lean#L75-L77)

There are two main directions we can go from here, and they are both orthogonal and compatible, and furthermore they both can be expressed via subtypes and quotients:

  1. Get rid of the inherent ordering of the lists, in order to build finite (multi)sets:
    1. As a quotient, this means filtering by the equivalence relation that identifies two underlying lists that are merely permutations of each other.
    2. As a subtype, it is less elegant: it means requiring an ordering on the elements of the list, and asking that all elements of the underlying list are ordered (which is also potentially computationally expensive).
  2. Get rid of the possible duplication of elements in the list, in order to build lists without duplicates and plain finite sets:
    1. As a quotient, this means asking that all eliminators ignore the fact that an element of the underlying list may appear more than once.
    2. As a subtype, this means asking that no elements of the underlying list repeats, and this is actually more elegant, since type theory always has a notion of equality on a type.

So we typically see that a multiset is defined as a quotient of a list, and a finite set is defined as a subtype of a multiset (one needs to prove that the property of having no duplicates respects the lack of order of the multiset, of course). And of course, length respects the quotient types, so with some (justified) abuse of notation we will use it for all these types.

But, regardless of the particular definitions, we can see that these types mirror the kinds of questions we were asking about with combinatorics:

  • Lists of fixed length n : ℕ with elements in t : Type (often called vectors) are counted by |t|^n
  • Lists of fixed length n : ℕ with elements in t : Type without duplicates (perhaps called distinct vectors) are counted by P |t| n
  • Sets of fixed length n : ℕ with elements in t : Type are counted by C |t| n

Still, the semantics we gave using functions above are more general, in that n does not have to be a natural number, but can be the cardinality of any type, meaning their semantics possibly extend to infinite cardinalities too. For example, we would have to modify the underlying datatypes from inductive lists to coinductive streams in order to pull of a feat like asking for a countable subset of real numbers, but it is clear that the function-based approach works for that. Whether it is the “right” generalization is a matter of debate, though.

Counting as data analysis with functions (subtypes and quotients)

Iʼve been saying an incredible number of abstract things. Now I just want to consider an example that came up in class:

What is the probability that a group of 100 people has one pair of birthday twins, two birthday triplets, and one birthday quadruplet? (Assume that every year has 365 days.)

It is standard to take a combinatorial approach to this problem. The denominator of the probability is easy: it is 365^100, which we said was modelled nicely by a function fin 100 -> fin 365 which means “assign each of 100 people a birthday”. The trick is figuring out the numerator. What data type could represent the ? I claim we can model this by figuring out what kind of data is needed.

The most direct approach would be to figure out how to classify the results of fin 100 -> fin 365 which fit our desired pattern. We define a function groups : (fin n -> fin m) -> (fin m -> set (fin n)) that characterizes outputs by their preimages, and then countgroups : (fin m -> set (fin n)) -> map ℕ (fin m) which counts how many of each type (twins, triplets, etc.) occur, and then we use the composition of those functions and ask that their result be [ 2: 1, 3: 2, 4: 1, 1: 100-2*1-3*2-4*1 ] (that is, one twin, two triplets, one quadruplet). This is great from a data computability perspective, but it doesnʼt make it obvious how many of these objects there are (even though itʼs a finite search, theoretically, it would be incredibly large).

def D := fin 100 -> fin 365

def groups : (fin n -> fin m) -> (fin m -> set (fin n)) .

def countgroups : (fin m -> set (fin n)) -> map ℕ (fin m) .

def desired : map ℕ (fin 100) :=
  [ 2: 1
  , 3: 2
  , 4: 1,
  , 1: 88
  ]

def numerator : Type :=
  Σ(bdays : D), countgroups (groups bdays) = desired

Instead, letʼs use our intuition, like we do for classic combinatorics: what information do we need to characterize this type of arrangement?

  • We need 4 different birthdays, one for the twins, two for the triplets, and another for the quadruplet: call it (A,B1,B2,C) : fin 4 ↪︎ fin 365 (we can consider it a 4-tuple, with distinct entries), where |fin 4 ↪︎ fin 365| = P 365 4
  • We need to pick 2 people to be the twins: call it a : fin 2 ↬ fin 100, where |fin 2 ↬ fin 100| = C 100 2
  • We need to pick 3 different people to be the first triplets: call it b1 : fin 3 ↬ fin 98, where |fin 3 ↬ fin 98| = C 98 3
  • We need to pick 3 different people to be the second triplets: call it b2 : fin 3 ↬ fin 95, where |fin 3 ↬ fin 95| = C 95 3
  • We need to pick 4 other people to be the quadruplets: call it c : fin 4 ↬ fin 92, where |fin 4 ↬ fin 92| = C 92 4
  • We need a further 88 distinct birthdays for the remaining people with unique birthdays: p : fin 88 ↪︎ fin 361, where |fin 88 ↪︎ fin 361| = P 361 88

This results in a product type as our numerator space N' := (fin 4 ↪︎ fin 365) × (fin 2 ↬ fin 100) × (fin 3 ↬ fin 98) × (fin 3 ↬ fin 95) × (fin 4 ↬ fin 92) × (fin 88 ↪︎ fin 361).

We then define a function f : N' → D. Iʼll handwave over the details for now, but it does what should be intuitive: for ((A,B1,B2,C), a, b1, b2, c, p) : N', we construct a result bdays : D which assigns the bday A to the two twins in a, B1 to the triplets in b1, B2 to those in b2, and bday C to the quaduplets in c, and the remaining bdays are assigned according to p.

Did you notice the problem, though? We overcounted by a factor of two! We made this mistake in class, and it was hard to explain why, but I think it is very clear with this framework now: we overcount because we can swap the order of the triplets (B1 with B2 and b1 with b1) and we get the same birthday assignments. In symbols, f((A,B1,B2,C), a, b1, b2, c, p) = f((A,B2,B1,C), a, b2, b1, c, p). That is, the image of f (letʼs call it fD := Σ(bdays : D), ∃(n : N'), f n = bdays) is only half as large as the input N'.

So what we found isnʼt optimal, but it does get us pretty close (since the image of f does include all the examples we want, and nothing that doesnʼt belong). But again, this characterization is less than satifying because we end up with a characterization of the data only as a subset of the denominator, not as a simple product type. Can we come up with an explicit type that is equivalent?

We almost had it: we just need to discard the ordering of B1/b1 and B2/b2. Thereʼs one way to do it easily: quotient N' by the relation that identifies inputs sent to the same output:

def same_output {α β : Type} (f : α → β) : α → α → Prop :=
  λ(a1 a2 : α), f a1 = f a2

def N'' := Quotient N' (same_output f)

We could probably prove that |N''| = |N'|/2, given the niceness of this relation. We could even replace it with an explicit characterization: the smallest relation ~ that identifies ((A,B1,B2,C), a, b1, b2, c, p) ~ ((A,B2,B1,C), a, b2, b1, c, p).

But I also want to go the next step. (WIP)

def finset : Type -> Type .

-- Turn a choice into a finset
def b {α β : Type} [fintype α] : (α ↬ β) → finset β .

lemma b_length {α β : Type} [fintype α] : ∀(f : α ↬ β), length (b f) = |α| .

-- Aggregate two finsets
def agg {β : Type} : finset β → finset β → finset β .

notation b1 <> b2 := agg b1 b2

def Day := fin 365
def Person := fin 100

⟨A, a⟩ : (Day \ {}) × (fin 2 ↬ Person \ {})
⟨B1, b1⟩ : (Day \ {A}) × (fin 3 ↬ Person \ a)
⟨B2, b2⟩ : (Day \ {A,B1}) × (fin 3 ↬ Person \ a <> b1)
⟨C, c⟩ : (Day \ {A,B1,B2}) × (fin 4 ↬ Person \ a <> b1 <> b2)
p : (Person \ a <> b1 <> b2 <> c) ↪︎ (Day \ {A,B1,B2,C})
__________________
bdays : Person ↪︎ Day

Probability monad

I havenʼt delved into the existing research much, but it is a well-known fact that probability distributions form a monad.

Where is the information?

I always find it interesting to track which parts of a definition (mathematical, etc.) are uniquely determined by the other parts. This is a quasi-combinatoric view that data is determined by freedom to make choices, and everything else, while informative in one sense of providing properties/boundaries/restrictions, does not actually provide the information that characterizes an object. Itʼs applicable in many contexts, and Iʼll just go through some hand-wavey examples to show you how I think about it.

Note that I will be assuming univalence in this post, which means isomorphic types are equal. This greatly simplifies how types are treated, as you will see.

Algebraic examples

Letʼs take the algebraic hierarchy of magmas to groups as our first set of examples. At each step we will discuss what each piece of data brings to the picture.

Magmas

Magmas have a carrier type T : Set (an h-set) and a binary operation op : T → T → T. In code it would be magma := Σ(T : Set), T → T → T, or as a structure in Lean-prover syntax:

structure Magma :=
(T : Set)
(op : T → T → T)

It turns out that the type doesnʼt matter as much as the operation does, especially once it is wrapped in a magma that hides the type (I will try to justify this later in the section on categories). For any equivalent types T1 and T2, you can take an operation op1 : T1 → T1 → T1 and define an equivalent operation op2 : T2 → T2 → T2 by transporting along the equivalence as needed. This is esssentially the definition of what it means for two operations on possibly different types to be equivalent – that is, for two magmas to be equal:

Π(T1 T2 : Set), Π(op1 : T1 → T1 → T1), Π(op2 : T2 → T2 → T2),
(Magma.mk T1 op1 = Magma.mk T2 op2) = Σ(t : T1 = T2), Π(a b : T1), apply t (op1 a b) = op2 (apply t a) (apply t b)

In words this means for all T1, T2, op1, and op2 as above, the corresponding magmas are equal precisely when the types are equal and the operations are also equal when transported along that equality. (Note that due to proof-relevance in HoTT, there may be many ways of establishing an equality between types, so magmas on the same type may be equal in interesting ways.)

Of course, since we are working in type theory, the type matters, but really it is determined by the function (two functions could not be considered equivalent in any reasonable sense if their types were not equivalent), so for the purposes of our argument, we will treat is as uniquely determined by the operation.

Thus a magma is essentially determined by its operation.

Semigroups

Semigroups are magmas whose operation is associative:

structure Semigroup :=
(T : Set)
(op : T → T → T)
(notation a `•`:65 b:65 := op a b)
(assoc : Π(a b c : T), a • (b • c) = (a • b) • c)

Note that since T is an h-set, assoc will be an h-prop, which means it contributes no data, only a unique proof. So if you accept that magmas are uniquely determined by their operation, semigroups are still uniquely determined by that.

Monoids

Monoids are semigroups with an identity element i:

structure Monoid :=
(T : Set)
(op : T → T → T)
(notation a `•`:65 b:65 := op a b)
(assoc : Π(a b c : T), a • (b • c) = (a • b) • c)
(i : T)
(identity : Π(a : T), a • i = a ∧ i • a = a)

Now we are adding data, i : T, which looks like it could be any element of T, an arbitrarily large type! But it turns out that it is still uniquely determined by the operation: for identities i1 i2 : T, we have a proof that i1 • i2 = i1 (right identity of i2) and another that i1 • i2 = i2 (left identity of i1), so it must be that i1 = i2.

Another way to formulate it would be with the following sigma type replacing the separate fields i and identity:

def IdentityOf {T : Type} (op : T → T → T) :=
  Σ(i : T), Π(a : T), op a i = a ∧ op i a = a

And then one could prove that IdentityOf op is a subsingleton, i.e. h-prop.

So we see that each operation has at most one identity. So a monoid describes a magma with some proofs of nice properties, but no additional choices were made after deciding what operation to use (and on what type).

How will it look for groups?

Groups

Groups are monoids with inverses:

structure Groups :=
(T : Set)
(op : T → T → T)
(notation a `•`:65 b:65 := op a b)
(assoc : Π(a b c : T), a • (b • c) = (a • b) • c)
(i : T)
(identity : Π(a : T), a • i = a ∧ i • a = a)
(inv : T → T)
(inverse : Π(a : T), a • inv a = i ∧ inv a • a = i)

Again, thereʼs a standard abstract algebra proof that inverses are unique, which means the function inv witnessing the totality of the inverse relation is also unique, so again, being a group is a property of the magma operation, there are no additional choices made in the definition of a group.

Like before, inv and inverse could be bundled into a subsingleton sigma type like so:

def InverseOf {T : Type} (op : T → T → T) (i : T) :=
  Σ(inv : T → T), Π(a : T), op a (inv a) = i ∧ op (inv a) a = i

Takeaways

Here weʼve been studying magmas &c. as mathematical objects, and we saw that they were HoTT-equal when their operations were equivalent under the change of type, and, as a related fact, that the extra fields of monoids/groups added information about properties of the operation but no additional data (i.e. choices).

However, these algebraic structures are most often used via typeclasses, in programming languages like Haskell and theorem provers like Lean, where they are parameterized over their carrier type: class magma (T : Type) := (op : T → T → T), etc. In this way, an instance of the class provides a canonical choice of operation for a particular (nominal) type, and easy/implicit access to the properties like associativity. This can be great for brevity, but can be a challenge where there are multiple natural choices for an operation on the same type, such as addition/multiplication, or when you run into other dark corners of typeclass issues (like diamonds, where the same type can have two different instances from generic derivations).

But as it turns out, most of these issues go away when you parameterize the classes by the operation (and leave the type implicit), like class semigroup {T : Type} (op : T → T → T) := (assoc : _). All the data thatʼs hidden away through the implicit resolution of the typeclass system is no surprise, as it can only follow from the operator, so there wonʼt be any chance of two incompatible instances being inferred (although one could still run into issues with a lack of definition equality in identities and inverses).

Furthermore, I would also argue that this kind of structure philosophically fits better in a Univalent universe, because it acts more like a well-defined-but-partial function from Magma -> {Group,Monoid,Semigroup,...}, one that respects type equalities instead of making decisions based on type names.

While the operators would have to be explicit with this setup, it is compatible with the existence of canonical choices (e.g. a + operator could still be disambiguated by type), and it still retains the ability for identities and inverses to be implicit in some sense (since they are not parameters to the structure), but syntactically disambiguating the identites would get harder in general.

On a final note, of course these results generalize to other algebraic structures, such as the hierarchy from rigs/semirings and rings to fields, which would need to be parameterized over the additive and multiplicative operators together.

Other examples

Categories

Letʼs apply what weʼve learned to (small) categories, which are much richer objects than magmas. Start with the laws:

structure CategoryLaws (obj : Set) (hom : obj → obj → Set)
  (comp : Π{a b c : obj}, hom a b → hom b c → hom c d)
  (id : Π(o : obj), hom o o) :=
(left_id := Π{a b : obj} (f : hom a b), comp id f = f)
(right_id := Π{a b : obj} (f : hom a b), comp f id = f)
(comp_assoc := Π{a b c d : obj} (f : hom a b) (g : hom b c) (h : hom c d), comp f (comp g h) = comp (comp f g) h)

Now a category, as a mathematical object, looks like

structure Category :=
(obj : Set)
(hom : obj → obj → Set)
(comp : Π{a b c : obj}, hom a b → hom b c → hom c d)
(id : Π(o : obj), hom o o)
(lawful : CategoryLaws obj hom comp id)

What information characterizes this? Well, as we were saying earlier, it isnʼt obj, since it can be inferred from hom. Objects are essential for the form and structure of a category, dictating which morphisms can be composed, but they are not the important part. This is exactly what category theorists have been saying for decades: it’s the arrows that matter, not the objects!

But wait a second, is that really right? Actually, I would argue that comp is what determines the category – since hom can be inferred from that! This seems a little weird, since we very rarely talk about what composition looks like in a category. This is because it is usually determined by hom and parametricity (e.g. for functions, there really is only one way to parametrically compose them, the obvious combinator forall a b c. (a -> b) -> (b -> c) -> (a -> c)), and once you add in the restrictions on associativity and identity, there is usually a unique choice (I think relations fall into this category) or an canonical “interesting” choice.

Of course, id does not contribute information (it is uniquely determined by comp).

So it looks like hom often determines what category we are talking about, but it is really comp . All the rest of the fields serve to constrain comp in some way, saying which homs it needs to be able to relate, or what properties it should have.

In fact, another way to look at categories would be to take the comp operation to be partial not in the sense of requiring its arguments to have type indices that align (forall a b c. hom a b -> hom b c -> hom a c), but being able to return nothing for certain combinations of arguments (hom -> hom -> Maybe hom). This looks more like a magma operation, but the indices need to be recovered with dom : hom -> obj and cod : hom -> obj which define where composition is allowed: comp f g = Nothing iff cod f = dom g. It turns out that this is less convenient to work with, but it is mathematically equivalent with the right rules (e.g. id : obj -> hom and comp f (id (cod f)) = Just f). Hopefully this helps clarify why comp is the locus of information in a category.

Cardinals and Ordinals

The rules

The heuristics used informally above gathered in one place:

  1. Consider types to be non-informative when they can be inferred from later arguments (so pretty much most of the time). Behavior is determined by functions and terms, not by types!
  2. Proofs (of h-props) are non-informative. Usually these are pretty obvious.
  3. Sometimes terms, even functions, are non-informative when there are restrictions on them that make them unique (e.g. identities and inverses of operations). This will require more careful examination and arguments than the other cases but is definitely worth figuring out.

I think inductive types are one of the most amazing ways of reasoning about mathematical objects and I wish I saw more explanations of them. Iʼll try to do my best here.

Function types

Trivial types

For most programmers, trivial types donʼt look so useful at first glance – who would ever need a type that has no constructors!! – but they are actually useful for a couple reasons: They ensure that the type system is “complete” and able to represent a wide range of types, and they are useful when you need to plug a type in but want the most minimal thing.

Think of it like this trick question: how would you represent 12 as a product of three integers? One possible answer would be 3 * 4 * 1, or maybe 12 * 1 * 1. If we didnʼt have 1, the multiplicative identity, we couldnʼt do this! Similarly, the trivial types let us plug in something when we absolutely must. They wonʼt convey any information, but they will fill in the hole.

The two most trivial types are the empty type, with zero inhabitants, and the unit type, with one inhabitant. Category theorists love trivial types, and they call the empty type the initial object (in the category of sets) and the unit type the terminal object.

data Void -- yup, thatʼs all

data Unit = MkUnit

Basic data types: (non-recursive) ADTs

Algebraic Data Types are a term for a manner of imagining new types based on existing types, in a simple pattern of constructors filled with values.

Product types

Product types store multiple independent pieces of data together. We can create specific product types, but even better, we can create a generic product type parameterized over its two constituents.

data ProductOfTAndD = MkProductOfTAndD T D

data Product a b = MkProduct a b
data Product a b where
  MkProduct :: a -> b -> Product a b

A lot of languages have support for records in various ways, which are essentially products with any number of named fields.

Sum types

Sum types are really cool to work with, especially with pattern matching.

Sum types can represent tagged data, but my favorite part about them is that you donʼt have to care about how they are represented internally.

Like product types, we can think of a sum of specific types or of two generic types.

data SumOfTAndD = SumOfT T | SumOfD D

data Sum a b = InL a | InR b
data Sum a b where
  InL :: a -> Sum a b
  InR :: b -> Sum a b

Constructors

In general, we can define a non-recursive ADT to be.

Pattern matching

Pattern matching is the bread and butter of inductive types. So far we have seen in GADT syntax the representation of constructors, which allows us to create new values of an ADT, but if we ever want to analyze a value that came from somewhere else, we need pattern matching.

Pattern matching looks at the possible shapes of a value (as dictated by its type) and lets the information contained in the value be used to produce a result. It turns out that a lot of nested conditionals in other languages can be replaced with pattern matching, and the result is almost always cleaner and easier to read!

Functional programming languages usually have a syntax for pattern matching that looks a lot like constructing values, but follows a few different rules. In fact, pattern matching is more restrictive, so not every use of a constructor is a valid pattern match, but most pattern matching is valid as a constructor.

For datatypes with only one constructor (that is, product types), there only needs to be one case to analyze. A lot of programming languages make this easier to use by allowing it to be inlined as a function argument, even in an anonymous function:

useProduct :: Product Int Int -> Int
useProduct (Product i j) = i + j * (i - 1)
-- alternative syntax
useProduct = \(Product i j) -> i + j * (i - 1)

Duality: Sums and Products, Constructors and Pattern matching

This is our first chance for a note on duality. Duality is a very common and useful pattern to recognize!

Products and sums are a classic example of a duality: Products require two pieces of information to construct but , while sums require only one piece of information but need to handle two cases when pattern matching.

Infinite (recursive) data types

Natural numbers are the simplest infinite inductive type, and they can be defined as follows:

inductive nat : Type
| zero : nat
| succ : nat → nat

notation ℕ := nat

It turns out that this is an encoding of Peanoʼs axioms as a datatype. It is the smallest type with an element zero and an injective function succ that satisfies the required properties, such as injectivity, and the fact that its image is distinct from any other constructor (namely, zero ≠ succ n for all n : ℕ). An element of this type looks like succ (succ (… zero)) (for some finite number of succs), but can more generally be thought of as “freely generated” by those constructors. (HITs are where this freely generated bit really matters, but weʼre not there yet.)

How do you use a value of nat? Well, you can still pattern match on it, but itʼs a little more complicated now: there are infinite possible patterns to try to match!

Weʼll see a little later how to package this up more nicely and make it well-behaved, but for now, letʼs just assume that our language has recursion, so we can call our function as we are defining it:

describe :: Nat -> String
describe Zero = "Zero, the additive identity"
describe (Succ Zero) = "One, the multiplicative identity"
describe (Succ (Succ Zero)) = "Two, the smallest prime number" -- my favorite number!
describe (Succ (Succ (Succ Zero))) = "Three, the smallest odd prime number"
describe (Succ (Succ (Succ (Succ n)))) = describe n <> ", plus four"

We can pick out specific numbers to test for, or we can remove a few layer of Succ and delegate the rest to another function (or the same function, recursively).

Quantification over types

Forall quantification

Parameterized data: functors, HKDs

Type equality

GADTs

Generalized Algebraic Data Types are our final step before we get to dependent types, which I consider the ultimate goal of type systems.

GADTs expand the power of parameterized data types with the ability to be by more specific about the parameter. It essentially adds two more features: existentials and type-equality. (This means that if you have a type theory that doesnʼt have GADT syntax, but does have existentials and type equality, then you can in fact rewrite the GADTs in terms of those.)

As it has been implemented in Haskell it has a different syntax than normal ADTs, but in fact it is more in line with other functional programming languages and theorem provers, which tend to favor GADT-like syntax only instead of Haskellʼs ADT syntax. Maybe you can see why.

Again, we specify a list of constructors, but this time we specific not its arguments, but the actual type of the constructor. For each constructor, we can quantify it however we want, we can take in arguments of any type, and the only restriction is that we have to have the return type be the GADT weʼre constructing applied to some arguments. This means that the constructor can quantify over types that donʼt appear in the result (existentials) and that the result type can have parameters that are not just arbitrary variables but can be more specific, or even concrete types (type equality).

(insert example)

Two notes:

  1. It is actually possible to have existentials in regular ADTs in Haskell.
  2. Both existentials and type equality can be encoded in quantified functions, but those encodings are much less convenient to use: type Existential f = (forall r. (forall a. f a -> r) -> r) and type LeibnizEquals a b = forall f. f a -> f b.

Dependent quantification

Now we get to the fun stuff: dependent types!

It turns out that many of the restrictions on what have to be types and what things have to be values were unnecessary: they can mix together and coëxist on more-or-less equal footing. It turns out to be a radical re-imagining of the foundations of programming languages/type theory, and requires a lot more care when designing a compiler, but it actually simplifies a number of things in the theory and syntax.

General equality, DITs, subtypes, quotient types

Univalent equality, HITs, propositional truncation

Induction rules

More on pattern matching

Pattern matching in Agda

More on function types

Recursion

We punted this question earlier: when is function recursion well-behaved?

It turns out to be a very difficult question, in that we canʼt give a truly satisfactory answer, but we can give one answer that is simple enough and another answer that is effective enough to fill in the gaps. The reason why it is really difficult is that if we could solve the problem of when recursion is well-behaved, we would be able to solve the halting problem in general, which is impossible to do automatically! Which is why we either settle for a simple but leaky heuristic, or we pull out the big guns and give a proof of termination.

Structural recursion

The simplest answer is that we can apply a function recursively only to things that are contained within or “smaller” than the current argument.

It turns out that using recursion principles automatically packages up the recursion in a structural way.

Well-founded recursion

Sometimes the function isnʼt structurally recursive, and so we need to prove for the compiler that it will in fact terminate.

The general theory of functions is well-founded recursion based on well-founded relations. The most familiar well-founded relation is the less-than relation on natural numbers: Thereʼs a base case, 0, and if you start at any natural number and ask for another number that is strictly less than it, you eventually reach that base case (you canʼt go on forever).

(insert definition of well-founded relation.)

How we “prove” that a function terminates is by recursion on this relation, which means that we start by analyzing the argument, and get to recurse in the function only if we can prove our new argument is smaller with respect to the relation than the current argument, otherwise the recursion might not be well-defined (meaning, computationally, that it might loop). If we cannot find a smaller argument, we must have hit a base case, and should produce a value directly.

(Aside: It turns out that the class of well-founded relations are called ordinals, and they can get really reaaally reaaaally large. Different theories will have larger or smaller ordinals that they can construct.)

Parametricity

This is actually my favorite part!

Böhm–Berarducci encoding

This is related to (and often conflated with) Church encodings, but Böhm–Berarducci are more powerful and they specifically address what happens in a typed setting.

So it turns out that the recursion principles we looked at above actually capture all we need to know about a datatype.

I want to write some brief comments on equality in type theories.

If you only take away one thing, let it be this:

Definitional equality t1 ≡ t2 means that t1 and t2 are obviously equal, by a decidable syntactic algorithm, and is the basis for a compilerʼs typechecking process, as well as internal propositional equalities via refl.

One of the basic facts about type theories is that they rely on two primary judgments:

  • A term t is well-typed with a type T: t : T
  • Two terms t1 and t2 with the same type T are equal: t1 ≡ t2

(Note that each judgment takes place in a context, usually labelled Γ, which is just a list pairing variables that have been bound with their types, but we omit it here for brevity.)

A compiler for a type theory will look at a term and follow certain rules to associate it with a type (type inference), or verify that it has a specified type (type checking). The mathematical formalization of type theories doesnʼt really distinguish these, and the typing judgment acts like a bit of both.

Letʼs consider a fundamental example of a typing judgment: function application. If I have a function f, and I want to apply it to an argument a, how does a compiler know that they are compatible? (Note that these stand for well-typed expressions, so they could be variables in scope, but they donʼt have to be.)

Letʼs say that f : T1 and a : T2. Well, first things first, we want f to actually be a function type: we require f : T3 -> T4. Next we want to apply a to it, so its type had better line up with the functionʼs input type. We might ask that a : T3 instead, since that is the functionʼs input type, but itʼs far better to make explicit what we mean: letʼs instead ask for T2 ≡ T3, since a priori we donʼt know they are equal, and we instead want to require that they are indeed. If all these conditions are satisfied, we say that f a : T4.

In the syntax of judgments we would write it this way:

f : T3 -> T4 -- require `f` has a function type
a : T2       -- require `a` is well-typed
T2 ≡ T3      -- require that `a`ʼs type matches the input type
____________
f a : T4     -- then `f` applied to `a` has the output type

This is a simple but fundamental example: in order to make sure everything lines up, compilers are constantly checking whether types are equal. And in dependently-typed languages, where terms can show up in types, this extends to terms too.

Now it turns out that equality is really hard to prove or check; otherwise mathematicians would be out of their jobs! But compilers do their best with a limited set of well-behaved rules.

Obviously one start is that if two expressions are literally the same, then they are judgmentally equal. Then next we ask: what if they bind different variables, but are still really the same? This is called alpha-equivalence, it means that λ(a : t) → a and λ(b : t) → b are definitionally equal, which hopefully makes sense (theyʼre both the identity function!). And so on: there are particular rules that solve some part of definitional equality (like eta-equivalence, beta-equivalence, etc.).

All this means that the compiler can do a fair amount of work in order to check that two things are equal. But thereʼs no magic to it: it wonʼt solve calculus equations for sure! It really underestimates when two things are equal, which in turn means that it cannot be used to prove that two things are different.

This is where we need to talk about propositional equality, which can be defined like so (in Agda):

data _≡_ {A : Set} (x : A) : A  Set where
  refl : x ≡ x

Or like this, in Lean:

inductive eq {T : Type n} : T -> T -> Type n
| refl : Π {x : T}, eq x x -- notated x = x

Donʼt let Agdaʼs notation fool you: this isnʼt judgmental equality anymore! For clarity I will stick to a normal equals = for propositional equality.

Propositional equality really starts with definitional equality: we really need to use refl to prove that two things are equal – after all, it is the only constructor – and it turns out that refl : x = y is only well-typed when x ≡ y definitionally.

But propositional equality has more flexibility than judgmental equality, since it is internal to the type theory and it lives in a type universe, it can be asked for as an argument, it can be negated to ask that two things are not equal, and it can be used for complex proofs (including of calculus!). But the fundamental start of it all, how you prove anything is equal at all, is definitional equality.

A couple miscellaneous notes for now:

  • Judgmental equality should follow the rules of normal equalities, such as symmetry and transitivity, but at least the Lean prover fails to be adequately transitive in some cases (particularly regarding quotients).
  • A (good) compiler will always accept replacing one term with another that is definitionally equal to it
  • The equality type is semantically the smallest reflexive relation; in HoTT it is more complex than its inductive definition would suggest, but it still works as if it just had that one constructor (I cannot hope to justify that here, but you probably do not need to worry about it unless you are specifically interested in HoTT).
  • Some type theories, called extensional type theories, reflect propositional equalities back to judgmental equalities, so there is no distinction, but it makes type checking undecidable

The purpose of unification

Unification is a key part of typechecking: itʼs how the type checker proves two things are the same. The trick about unification is that it has to work with syntactic equality, and not semantic equality. This is because semantic equality is undecidable, so no algorithm exists for it, but there are algorithms that can approximate it via syntactic equality. (It means that the algorithm might miss stuff, since semantic equality is richer than syntactic equality, but if two things are syntactically equal they must also be semantically equal, so nothing ill-behaved will happen.)

Basic unification ideas

There are a bunch of rules that define how unification operates. The exact details depend on the many details of how a particular type theory works, but there are some basics that are more or less universal.

The most basic rule is that if two types have the same abstract syntax tree, then they can be unified. This means that two types with the same names unify, of course. It means that function types unify when their input types and their output types unify (i -> o unifies with i' -> o' when i unifies with i' and o with o'). Other type constructors follow a similar pattern to function types.

Things get much more interesting when the algorithms solves for (type) variables by unification. I wonʼt get into all the details of how variables work, but the primary cases this comes up are:

  • In Haskell-like languages, this occurs when applying a polymorphic function, where the polymorphic type variables are solved for implicitly.
  • In most dependently-typed languages, this occurs when applying a function with implicit arguments. (Note that these need not be types, they could be values.)
  • Other cases might be when there are wildcards or holes that should be solved for.

The Diagonal of Universe Level and H-Level.

In a type theory without HITs, it seems to me that there is some correspondence between universe level and h-level. In particular (without HITs), types in Type 0 will be at most h-sets, types in Type 1 will have at most h-level 3, etc.

Parametricity and Predicativity

In Reynold’s paper on parametricity, it is stated that set-theoretically, a naïve definition of functions that polymorphic over sets is too large to form a set itself, and instead must be a class, but, crucially, they are equivalent to a set when they are parametric. We see a parallel in (predicative) type theory, where polymorphic functions must lie in a universe higher than what they quantify over, but of course parametric functions biject with corresponding inductive types in a well-known way (e.g. with Boehm-Berarducci encoding), and in fact the inductive type will lie in a smaller universe because it skirts the issue of quantification. Is there a way to recover this kind of shrinking effect in type theories without inductive types?

For example, if I had a dependent type theory with only universes and pi types, I would still be able to form a type of “natural numbers” and many other inductive types, but they would all live in universes that are too large, and in fact it seems like it would lead to an explosion of universes.

In particular, if we define a few types:

Nat_n : U_(n+1) :=
  (∀T:U_n, (T → T) → T → T, suitably restricted to be parametric)
List_n,m : U_n -> U_(n+1 ∨ m) :=
  λA:U_m, (∀T:U_n, (T → A → T) → T → T, suitably restricted to be parametric)

Prop elimination

More generally, “prove” that a thing lives in a universe?

Internalize lack of injection to higher universe?

Obviously Type 1 and Type 2 are not isomorphic. Is there an internal proof of this?

Type Formers and Cardinality

Letʼs construct some infinite types! We wonʼt get too large, but will cover the beth numbers.

This process will require us to first construct ℶ₀, then we can follow a regular construction to get to ℶ₁, ℶ₂, etc.

We will use functions (of course!), along with either a natural numbers object (NNO/ℕ) or recursive data types (ADTs/inductive types). It turns out that more sophisticated extensions, like GADTs or subtypes/quotients, do not influence the particular results here. As far as the syntax goes, weʼll freely promote cardinalities to types (so 1 is the unital type, 2 is the type of booleans, etc.), and will measure the cardinalities of types with bars: |ℕ| = ℶ₀. We will similarly use the standard operators (+, *, and ^) both for cardinal arithmetic and the corresponding type operators.

Note: in what follows, weʼll be assuming the Axiom of Choice (AC) so the cardinal numbers are well-ordered.

Starting Off: Countable Infinity

Where does infinity come from? Letʼs start with the smallest infinity, which we call countable. Thereʼs two equivalent answers: natural numbers and ADTs/inductive types.

The natural numbers ℕ are special because theyʼre the smallest infinite type. This can be seen with their construction as an inductive type: one constructor to make it inhabited (zero : ℕ), and another to make it infinite (succ : ℕ → ℕ). No bells and whistles.

But, if weʼre talking about cardinals (as weʼre doing here), or alternately if we accept Univalence, then all countable cardinals/types are equal, so this fact of the particular construction does not matter.

In fact, it turns out that if you exclude mentioning function types in your inductive definitions, then all recursive data types are countable.

(Quick proof sketch: Assume all existing types are countable. Inductive types have the capabilities of sums, products, and recursion. Sums and products both preserve countability. And recursion also preserves countability, since it constructs the least fixed point, and nothing will push it to be uncountable.)

(Note: this goes for all flavors of ADTs: GADTs and I believe higher inductive types in HoTT too.)

So if you donʼt use function types in your ADTs, I can guarantee they are countable! (And thus serializable and discrete.)

There are thus a lot of equivalent ways to come up with our first infinite cardinal ℶ₀, and hopefully this shows you why it isnʼt an arbitrary decision, although we typically only talk about the natural numbers and countability.

Stepping Up: The Power of Functions

One way of thinking of the cardinality of a type is that itʼs the number of choices you have to pick an inhabitant of that type.

To pick an 8-bit number, you have 2^8 = 256 numbers to choose from; the type of 8-bit numbers has cardinality 256.

To pick a list of 8-bit numbers, you first have to choose the length n, which can be any natural number (so countably many choices), and then you have to fill n slots with 8-bit numbers, so 2^8^n choices – but these finite choices donʼt matter, and itʼs a countable number of choices overall: ℶ₀.

To pick a list of natural numbers, choose a length n, and then n natural numbers to fill the array. Itʼs a little trickier to see, but this is also a countable type, since we want to compute ℶ₀ × ℶ₀ × … × ℶ₀ (n+1 times) and it turns out that ℶ₀ × ℶ₀ = ℶ₀ so ℶ₀^n+1 = ℶ₀. (This can be seen by constructing a bijection from pairs of natural numbers to natural numbers, which is also useful for proving that integers and rationals are countably infinite too.)

So these constructions only result in countably many choices. How do we get a larger infinity?!

The answer is, of course, function types! Functions let you analyze the input and branch on arbitrary distinctions, but that means for each choice of input, you must make a choice of output to associate to that input. This means that instead of a product of finitely many ℶ₀s, we can instead get a product of ℶ₀-many ℶ₀s. (This makes sense if we consider how (dependent) functions also moonlight as infinite products.)

So if we want to choose a function of type ℕ → ℕ, we have to first choose the value of f 0 : ℕ which has countably many choices, then we need to repeat this countably many times! This results in ℕ → ℕ having a cardinality of ℶ₀^ℶ₀ which is our long-sought-after uncountable cardinal ℶ₁.

However, it turns out that we can economize on our types, and the cardinality of ℕ → 2 is enough to reach ℶ₁. To summarize the previous process, each function was the product (quite literally) of countably many choices, and there were uncountably many other functions we could make with different choices. The fact that each choice of where f n went was also countable is actually immaterial: it suffices to be a binary choice, the emphasis is on how each function incorporates an infinite number of choices in it. (Note: I think this is just helpful intuition, but weʼll formalize how it works in general and clean up the rough edges in the section on arithmetic.)

Now that weʼve clarified the essential ingredients, we can repeat the construction quite easily: ℶ₂ is the cardinality of (ℕ → 2) → 2, ℶ₃ is the cardinality of ((ℕ → 2) → 2) → 2, and in general, ℶₐ₊₁ = 2^ℶₐ.

Another way of viewing this is as the powerset construction: ℶₐ₊₁ = ℙ(ℶₐ).

Whatʼs Covered and Whatʼs Missed?

If we assume the generalized continuum hypothesis (GCH) and the axiom of choice (AC), beth numbers are essentially the only cardinalities we can construct with this iterated function/powerset construction. (There will still be larger cardinals that can be constructed otherwise, with .)

Without GCH, there will be other cardinals in between the beth numbers, but we can’t give them an explicit construction. (But when has that ever stopped destructive mathematicians? 😉)

Without AC, the cardinals aren’t even well ordered so we can’t say much, other than that there are cardinals other than the beth numbers that we can’t really relate to but aren’t really distinguishable either. (This last part might be wrong.)

Here are some examples of types/sets with the cardinality of each beth number:

ℶ₀

  • The type of natural numbers ℕ, of course
  • Also the type of integers ℤ and the type of rationals ℚ
  • The type of lists with elements of countable type

ℶ₁

  • The powerset of any type of cardinality ℶ₀ (i.e. countable type)
  • The type of sequences of any countable type
  • The type of real numbers ℝ
    • Note that there are various constructions of the real numbers, but at some point they each have to use a function or powerset construction to increase the cardinality to ℶ₁, and further steps must not decrease back to ℶ₀.
    • A naïve view of real numbers as infinite sequences of decimal digits (with a couple corner cases) captures well the intuition that there has to be a process that generates these additional choices.
    • The construction of ℝ via Cauchy sequences is a quotient of a subtype of sequences of rationals (ℕ → ℚ), those sequences which are Cauchy-convergent (intuitively meaning that they eventually bunch together to point to a real number) quotiented by the relation that identifies two real numbers when the difference of their Cauchy sequences goes to 0. This assumption of Cauchy-convergence and the further quotienting does not rule out enough functions to affect the cardinality.
    • The Dedekind construction starts with powersets of rationals (ℚ → 2) and subtypes them to ensure they are well-formed Dedekind cuts, again without changing the cardinality.
  • More remarkably, the cardinality of the type of continuous functions from ℝ to ℝ is also ℶ₁. (Removing the continuity restriction would of course give a cardinality of ℶ₂.)

More examples can be formed either with the powerset construction, or by using functions, as elaborated on in the next section

Cardinal Arithmetic

-- A model of the cardinality of a type:
-- Either finite with the specified number of inhabitants,
-- or infinite, with the number of inhabitants as indexed by the beth numbers
data Card
  = Finite Nat
  | Beth Nat

instance Ord Card where
  compare (Finite a) (Finite b) = compare a b
  compare (Finite _) _ = LT
  compare _ (Finite _) = GT
  compare (Beth a) (Beth b) = compare a b

-- On finite arguments, sum and product work as usual,
-- otherwise, take the larger argument
sumCard :: Card -> Card -> Card
sumCard (Finite a) (Finite b) = Finite (a + b)
sumCard a b = max a b
prodCard :: Card -> Card -> Card
prodCard (Finite a) (Finite b) = Finite (a * b)
prodCard a b = max a b

-- funCard a b is the cardinality of (a -> b)
-- (the flipped version would be the exponential b ^ a)
-- The main rule of interest is the last one
funCard :: Card -> Card -> Card
funCard (Finite 0) _ = Finite 1 -- initial object, b ^ 0 = 1
funCard _ (Finite 1) = Finite 1 -- terminal object, 1 ^ a = 1
funCard _ (Finite 0) = Finite 0 -- 0 ^ a = 0 (a /= 0)
funCard (Finite 1) b = b -- b ^ 1 = b (b /= 0)
funCard (Finite a) (Finite b) = Finite (b ^ a)
funCard (Finite _) (Beth b) = Beth b -- Bb ^ n = Bb
funCard (Beth a) (Finite _) = Beth (a + 1)
  -- n ^ B_a = B_(a+1) (n >= 2)
funCard (Beth a) (Beth b) = Beth (max (a+1) b)
  -- B_b ^ B_(b+n) = B_(b+n+1)
  -- B_(a+n) ^ B_a = B_(a+n)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment