NOTE: This is an article by Chris Taylor, originally posted on his blog, but the blog was removed at some point. I resurected it from the WayBack Machine without the authors permission, but post it here for posterity's sake, so others may be able to find it.
The Algebra of Algebraic Data Types, Part 1
by Chris Taylor
FEB 10TH, 2013
I gave a talk on this subject at London Haskell in November 2012. A video of the talk is on YouTube and slides are on GitHub.
In this series of posts I’ll explain why Haskell’s data types are called algebraic - without mentioning category theory or advanced math.
The algebra you learned in high school starts with numbers (e.g. 1, 2, 3 …) and operators (e.g. addition and multiplication). The operators give you a way to combine numbers and make new numbers from them. For example, combining 1 and 2 with the operation of addition gives you another number, 3 - a fact that we normally express as
When you get a little older you are introduced to variables (e.g. x, y, z …) which can stand for numbers. Further still, and you learn about the laws that algebra obeys. Laws like
which hold for all values of x. There are other laws as well, which define properties of numbers or of operations.
When mathematicans talk about algebra, they mean something more general than this. A mathematical algebra has three parts:
- Objects are the ‘things’ of the algebra. The collection of objects defines what we’re talking about.
- Operations give us ways to combine old things to make new things.
- Laws are relationships between the objects and the operations.
In high school algebra the objects are numbers and the operations are addition, multiplication and friends.
The algebra of Haskell types
In the algebra of Haskell types, the objects are types, for example
Int. The operators take types that already exist, and generate new types from them. An example is the type constructor
Maybe. It’s not a type itself, but you use it to create types - for example
Maybe Bool and
Maybe Int, which are types. Another example is
Either, which creates a new type from two old types - for example
Either Int Bool.
A connection to the more familiar algebra of numbers can be seen by counting the possible values that a type has. Take
Bool, defined by
data Bool = False | True
There are two values that an object of type
Bool can have – it is either
True (technically it could also be
undefined – a fact that I’m going to ignore for the rest of the post). Loosely, the type
Bool corresponds to the number ‘2’ in the algebra of numbers.
Bool is 2, then what is 1? It should be a type with only one value. In the computer science literature such a type is often called
Unit and defined as
data Unit = Unit
In Haskell there is already a type with only one value - it’s called
() (pronounced “unit”). You can’t define it yourself, but if you could it would look like
data () = ()
Using this counting analogy,
Int corresponds to the number 2^32, as this is the number of values of type
Int (at least, it is on my machine).
In principle we could types corresponding to 3, 4, 5 and so on. Sometimes we might have a genuine need to do this - for example, the type corresponding to 7 is useful for encoding days of the week. But it would be nicer if we could build up new types from old. This is where the operators of the algebra come in.
A type corresponding to addition is
data Add a b = AddL a | AddR b
That is, the type
a + b is a tagged union, holding either an
a or a
b. To see why this corresponds to addition, we can revisit the counting argument. Let’s say that
(), so that there are 2 values
a and 1 value for
b. How many values of type
Add Bool () are there? We can list them out:
addValues = [AddL False, AddL True, AddR ()]
There are three values, and 3 = 2 + 1. This is often called a sum type. In Haskell the sum type is often called
Either, defined as
data Either a b = Left a | Right b
but I’ll stick with
A type corresponding to multiplication is
data Mul a b = Mul a b
That is, the type
a · b is a container holding both an
a and a
b. The counting argument justifies the correspondence with multiplication - if we fix
b to both be
Bool, the possible values of the type
Mul Bool Bool are
mulValues = [Mul False False, Mul False True, Mul True False, Mul True True]
There are four values, and 4 = 2 x 2. This is often called a product type. In Haskell the product is the pair type:
data (,) a b = (a, b)
but I’ll stick with Mul.
Using addition and multiplication we can generate types corresponding to all the numbers from 1 upwards - but what about 0? That would be a type with no values. It sounds odd, but you can define such a type:
Notice that there are no constructors in the data definition, so you can’t ever construct a value of type Void – it has zero values, just as we wanted!
Laws in the algebra of Haskell types
What are the laws for the types we’ve just defined? Just like in the algebra of numbers, a law will assert the equality of two objects - in our case, the objects will be types.
However, when I talk about equality, I don’t mean Haskell equality, in the sense of the
(==) function. Instead, I mean that the two types are in one-to-one correspondence – that is, when I say that two types a and b are equal, I mean that you could write two functions
from :: a -> b to :: b -> a
that pair up values of a with values of b, so that the following equations always hold (here the
== is genuine, Haskell-flavored equality):
to (from a) == a from (to b) == b
For example, I contend that the types
Add () () are equivalent. I can demonstrate the equivalence with the following functions:
to :: Bool -> Add () () to False = AddL () to True = AddR ()
from :: Add () () -> Bool from (AddL _) = False from (AddR _) = True
I’ll use the triple equality symbol,
===, to denote this kind of equivalence between types.
Laws for sum types
Here are two laws for addition:
Add Void a === a
which says that there are as many values of type
Add Void a as there are of type a, and
Add a b === Add b a
which says that it doesn’t matter which order you add things in. These laws are probably more familiar to you in the algebra of numbers as
0 + x = x x + y = y + x
If you fancy an exercise, you can demonstrate the correctness of the laws in the Haskell algebra – either with a counting argument, or by writing the functions from and to.
Laws for product types
There are three useful laws for multiplication:
Mul Void a === Void
which says that if you multiply anything by Void, you get Void back,
Mul () a === a
which says that if you multiply by () you don’t change anything, and
Mul a b === Mul b a
which says that it doesn’t matter which order you multiply in. The more familiar forms of these laws are
0 ⋅ x = 0 1 ⋅ x = x x ⋅ y = y ⋅ x
Two more exercises: (i) prove the validity of these laws in the Haskell algebra, and (ii) explain why we don’t need laws of the form
Mul a Void === Void Mul a () === a
There’s also a law that relates the addition and multiplication operators:
Mul a (Add b c) === Add (Mul a b) (Mul a c)
This one is a bit trickier to reason about, but writing the corresponding from and to functions isn’t too hard. The arithmetic version of this law is the friendlier-looking
a ⋅ (b + c) = a ⋅ b + a ⋅ c
called the distributive law.
As well as concrete types like
Bool, in Haskell you also have function types, like
Int -> Bool or
Double -> String. How do these fit into the algebra?
To figure this out we can go back to the counting argument. How many functions of type
a → b are there?
Let’s be concrete, and fix a and b to both be
Bool. The value
False can map to either
True, and similarly for the value
True - thus there are
2 ⋅ 2 = 2^2 = 4 possible functions
Bool -> Bool. To be really explicit, we could enumerate them:
f1 :: Bool -> Bool -- equivalent to 'id' f1 True = True f1 False = False f2 :: Bool -> Bool -- equivalent to 'const False' f2 _ = False f3 :: Bool -> Bool -- equivalent to 'const True' f3 _ = True f4 :: Bool -> Bool -- equivalent to 'not' f4 True = False f4 False = True
What happens if b is still
Bool (with two values) and a is a type with three values, say
data Trio = First | Second | Third
Then each of
Third can map to two possible values, and in total there are
2 ⋅ 2 ⋅ 2 = 2^3 = 8 functions of type
Trio -> Bool.
The same argument holds in general. If there are A values of type a, and B values of type b, then the number of values of type
a → b is
This justifies the common terminology for function types as exponential types.
Laws for functions
There are two laws for function types that involve the unit type. They are
() -> a === a
which says that there are as many functions
() -> a as there are values of type a, and
a -> () === ()
which says that there is only one function
a -> () – in particular, it is
const (). The arithmetic versions of these laws are
a^1 = a 1^a = 1
There is also a law that allows factoring out of common arguments:
(a -> b, a -> c) === a -> (b,c)
whose arithmetic form is
b^a ⋅ c^a = (bc)^a
and a law about functions that return other functions:
a -> (b -> c) === (b,a) -> c
whose arithmetic form is
(c^b)^a = c^(b ⋅ a)
This last law may be more familiar when the order of the variables in the pair on the right-hand side is switched, and the parens on the left hand side are removed:
a -> b -> c === (a,b) -> c
which just says that we can curry and uncurry functions. Again, it’s an interesting exercise to prove all of these laws by writing the corresponding to and from functions.
In the next post I’ll look at recursive types, like lists and binary trees, and show how you can abuse the algebra of types in all kinds of ways to deduce interesting facts about types.