Skip to content

Instantly share code, notes, and snippets.

@gatlin
Last active August 29, 2015 14:06
Show Gist options
  • Save gatlin/2b9f0e23b90a25de6c8d to your computer and use it in GitHub Desktop.
Save gatlin/2b9f0e23b90a25de6c8d to your computer and use it in GitHub Desktop.
What is a Church encoded data type?
===
I wrote a [post][listpost] about a really cool concept in computer science and
programming languages, but I fear that I may have gotten ahead of myself. It
turns out, Haskell is not so well known and notation was proving to be a
problem.
That and the concept is kind of a mind-bender.
So, this post is going to explain some notational things and then give a much,
much simpler example
The language extension
---
> {-# LANGUAGE Rank2Types #-}
I promise I will explain this below. Pinky-swear.
Syntax primer: Data types in Haskell
---
So Haskell has these things called *algebraic data types*. The name is much,
much more intense-sounding than it actually is. An ADT is a way to create a
compound structure like you would in more mainstream languages.
Here is a simple example:
> data Person = Person String Int
This says that a `Person` is a pair of a `String` and an `Int`, and that you
can create a `Person` like so:
> p1 = Person "gatlin" 25
In math terms, `Person` is a *product type*. The set of all `Person`s is a
Cartesian product of `String`s and `Int`s.
How do we get the data out of a `Person`, though? One way is through *pattern
matching* functions:
> getPersonName (Person name age) = name
> getPersonAge (Person name age) = age
*Voila.* You can also define your type this way, though:
> data BetterPerson = BetterPerson { name :: String
> , age :: Int
> }
> p2 = BetterPerson { name = "dorian", age = 23 }
> flatter :: BetterPerson -> String
> flatter bp = "gosh darn " ++ (name bp) ++ " sure is great"
All well and good. The final important thing about ADTs though is that you can
specify multiple different alternatives, or *constructors*. Example:
> data USPerson = Human { hName :: String, hAge :: Int }
> | Corporation { cName :: String
> , cTaxId :: String }
> insult :: USPerson -> String
> insult (Human hName hAge) = hName ++ " is a bad person"
> insult (Corporation cName cTaxId) = cName ++ " is also (legally, a bad person"
The `|` creates a *sum* type. Because `USPerson` is a *sum* of *product* types,
it is said to be an *algebraic* data type.
But what matters most is an ADT is basically a struct-thing in Haskell with
multiple representations.
One more little syntax thing ...
---
You can make ADTs that are *polymorphic*. An example will make it clear:
> data AnyPerson a = AnyPerson a
> p3 = AnyPerson p1
> p4 = AnyPerson p2
Strictly speaking `AnyPerson` is not a type - it requires some helper type to
become real. `p3` is an `AnyPerson Person` and `p4` is an `AnyPerson
BetterPerson`.
Now that you know enough Haskell syntax, let's start the show.
Let's create a Boolean!
---
The simple way:
> data NBoolean = NTrue | NFalse
Congrats! You made a boolean type. Let's make some useful functions for it:
> ifN (NTrue) _then _ = _then
> ifN (NFalse) _ _else = _else
And some contrived examples:
> gt x y = if x > y then NTrue else NFalse
> lt x y = gt y x
> eq x y = if x == y then NTrue else NFalse
> ex1 = ifN (4 `gt` 2) "4 is greater than 2" "run; math is broken"
`ex1` will return "4 is greater than 2".
Types
---
`NTrue` and `NFalse` (and `True` and `False`, the actual Haskell booleans) are
all just functions. *All* value constructors, in fact, are just functions.
They even have types:
True :: Bool
False :: Bool
NTrue :: NBoolean
NFalse :: NBoolean
They accept no arguments and return their one type. Okay, that's cool. Keep
that in mind.
Let's create a second Boolean
---
Alright:
> newtype Boolean = B { if' :: forall a. a -> a -> a }
Whoa, what? Let's break it down:
- `newtype` is where `data` should be, right?
- But sure, `Boolean` is a new type, okay.
- `B` is a value constructor containing some awful nasty type. Sure ...
- `if'` is the accessor function, got it ...
- what in the sam hill is `forall` ?! Why is it shooting arrows?
Okay, breathe.
- `newtype` is identical to `data` except you can only have one constructor
with one field. It has subtle magical properties I won't go into yet.
- `->` means that the type is a function, is all.
- `forall` is for when you want to use a type variable, but you didn't mention
it before the `=`. The `Rank2Types` thing at the top of this file enables
`forall`.
*Why not just mention it before the `=`?*
Because a boolean value *itself* does not depend on any other type. But, as it
will hopefully become clear, it must be aware of other types.
For now, let me make some useful utilities:
> true = B pickFirst where pickFirst t f = t
> false = B pickSecond where pickSecond t f = f
`true` is a function of two variables which returns the first one, wrapped in a
`Boolean`; `false` is analogous.
*Syntax note*: I would normally write these as `true = B $ \t f -> t`, etc but
I really don't want to keep making this a Haskell syntax tutorial.
What can we do with these? As the names suggest ...
> x @> y = if x > y then true else false
> x @< y = y @> x
> x @== y = if x == y then true else false
> ex2 = if' (4 @> 2) "4 is greater than 2" "run; math is broken"
So it's the exact same, but more confusing. What? Why?
`B` and `if'` are both functions. Here is (approximately) what they would look
like:
B :: Boolean -> Boolean
B x = x
if' :: forall a. Boolean -> a -> a -> a
if' (B c) t e = c t e
`if'`'s first argument is a `Boolean`. A `Boolean` is just a wrapper around a
selector function. And `if'` just gives its remaining arguments to the selector
function.
Ready your mind for being blown
---
In essence, while they look and act like *things* they don't exist at all.
Making a function that builds up a Church encoded value like this is actually
just composing functions together in a pipeline, and a smart compiler can
optimize it to the point of *never actually creating anything.* The calls to
`B` and `if'` can be erased as well, so while in your code you get a `Boolean`
value and carry it around until you need to make a decision, the compiled
program will be able to stitch your functions together more intelligently.
Why bother with `B` and `if'` if they're just identity / application functions,
and will get erased anyway? Because they're simple functions with particular
types that allow the compiler to verify correctness and give you intelligent
insight, without changing your program semantics in an unintended way.
*Data is code, code is data.* Much of the time, what seem like data structures
are really control structures. You can (must?) make decisions based on data,
and you can store different data based on decisions made. It's intuitively
obvious.
And sometimes you can use this relation for optimization.
So in my [other post][listpost], I take this idea and extend it to lists.
[listpost]: https://gist.github.com/gatlin/b0c5ed6a94831102da92
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment