Last active
August 29, 2015 14:06
-
-
Save gatlin/2b9f0e23b90a25de6c8d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What is a Church encoded data type? | |
=== | |
I wrote a [post][listpost] about a really cool concept in computer science and | |
programming languages, but I fear that I may have gotten ahead of myself. It | |
turns out, Haskell is not so well known and notation was proving to be a | |
problem. | |
That and the concept is kind of a mind-bender. | |
So, this post is going to explain some notational things and then give a much, | |
much simpler example | |
The language extension | |
--- | |
> {-# LANGUAGE Rank2Types #-} | |
I promise I will explain this below. Pinky-swear. | |
Syntax primer: Data types in Haskell | |
--- | |
So Haskell has these things called *algebraic data types*. The name is much, | |
much more intense-sounding than it actually is. An ADT is a way to create a | |
compound structure like you would in more mainstream languages. | |
Here is a simple example: | |
> data Person = Person String Int | |
This says that a `Person` is a pair of a `String` and an `Int`, and that you | |
can create a `Person` like so: | |
> p1 = Person "gatlin" 25 | |
In math terms, `Person` is a *product type*. The set of all `Person`s is a | |
Cartesian product of `String`s and `Int`s. | |
How do we get the data out of a `Person`, though? One way is through *pattern | |
matching* functions: | |
> getPersonName (Person name age) = name | |
> getPersonAge (Person name age) = age | |
*Voila.* You can also define your type this way, though: | |
> data BetterPerson = BetterPerson { name :: String | |
> , age :: Int | |
> } | |
> p2 = BetterPerson { name = "dorian", age = 23 } | |
> flatter :: BetterPerson -> String | |
> flatter bp = "gosh darn " ++ (name bp) ++ " sure is great" | |
All well and good. The final important thing about ADTs though is that you can | |
specify multiple different alternatives, or *constructors*. Example: | |
> data USPerson = Human { hName :: String, hAge :: Int } | |
> | Corporation { cName :: String | |
> , cTaxId :: String } | |
> insult :: USPerson -> String | |
> insult (Human hName hAge) = hName ++ " is a bad person" | |
> insult (Corporation cName cTaxId) = cName ++ " is also (legally, a bad person" | |
The `|` creates a *sum* type. Because `USPerson` is a *sum* of *product* types, | |
it is said to be an *algebraic* data type. | |
But what matters most is an ADT is basically a struct-thing in Haskell with | |
multiple representations. | |
One more little syntax thing ... | |
--- | |
You can make ADTs that are *polymorphic*. An example will make it clear: | |
> data AnyPerson a = AnyPerson a | |
> p3 = AnyPerson p1 | |
> p4 = AnyPerson p2 | |
Strictly speaking `AnyPerson` is not a type - it requires some helper type to | |
become real. `p3` is an `AnyPerson Person` and `p4` is an `AnyPerson | |
BetterPerson`. | |
Now that you know enough Haskell syntax, let's start the show. | |
Let's create a Boolean! | |
--- | |
The simple way: | |
> data NBoolean = NTrue | NFalse | |
Congrats! You made a boolean type. Let's make some useful functions for it: | |
> ifN (NTrue) _then _ = _then | |
> ifN (NFalse) _ _else = _else | |
And some contrived examples: | |
> gt x y = if x > y then NTrue else NFalse | |
> lt x y = gt y x | |
> eq x y = if x == y then NTrue else NFalse | |
> ex1 = ifN (4 `gt` 2) "4 is greater than 2" "run; math is broken" | |
`ex1` will return "4 is greater than 2". | |
Types | |
--- | |
`NTrue` and `NFalse` (and `True` and `False`, the actual Haskell booleans) are | |
all just functions. *All* value constructors, in fact, are just functions. | |
They even have types: | |
True :: Bool | |
False :: Bool | |
NTrue :: NBoolean | |
NFalse :: NBoolean | |
They accept no arguments and return their one type. Okay, that's cool. Keep | |
that in mind. | |
Let's create a second Boolean | |
--- | |
Alright: | |
> newtype Boolean = B { if' :: forall a. a -> a -> a } | |
Whoa, what? Let's break it down: | |
- `newtype` is where `data` should be, right? | |
- But sure, `Boolean` is a new type, okay. | |
- `B` is a value constructor containing some awful nasty type. Sure ... | |
- `if'` is the accessor function, got it ... | |
- what in the sam hill is `forall` ?! Why is it shooting arrows? | |
Okay, breathe. | |
- `newtype` is identical to `data` except you can only have one constructor | |
with one field. It has subtle magical properties I won't go into yet. | |
- `->` means that the type is a function, is all. | |
- `forall` is for when you want to use a type variable, but you didn't mention | |
it before the `=`. The `Rank2Types` thing at the top of this file enables | |
`forall`. | |
*Why not just mention it before the `=`?* | |
Because a boolean value *itself* does not depend on any other type. But, as it | |
will hopefully become clear, it must be aware of other types. | |
For now, let me make some useful utilities: | |
> true = B pickFirst where pickFirst t f = t | |
> false = B pickSecond where pickSecond t f = f | |
`true` is a function of two variables which returns the first one, wrapped in a | |
`Boolean`; `false` is analogous. | |
*Syntax note*: I would normally write these as `true = B $ \t f -> t`, etc but | |
I really don't want to keep making this a Haskell syntax tutorial. | |
What can we do with these? As the names suggest ... | |
> x @> y = if x > y then true else false | |
> x @< y = y @> x | |
> x @== y = if x == y then true else false | |
> ex2 = if' (4 @> 2) "4 is greater than 2" "run; math is broken" | |
So it's the exact same, but more confusing. What? Why? | |
`B` and `if'` are both functions. Here is (approximately) what they would look | |
like: | |
B :: Boolean -> Boolean | |
B x = x | |
if' :: forall a. Boolean -> a -> a -> a | |
if' (B c) t e = c t e | |
`if'`'s first argument is a `Boolean`. A `Boolean` is just a wrapper around a | |
selector function. And `if'` just gives its remaining arguments to the selector | |
function. | |
Ready your mind for being blown | |
--- | |
In essence, while they look and act like *things* they don't exist at all. | |
Making a function that builds up a Church encoded value like this is actually | |
just composing functions together in a pipeline, and a smart compiler can | |
optimize it to the point of *never actually creating anything.* The calls to | |
`B` and `if'` can be erased as well, so while in your code you get a `Boolean` | |
value and carry it around until you need to make a decision, the compiled | |
program will be able to stitch your functions together more intelligently. | |
Why bother with `B` and `if'` if they're just identity / application functions, | |
and will get erased anyway? Because they're simple functions with particular | |
types that allow the compiler to verify correctness and give you intelligent | |
insight, without changing your program semantics in an unintended way. | |
*Data is code, code is data.* Much of the time, what seem like data structures | |
are really control structures. You can (must?) make decisions based on data, | |
and you can store different data based on decisions made. It's intuitively | |
obvious. | |
And sometimes you can use this relation for optimization. | |
So in my [other post][listpost], I take this idea and extend it to lists. | |
[listpost]: https://gist.github.com/gatlin/b0c5ed6a94831102da92 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment