Skip to content

Instantly share code, notes, and snippets.

@gatlin
Last active September 19, 2017 23:28
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save gatlin/b0c5ed6a94831102da92 to your computer and use it in GitHub Desktop.
Save gatlin/b0c5ed6a94831102da92 to your computer and use it in GitHub Desktop.
"Church-encoded" lists and their interesting properties
===
*This is a literate Haskell program which you can run in ghci.*
*Also, I already corrected a glaring problem. Let us speak no more of it.*
One of the beautiful things about computer science is that, fundamentally,
**data is code** and **code is data**. But what does that mean? And how is it
useful?
Below I implement and explain a lazy list data structure using (almost) nothing
but boring old functions. For the most part the actual code is straightforward.
*This is often called the Church encoding but this is not strictly accurate.*
The `L a` type
---
> {-# LANGUAGE Rank2Types #-}
> import Prelude hiding ( foldr -- we will be reimplementing these,
> , foldl -- so I have excluded them.
> , map
> , filter
> , length
> , zip
> , zipWith
> )
> newtype L a = L { foldr :: forall r. (a -> r -> r) -> r -> r }
This says: "a value of type `L a` wraps a strange-looking function. Applying
`foldr` to an `L a` value will extract the strange-looking function."
To get an intuition for how this is related to lists, consider:
data List a = Cons a (List a) | Nil
Then the types of the constructors would be:
Cons :: a -> List a -> List a
Nil :: List a
Replace `List a` with `r`:
Cons :: a -> r -> r
Nil :: r
Curious. So an `L` value wraps a function which accepts two functions and
(presumably) evaluates one or the other. These two functions are oddly
reminiscent of `Cons` and `Nil`.
How does it know, though? And why the non-specific type `r`? We'll get there.
Behind the magic
---
What `newtype` has automatically generated is equivalent to the following:
type L a = forall r. (a -> r -> r) -> r -> r
L :: L a -> L a
L x = x
foldr :: L a -> (a -> r -> r) -> r -> r
foldr xs op x = xs op x
Thus `L` is a narrowly-typed identity function and `foldr` is a
narrowly-typed application function. Composing them together in expressions
shouldn't affect program semantics at all but it will give you guarantees about
type safety. This composition also opens the door to a very special
optimization.
But first let's play with them a bit.
List construction: `cons` and `nil`
---
> cons :: a -> L a -> L a
> cons x xs = L $ \c e -> c x (foldr xs c e)
Fully evaluated, this is what a cons looks like:
cons 1 nil =>
L (\c e -> c 1 (foldr (L (\c e -> e)) c e)) =>
> nil :: L a
> nil = L $ \c e -> e
Example:
> list1 :: L Int
> list1 = cons 1 $ cons 2 $ cons 3 $ cons 4 $ cons 5 $ cons 6 nil
Opening up a list
---
It's all well and good to construct lists but equally, if not more, important
is the ability to extract data from a list.
Just as we construct a list by prepending a head to a tail, we will first
write a function to split a list into its head and tail:
> split :: L a -> (Maybe a, L a)
> split xs = foldr xs f (Nothing, nil) where
> f y ys = (Just y, L (\c e ->
> maybe e (\x -> c x (foldr (snd ys) c e))
> (fst ys)))
This works because `foldr`'s job is to tear down a list and transform it into
some other type. In this case, we want to construct a tuple of `Maybe a` and
`List a`. The last argument of `foldr` is the value to use if the list is
empty.
Otherwise we pass a function which will receive two values. One is indeed the
head of the list, but the other is not quite what you might first expect. It
may seem like something out of *Bill and Ted's Excellent Adventure* but since
this operation is recursive, you must assume that the rest of the list has been
taken care of. **Exercise: why?** Thus, `ys` is already an `(Maybe a, List a)`
value.
In Lisp languages, the traditional name for the function which extracts the
first element of a list is called `car`. Since this doesn't clash with any
Haskell names it's what I'll name the next function:
> car :: L a -> a
> car xs = maybe diverge id (fst $ split xs) where
> fix f = f (fix f)
> diverge = fix id
`fix` computes the *fixpoint* of a function; that is, the value `x` for which
`f x == x`. Diverge computes the fixpoint of the `id` function; the upshot is
that `diverge` is a value which can take any type. `maybe` demands that we
supply a final argument but if the list is non-empty we will never actually
need the last element.
Keeping with our naming scheme, the function to produce the remainder of the
list is `cdr`:
> cdr :: L a -> L a
> cdr = snd . split
For giggles, let's expose a "safe" `car` variant: one which returns `Just` an
element if the list has an element, or `Nothing` otherwise:
> car' :: L a -> Maybe a
> car' = fst . split
Demonstration: a `sum` function
---
Let's get a feel for how this all works by writing a function to sum a
list of numbers.
> sum' xs = foldl xs (+) 0
Evaluated on (cons 1 (cons 2 nil)):
sum' (cons 1 (cons 2 nil)) =>
foldr (cons 1 (cons 2 nil)) (+) 0 =>
foldr (L (\c e -> c 1 (foldr (cons 2 nil) c e))) (+) 0 =>
(\c e -> c 1 (foldr (cons 2 nil) c e)) (+) 0 =>
(+) 1 ((foldr (cons 2 nil) (+) 0)) =>
(+) 1 ((foldr (L (\c e -> c 2 (foldr nil c e))) (+) 0)) =>
(+) 1 ((\c e -> c 2 (foldr nil c e)) (+) 0) =>
(+) 1 ((+) 2 (foldr nil (+) 0)) =>
(+) 1 ((+) 2 (foldr (L (\c e -> e)) (+) 0)) =>
(+) 1 ((+) 2 ((\c e -> e) (+) 0)) =>
(+) 1 ((+) 2 0) =>
(+) 1 2 =>
3
There is an important symmetry to understand here: lists are an inductive
type, and so are operations on lists. When performing some computation on a
list, you must supply a base case (analogous to `Nil`) and an inductive case
(analogous to `Cons`).
The naive encoding of the data type creates an object in memory, a *thing*, and
builds up a structure. The functional encoding, however, simply provides
possible execution paths. `L a` values are control flow operators, deciding at
every element of the list whether to continue or halt.
Thus, data is code and code is data.
Fusion
---
As for the optimization, we know that when we see `foldr (L f) x y` we can
easily rewrite it as `f x y` before we ever generate any code. `foldr` and `L`
disappear when composed correctly. This is called *fusion*. Haskell provides a
system to tell the compiler about such pairs of functions and this can result
in much more efficient programs.
Because ultimately you're not *doing* anything other than proving properties of
your program. These proofs can be discarded once verified.
Other examples
---
For completeness here are a few other traditional list functions implemented in
our scheme. As an exercise, try to figure out how and why they work.
> foldl :: L a -> (r -> a -> r) -> r -> r
> foldl bs f a = foldr bs (\b g x -> g (f x b)) id a
> map :: (a -> b) -> L a -> L b
> map f xs = foldr xs (\y ys -> cons (f y) ys) nil
> filter :: (a -> Bool) -> L a -> L a
> filter pred xs = foldr xs f nil where
> f y ys = L $ \c e -> if (pred y)
> then c y (foldr ys c e)
> else foldr ys c e
> length :: L a -> Int
> length xs = foldr xs (\_ -> (+1)) 0
And here are a few more example uses:
> isEven :: (Integral n, Eq n, Num n) => n -> Bool
> isEven x = x `mod` 2 == 0
> sum_1 = sum' list1 -- => 21
> sum_2 = sum' . map (*2) $ list1 -- => 42
> sum_3 = sum' . map (*2) . cdr $ list1 -- => 40
> sum_4 = sum' . filter isEven $ list1 -- => 12
A note on recursion.
---
One very interesting property of these functions is that none of them are
recursive. Yet, we define maps, filters, and folds which necessarily iterate
the items of a list!
Instead, these functions are *mutually recursive*. No single function calls
itself directly, but different compositions of the basic functions cause
program execution to loop as long as necessary. In this way the very concepts
of iteration and recursion are defined in terms of list processing functions.
And with tail call optimization and sibling call optimization, the compiler can
generate machine code which takes up constant space.
Now if only there were some language dedicated to LISt Processing, I
bet it would be pretty powerful ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment