gatlin/list.lhs

## list.lhs
"Church-encoded" lists and their interesting properties
===

*This is a literate Haskell program which you can run in ghci.*
*Also, I already corrected a glaring problem. Let us speak no more of it.*

One of the beautiful things about computer science is that, fundamentally,
**data is code** and **code is data**. But what does that mean? And how is it
useful?

Below I implement and explain a lazy list data structure using (almost) nothing
but boring old functions. For the most part the actual code is straightforward.

*This is often called the Church encoding but this is not strictly accurate.*

The `L a` type
---

> {-# LANGUAGE Rank2Types #-}

> import Prelude hiding ( foldr  -- we will be reimplementing these,
>                       , foldl  -- so I have excluded them.
>                       , map
>                       , filter
>                       , length
>                       , zip
>                       , zipWith
>                       )

> newtype L a = L { foldr :: forall r. (a -> r -> r) -> r -> r }

This says: "a value of type `L a` wraps a strange-looking function. Applying
`foldr` to an `L a` value will extract the strange-looking function."

To get an intuition for how this is related to lists, consider:

    data List a = Cons a (List a) | Nil

Then the types of the constructors would be:

    Cons :: a -> List a -> List a
    Nil  :: List a

Replace `List a` with `r`:

    Cons :: a -> r -> r
    Nil  :: r

Curious. So an `L` value wraps a function which accepts two functions and
(presumably) evaluates one or the other. These two functions are oddly
reminiscent of `Cons` and `Nil`.

How does it know, though? And why the non-specific type `r`? We'll get there.

Behind the magic
---

What `newtype` has automatically generated is equivalent to the following:

    type L a = forall r. (a -> r -> r) -> r -> r

    L :: L a -> L a
    L x = x

    foldr :: L a -> (a -> r -> r) -> r -> r
    foldr xs op x = xs op x

Thus `L` is a narrowly-typed identity function and `foldr` is a
narrowly-typed application function. Composing them together in expressions
shouldn't affect program semantics at all but it will give you guarantees about
type safety. This composition also opens the door to a very special
optimization.

But first let's play with them a bit.

List construction: `cons` and `nil`
---

> cons :: a -> L a -> L a
> cons x xs = L $ \c e -> c x (foldr xs c e)

Fully evaluated, this is what a cons looks like:

    cons 1 nil                                  =>
    L (\c e -> c 1 (foldr (L (\c e -> e)) c e)) =>

> nil :: L a
> nil = L $ \c e -> e

Example:

> list1 :: L Int
> list1 = cons 1 $ cons 2 $ cons 3 $ cons 4 $ cons 5 $ cons 6 nil

Opening up a list
---

It's all well and good to construct lists but equally, if not more, important
is the ability to extract data from a list.

Just as we construct a list by prepending a head to a tail, we will first
write a function to split a list into its head and tail:

> split :: L a -> (Maybe a, L a)
> split xs = foldr xs f (Nothing, nil) where
>     f y ys = (Just y, L (\c e ->
>         maybe e (\x -> c x (foldr (snd ys) c e))
>                (fst ys)))

This works because `foldr`'s job is to tear down a list and transform it into
some other type. In this case, we want to construct a tuple of `Maybe a` and
`List a`. The last argument of `foldr` is the value to use if the list is
empty.

Otherwise we pass a function which will receive two values. One is indeed the
head of the list, but the other is not quite what you might first expect. It
may seem like something out of *Bill and Ted's Excellent Adventure* but since
this operation is recursive, you must assume that the rest of the list has been
taken care of. **Exercise: why?** Thus, `ys` is already an `(Maybe a, List a)`
value.

In Lisp languages, the traditional name for the function which extracts the
first element of a list is called `car`. Since this doesn't clash with any
Haskell names it's what I'll name the next function:

> car :: L a -> a
> car xs = maybe diverge id (fst $ split xs) where
>     fix f = f (fix f)
>     diverge = fix id

`fix` computes the *fixpoint* of a function; that is, the value `x` for which
`f x == x`. Diverge computes the fixpoint of the `id` function; the upshot is
that `diverge` is a value which can take any type. `maybe` demands that we
supply a final argument but if the list is non-empty we will never actually
need the last element.

Keeping with our naming scheme, the function to produce the remainder of the
list is `cdr`:

> cdr :: L a -> L a
> cdr = snd . split

For giggles, let's expose a "safe" `car` variant: one which returns `Just` an
element if the list has an element, or `Nothing` otherwise:

> car' :: L a -> Maybe a
> car' = fst . split


Demonstration: a `sum` function
---

Let's get a feel for how this all works by writing a function to sum a
list of numbers.

> sum' xs = foldl xs (+) 0

Evaluated on (cons 1 (cons 2 nil)):

    sum' (cons 1 (cons 2 nil))                              =>
    foldr (cons 1 (cons 2 nil)) (+) 0                       =>
    foldr (L (\c e -> c 1 (foldr (cons 2 nil) c e))) (+) 0  =>
    (\c e -> c 1 (foldr (cons 2 nil) c e)) (+) 0            =>
    (+) 1 ((foldr (cons 2 nil) (+) 0))                      =>
    (+) 1 ((foldr (L (\c e -> c 2 (foldr nil c e))) (+) 0)) =>
    (+) 1 ((\c e -> c 2 (foldr nil c e)) (+) 0)             =>
    (+) 1 ((+) 2 (foldr nil (+) 0))                         =>
    (+) 1 ((+) 2 (foldr (L (\c e -> e)) (+) 0))             =>
    (+) 1 ((+) 2 ((\c e -> e) (+) 0))                       =>
    (+) 1 ((+) 2 0)                                         =>
    (+) 1 2                                                 =>
    3

There is an important symmetry to understand here: lists are an inductive
type, and so are operations on lists. When performing some computation on a
list, you must supply a base case (analogous to `Nil`) and an inductive case
(analogous to `Cons`).

The naive encoding of the data type creates an object in memory, a *thing*, and
builds up a structure. The functional encoding, however, simply provides
possible execution paths. `L a` values are control flow operators, deciding at
every element of the list whether to continue or halt.

Thus, data is code and code is data.

Fusion
---

As for the optimization, we know that when we see `foldr (L f) x y` we can
easily rewrite it as `f x y` before we ever generate any code. `foldr` and `L`
disappear when composed correctly. This is called *fusion*. Haskell provides a
system to tell the compiler about such pairs of functions and this can result
in much more efficient programs.

Because ultimately you're not *doing* anything other than proving properties of
your program. These proofs can be discarded once verified.

Other examples
---

For completeness here are a few other traditional list functions implemented in
our scheme. As an exercise, try to figure out how and why they work.

> foldl :: L a -> (r -> a -> r) -> r -> r
> foldl bs f a = foldr bs (\b g x -> g (f x b)) id a

> map :: (a -> b) -> L a -> L b
> map f xs = foldr xs (\y ys -> cons (f y) ys) nil

> filter :: (a -> Bool) -> L a -> L a
> filter pred xs = foldr xs f nil where
>     f y ys = L $ \c e -> if (pred y)
>                            then c y (foldr ys c e)
>                            else foldr ys c e

> length :: L a -> Int
> length xs = foldr xs (\_ -> (+1)) 0

And here are a few more example uses:

> isEven :: (Integral n, Eq n, Num n) => n -> Bool
> isEven x = x `mod` 2 == 0

> sum_1 = sum' list1 -- => 21
> sum_2 = sum' . map (*2) $ list1 -- => 42
> sum_3 = sum' . map (*2) . cdr $ list1 -- => 40
> sum_4 = sum' . filter isEven $ list1 -- => 12

A note on recursion.
---

One very interesting property of these functions is that none of them are
recursive. Yet, we define maps, filters, and folds which necessarily iterate
the items of a list!

Instead, these functions are *mutually recursive*. No single function calls
itself directly, but different compositions of the basic functions cause
program execution to loop as long as necessary. In this way the very concepts
of iteration and recursion are defined in terms of list processing functions.

And with tail call optimization and sibling call optimization, the compiler can
generate machine code which takes up constant space.

Now if only there were some language dedicated to LISt Processing, I
bet it would be pretty powerful ...
	"Church-encoded" lists and their interesting properties
	===

	This is a literate Haskell program which you can run in ghci.
	Also, I already corrected a glaring problem. Let us speak no more of it.

	One of the beautiful things about computer science is that, fundamentally,
	data is code and code is data. But what does that mean? And how is it
	useful?

	Below I implement and explain a lazy list data structure using (almost) nothing
	but boring old functions. For the most part the actual code is straightforward.

	This is often called the Church encoding but this is not strictly accurate.

	The `L a` type
	---

	> {-# LANGUAGE Rank2Types #-}

	> import Prelude hiding ( foldr -- we will be reimplementing these,
	> , foldl -- so I have excluded them.
	> , map
	> , filter
	> , length
	> , zip
	> , zipWith
	> )

	> newtype L a = L { foldr :: forall r. (a -> r -> r) -> r -> r }

	This says: "a value of type `L a` wraps a strange-looking function. Applying
	`foldr` to an `L a` value will extract the strange-looking function."

	To get an intuition for how this is related to lists, consider:

	data List a = Cons a (List a) \| Nil

	Then the types of the constructors would be:

	Cons :: a -> List a -> List a
	Nil :: List a

	Replace `List a` with `r`:

	Cons :: a -> r -> r
	Nil :: r

	Curious. So an `L` value wraps a function which accepts two functions and
	(presumably) evaluates one or the other. These two functions are oddly
	reminiscent of `Cons` and `Nil`.

	How does it know, though? And why the non-specific type `r`? We'll get there.

	Behind the magic
	---

	What `newtype` has automatically generated is equivalent to the following:

	type L a = forall r. (a -> r -> r) -> r -> r

	L :: L a -> L a
	L x = x

	foldr :: L a -> (a -> r -> r) -> r -> r
	foldr xs op x = xs op x

	Thus `L` is a narrowly-typed identity function and `foldr` is a
	narrowly-typed application function. Composing them together in expressions
	shouldn't affect program semantics at all but it will give you guarantees about
	type safety. This composition also opens the door to a very special
	optimization.

	But first let's play with them a bit.

	List construction: `cons` and `nil`
	---

	> cons :: a -> L a -> L a
	> cons x xs = L $ \c e -> c x (foldr xs c e)

	Fully evaluated, this is what a cons looks like:

	cons 1 nil =>
	L (\c e -> c 1 (foldr (L (\c e -> e)) c e)) =>

	> nil :: L a
	> nil = L $ \c e -> e

	Example:

	> list1 :: L Int
	> list1 = cons 1 $ cons 2 $ cons 3 $ cons 4 $ cons 5 $ cons 6 nil

	Opening up a list
	---

	It's all well and good to construct lists but equally, if not more, important
	is the ability to extract data from a list.

	Just as we construct a list by prepending a head to a tail, we will first
	write a function to split a list into its head and tail:

	> split :: L a -> (Maybe a, L a)
	> split xs = foldr xs f (Nothing, nil) where
	> f y ys = (Just y, L (\c e ->
	> maybe e (\x -> c x (foldr (snd ys) c e))
	> (fst ys)))

	This works because `foldr`'s job is to tear down a list and transform it into
	some other type. In this case, we want to construct a tuple of `Maybe a` and
	`List a`. The last argument of `foldr` is the value to use if the list is
	empty.

	Otherwise we pass a function which will receive two values. One is indeed the
	head of the list, but the other is not quite what you might first expect. It
	may seem like something out of Bill and Ted's Excellent Adventure but since
	this operation is recursive, you must assume that the rest of the list has been
	taken care of. Exercise: why? Thus, `ys` is already an `(Maybe a, List a)`
	value.

	In Lisp languages, the traditional name for the function which extracts the
	first element of a list is called `car`. Since this doesn't clash with any
	Haskell names it's what I'll name the next function:

	> car :: L a -> a
	> car xs = maybe diverge id (fst $ split xs) where
	> fix f = f (fix f)
	> diverge = fix id

	`fix` computes the fixpoint of a function; that is, the value `x` for which
	`f x == x`. Diverge computes the fixpoint of the `id` function; the upshot is
	that `diverge` is a value which can take any type. `maybe` demands that we
	supply a final argument but if the list is non-empty we will never actually
	need the last element.

	Keeping with our naming scheme, the function to produce the remainder of the
	list is `cdr`:

	> cdr :: L a -> L a
	> cdr = snd . split

	For giggles, let's expose a "safe" `car` variant: one which returns `Just` an
	element if the list has an element, or `Nothing` otherwise:

	> car' :: L a -> Maybe a
	> car' = fst . split


	Demonstration: a `sum` function
	---

	Let's get a feel for how this all works by writing a function to sum a
	list of numbers.

	> sum' xs = foldl xs (+) 0

	Evaluated on (cons 1 (cons 2 nil)):

	sum' (cons 1 (cons 2 nil)) =>
	foldr (cons 1 (cons 2 nil)) (+) 0 =>
	foldr (L (\c e -> c 1 (foldr (cons 2 nil) c e))) (+) 0 =>
	(\c e -> c 1 (foldr (cons 2 nil) c e)) (+) 0 =>
	(+) 1 ((foldr (cons 2 nil) (+) 0)) =>
	(+) 1 ((foldr (L (\c e -> c 2 (foldr nil c e))) (+) 0)) =>
	(+) 1 ((\c e -> c 2 (foldr nil c e)) (+) 0) =>
	(+) 1 ((+) 2 (foldr nil (+) 0)) =>
	(+) 1 ((+) 2 (foldr (L (\c e -> e)) (+) 0)) =>
	(+) 1 ((+) 2 ((\c e -> e) (+) 0)) =>
	(+) 1 ((+) 2 0) =>
	(+) 1 2 =>
	3

	There is an important symmetry to understand here: lists are an inductive
	type, and so are operations on lists. When performing some computation on a
	list, you must supply a base case (analogous to `Nil`) and an inductive case
	(analogous to `Cons`).

	The naive encoding of the data type creates an object in memory, a thing, and
	builds up a structure. The functional encoding, however, simply provides
	possible execution paths. `L a` values are control flow operators, deciding at
	every element of the list whether to continue or halt.

	Thus, data is code and code is data.

	Fusion
	---

	As for the optimization, we know that when we see `foldr (L f) x y` we can
	easily rewrite it as `f x y` before we ever generate any code. `foldr` and `L`
	disappear when composed correctly. This is called fusion. Haskell provides a
	system to tell the compiler about such pairs of functions and this can result
	in much more efficient programs.

	Because ultimately you're not doing anything other than proving properties of
	your program. These proofs can be discarded once verified.

	Other examples
	---

	For completeness here are a few other traditional list functions implemented in
	our scheme. As an exercise, try to figure out how and why they work.

	> foldl :: L a -> (r -> a -> r) -> r -> r
	> foldl bs f a = foldr bs (\b g x -> g (f x b)) id a

	> map :: (a -> b) -> L a -> L b
	> map f xs = foldr xs (\y ys -> cons (f y) ys) nil

	> filter :: (a -> Bool) -> L a -> L a
	> filter pred xs = foldr xs f nil where
	> f y ys = L $ \c e -> if (pred y)
	> then c y (foldr ys c e)
	> else foldr ys c e

	> length :: L a -> Int
	> length xs = foldr xs (\_ -> (+1)) 0

	And here are a few more example uses:

	> isEven :: (Integral n, Eq n, Num n) => n -> Bool
	> isEven x = x `mod` 2 == 0

	> sum_1 = sum' list1 -- => 21
	> sum_2 = sum' . map (*2) $ list1 -- => 42
	> sum_3 = sum' . map (*2) . cdr $ list1 -- => 40
	> sum_4 = sum' . filter isEven $ list1 -- => 12

	A note on recursion.
	---

	One very interesting property of these functions is that none of them are
	recursive. Yet, we define maps, filters, and folds which necessarily iterate
	the items of a list!

	Instead, these functions are mutually recursive. No single function calls
	itself directly, but different compositions of the basic functions cause
	program execution to loop as long as necessary. In this way the very concepts
	of iteration and recursion are defined in terms of list processing functions.

	And with tail call optimization and sibling call optimization, the compiler can
	generate machine code which takes up constant space.

	Now if only there were some language dedicated to LISt Processing, I
	bet it would be pretty powerful ...