jcoglan/lambda-the-ultimate.md

## lambda-the-ultimate.md

      
    Raw
  

              lambda-the-ultimate.md
            
          
    Lambda the Ultimate

On page 137 of The Little Schemer, this rather convoluted function appears:
(define multirember&co
  (lambda (a lat col)
    (cond
      ((null? lat)
       (col (quote ()) (quote ())))
      ((eq? (car lat) a)
       (multirember&co a
         (cdr lat)
         (lambda (newlat seen)
           (col newlat
             (cons (car lat) seen)))))
      (else
        (multirember&co a
          (cdr lat)
          (lambda (newlat seen)
            (col (cons (car lat) newlat)
              seen)))))))
I’m going to rewrite this in slightly more idiomatic style, replacing (define foo (lambda (x y) ... with (define (foo x y) ..., replacing (quote x) with
'x, and aligning function arguments to make things clearer:
(define (multirember&co a lat col)
  (cond ((null? lat)
         (col '() '()))
        ((eq? (car lat) a)
         (multirember&co a
                         (cdr lat)
                         (lambda (newlat seen)
                           (col newlat
                                (cons (car lat) seen)))))
        (else
          (multirember&co a
                          (cdr lat)
                          (lambda (newlat seen)
                            (col (cons (car lat) newlat)
                                 seen))))))
This function takes an atom a, a list of atoms lat, and a function col
(short for ‘collector’), and invokes the function with two lists: one is the
original list with all occurrences of a removed, and the other is all the
occurrences of a we found. It combines it with this function:
(define a-friend
  (lambda (x y)
    (null? y)))

; or, in modern notation

(define (a-friend x y)
  (null? y))
In the example this returns #f, since tuna appears in the list and so the
second argument to a-friend is not empty.
(define foods '(strawberries tuna and swordfish))

(multirember&co 'tuna foods a-friend)
; => #f
Some discussion of this leads up to the tenth commandment:

Build functions to collect more than one value at a time.

The question is, why? We spent some time on this question, since it was not
clear why this function should emit its filtered list and the list of
occurrences via a callback, rather than simply returning them as a pair of
values in a list. We weren’t sure what the commandment was teaching us, or why
later examples of this pattern tightly couple multiple operations up in the same
function. They seem to be hinting at a need for fold, but never get there. The
callback is simply used to emit the final result of an operation.
It also is not clear whether the output of col is intended to be the return
value of multirember&co, i.e. that col is a data constructor to be consumed
by multirember&co’s caller, or whether col is intended only to deliver the
final lists to a callback as a side-effect, and the output of col should be
ignored. The use of the term ‘continuation’ suggests the latter, although the
purely functional nature of the rest of the book suggests that this is really an
unsual method of composition rather than a ‘side effect’ per se.
Recursing with multiple values

One explanation we came up with is that, even though multirember&co could
return a list containing two lists, getting those inner lists back out again on
recursion would be annoying. Here’s another version that doesn’t use a callback:
(define (partition-eq? a lat)
  (cond ((null? lat)
         '(() ()))
        ((eq? (car lat) a)
         (let ((pair (partition-eq? a (cdr lat))))
           (list (car pair)
                 (cons (car lat)
                       (cadr pair)))))
        (else
          (let ((pair (partition-eq? a (cdr lat))))
            (list (cons (car lat)
                        (car pair))
                  (cadr pair))))))

(partition-eq? 'tuna foods)
; => ((strawberries and swordfish) (tuna))
(In modern Scheme, (cadr x) is a shorthand for (car (cdr x)), i.e. the
second element of x.)
Instead of calling a callback, the base case of this function returns '(() ()), a pair of lists that are added to via recursion. This makes the data flow
a bit more obvious: we’re back to the common pattern of recurring on the ‘rest’
of the list and then adding to the result. But destructuring the return value at
every turn is annoying. Since we need to refer to the list pair generated by
recursion multiple times -- to get its components out -- we must store it in a
local variable using let or else risk making unnecessary function calls. This
concept of local variable binding is never discussed in the book, and without it
handling the destructuring becomes painful (let desugars to a lambda that is
immediately applied to the inputs).
We can clean things up even further with more local variables:
(define (partition-eq? a lat)
  (cond ((null? lat)
         '(() ()))
        ((eq? (car lat) a)
         (let* ((pair   (partition-eq? a (cdr lat)))
                (newlat (car pair))
                (seen   (cadr pair)))
           (list newlat
                 (cons (car lat) seen))))
        (else
          (let* ((pair   (partition-eq? a (cdr lat)))
                 (newlat (car pair))
                 (seen   (cadr pair)))
            (list (cons (car lat) newlat)
                  seen)))))
Now we can see there's a lot of duplicated logic between two branches of the
cond. We can fix that by doing the recursion, then figuring out how to push
our current element onto the result:
(define (partition-eq? a lat)
  (cond ((null? lat)
         '(() ()))
        (else
          (let* ((pair   (partition-eq? a (cdr lat)))
                 (newlat (car pair))
                 (seen   (cadr pair)))
            (cond ((eq? (car lat) a)
                   (list newlat (cons (car lat) seen)))
                  (else
                    (list (cons (car lat) newlat) seen)))))))
This keeps the recursive flow clear, and it makes the construction of the next
value in the loop clearer.
Composing functions with multiple values

So, it is not very difficult to write multirember&co in way that uses a more
natural recursion pattern, assuming you have let available. But, it’s still
hard to compose this with other functions: partition-eq? returns one list, but
a-friend takes two values, so this would be an error:
(a-friend (partition-eq? 'tuna foods))
Instead, we'd have to do:
(let ((pair (partition-eq? 'tuna foods)))
  (a-friend (car pair) (cadr pair)))

; => #f
Actual Scheme has a function called apply that calls a function with its
arguments supplied from a list:
(apply a-friend (partition-eq? 'tuna foods))
; => #f
Or, we can use macros; a macro is an instruction for how to rewrite a source
expression in terms of lower-level features. Macro systems were first proposed
in a 1963 paper by
Hart. Modern
Scheme (since 1998) provides the pattern-matching syntax-rules macro system
for declaring transformations; Common Lisp uses the more procedural defmacro
system where you directly manipulate source trees using the built-in functions.
We can write a macro for destructuring assignment:
(define-syntax let-elements (syntax-rules (in)
  ((let-elements (a b ...) expr in body ...)
   (let* ((args expr)
          (a (car args)))
     (let-elements (b ...) (cdr args) in body ...)))
  ((let-elements () _ in body ...)
   (begin body ...))))
This macro can be used to destructure a list result and hand the pieces off to
another function; x is bound to the first element of the result, y to the
second.
(let-elements (x y) (partition-eq? 'tuna foods) in (a-friend x y))
; => #f
This expression is rewritten into a series of let* expressions:
(let* ((args (partition-eq? 'tuna foods))
       (x (car args)))
  (let* ((args (cdr args))
         (y (car args)))
    (begin
      (a-friend x y))))
And let* is itself a macro that desugars to lambda application:
((lambda (args)
   ((lambda (x)
      ((lambda (args)
         ((lambda (y)
            (a-friend x y))
          (car args)))
       (cdr args)))
    (car args)))
 (partition-eq? 'tuna foods))

; => #f
Which brings us back to the limited set of concepts used in the book. This is an
obtuse way to write the function, and in the absence of a built-in let or a
macro system, it seems reasonable to look for other patterns for using lambda
as a convenient variable binding mechanism.
However, given that apply appears in McCarthy’s original 1960
paper, its omission from The
Little Schemer seems odd. It greatly simplifies the problem of composing
functions that return multiple values.
Diversion: Haskell

It’s instructive to rewrite these examples in Haskell to see which things stay
the same, and which things recommended by the book are there due to the
limitations of the Scheme language. Here’s multirember&co and a-friend:
multiremberCo                 :: (Eq a) => a -> [a] -> ([a] -> [a] -> b) -> b
multiremberCo _ []     col    =  col [] []
multiremberCo a (x:xs) col
    | a == x                  =  multiremberCo a xs colSeen
    | otherwise               =  multiremberCo a xs colRest
    where
        colSeen newlat seen   =  col newlat (x:seen)
        colRest newlat seen   =  col (x:newlat) seen

aFriend         :: [a] -> [a] -> Bool
aFriend xs ys   =  null ys
Here we use a combination of pattern matching, guards and where clauses to
define helper functions. This is literally the same algorithm, but expressed
differently. To explain what’s going on in each line:
-- This is a type signature. It says that multiremberCo takes three arguments:
-- one of type `a`, one of type `[a]` (a list of values of type `a`), and one of
-- type `([a] -> [a] -> b)`: a function that takes two lists of `a` and returns
-- a `b`. `a` is a 'type variable', it means the value can take any type so long
-- all the places that `a` appears in the signature are the same type. You can
-- pass an `Int` and a `[Int]`, but not a `Bool` and a `[Int]`. The `(Eq a) =>`
-- part means that whatever type `a` is, it must implement the equality
-- interface, since we're going to pass `a` to the `==` function.
multiremberCo                 :: (Eq a) => a -> [a] -> ([a] -> [a] -> b) -> b

-- This is a pattern-match for the case where the list is empty. We use `_` for
-- the first parameter because we don't care about its value in this case.
multiremberCo _ []     col    =  col [] []

-- Now we handle the general case. The pipes present different boolean checks on
-- the function arguments and say what to do in each case. The `(x:xs)`
-- expression is destructuring the list argument into its first element `x` and
-- the rest of the list `xs`. `where` lets us set the values of internal
-- variables; these bindings apply across all the guard clauses.
multiremberCo a (x:xs) col
    | a == x                  =  multiremberCo a xs colSeen
    | otherwise               =  multiremberCo a xs colRest
    where
        colSeen newlat seen   =  col newlat (x:seen)
        colRest newlat seen   =  col (x:newlat) seen
For example:
let foods = ["strawberries", "tuna", "and", "swordfish"]
multiremberCo "tuna" foods aFriend
-- => False
This is where the type signature helps you to see how data flows: the fact that
col takes two [a] and returns a b, and multiremberCo takes an a and
[a] and returns a b, means that multiremberCo must produce its result by
handing some derivative of its arguments off to col and returning the result.
We could bring our original Scheme function a little closer by extracting some
helper functions:
(define (multirember&co a lat col)
  (let ((col-seen (lambda (newlat seen)
                    (col newlat (cons (car lat) seen))))
        (col-rest (lambda (newlat seen)
                    (col (cons (car lat) newlat) seen))))
    (cond ((null? lat)
           (col '() '()))
          ((eq? (car lat) a)
           (multirember&co a (cdr lat) col-seen))
          (else
            (multirember&co a (cdr lat) col-rest)))))
This helps separate the decision-making in the cond from the logic for how to
build the next value, which is wrapped up in the helper functions. But again,
this relies on let, which the book does not use.
Currying

Let’s rewrite our function one final time to remove the hard-coded eq?.
Instead, we will let partition take a predicate p and a list lat, and
filter out the elements for which (p x) is #t.
(define (partition p lat)
  (cond ((null? lat)
         '(() ()))
        (else
          (let* ((pair   (partition p (cdr lat)))
                 (newlat (car pair))
                 (seen   (cadr pair)))
            (cond ((p (car lat))
                   (list newlat (cons (car lat) seen)))
                  (else
                    (list (cons (car lat) newlat) seen)))))))
We can filter out the elements matching tuna:
(partition (lambda (a) (eq? 'tuna a)) foods)
; => ((strawberries and swordfish) (tuna))
Writing out lambdas creates a lot of line-noise, so let’s extract that function
and give it a name:
(define (is-tuna? a)
  (eq? 'tuna a))

(partition is-tuna? foods)
; => ((strawberries and swordfish) (tuna))
But it would be even better if we didn’t have to write specialised functions to
check for particular values at all. If we had a curried version of eq?, we
could produce specialised versions of it to check for particular values:
(define (is? a)
  (lambda (b)
    (eq? a b)))

(partition (is? 'tuna) foods)
; => ((strawberries and swordfish) (tuna))
Again, Haskell provides an interesting example. Here’s partition; it takes a
function a -> Bool and a list [a] and returns a tuple contain two lists of
a.
partition             :: (a -> Bool) -> [a] -> ([a], [a])
partition _ []        =  ([], [])
partition p (x:xs)
    | p x             =  (rest, x:seen)
    | otherwise       =  (x:rest, seen)
    where
        (rest, seen)  =  partition p xs
This time, the base case returns a tuple containing two empty lists. Haskell has
built-in destructuring; (rest, seen) = partition p xs takes the tuple returned
by the recursive call and assigns its first member to rest and the second to
seen. x:y is how Haskell says (cons x y), which hugely reduces line noise
and makes the structure more apparent.
We could call this by passing in a lambda:
partition (\x -> x == "tuna") foods
-- => (["strawberries", "and", "swordfish"], ["tuna"])
But this is not necessary. == is a function; symbolic functions in Haskell are
written infix rather than prefix. All functions are curried by default, so you
can pass any function fewer arguments than it expects and get an intermediate
function. Infix functions can be partially applied on both their left and right
operands, so for example == "tuna" is equivalent to the lambda we used above.
partition (== "tuna") foods
-- => (["strawberries", "and", "swordfish"], ["tuna"])
We can do the same with other operators:
partition (> 5) [1..10]
-- => ([1,2,3,4,5], [6,7,8,9,10])
This massively cuts down on the amount of boilerplate needed to write
higher-order functions. But what about composition? aFriend takes two lists,
not a tuple, so it doesn’t compose with partition. There are a couple of
solutions to this; we can either use let to destructure:
let (xs, ys) = partition (== "tuna") foods in aFriend xs ys
-- => False
(This is what I modelled the let-elements macro on.)
Or, Haskell has a function called uncurry:
uncurry :: (a -> b -> c) -> ((a, b) -> c)
uncurry takes a function that takes two arguments, and returns a function that
takes both arguments as a tuple.
uncurry aFriend (partition (== "tuna") foods)
-- => False
Could we do this in Scheme?
(define (uncurry f)
  (lambda (tuple)
    (f (car tuple) (cadr tuple))))

((uncurry a-friend) (partition (is? 'tuna) foods))
; => #f
We haven’t used any concepts outside the book to define this helper function,
and it seems to lead to better separation of concerns.
Tail recursion

I can only think of one other reason that multirember&co is presented in the
way that it is. Here it is again:
(define (multirember&co a lat col)
  (cond ((null? lat)
         (col '() '()))
        ((eq? (car lat) a)
         (multirember&co a
                         (cdr lat)
                         (lambda (newlat seen)
                           (col newlat
                                (cons (car lat) seen)))))
        (else
          (multirember&co a
                          (cdr lat)
                          (lambda (newlat seen)
                            (col (cons (car lat) newlat)
                                 seen))))))
In this definition, the last expression in any branch of this function is a call
to multirember&co itself: the value of a call to multirember&co is the
value of the recusrive call to itself. Similarly, the internal lambdas that wrap
col have a call to col as their very last expression. This means this
definition is tail recursive, so it can be applied to an indefinitely long
input without blowing the stack. The same is not true of the partition
definition, although it would be possible to rewrite it to be tail-recursive.
However, the book doesn’t seem to be making an argument about implementation
details of the platform, and tail calls were initially described in a 1974
paper. It
seems to be telling you that writing functions in this style is a good idea as
part of generally good programming style, but given even the limited tools
available in the book, which includes higher-order functions, I’m not convinced.