Instantly share code, notes, and snippets.

Embed
What would you like to do?

Lambda the Ultimate

On page 137 of The Little Schemer, this rather convoluted function appears:

(define multirember&co
  (lambda (a lat col)
    (cond
      ((null? lat)
       (col (quote ()) (quote ())))
      ((eq? (car lat) a)
       (multirember&co a
         (cdr lat)
         (lambda (newlat seen)
           (col newlat
             (cons (car lat) seen)))))
      (else
        (multirember&co a
          (cdr lat)
          (lambda (newlat seen)
            (col (cons (car lat) newlat)
              seen)))))))

I’m going to rewrite this in slightly more idiomatic style, replacing (define foo (lambda (x y) ... with (define (foo x y) ..., replacing (quote x) with 'x, and aligning function arguments to make things clearer:

(define (multirember&co a lat col)
  (cond ((null? lat)
         (col '() '()))
        ((eq? (car lat) a)
         (multirember&co a
                         (cdr lat)
                         (lambda (newlat seen)
                           (col newlat
                                (cons (car lat) seen)))))
        (else
          (multirember&co a
                          (cdr lat)
                          (lambda (newlat seen)
                            (col (cons (car lat) newlat)
                                 seen))))))

This function takes an atom a, a list of atoms lat, and a function col (short for ‘collector’), and invokes the function with two lists: one is the original list with all occurrences of a removed, and the other is all the occurrences of a we found. It combines it with this function:

(define a-friend
  (lambda (x y)
    (null? y)))

; or, in modern notation

(define (a-friend x y)
  (null? y))

In the example this returns #f, since tuna appears in the list and so the second argument to a-friend is not empty.

(define foods '(strawberries tuna and swordfish))

(multirember&co 'tuna foods a-friend)
; => #f

Some discussion of this leads up to the tenth commandment:

Build functions to collect more than one value at a time.

The question is, why? We spent some time on this question, since it was not clear why this function should emit its filtered list and the list of occurrences via a callback, rather than simply returning them as a pair of values in a list. We weren’t sure what the commandment was teaching us, or why later examples of this pattern tightly couple multiple operations up in the same function. They seem to be hinting at a need for fold, but never get there. The callback is simply used to emit the final result of an operation.

It also is not clear whether the output of col is intended to be the return value of multirember&co, i.e. that col is a data constructor to be consumed by multirember&co’s caller, or whether col is intended only to deliver the final lists to a callback as a side-effect, and the output of col should be ignored. The use of the term ‘continuation’ suggests the latter, although the purely functional nature of the rest of the book suggests that this is really an unsual method of composition rather than a ‘side effect’ per se.

Recursing with multiple values

One explanation we came up with is that, even though multirember&co could return a list containing two lists, getting those inner lists back out again on recursion would be annoying. Here’s another version that doesn’t use a callback:

(define (partition-eq? a lat)
  (cond ((null? lat)
         '(() ()))
        ((eq? (car lat) a)
         (let ((pair (partition-eq? a (cdr lat))))
           (list (car pair)
                 (cons (car lat)
                       (cadr pair)))))
        (else
          (let ((pair (partition-eq? a (cdr lat))))
            (list (cons (car lat)
                        (car pair))
                  (cadr pair))))))

(partition-eq? 'tuna foods)
; => ((strawberries and swordfish) (tuna))

(In modern Scheme, (cadr x) is a shorthand for (car (cdr x)), i.e. the second element of x.)

Instead of calling a callback, the base case of this function returns '(() ()), a pair of lists that are added to via recursion. This makes the data flow a bit more obvious: we’re back to the common pattern of recurring on the ‘rest’ of the list and then adding to the result. But destructuring the return value at every turn is annoying. Since we need to refer to the list pair generated by recursion multiple times -- to get its components out -- we must store it in a local variable using let or else risk making unnecessary function calls. This concept of local variable binding is never discussed in the book, and without it handling the destructuring becomes painful (let desugars to a lambda that is immediately applied to the inputs).

We can clean things up even further with more local variables:

(define (partition-eq? a lat)
  (cond ((null? lat)
         '(() ()))
        ((eq? (car lat) a)
         (let* ((pair   (partition-eq? a (cdr lat)))
                (newlat (car pair))
                (seen   (cadr pair)))
           (list newlat
                 (cons (car lat) seen))))
        (else
          (let* ((pair   (partition-eq? a (cdr lat)))
                 (newlat (car pair))
                 (seen   (cadr pair)))
            (list (cons (car lat) newlat)
                  seen)))))

Now we can see there's a lot of duplicated logic between two branches of the cond. We can fix that by doing the recursion, then figuring out how to push our current element onto the result:

(define (partition-eq? a lat)
  (cond ((null? lat)
         '(() ()))
        (else
          (let* ((pair   (partition-eq? a (cdr lat)))
                 (newlat (car pair))
                 (seen   (cadr pair)))
            (cond ((eq? (car lat) a)
                   (list newlat (cons (car lat) seen)))
                  (else
                    (list (cons (car lat) newlat) seen)))))))

This keeps the recursive flow clear, and it makes the construction of the next value in the loop clearer.

Composing functions with multiple values

So, it is not very difficult to write multirember&co in way that uses a more natural recursion pattern, assuming you have let available. But, it’s still hard to compose this with other functions: partition-eq? returns one list, but a-friend takes two values, so this would be an error:

(a-friend (partition-eq? 'tuna foods))

Instead, we'd have to do:

(let ((pair (partition-eq? 'tuna foods)))
  (a-friend (car pair) (cadr pair)))

; => #f

Actual Scheme has a function called apply that calls a function with its arguments supplied from a list:

(apply a-friend (partition-eq? 'tuna foods))
; => #f

Or, we can use macros; a macro is an instruction for how to rewrite a source expression in terms of lower-level features. Macro systems were first proposed in a 1963 paper by Hart. Modern Scheme (since 1998) provides the pattern-matching syntax-rules macro system for declaring transformations; Common Lisp uses the more procedural defmacro system where you directly manipulate source trees using the built-in functions.

We can write a macro for destructuring assignment:

(define-syntax let-elements (syntax-rules (in)
  ((let-elements (a b ...) expr in body ...)
   (let* ((args expr)
          (a (car args)))
     (let-elements (b ...) (cdr args) in body ...)))
  ((let-elements () _ in body ...)
   (begin body ...))))

This macro can be used to destructure a list result and hand the pieces off to another function; x is bound to the first element of the result, y to the second.

(let-elements (x y) (partition-eq? 'tuna foods) in (a-friend x y))
; => #f

This expression is rewritten into a series of let* expressions:

(let* ((args (partition-eq? 'tuna foods))
       (x (car args)))
  (let* ((args (cdr args))
         (y (car args)))
    (begin
      (a-friend x y))))

And let* is itself a macro that desugars to lambda application:

((lambda (args)
   ((lambda (x)
      ((lambda (args)
         ((lambda (y)
            (a-friend x y))
          (car args)))
       (cdr args)))
    (car args)))
 (partition-eq? 'tuna foods))

; => #f

Which brings us back to the limited set of concepts used in the book. This is an obtuse way to write the function, and in the absence of a built-in let or a macro system, it seems reasonable to look for other patterns for using lambda as a convenient variable binding mechanism.

However, given that apply appears in McCarthy’s original 1960 paper, its omission from The Little Schemer seems odd. It greatly simplifies the problem of composing functions that return multiple values.

Diversion: Haskell

It’s instructive to rewrite these examples in Haskell to see which things stay the same, and which things recommended by the book are there due to the limitations of the Scheme language. Here’s multirember&co and a-friend:

multiremberCo                 :: (Eq a) => a -> [a] -> ([a] -> [a] -> b) -> b
multiremberCo _ []     col    =  col [] []
multiremberCo a (x:xs) col
    | a == x                  =  multiremberCo a xs colSeen
    | otherwise               =  multiremberCo a xs colRest
    where
        colSeen newlat seen   =  col newlat (x:seen)
        colRest newlat seen   =  col (x:newlat) seen

aFriend         :: [a] -> [a] -> Bool
aFriend xs ys   =  null ys

Here we use a combination of pattern matching, guards and where clauses to define helper functions. This is literally the same algorithm, but expressed differently. To explain what’s going on in each line:

-- This is a type signature. It says that multiremberCo takes three arguments:
-- one of type `a`, one of type `[a]` (a list of values of type `a`), and one of
-- type `([a] -> [a] -> b)`: a function that takes two lists of `a` and returns
-- a `b`. `a` is a 'type variable', it means the value can take any type so long
-- all the places that `a` appears in the signature are the same type. You can
-- pass an `Int` and a `[Int]`, but not a `Bool` and a `[Int]`. The `(Eq a) =>`
-- part means that whatever type `a` is, it must implement the equality
-- interface, since we're going to pass `a` to the `==` function.
multiremberCo                 :: (Eq a) => a -> [a] -> ([a] -> [a] -> b) -> b

-- This is a pattern-match for the case where the list is empty. We use `_` for
-- the first parameter because we don't care about its value in this case.
multiremberCo _ []     col    =  col [] []

-- Now we handle the general case. The pipes present different boolean checks on
-- the function arguments and say what to do in each case. The `(x:xs)`
-- expression is destructuring the list argument into its first element `x` and
-- the rest of the list `xs`. `where` lets us set the values of internal
-- variables; these bindings apply across all the guard clauses.
multiremberCo a (x:xs) col
    | a == x                  =  multiremberCo a xs colSeen
    | otherwise               =  multiremberCo a xs colRest
    where
        colSeen newlat seen   =  col newlat (x:seen)
        colRest newlat seen   =  col (x:newlat) seen

For example:

let foods = ["strawberries", "tuna", "and", "swordfish"]
multiremberCo "tuna" foods aFriend
-- => False

This is where the type signature helps you to see how data flows: the fact that col takes two [a] and returns a b, and multiremberCo takes an a and [a] and returns a b, means that multiremberCo must produce its result by handing some derivative of its arguments off to col and returning the result.

We could bring our original Scheme function a little closer by extracting some helper functions:

(define (multirember&co a lat col)
  (let ((col-seen (lambda (newlat seen)
                    (col newlat (cons (car lat) seen))))
        (col-rest (lambda (newlat seen)
                    (col (cons (car lat) newlat) seen))))
    (cond ((null? lat)
           (col '() '()))
          ((eq? (car lat) a)
           (multirember&co a (cdr lat) col-seen))
          (else
            (multirember&co a (cdr lat) col-rest)))))

This helps separate the decision-making in the cond from the logic for how to build the next value, which is wrapped up in the helper functions. But again, this relies on let, which the book does not use.

Currying

Let’s rewrite our function one final time to remove the hard-coded eq?. Instead, we will let partition take a predicate p and a list lat, and filter out the elements for which (p x) is #t.

(define (partition p lat)
  (cond ((null? lat)
         '(() ()))
        (else
          (let* ((pair   (partition p (cdr lat)))
                 (newlat (car pair))
                 (seen   (cadr pair)))
            (cond ((p (car lat))
                   (list newlat (cons (car lat) seen)))
                  (else
                    (list (cons (car lat) newlat) seen)))))))

We can filter out the elements matching tuna:

(partition (lambda (a) (eq? 'tuna a)) foods)
; => ((strawberries and swordfish) (tuna))

Writing out lambdas creates a lot of line-noise, so let’s extract that function and give it a name:

(define (is-tuna? a)
  (eq? 'tuna a))

(partition is-tuna? foods)
; => ((strawberries and swordfish) (tuna))

But it would be even better if we didn’t have to write specialised functions to check for particular values at all. If we had a curried version of eq?, we could produce specialised versions of it to check for particular values:

(define (is? a)
  (lambda (b)
    (eq? a b)))

(partition (is? 'tuna) foods)
; => ((strawberries and swordfish) (tuna))

Again, Haskell provides an interesting example. Here’s partition; it takes a function a -> Bool and a list [a] and returns a tuple contain two lists of a.

partition             :: (a -> Bool) -> [a] -> ([a], [a])
partition _ []        =  ([], [])
partition p (x:xs)
    | p x             =  (rest, x:seen)
    | otherwise       =  (x:rest, seen)
    where
        (rest, seen)  =  partition p xs

This time, the base case returns a tuple containing two empty lists. Haskell has built-in destructuring; (rest, seen) = partition p xs takes the tuple returned by the recursive call and assigns its first member to rest and the second to seen. x:y is how Haskell says (cons x y), which hugely reduces line noise and makes the structure more apparent.

We could call this by passing in a lambda:

partition (\x -> x == "tuna") foods
-- => (["strawberries", "and", "swordfish"], ["tuna"])

But this is not necessary. == is a function; symbolic functions in Haskell are written infix rather than prefix. All functions are curried by default, so you can pass any function fewer arguments than it expects and get an intermediate function. Infix functions can be partially applied on both their left and right operands, so for example == "tuna" is equivalent to the lambda we used above.

partition (== "tuna") foods
-- => (["strawberries", "and", "swordfish"], ["tuna"])

We can do the same with other operators:

partition (> 5) [1..10]
-- => ([1,2,3,4,5], [6,7,8,9,10])

This massively cuts down on the amount of boilerplate needed to write higher-order functions. But what about composition? aFriend takes two lists, not a tuple, so it doesn’t compose with partition. There are a couple of solutions to this; we can either use let to destructure:

let (xs, ys) = partition (== "tuna") foods in aFriend xs ys
-- => False

(This is what I modelled the let-elements macro on.)

Or, Haskell has a function called uncurry:

uncurry :: (a -> b -> c) -> ((a, b) -> c)

uncurry takes a function that takes two arguments, and returns a function that takes both arguments as a tuple.

uncurry aFriend (partition (== "tuna") foods)
-- => False

Could we do this in Scheme?

(define (uncurry f)
  (lambda (tuple)
    (f (car tuple) (cadr tuple))))

((uncurry a-friend) (partition (is? 'tuna) foods))
; => #f

We haven’t used any concepts outside the book to define this helper function, and it seems to lead to better separation of concerns.

Tail recursion

I can only think of one other reason that multirember&co is presented in the way that it is. Here it is again:

(define (multirember&co a lat col)
  (cond ((null? lat)
         (col '() '()))
        ((eq? (car lat) a)
         (multirember&co a
                         (cdr lat)
                         (lambda (newlat seen)
                           (col newlat
                                (cons (car lat) seen)))))
        (else
          (multirember&co a
                          (cdr lat)
                          (lambda (newlat seen)
                            (col (cons (car lat) newlat)
                                 seen))))))

In this definition, the last expression in any branch of this function is a call to multirember&co itself: the value of a call to multirember&co is the value of the recusrive call to itself. Similarly, the internal lambdas that wrap col have a call to col as their very last expression. This means this definition is tail recursive, so it can be applied to an indefinitely long input without blowing the stack. The same is not true of the partition definition, although it would be possible to rewrite it to be tail-recursive.

However, the book doesn’t seem to be making an argument about implementation details of the platform, and tail calls were initially described in a 1974 paper. It seems to be telling you that writing functions in this style is a good idea as part of generally good programming style, but given even the limited tools available in the book, which includes higher-order functions, I’m not convinced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment