On page 137 of The Little Schemer, this rather convoluted function appears:
(define multirember&co
(lambda (a lat col)
(cond
((null? lat)
(col (quote ()) (quote ())))
((eq? (car lat) a)
(multirember&co a
(cdr lat)
(lambda (newlat seen)
(col newlat
(cons (car lat) seen)))))
(else
(multirember&co a
(cdr lat)
(lambda (newlat seen)
(col (cons (car lat) newlat)
seen)))))))
I’m going to rewrite this in slightly more idiomatic style, replacing (define foo (lambda (x y) ...
with (define (foo x y) ...
, replacing (quote x)
with
'x
, and aligning function arguments to make things clearer:
(define (multirember&co a lat col)
(cond ((null? lat)
(col '() '()))
((eq? (car lat) a)
(multirember&co a
(cdr lat)
(lambda (newlat seen)
(col newlat
(cons (car lat) seen)))))
(else
(multirember&co a
(cdr lat)
(lambda (newlat seen)
(col (cons (car lat) newlat)
seen))))))
This function takes an atom a
, a list of atoms lat
, and a function col
(short for ‘collector’), and invokes the function with two lists: one is the
original list with all occurrences of a
removed, and the other is all the
occurrences of a
we found. It combines it with this function:
(define a-friend
(lambda (x y)
(null? y)))
; or, in modern notation
(define (a-friend x y)
(null? y))
In the example this returns #f
, since tuna
appears in the list and so the
second argument to a-friend
is not empty.
(define foods '(strawberries tuna and swordfish))
(multirember&co 'tuna foods a-friend)
; => #f
Some discussion of this leads up to the tenth commandment:
Build functions to collect more than one value at a time.
The question is, why? We spent some time on this question, since it was not
clear why this function should emit its filtered list and the list of
occurrences via a callback, rather than simply returning them as a pair of
values in a list. We weren’t sure what the commandment was teaching us, or why
later examples of this pattern tightly couple multiple operations up in the same
function. They seem to be hinting at a need for fold
, but never get there. The
callback is simply used to emit the final result of an operation.
It also is not clear whether the output of col
is intended to be the return
value of multirember&co
, i.e. that col
is a data constructor to be consumed
by multirember&co
’s caller, or whether col
is intended only to deliver the
final lists to a callback as a side-effect, and the output of col
should be
ignored. The use of the term ‘continuation’ suggests the latter, although the
purely functional nature of the rest of the book suggests that this is really an
unsual method of composition rather than a ‘side effect’ per se.
One explanation we came up with is that, even though multirember&co
could
return a list containing two lists, getting those inner lists back out again on
recursion would be annoying. Here’s another version that doesn’t use a callback:
(define (partition-eq? a lat)
(cond ((null? lat)
'(() ()))
((eq? (car lat) a)
(let ((pair (partition-eq? a (cdr lat))))
(list (car pair)
(cons (car lat)
(cadr pair)))))
(else
(let ((pair (partition-eq? a (cdr lat))))
(list (cons (car lat)
(car pair))
(cadr pair))))))
(partition-eq? 'tuna foods)
; => ((strawberries and swordfish) (tuna))
(In modern Scheme, (cadr x)
is a shorthand for (car (cdr x))
, i.e. the
second element of x
.)
Instead of calling a callback, the base case of this function returns '(() ())
, a pair of lists that are added to via recursion. This makes the data flow
a bit more obvious: we’re back to the common pattern of recurring on the ‘rest’
of the list and then adding to the result. But destructuring the return value at
every turn is annoying. Since we need to refer to the list pair
generated by
recursion multiple times -- to get its components out -- we must store it in a
local variable using let
or else risk making unnecessary function calls. This
concept of local variable binding is never discussed in the book, and without it
handling the destructuring becomes painful (let
desugars to a lambda
that is
immediately applied to the inputs).
We can clean things up even further with more local variables:
(define (partition-eq? a lat)
(cond ((null? lat)
'(() ()))
((eq? (car lat) a)
(let* ((pair (partition-eq? a (cdr lat)))
(newlat (car pair))
(seen (cadr pair)))
(list newlat
(cons (car lat) seen))))
(else
(let* ((pair (partition-eq? a (cdr lat)))
(newlat (car pair))
(seen (cadr pair)))
(list (cons (car lat) newlat)
seen)))))
Now we can see there's a lot of duplicated logic between two branches of the
cond
. We can fix that by doing the recursion, then figuring out how to push
our current element onto the result:
(define (partition-eq? a lat)
(cond ((null? lat)
'(() ()))
(else
(let* ((pair (partition-eq? a (cdr lat)))
(newlat (car pair))
(seen (cadr pair)))
(cond ((eq? (car lat) a)
(list newlat (cons (car lat) seen)))
(else
(list (cons (car lat) newlat) seen)))))))
This keeps the recursive flow clear, and it makes the construction of the next value in the loop clearer.
So, it is not very difficult to write multirember&co
in way that uses a more
natural recursion pattern, assuming you have let
available. But, it’s still
hard to compose this with other functions: partition-eq?
returns one list, but
a-friend
takes two values, so this would be an error:
(a-friend (partition-eq? 'tuna foods))
Instead, we'd have to do:
(let ((pair (partition-eq? 'tuna foods)))
(a-friend (car pair) (cadr pair)))
; => #f
Actual Scheme has a function called apply
that calls a function with its
arguments supplied from a list:
(apply a-friend (partition-eq? 'tuna foods))
; => #f
Or, we can use macros; a macro is an instruction for how to rewrite a source
expression in terms of lower-level features. Macro systems were first proposed
in a 1963 paper by
Hart. Modern
Scheme (since 1998) provides the pattern-matching syntax-rules
macro system
for declaring transformations; Common Lisp uses the more procedural defmacro
system where you directly manipulate source trees using the built-in functions.
We can write a macro for destructuring assignment:
(define-syntax let-elements (syntax-rules (in)
((let-elements (a b ...) expr in body ...)
(let* ((args expr)
(a (car args)))
(let-elements (b ...) (cdr args) in body ...)))
((let-elements () _ in body ...)
(begin body ...))))
This macro can be used to destructure a list result and hand the pieces off to
another function; x
is bound to the first element of the result, y
to the
second.
(let-elements (x y) (partition-eq? 'tuna foods) in (a-friend x y))
; => #f
This expression is rewritten into a series of let*
expressions:
(let* ((args (partition-eq? 'tuna foods))
(x (car args)))
(let* ((args (cdr args))
(y (car args)))
(begin
(a-friend x y))))
And let*
is itself a macro that desugars to lambda application:
((lambda (args)
((lambda (x)
((lambda (args)
((lambda (y)
(a-friend x y))
(car args)))
(cdr args)))
(car args)))
(partition-eq? 'tuna foods))
; => #f
Which brings us back to the limited set of concepts used in the book. This is an
obtuse way to write the function, and in the absence of a built-in let
or a
macro system, it seems reasonable to look for other patterns for using lambda
as a convenient variable binding mechanism.
However, given that apply
appears in McCarthy’s original 1960
paper, its omission from The
Little Schemer seems odd. It greatly simplifies the problem of composing
functions that return multiple values.
It’s instructive to rewrite these examples in Haskell to see which things stay
the same, and which things recommended by the book are there due to the
limitations of the Scheme language. Here’s multirember&co
and a-friend
:
multiremberCo :: (Eq a) => a -> [a] -> ([a] -> [a] -> b) -> b
multiremberCo _ [] col = col [] []
multiremberCo a (x:xs) col
| a == x = multiremberCo a xs colSeen
| otherwise = multiremberCo a xs colRest
where
colSeen newlat seen = col newlat (x:seen)
colRest newlat seen = col (x:newlat) seen
aFriend :: [a] -> [a] -> Bool
aFriend xs ys = null ys
Here we use a combination of pattern matching, guards and where
clauses to
define helper functions. This is literally the same algorithm, but expressed
differently. To explain what’s going on in each line:
-- This is a type signature. It says that multiremberCo takes three arguments:
-- one of type `a`, one of type `[a]` (a list of values of type `a`), and one of
-- type `([a] -> [a] -> b)`: a function that takes two lists of `a` and returns
-- a `b`. `a` is a 'type variable', it means the value can take any type so long
-- all the places that `a` appears in the signature are the same type. You can
-- pass an `Int` and a `[Int]`, but not a `Bool` and a `[Int]`. The `(Eq a) =>`
-- part means that whatever type `a` is, it must implement the equality
-- interface, since we're going to pass `a` to the `==` function.
multiremberCo :: (Eq a) => a -> [a] -> ([a] -> [a] -> b) -> b
-- This is a pattern-match for the case where the list is empty. We use `_` for
-- the first parameter because we don't care about its value in this case.
multiremberCo _ [] col = col [] []
-- Now we handle the general case. The pipes present different boolean checks on
-- the function arguments and say what to do in each case. The `(x:xs)`
-- expression is destructuring the list argument into its first element `x` and
-- the rest of the list `xs`. `where` lets us set the values of internal
-- variables; these bindings apply across all the guard clauses.
multiremberCo a (x:xs) col
| a == x = multiremberCo a xs colSeen
| otherwise = multiremberCo a xs colRest
where
colSeen newlat seen = col newlat (x:seen)
colRest newlat seen = col (x:newlat) seen
For example:
let foods = ["strawberries", "tuna", "and", "swordfish"]
multiremberCo "tuna" foods aFriend
-- => False
This is where the type signature helps you to see how data flows: the fact that
col
takes two [a]
and returns a b
, and multiremberCo
takes an a
and
[a]
and returns a b
, means that multiremberCo
must produce its result by
handing some derivative of its arguments off to col
and returning the result.
We could bring our original Scheme function a little closer by extracting some helper functions:
(define (multirember&co a lat col)
(let ((col-seen (lambda (newlat seen)
(col newlat (cons (car lat) seen))))
(col-rest (lambda (newlat seen)
(col (cons (car lat) newlat) seen))))
(cond ((null? lat)
(col '() '()))
((eq? (car lat) a)
(multirember&co a (cdr lat) col-seen))
(else
(multirember&co a (cdr lat) col-rest)))))
This helps separate the decision-making in the cond
from the logic for how to
build the next value, which is wrapped up in the helper functions. But again,
this relies on let
, which the book does not use.
Let’s rewrite our function one final time to remove the hard-coded eq?
.
Instead, we will let partition
take a predicate p
and a list lat
, and
filter out the elements for which (p x)
is #t
.
(define (partition p lat)
(cond ((null? lat)
'(() ()))
(else
(let* ((pair (partition p (cdr lat)))
(newlat (car pair))
(seen (cadr pair)))
(cond ((p (car lat))
(list newlat (cons (car lat) seen)))
(else
(list (cons (car lat) newlat) seen)))))))
We can filter out the elements matching tuna
:
(partition (lambda (a) (eq? 'tuna a)) foods)
; => ((strawberries and swordfish) (tuna))
Writing out lambdas creates a lot of line-noise, so let’s extract that function and give it a name:
(define (is-tuna? a)
(eq? 'tuna a))
(partition is-tuna? foods)
; => ((strawberries and swordfish) (tuna))
But it would be even better if we didn’t have to write specialised functions to
check for particular values at all. If we had a curried version of eq?
, we
could produce specialised versions of it to check for particular values:
(define (is? a)
(lambda (b)
(eq? a b)))
(partition (is? 'tuna) foods)
; => ((strawberries and swordfish) (tuna))
Again, Haskell provides an interesting example. Here’s partition
; it takes a
function a -> Bool
and a list [a]
and returns a tuple contain two lists of
a
.
partition :: (a -> Bool) -> [a] -> ([a], [a])
partition _ [] = ([], [])
partition p (x:xs)
| p x = (rest, x:seen)
| otherwise = (x:rest, seen)
where
(rest, seen) = partition p xs
This time, the base case returns a tuple containing two empty lists. Haskell has
built-in destructuring; (rest, seen) = partition p xs
takes the tuple returned
by the recursive call and assigns its first member to rest
and the second to
seen
. x:y
is how Haskell says (cons x y)
, which hugely reduces line noise
and makes the structure more apparent.
We could call this by passing in a lambda:
partition (\x -> x == "tuna") foods
-- => (["strawberries", "and", "swordfish"], ["tuna"])
But this is not necessary. ==
is a function; symbolic functions in Haskell are
written infix rather than prefix. All functions are curried by default, so you
can pass any function fewer arguments than it expects and get an intermediate
function. Infix functions can be partially applied on both their left and right
operands, so for example == "tuna"
is equivalent to the lambda we used above.
partition (== "tuna") foods
-- => (["strawberries", "and", "swordfish"], ["tuna"])
We can do the same with other operators:
partition (> 5) [1..10]
-- => ([1,2,3,4,5], [6,7,8,9,10])
This massively cuts down on the amount of boilerplate needed to write
higher-order functions. But what about composition? aFriend
takes two lists,
not a tuple, so it doesn’t compose with partition
. There are a couple of
solutions to this; we can either use let
to destructure:
let (xs, ys) = partition (== "tuna") foods in aFriend xs ys
-- => False
(This is what I modelled the let-elements
macro on.)
Or, Haskell has a function called uncurry
:
uncurry :: (a -> b -> c) -> ((a, b) -> c)
uncurry
takes a function that takes two arguments, and returns a function that
takes both arguments as a tuple.
uncurry aFriend (partition (== "tuna") foods)
-- => False
Could we do this in Scheme?
(define (uncurry f)
(lambda (tuple)
(f (car tuple) (cadr tuple))))
((uncurry a-friend) (partition (is? 'tuna) foods))
; => #f
We haven’t used any concepts outside the book to define this helper function, and it seems to lead to better separation of concerns.
I can only think of one other reason that multirember&co
is presented in the
way that it is. Here it is again:
(define (multirember&co a lat col)
(cond ((null? lat)
(col '() '()))
((eq? (car lat) a)
(multirember&co a
(cdr lat)
(lambda (newlat seen)
(col newlat
(cons (car lat) seen)))))
(else
(multirember&co a
(cdr lat)
(lambda (newlat seen)
(col (cons (car lat) newlat)
seen))))))
In this definition, the last expression in any branch of this function is a call
to multirember&co
itself: the value of a call to multirember&co
is the
value of the recusrive call to itself. Similarly, the internal lambdas that wrap
col
have a call to col
as their very last expression. This means this
definition is tail recursive, so it can be applied to an indefinitely long
input without blowing the stack. The same is not true of the partition
definition, although it would be possible to rewrite it to be tail-recursive.
However, the book doesn’t seem to be making an argument about implementation details of the platform, and tail calls were initially described in a 1974 paper. It seems to be telling you that writing functions in this style is a good idea as part of generally good programming style, but given even the limited tools available in the book, which includes higher-order functions, I’m not convinced.