The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise
Edsger W. Dijkstra
This is my contribution to F# Advent Calendar in English. Thanks to Sergey Tihon for making sure this happens every year.
An abstraction most .NET developers are familiar with is IEnumerable
(aka Seq
in F#). IEnumerable
is an abstraction over a set of values in a particular order and it looks roughly like this:
type IEnumerator<'T> =
interface
inherit System.IDisposable
abstract Current: 'T
abstract MoveNext: unit -> bool
end
type IEnumerable<'T> =
interface
abstract GetEnumerator: unit -> IEnumerator<'T>
end
On top of this abstraction .NET developers have implemented the operations filter
, map
and so on that enables us to build pipelines where values through flow.
To visit all values in an IEnumerable
there are various options such as using for
, Seq.fold
or Seq.iter
, however internally they boil down to this code pattern.
let s: 'T seq = ???
let e = s.GetEnumerator ()
try
while e.MoveNext () do
doStuff e.Current
finally
e.Dispose ()
This is what is sometimes known as a pull pipeline, that is we try to “pull” one value out of the pipeline by calling MoveNext
, if that was successful, we “doStuff” on the pulled value. To support abstracting over a set of values generated from resources that need disposing we need to dispose the enumerator.
There are few drawbacks with IEnumerable
- You need to implement two interfaces,
IEnumerable
andIEnumerator
- Implementing
IEnumerator
can feel a bit awkward as work is performed inMoveNext
but then value is retrieved usingCurrent
. - Each step in the pipeline essentially reimplements the visitation pattern above meaning there are two virtual calls per value and a test, all wrapped in a
try…finally
A benefit of IEnumerable
is that each step in the pipeline does not have to materialize the entire collection as is the case when you use Array
or List
in F#. However, there is an obvious alternative to pull pipelines in push pipelines.
In a push pipeline the values are pushed from the source through the pipeline rather than the receiver pull the values through the pipeline.
A push pipeline tends to have a:
- More elegant implementation
- More efficient implementation by reducing virtual calls and unnecessary checks
There are drawbacks such that once the pipeline is started it will push all data through the pipeline until stopped or reaching the end where IEnumerable
by its design is yieldable.
Nessos Streams pulls most of the benefits of Pull and Push pipelines by combining them into a “pump pipeline” at the expense of increased implementation complexity.
One of my favorite Push pipelines is church encoded lists, as described here as an example of Rank N types in Haskell.
A Church Encoded List is an abstraction over a set of values that exposes a single method runList
which behaves like foldBack except the collection is given by the abstraction.
The name Church Encoded Lists is likely derived from how natural numbers are commonly encoded in lambda calculus.
0 := \fx. x
1 := \fx. f x
2 := \fx. f (f x)
A church encoded numeral n
is a function that applies f
n times over x
. A church encoded list in F# does the same for a list of length n
except it also passes the values of the list through f
as well.
In F# we could try to model the Church Encoded List like this:
type Cel<'T> = ('T -> 'S -> 'S) -> 'S -> 'S
Assuming above was legal in F# we would sum all numbers in a Church Encoded List like this:
let vs: Cel<int> = ???
let sum = vs (+) 0 // Folds all values in vs using + and 0
The type definition isn’t legal F# though because F# don’t support Rank-N types for function types (The folding functions should work for all ‘S?) and including ‘S in the type definition quickly feels clumsy and awkward. .NET do support Rank-2 types in generic methods in generic classes though so an alternative definition for a Church Encoded List could look like this:
type [<AbstractClass>] Cel<'T>() =
abstract RunList: ('T -> 'S -> 'S) -> 'S -> 'S
In this case there isn’t a problem as RunList
is now a generic method over ´S.
Let us define a few simple functions:
module Cel =
// Helper method to run a list over a folder function and a zero value
let inline runList (c: Cel<'T>) folder (zero: 'S) = c.RunList folder zero
// The empty list
[<GeneralizableValue>]
let nil<'T> =
{ new Cel<'T>() with
override _.RunList f z = z
}
// Appends a value
let inline cons x cel =
{ new Cel<'T>() with
override _.RunList f z = f x (runList cel f z)
}
The empty list contains no values so just returns the “zero” value. For the single value list run the folder function once over the value and the zero value.
Implementing an Church Encoded List over an ordinary F# list is easy enough
let inline fromList xs =
{ new Cel<'T>() with
override x.RunList f z = List.foldBack f xs z
}
As runList should behave as foldBack
we just forward the folder function and the zero value to List.foldBack
.
Going from a Church Encoded List to an F# list is easy as well
let inline toList cel = runList cel (fun v s -> v::s) []
The folder function prepends v on list s and the zero value is the empty list. As runList
is reverse folding there is no need to reverse the list at the end.
The Haskell page on Rank-N types list 3 laws for the operations we defined so far:
runList xs cons nil == xs
runList (fromList xs) f z == foldr f z xs
foldr f z (toList xs) == runList xs f z
First law states that given any church encoded list xs
when we fold with cons and nil the resulting list shall be identical, essentially a copy list operation.
Second law states that given any F# list xs
, any folder function f
and any initial value z
when we runList
over a church encoded list created by fromList xs
that is identical to when we do reverse fold over the given list (List.foldBack
in F#).
Third law states that given any church encoded list xs
, any folder function f
and any initial value z
when we do reverse fold over the given list created by toList xs
that is identical to when we runList
over the given list.
If these 3 laws hold for all values then the implementation so far is sound.
The best thing would be to prove our implementation but that can be difficult. Often in the software industry we use testing as a compromise. One way to test is to generate random input values to make sure the laws hold for all given values.
This is sometimes called property-based testing and for F# there is an excellent library called FsCheck that enables property-based testing.
Using FsCheck it’s possible to write the tests of the 3 laws as this:
type Properties =
// Church Encoded List Laws
static member ``For all xs: runList xs cons nil == xs`` (xs: int Cel) =
runList xs cons nil == xs
static member ``For all xs, f and z: runList (fromList xs) f z == foldr f z xs`` (xs: int list) (f: int -> int -> int) (z: int) =
runList (fromList xs) f z == List.foldBack f xs z
static member ``For all cs, f and z: foldr f z (toList cs) == runList cs f z`` (xs: int Cel) (f: int -> int -> int) (z: int) =
List.foldBack f (toList xs) z == runList xs f z
Each test function is called with randomly generated data and returns true if the law holds true for the data given. This is not proof that the laws always hold but tests never provide proofs, good tests just builds confidence. Property based testing properly done can give a high degree of confidence with little effort.
The testing code also looks like the original laws which is a nice touch.
Common operations for data pipelines are collect
, filter
and map
. Let’s look at how they can be implemented for Church Encoded Lists.
let inline collect collector cel =
{ new Cel<'U>() with
override _.RunList f z =
let ff v s = runList (collector v) f s
runList cel ff z
}
let inline filter filterer cel =
{ new Cel<'T>() with
override _.RunList f z = runList cel (fun v s -> if filterer v then f v s else s) z
}
let inline map mapper cel =
{ new Cel<'U>() with
override _.RunList f z = runList cel (fun v s -> f (mapper v) s) z
}
The implementations are short but might be a bit tricky to follow. Let us start with filter
:
override _.RunList f z = runList cel (fun v s -> if filterer v then f v s else s) z
What happens here is that we runList
on the input cel
but we pass it a filter fold function. This function checks if v
passed the condition and if so applies f v s
to get a new state, otherwise it passes the existing state.
map
is similar:
override _.RunList f z = runList cel (fun v s -> f (mapper v) s) z
Here a map folder function is passed to runList
that for each value in cel
maps it to the new value domain and applies it to f
to get a new state.
Finally let us look at collect
:
override _.RunList f z =
let ff v s = runList (collector v) f s
runList cel ff z
collect
aka flatMap
conceptually calls collector
for each value in cel
to get a Church Encoded List for each value. These lists are concatenated to the final Church Encoded List.
The implementation is simpler though as we create a fold function ff
that given a value v
does runList
on the Church Encoded List created from the value.
The definitions of collect
, filter
and map
look to me close to how they would be defined mathematically given foldr
function.
As we already tested the church encoded laws in terms of foldr
for lists in order to test collector, filter and map we can test them by introducing laws saying that for all values they should behave as collect
, filter
and map
for lists.
filter f xs == (fromList >> filter f >> toList) xs
map f xs == (fromList >> map f >> toList) xs
collect f xxs == (fromList >> collect (f >> fromList) >> toList) xxs
fromList >> g >> toList
is a pattern that maps the input list into a Church Encoded list, applies g to the it and then maps the result back to list. This is because ’T list
and ’T cel
aren’t directly comparable.
Given xs
is a ’T list
, f
is any function and xxs
is ’T list list
we can read the rules such as that for all possible combinations of xs
, f
and xxs
- List filter produce the same result as filter for Church Encoded lists
- List map produce the same result as map for Church Encoded lists
- List collect produce the same result as collect for Church Encoded lists
Using FsCheck to check the laws it could look like this.
static member ``For all f and xxs: List.collect and collect produce equal results`` (f: int list -> int list) (xxs: int list list) =
List.collect f xxs == (fromList >> collect (f >> fromList) >> toList) xxs
static member ``For all f and xs: List.filter and filter produce equal results`` (xs: int list) (f: int -> bool) =
List.filter f xs == (fromList >> filter f >> toList) xs
static member ``For all f and xs: List.map and map produce equal results`` (xs: int list) (f: int -> int) =
List.map f xs == (fromList >> map f >> toList) xs
runList
is essentially a foldBack
so here's a suggestion for what map
could be in terms of runList
let map f xs = runList xs (fun v s -> cons (f v) s) nil
We say that map
should be the same as folding back over xs and cons
all mapped values f v
on the empty list nil
.
From this we can then define a property test:
static member ``For all f and cs: map produce equal results to runList concatenating the results of f`` (xs: int Cel) (f: int -> int) =
map f xs == runList xs (fun v s -> cons (f v) s) nil
Filter can also be expressed in terms of runList
with little effort:
let filter f xs = runList xs (fun v s -> if f v then cons v s else s) nil
We say that filter
should be the same as folding back over xs and cons
all values where f v
is true
on the empty list nil
.
From this we can then define a property test:
static member ``For all f and cs: map produce equal results to runList concatenating the true results for f`` (xs: int Cel) (f: int -> bool) =
filter f xs == runList xs (fun v s -> if f v then cons v s else s) nil
Collect aka flatmap can also be defined in terms of runList
as it's conceptually a mapping operation that concatenates the results.
let collect f xs = runList xs (fun v s -> runList (f v) cons s) nil
We have a list that through the function f
generates lists to be concatenated so it makes sense to have two nested runList
. The first folding function runs f v
to get the list from v
and then concatenates on s
using cons
.
The property test:
static member ``For all f and ccs: collect produce equal results to runList that concatenates the results of the lists of f`` (xs: int Cel Cel) =
// FsCheck generating: int Cel -> int Cel function results in a quite slow function
// so use id for now
let f = id
collect f xs == runList xs (fun v s -> runList (f v) cons s) nil
We can simplify our definitions further for map and filter by defining them in terms of collect
let filter f xs = collect (fun v -> if f v then cons v nil else nil) xs
let map f xs = collect (fun v -> cons (f v) nil) xs
cons v nil
creates a single element list which we can use for both filter
and map
when defining them in terms of collect
.
The property tests will be:
// filter and map laws defined in terms of collect
static member ``For all f and cs: filter produce equal results to collect concatenating singletons list for all true results for f`` (xs: int Cel) (f: int -> bool) =
filter f xs == collect (fun v -> if f v then cons v nil else nil) xs
static member ``For all f and cs: map produce equal results to collect concatenating singletons list with results for f`` (xs: int Cel) (f: int -> int) =
map f xs == collect (fun v -> cons (f v) nil) xs
The different tests for collect
, filter
and map
is redundant and which style you prefer is up to you. Using a well-known library as an oracle is a quick way to achieve property tests that are easy to understand.
Taking the extra time to define the property tests in terms of simpler primitives gives the writer more insight into what is actually a filter
function? Coming back a few months later to the tests it might not feel as obvious though.
Finally, we can implement fromRange
and sum
such as this:
let inline fromRange b e =
{ new Cel<int>() with
override _.RunList f z =
let rec loop f s b i =
if i >= b then
loop f (f i s) b (i - 1)
else
s
loop f z b e
}
let inline sum cel = runList cel (+) LanguagePrimitives.GenericZero
While possible to implement fromRange
using F# for
I personally like tail recursion as it allows me to avoid mutable state, note that the loop goes backwards because runList
is essentially a foldBack
.
sum
is trivial as it just forwards a folder function +
and the generic zero to runList
.
The laws for these two functions could be something like this:
[x..y] == toList (fromRange x y)
(n*n + n)/2 == sum (fromRange 0 n)
The first law states that given any integer x
and y
the list generated from the comprehension [x..y]
should always be identical to the list created using toList (fromRange x y)
The second law says that given any integer n
(which is greater or equal to 0) (n*n + n)/2
should always be equal to the sum of all integers between 0 and n inclusive. (n*n + n)/2
is a known expression, the sum of integers from 0 to n inclusive, a sum sequence is known as an arithmetic sequence (https://en.wikipedia.org/wiki/Arithmetic_progression).
The property tests could look like this:
static member ``For all xs: x..y is equal to list created from fromRange x y`` (x: int) (y: int) =
[x..y] == toList (fromRange x y)
static member ``For all n: (n*n + n)/2 is equal to sum (fromRange 0 n)`` (n: int) =
let n = abs n
(n*n + n)/2 == sum (fromRange 0 n)
As they say, I leave it as an exercise to the reader to implement the functions above for IEnumerable
. I think you will find that exercise somewhat more cumbersome than implementing them for a Church Encoded List.
Even though the Church Encoded List definition might seem non-obvious the functions we create become almost the mathematical definitions of what they should be. Testing almost seems unnecessary but if we derive laws and then we can then test these laws using FsCheck and with little effort reach a high degree of confidence.
Working further and changing the definition to use fold
rather than foldRight
the Church Encoded List is quite performant abstraction as well, often out-performing Linq
. Due to technicalities around how tail recursive functions are implemented in .NET to get predictable performance one has to apply some ugly hacks to avoid the function calls from being invoked as tail calls.
In conclusion I hope you can agree with me that the Church Encoded List is an elegant abstraction and testing with FsCheck is just awesome.
Merry Christmas, Mårten
- Rank-N types in Haskell - https://wiki.haskell.org/Rank-N_types
- Church numerals - https://en.wikipedia.org/wiki/Church_encoding
- FsCheck - https://fscheck.github.io/FsCheck/
- Arithmetic progression - https://en.wikipedia.org/wiki/Arithmetic_progression
- Algebraic driven design - https://algebradriven.design/