Skip to content

Instantly share code, notes, and snippets.

Created August 20, 2017 13:08
Show Gist options
  • Save zmactep/c5e167c86fb8d80dcd5532792371863f to your computer and use it in GitHub Desktop.
Save zmactep/c5e167c86fb8d80dcd5532792371863f to your computer and use it in GitHub Desktop.
Number encodings

Alternative to the Church, Scott and Parigot encodings of data on the Lambda Calculus.

When it comes to encoding data on the pure λ-calculus (without complex extensions such as ADTs), there are 3 widely used approaches.

Church Encoding

The Church Encoding, which represents data structures as their folds. Using Caramel’s syntax, the natural number 3 is, for example. represented as:

0 c0 = (f x -> x)
1 c1 = (f x -> (f x))
2 c2 = (f x -> (f (f x)))
3 c3 = (f x -> (f (f (f x))))

The Church Encoding provide iteration, and is typeable in System F, which mean it can be used in a total language. Algorithms on Church-encoded data are strongly normalizing; writing foldNat is trivial. It has a major issue, though: pattern matching takes O(N) time, and so does pred, which isn’t trivial to write.

foldNat = (s z nat -> (nat s z))                    -- Non-recursive. Good.
pred    = (n (g h -> (h (g succ))) (const zero) id) -- O(N). Also, WTF.

Scott Encoding

O(N) pred is considered bad, and solved by the Scott Encoding, which represents data structures as their pattern-match. Here is 3 again:

0 s0 = (f x -> x)
1 s1 = (f x -> (f (f x -> x)))
2 s2 = (f x -> (f (f x -> (f (f x -> x)))))
3 s3 = (f x -> (f (f x -> (f (f x -> (f (f x -> x)))))))

With this representation, we get constant-time pattern-matching, but we need general recursion to implement foldr. That means we can’t implement foldNat in a total language, and no implementation of foldNat for Scott-encoded structures has a beta normal form.

pred    = (nat -> (nat (pred -> pred) _))                       -- O(1). Good.
foldNat = (s z nat -> (nat (pred -> (s (foldNat s z pred))) z)) -- Recursive. Also, WTF.
Parigot Encoding

To solve even this issue, the Parigot encoding is sometimes used. It works like a combination of the Church and Scott encodings:

0 p0 = (f x -> x)
1 p1 = (f x -> (f (f x -> (f x)) (f x -> x)))
2 p2 = (f x -> (f (f x -> (f (f x))) (f x -> (f (f x -> (f x)) (f x -> x)))))
3 p3 = (f x -> (f (f x -> (f (f (f x)))) (f x -> (f (f x -> (f (f x))) (f x -> (f (f x -> (f x)) (f x -> x)))))))

We can make it easier to read using church-number literals:

0 p0 = (f x -> x)
1 p1 = (f x -> (f c1 (f x -> x)))
2 p2 = (f x -> (f c2 (f c -> (f c1 (f x -> x)))))
3 p3 = (f x -> (f c3 (f x -> (f c2 (f x -> (f c1 (f x -> x)))))))

This allows us to see that it is just the Scott encoding with Church-numbers stored inbetween - or, if you prefer, a Scott-encoded list of Church-encoded naturals. This allows us to get pred in constant time (just take the tail of the list), and foldr without recursion (just take the head of the list).

pred    = (nat -> (nat (_ pred -> pred) _)) -- O(1). Good.
foldNat = (nat -> (nat (fold _ -> fold) _)) -- Non-recursive. Good.

But the Parigot Encoding has its own problem: terms take quadratic amount of space.

Alternative Encoding

There is another encoding which combines all good properties from the previous, but I’ve never seen on literature. This is it:

0 a0 = (f x -> x)
1 a1 = (f x -> (f (c -> (c f x)) (f x -> x)))
2 a2 = (f x -> (f (c -> (c f x)) (f x -> (f (c -> (c f x)) (f x -> x)))))
3 a3 = (f x -> (f (c -> (c f x)) (f x -> (f (c -> (c f x)) (f x -> (f (c -> (c f x)) (f x -> x)))))))

Like Parigot encoding, this uses a Scott-list. Instead of Church-numbers, we store continuators, (c -> (c f x)). Each continuator gives us fuel to fold to the next pred - all the way to zero, where there isn’t fuel anymore, so the term has to halt. Pred is still just tail (O(1)), and there is a non-recursive implementation of foldNat with a beta normal form.

foldNat = (s z nat -> (nat (cont pred -> (s (cont pred))) z)) -- Non-recursive. Good.
pred    = (nat -> (nat (_ pred -> pred) _))                   -- O(1). Good.

It can easily be extended to arbitrary data-types and, differently from the Parigot encoding, terms take linear space. That looks like a decent solution. Was this proposed? Is there any merit on that kind of encoding?

Copy link

VictorTaelin commented Jan 14, 2018

Copy link

cwgoes commented Dec 30, 2019

The "alternative encoding" looks almost identical to Stump's Mendler encoding; it seems to have the same "included continuator" structure, I suspect the differences mostly have to do with typing.

@zmactep - Have you tried to type the "alternative encoding"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment