Skip to content

Instantly share code, notes, and snippets.

@VictorTaelin
Forked from zmactep/encodings.md
Created April 5, 2018 21:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save VictorTaelin/ae2327ddd9e03c162ef536d72459b6b7 to your computer and use it in GitHub Desktop.
Save VictorTaelin/ae2327ddd9e03c162ef536d72459b6b7 to your computer and use it in GitHub Desktop.
Number encodings

Alternative to the Church, Scott and Parigot encodings of data on the Lambda Calculus.

When it comes to encoding data on the pure λ-calculus (without complex extensions such as ADTs), there are 3 widely used approaches.

Church Encoding

The Church Encoding, which represents data structures as their folds. Using Caramel’s syntax, the natural number 3 is, for example. represented as:

0 c0 = (f x -> x)
1 c1 = (f x -> (f x))
2 c2 = (f x -> (f (f x)))
3 c3 = (f x -> (f (f (f x))))

The Church Encoding provide iteration, and is typeable in System F, which mean it can be used in a total language. Algorithms on Church-encoded data are strongly normalizing; writing foldNat is trivial. It has a major issue, though: pattern matching takes O(N) time, and so does pred, which isn’t trivial to write.

foldNat = (s z nat -> (nat s z))                    -- Non-recursive. Good.
pred    = (n (g h -> (h (g succ))) (const zero) id) -- O(N). Also, WTF.

Scott Encoding

O(N) pred is considered bad, and solved by the Scott Encoding, which represents data structures as their pattern-match. Here is 3 again:

0 s0 = (f x -> x)
1 s1 = (f x -> (f (f x -> x)))
2 s2 = (f x -> (f (f x -> (f (f x -> x)))))
3 s3 = (f x -> (f (f x -> (f (f x -> (f (f x -> x)))))))

With this representation, we get constant-time pattern-matching, but we need general recursion to implement foldr. That means we can’t implement foldNat in a total language, and no implementation of foldNat for Scott-encoded structures has a beta normal form.

pred    = (nat -> (nat (pred -> pred) _))                       -- O(1). Good.
foldNat = (s z nat -> (nat (pred -> (s (foldNat s z pred))) z)) -- Recursive. Also, WTF.
Parigot Encoding

To solve even this issue, the Parigot encoding is sometimes used. It works like a combination of the Church and Scott encodings:

0 p0 = (f x -> x)
1 p1 = (f x -> (f (f x -> (f x)) (f x -> x)))
2 p2 = (f x -> (f (f x -> (f (f x))) (f x -> (f (f x -> (f x)) (f x -> x)))))
3 p3 = (f x -> (f (f x -> (f (f (f x)))) (f x -> (f (f x -> (f (f x))) (f x -> (f (f x -> (f x)) (f x -> x)))))))

We can make it easier to read using church-number literals:

0 p0 = (f x -> x)
1 p1 = (f x -> (f c1 (f x -> x)))
2 p2 = (f x -> (f c2 (f c -> (f c1 (f x -> x)))))
3 p3 = (f x -> (f c3 (f x -> (f c2 (f x -> (f c1 (f x -> x)))))))

This allows us to see that it is just the Scott encoding with Church-numbers stored inbetween - or, if you prefer, a Scott-encoded list of Church-encoded naturals. This allows us to get pred in constant time (just take the tail of the list), and foldr without recursion (just take the head of the list).

pred    = (nat -> (nat (_ pred -> pred) _)) -- O(1). Good.
foldNat = (nat -> (nat (fold _ -> fold) _)) -- Non-recursive. Good.

But the Parigot Encoding has its own problem: terms take quadratic amount of space.

Alternative Encoding

There is another encoding which combines all good properties from the previous, but I’ve never seen on literature. This is it:

0 a0 = (f x -> x)
1 a1 = (f x -> (f (c -> (c f x)) (f x -> x)))
2 a2 = (f x -> (f (c -> (c f x)) (f x -> (f (c -> (c f x)) (f x -> x)))))
3 a3 = (f x -> (f (c -> (c f x)) (f x -> (f (c -> (c f x)) (f x -> (f (c -> (c f x)) (f x -> x)))))))

Like Parigot encoding, this uses a Scott-list. Instead of Church-numbers, we store continuators, (c -> (c f x)). Each continuator gives us fuel to fold to the next pred - all the way to zero, where there isn’t fuel anymore, so the term has to halt. Pred is still just tail (O(1)), and there is a non-recursive implementation of foldNat with a beta normal form.

foldNat = (s z nat -> (nat (cont pred -> (s (cont pred))) z)) -- Non-recursive. Good.
pred    = (nat -> (nat (_ pred -> pred) _))                   -- O(1). Good.

It can easily be extended to arbitrary data-types and, differently from the Parigot encoding, terms take linear space. That looks like a decent solution. Was this proposed? Is there any merit on that kind of encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment