Instantly share code, notes, and snippets.

# VictorTaelin/simple_fast_functional_sieve.md

Last active July 2, 2024 19:15
Show Gist options
• Save VictorTaelin/a5571afaf5ee565689d2b9a981bd9df8 to your computer and use it in GitHub Desktop.
Can a simple functional sieve be fast? Optimizing Tromp's algorithm on HVM.

## Can a simple functional sieve be fast? Optimizing Tromp's algorithm on HVM.

Today, John Tromp - creator of the Binary λ-Calculus, and one of the smartest functional wizards alive - ported his famous prime number generator (first published 12 years ago!) to HVM:

```B0     = λx λy x
B1     = λx λy y
cons   = λx λy λz (z x y)
cons0  = λy λz (z B0 y)
Y      = λf ((λx (x x)) (λx (f (x x))))
S0     = λcont  λx λxs (cons0  (xs cont))
Ssucc  = λSn λc λx λxs (cons x (xs (Sn c)))
ap     = λx λy λz (x z (y z))
sieve  = λSn (cons B1 (ap sieve Y (Ssucc Sn)))
primes = (cons0 (cons0 (sieve S0)))
C2     = λf λx (f (f x))
C5     = λf λx (f (f (f (f (f x)))))
C10    = λf (C2 (C5 f))
C1K    = (C10 C2)
C3     = λf λx (f (f (f x)))
C4     = λf λx (f (f (f (f x))))
C8     = (C3 C2)
C256   = (C8 C2)
C12    = λf (C3 (C4 f))
C4K    = (C12 C2)
Take   = λcont λx λxs (SCons (x '0' '1') (xs cont))
Main   = (primes (C1K Take (B0 (B0 SNil))))```

His algorithm is remarkable for being extremely elegant and small - possibly, the smallest of all! Sadly, it isn't so fast. To generate the `N`th bit, a "siever" has to be applied `N` times to it, which makes its complexity quadratic. So, the question is: can we take advantage of HVM's optimality, to gain an asymptotical speedup on this elegant algorithm? My first intuition was to isolate the simplest siever (i.e., `siever(1)`), and measure its iterated complexity. As expected, it was linear; applying 1 million sievers had a cost of 30,000,040 interactions.

```App   = λn λf λx match n { 0: x; +: (App (- n 1) f (f x)) } // applies a function N times
Y     = λf (f (Y f)) // y-combinator
B0    = λb0 λb1 b0 // bit 0
B1    = λb0 λb1 b1 // bit 1
cons  = λx λxs λcons (cons x xs) // cons on infinite lists
head  = λxs (xs λx λxs x) // gets the first bit
ones  = λc(c B1 ones) // infinite bitstring with 1's
S0    = λcont λx λxs (cons B0 (xs cont)) // the first siever (i.e., siever(1))
apply = λs λx (x (Y s)) // applies a siever
(F n) = (head (App n (apply S0) ones)) // applies siever(1) N times to [1..], and gets the first bit
Main  = (F 1000000) // COST: 30,000,040 interactions```

Note that, on HVM, each interaction (also called "rewrite") is a constant-time operation that takes just a few ASM instructions. As such, it is a precise measure of complexity. That is, instead of measuring time and using the imprecise `big-O` notation (`siever1^N = O(N)`), we can measure and write it in terms of interactions: `siever1^N = 30*N + 40 interactions`. Below are some specific measures:

``````// F(100) =     3040 interactions
// F(200) =     6040 interactions
// F(300) =     9040 interactions
// F(400) =    12040 interactions
// F(500) =    15040 interactions
// F(600) =    18040 interactions
// F(700) =    21040 interactions
// F(800) =    24040 interactions
// F(900) =    27040 interactions
// F(1k)  =    30040 interactions
// F(1kk) = 30000040 interactions
``````

Thankfully, HVM as a clever way to perform repeated application in logarithmic time: fusion! To obtain a fusible siever, I applied 3 transformations:

1. Perform repeated application by squared self-composition.

2. Flip the siever, so that it is a applied to the list (not the other way around).

3. Lift the cons lambda inside the siever, allowing it to be shared.

I then ran the same computation and - wow - it completed in just 1012 interactions! Now, that's an optimization. By merely fusing the siever, we got a massive 29644x speedup. Asymptotics!

```App   = λn λf λx match n { 0: x; +: (App (/ n 2) λx(f (f x)) (match m = (% n 2) { 0: x; +: (f x) })) } // applies a function N times
Y     = λf (f (Y f)) // y-combinator
B0    = λb0 λb1 b0 // bit 0
B1    = λb0 λb1 b1 // bit 1
cons  = λx λxs λcons (cons x xs) // cons on infinite lists
head  = λxs (xs λx λxs x) // gets the first bit
ones  = λc(c B1 ones) // infinite bitstring with 1's
S0    = λcont λbs λcons (bs λxλxs(cons B0 (cont xs))) // the first siever (i.e., siever(1))
apply = λs λx ((Y s) x) // applies a siever
(F n) = (head (App n (Y S0) ones)) // applies siever(1) N times to [1..], and gets the first bit
Main  = (F 1000000) // COST: 1012```

Below are some updated measures:

``````// F(1)   =  381 interactions
// F(2)   =  428 interactions
// F(3)   =  480 interactions
// F(4)   =  475 interactions
// F(5)   =  490 interactions
// F(6)   =  527 interactions
// F(7)   =  537 interactions
// F(8)   =  522 interactions
// F(9)   =  527 interactions
// F(1k)  =  537 interactions
// F(1kk) = 1012 interactions
``````

So, can this trick be used to create a simple and fast pure functional prime sieve? Probably! But one would need to carefully apply this technique to fuse arbitrary sievers during the sieve generation. Tricky, yes; but doable. So, a better question is: who will be the first one to develop it? Could be you!

Note that, as of today (29 Jan 2024), HVM's production-ready version isn't officially available yet, but you can already test the code above by installing the `lazy_mode` branch of HVM-Lang, with the `-L` flag.

This discussion is taking place right now on HOC's Discord:

https://discord.HigherOrderCO.com/ (#hvm channel)

### VictorTaelin commented Jan 29, 2024 • edited Loading

Here is a more HVM-idiomatic port of Tromp's sieve (I wrote it to help me understand his algorithm):

```Y = λf (f (Y f))

// applies a function N times
App = λn λf λx match n {
0: x
+: (App n-1 f (f x))
}

B0 = λb0 λb1 b0
B1 = λb0 λb1 b1

cons = λx λxs λcons (cons x xs)
skip = λbs λf (bs λxλxs(cons B0 (f xs)))
keep = λbs λf (bs λxλxs(cons x  (f xs)))

// Sievers
S0 =   λfλp(skip p f) // siever(1)
SS = λsλfλp(keep p (s f)) // ∀n. siever(n) -> siever(n+1)

sieve  = λs((Y (SS s)) (cons B1 (sieve (SS s))))
primes = (sieve S0)

Take = λn λxs match n {
0: SNil
+: (xs λxλxs (SCons (x '0' '1') (Take n-1 xs)))
}

Main = (Take 200 primes)```

to join this conversation on GitHub. Already have an account? Sign in to comment