OK, so here's a short progression of implementations for comparison, for clarity:
{-# LANGUAGE ViewPatterns #-}
import Data.Array.Unboxed
import Data.List (tails, inits)
import qualified Data.List.Ordered as O -- package 'data-ordlist'
(\\) = O.minus
tr = sv [2..]
where
sv (p:t) = [p] ++ sv [n | n <- t, rem n p > 0]
Turner's sieve is the classic, but it is trial division, and it is non-postponed (more below).
er = sv [2..]
where
sv (p:t) = [p] ++ sv (t \\ [p, p+p..]) -- (t \\ map (p*) [p..])
This basic sieve of Eratosthenes (SoE) is not postponed either. Generating the multiples to be removed starting from p*p
instead of just from p
is a step in the right direction but not much of an improvement, because the removals are still being started way to early, so there'll be way too many of them, just as in the Turner's sieve.
eu = sv [2..]
where
sv (p:t) = [p] ++ sv (t \\ map (p*) (p:t))
Euler's scheme is a detour, arranging to avoid SoE's duplicate composites production, like 5*9 == 3*15
. It comes with a great price of making the postponement and thus segmentation impossible, because the multiples on each sieving step are not generated independently of each other, as it was before.
pe = 2 : sv [[p*p, p*p+p..] | p <- pe] [3..]
where
sv ((q:cs):r) (span (< q) -> (h,_:t)) = h ++ sv r (t \\ cs)
At last, Postponed Eratosthenes starts the removals just at the right moment, so there's no more quadratic explosion in their numbers as it was in the non-postponed variants.
br = 2 : [3..] \\ foldr (\p -> (p*p :) . O.union [p*p+p, p*p+2*p..]) [] br
Bird's sieve is also a classic.
= 2 : [3..] \\ foldr (\(x:xs) -> (x:) . O.union xs) [] [[p*p, p*p+p..] | p <- br]
= (2 :) . ([3..] \\) . foldr (\(x:xs) -> (x:) . O.union xs) []
. map (\p -> [p*p, p*p+p..]) $ br
= fix $ (2 :) . ([3..] \\) . O.unionAll . map (\p -> [p*p, p*p+p..])
tf = (2:) . _Y $ (3 :) . ([5,7..] \\) . O.unionAll . map (\p -> [p*p, p*p+2*p..])
_Y g = g (_Y g)
Folding the multiples in a tree is much better, provides nice logarithmic advantage in complexity.
se = 2 : [n | (r:q:_, px) <- (zip . tails . (2:) . map (^2)) se (inits se),
(n,True) <- assocs ( accumArray (const id) True (r+1,q-1)
[(m,False) | p <- px, s <- [ (r+p)`div`p * p ],
m <- [s,s+p..q-1]] :: UArray Int Bool )]
Segmentwise Eratosthenes is the same algorithm, only "transposed", making it much more of a natural fit to using arrays, for much much improved speed.
Wheel factorization is an orthogonal optimization.
Another is using _Y g = g (_Y g)
instead of fix
to express the data recursion (re: se = .... se ....
) creating a (short) tower of independent primes producers self-arranged by the repeated squaring scheme feeding each into the one above it, to minimize the overall memory retention.