-
-
Save Tener/1224319 to your computer and use it in GitHub Desktop.
{- | |
ghc --make -Odph -fllvm -optlo-globalopt -optlo-loop-unswitch -optlo-mem2reg -optlo-prune-eh -O2 -fllvm -optl-O3 -rtsopts lookandsay.hs | |
time ./lookandsay > /dev/null | |
./lookandsay > /dev/null 6,02s user 0,24s system 99% cpu 6,284 total | |
-} | |
import System.IO | |
import Control.DeepSeq | |
data S = S1 | S2 | S3 -- deriving (Eq,Show,Read,Ord,Enum) | |
instance NFData S where | |
look_and_say :: [[S]] | |
look_and_say = iterate carryOn [S1] | |
carryOn :: [S] -> [S] | |
carryOn (S1 : S1 : S1 : rest) = S3 : S1 : carryOn rest | |
carryOn (S1 : S1 : rest) = S2 : S1 : carryOn rest | |
carryOn (S1 : rest) = S1 : S1 : carryOn rest | |
carryOn (S2 : S2 : S2 : rest) = S3 : S2 : carryOn rest | |
carryOn (S2 : S2 : rest) = S2 : S2 : carryOn rest | |
carryOn (S2 : rest) = S1 : S2 : carryOn rest | |
carryOn (S3 : S3 : S3 : rest) = S3 : S3 : carryOn rest | |
carryOn (S3 : S3 : rest) = S2 : S3 : carryOn rest | |
carryOn (S3 : rest) = S1 : S3 : carryOn rest | |
carryOn [] = [] | |
{-# INLINE render #-} | |
render (x,y) = do | |
let r S1 = '1' | |
r S2 = '2' | |
r S3 = '3' | |
fi :: S -> Int | |
fi S1 = 1 | |
fi S2 = 2 | |
fi S3 = 3 | |
-- hPrint stderr (deepseq y x) | |
hPrint stderr (x, sum $ map fi y) | |
print (x,map r y) | |
-- print (deepseq y x) | |
look_and_say' :: [[S]] | |
look_and_say' = take n1 look_and_say | |
main :: IO () | |
main = do | |
mapM_ render (zip [1..n1] look_and_say') | |
-- print $ look_and_say !! n2 | |
n1 = 60 |
The difference in performance is huge, and hence hard to measure accurately. The slow algorithm takes a lot of time to finish printing the first 50 items. With fast it's almost instant. Furthermore the fast one is actually GC-bound. If you provide the right RTS flags it runs even faster.
The actual numbers:
slow - 117.34 sec.
above, fast - 0.40 sec.
So generallly it's like 300x faster.
Do you know why this is so much faster? I pattern matching faster than using length
? Or is it the absence of show
and read
that does the speed difference?
I think your analysis is quite right. The pattern matching is indeed very fast, while length, show and read are understandably quite slow. I said "quite efficient" because I think one can do even better. For example C++ version is bound to be faster.
Have you run any tests to compare the speeds of the "slow" and "quite efficient" implementations?