-
-
Save oisdk/0822477aaced58a5ba937c3d11c19639 to your computer and use it in GitHub Desktop.
import Data.List (unfoldr, partition) | |
import Data.Maybe (catMaybes) | |
import Criterion.Main (defaultMain, env, bgroup, bench, nf) | |
import System.Random (randomIO) | |
import Control.Monad (replicateM) | |
groupOn :: Eq k => (a -> k) -> [a] -> [(k, [a])] | |
groupOn k = unfoldr f . map (\x -> (k x, x)) | |
where | |
f [] = Nothing | |
f ((k,x):xs) = Just ((k , x : map snd ys), zs) | |
where | |
(ys,zs) = partition ((k==) . fst) xs | |
groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])] | |
groupOnOrd k = catMaybes . go . map (\x -> (k x, x)) | |
where | |
go [] = [] | |
go ((k,x):xs) = Just (k, x : e) : merge m (go l) (go g) | |
where | |
(e, m, l, g) = foldr split ([],[],[],[]) xs | |
split ky@(k',y) ~(e, m, l, g) = case compare k' k of | |
LT -> ( e, LT : m, ky : l, g) | |
EQ -> (y:e, EQ : m, l, g) | |
GT -> ( e, GT : m, l, ky : g) | |
merge [] lt gt = [] | |
merge (EQ : xs) lt gt = Nothing : merge xs lt gt | |
merge (LT : xs) (l:lt) gt = l : merge xs lt gt | |
merge (GT : xs) lt (g:gt) = g : merge xs lt gt | |
main = | |
defaultMain | |
[ env (replicateM m randomIO) $ \xs -> | |
bgroup (show m) | |
( | |
[ bgroup "id" | |
[ bench "groupOn" $ nf (groupOn id) xs | |
, bench "groupOnOrd" $ nf (groupOnOrd id) xs | |
] | |
] ++ | |
[ bgroup (show (n :: Word)) | |
[ bench "groupOn" $ nf (groupOn (`rem` n)) xs | |
, bench "groupOnOrd" $ nf (groupOnOrd (`rem` n)) xs | |
] | |
| n <- [2,3,100,1000], n < toEnum m ] | |
) | |
| p <- [2,3,4], let m = 10 ^ p ] |
The maybe isn't there to improve laziness really, it's there to make sure that the lists match up when zipped back together.
split 2 [1,2,3,1,2,3] ~= ([2,2],[Just False, Nothing, Just True, Just False, Nothing, Just True],[1,1],[3,3])
When reconstructing the list at the end you need to know when to not pull an element from either list, and you need to preserve the Nothing
s in the generated list at each level so the zipped lists match up in length.
Although it probably is clearer to use Ord
rather than Maybe Bool
But Nothing
only happens when the elements are equal, in which case there is nothing to match up, so it just get's thrown away immediately. Am I missing something?
Although it probably is clearer to use
Ord
rather thanMaybe Bool
Yes, that does look a lot nicer!
Riight, I see now. It's the recursive case where it breaks down, since if the child calls would remove an element of the list, it would no longer line up for the parent. Thanks for clearing it up!
I don’t think the
Maybe
actually does anything to improve laziness, since it’s immediately consumed by thecatMaybes
at the end. Both of the uses ofNothing
could be replaced with literally nothing and not change the behavior. It is fine to be strict in the equals-case of f since the user already requested an element.You could also replace the list of bools with having both of the split lists be
Maybe
, which is kind of a similar strategy to streaming libraries having aSkip
token. Edit: or wait, no, that would probably not get good asymptotics, since those would be traversed multiple times recursively.