Skip to content

Instantly share code, notes, and snippets.

@jonschoning
Created February 2, 2014 22:26
Show Gist options
  • Save jonschoning/8775918 to your computer and use it in GitHub Desktop.
Save jonschoning/8775918 to your computer and use it in GitHub Desktop.
{-
http://www.reddit.com/r/haskell/comments/cs54i/how_would_you_write_du_in_haskell/c0uvqqo
That pattern is called a "hylomorphism", and is one of the more common
recursion schemes. It can always be simplified into fold . unfold because
the mapped function can be pushed into the arguments on either side. For
pedagogical/clarity reasons it can be worth keeping it factored out, if you
so desire. (It can also be worth factoring out a natural transformation to
convert the functor you unfold into a different functor for folding, cf.
category-extras.)
Of course, the hylo function is pretty trivial, so I assume that can't be
what you're after. A hylomorphism exists for every recursive type, though
in this case you'd really want to make up your own simple type for
directory trees instead of abusing lists. And since hylomorphisms exhaust
the space of fold . unfold, if you can't do your other stuff with hylo then
you can't do it with just folds and unfolds. Catamorphisms are the simplest
refolds, and there are plenty of more complex recursion schemes which allow
you to do things like generating/destroying multiple ply at a time. I can't
say which scheme you want without knowing what you really want to do, but a
bunch of this stuff is coded up already in the category-extras library, so
you can look around there for inspiration.
-}
import Control.Applicative (pure, (<$>))
import Control.Monad ((<=<))
import qualified Data.Foldable as F
import qualified Data.Traversable as T
filesize :: String -> IO Integer
filesize = undefined
dirsize :: String -> IO Integer
dirsize = undefined
isdir :: String -> IO Bool
isdir = undefined
direntries :: String -> IO [String]
direntries = undefined
-- | Open-recursive form of a file-system
data FS r = File String | Dir String [r]
instance Functor FS where
fmap f (File x) = File x
fmap f (Dir x ys) = Dir x (map f ys)
instance F.Foldable FS where
-- Fill in optimized implementations for all functions, as desired
foldMap = T.foldMapDefault
instance T.Traversable FS where
-- Fill in optimized implementations for other functions, as desired
sequenceA (File x) = pure (File x)
sequenceA (Dir x ys) = (Dir x) <$> T.sequenceA ys
-- | Generic pure hylomorphism.
hylo :: (Functor f) => (a -> f a) -> (f b -> b) -> a -> b
hylo g f = f . fmap (hylo g f) . g
-- | Generic monadic hylomorphism.
hyloM :: (T.Traversable f, Monad m) => (a -> m (f a)) -> (f b -> m b) -> a -> m b
hyloM g f = f <=< T.mapM (hyloM g f) <=< g
du :: String -> IO Integer
du = hyloM getFiles sumFiles
getFiles :: String -> IO (FS String)
getFiles path = isdir path >>= \b ->
if b
then Dir path <$> direntries path
else return $ File path
sumFiles :: FS Integer -> IO Integer
sumFiles (File x) = filesize x
sumFiles (Dir x ys) = (sum ys +) <$> dirsize x
main :: IO ()
main = du "null" >> return ()
{-
To maximize performance we would specialize hylo and hyloM to f ~ FS so we
could inline fmap and mapM, and then we'd want to do a worker/wrapper transform
and add annotations so that hylo and hyloM are inlined at their use sites. It's
this latter inlining which enables fusing away the intermediate Fix FS.
Otherwise we'll still have to construct the intermediate structure, though
we'll do so one ply at a time and interleaved with destroying it one ply at a
time.
edit: The reason why the latter inlining is necessary to completely get rid of
the intermediate structure is that it's only "complete" functions like du which
can do fusion, because it's only when we bring together the functions that use
the constructors and the functions that match on them that the compiler can
perform that case analysis at compile-time. Lists are a special case because
they don't have branching recursion and they only have a single base case.
Because of this restricted structure it's easy to make the compiler smart
enough to see through the abstractions and glue the right parts together. And
since lists are so common, all this work has already been done in the
libraries. For less restricted data types it takes a bit more work, but it's
still doable (barring certain list-specific fusions).
-}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment