Created
February 2, 2014 22:26
-
-
Save jonschoning/8775918 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{- | |
http://www.reddit.com/r/haskell/comments/cs54i/how_would_you_write_du_in_haskell/c0uvqqo | |
That pattern is called a "hylomorphism", and is one of the more common | |
recursion schemes. It can always be simplified into fold . unfold because | |
the mapped function can be pushed into the arguments on either side. For | |
pedagogical/clarity reasons it can be worth keeping it factored out, if you | |
so desire. (It can also be worth factoring out a natural transformation to | |
convert the functor you unfold into a different functor for folding, cf. | |
category-extras.) | |
Of course, the hylo function is pretty trivial, so I assume that can't be | |
what you're after. A hylomorphism exists for every recursive type, though | |
in this case you'd really want to make up your own simple type for | |
directory trees instead of abusing lists. And since hylomorphisms exhaust | |
the space of fold . unfold, if you can't do your other stuff with hylo then | |
you can't do it with just folds and unfolds. Catamorphisms are the simplest | |
refolds, and there are plenty of more complex recursion schemes which allow | |
you to do things like generating/destroying multiple ply at a time. I can't | |
say which scheme you want without knowing what you really want to do, but a | |
bunch of this stuff is coded up already in the category-extras library, so | |
you can look around there for inspiration. | |
-} | |
import Control.Applicative (pure, (<$>)) | |
import Control.Monad ((<=<)) | |
import qualified Data.Foldable as F | |
import qualified Data.Traversable as T | |
filesize :: String -> IO Integer | |
filesize = undefined | |
dirsize :: String -> IO Integer | |
dirsize = undefined | |
isdir :: String -> IO Bool | |
isdir = undefined | |
direntries :: String -> IO [String] | |
direntries = undefined | |
-- | Open-recursive form of a file-system | |
data FS r = File String | Dir String [r] | |
instance Functor FS where | |
fmap f (File x) = File x | |
fmap f (Dir x ys) = Dir x (map f ys) | |
instance F.Foldable FS where | |
-- Fill in optimized implementations for all functions, as desired | |
foldMap = T.foldMapDefault | |
instance T.Traversable FS where | |
-- Fill in optimized implementations for other functions, as desired | |
sequenceA (File x) = pure (File x) | |
sequenceA (Dir x ys) = (Dir x) <$> T.sequenceA ys | |
-- | Generic pure hylomorphism. | |
hylo :: (Functor f) => (a -> f a) -> (f b -> b) -> a -> b | |
hylo g f = f . fmap (hylo g f) . g | |
-- | Generic monadic hylomorphism. | |
hyloM :: (T.Traversable f, Monad m) => (a -> m (f a)) -> (f b -> m b) -> a -> m b | |
hyloM g f = f <=< T.mapM (hyloM g f) <=< g | |
du :: String -> IO Integer | |
du = hyloM getFiles sumFiles | |
getFiles :: String -> IO (FS String) | |
getFiles path = isdir path >>= \b -> | |
if b | |
then Dir path <$> direntries path | |
else return $ File path | |
sumFiles :: FS Integer -> IO Integer | |
sumFiles (File x) = filesize x | |
sumFiles (Dir x ys) = (sum ys +) <$> dirsize x | |
main :: IO () | |
main = du "null" >> return () | |
{- | |
To maximize performance we would specialize hylo and hyloM to f ~ FS so we | |
could inline fmap and mapM, and then we'd want to do a worker/wrapper transform | |
and add annotations so that hylo and hyloM are inlined at their use sites. It's | |
this latter inlining which enables fusing away the intermediate Fix FS. | |
Otherwise we'll still have to construct the intermediate structure, though | |
we'll do so one ply at a time and interleaved with destroying it one ply at a | |
time. | |
edit: The reason why the latter inlining is necessary to completely get rid of | |
the intermediate structure is that it's only "complete" functions like du which | |
can do fusion, because it's only when we bring together the functions that use | |
the constructors and the functions that match on them that the compiler can | |
perform that case analysis at compile-time. Lists are a special case because | |
they don't have branching recursion and they only have a single base case. | |
Because of this restricted structure it's easy to make the compiler smart | |
enough to see through the abstractions and glue the right parts together. And | |
since lists are so common, all this work has already been done in the | |
libraries. For less restricted data types it takes a bit more work, but it's | |
still doable (barring certain list-specific fusions). | |
-} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment