Skip to content

Instantly share code, notes, and snippets.

@cjay
Last active February 27, 2019 20:01
Show Gist options
  • Save cjay/c15813c911609d355f8e280ee64b3e8f to your computer and use it in GitHub Desktop.
Save cjay/c15813c911609d355f8e280ee64b3e8f to your computer and use it in GitHub Desktop.

List folds in Haskell from a use case oriented perspective

Since the definitions in Data.Foldable are too generic, here are list-specific ones:

-- foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z []     = z
foldl f z (x:xs) = let z' = z `f` x
                   in foldl f z' xs

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f z []     = z
foldl' f z (x:xs) = let z' = z `f` x
                    in seq z' $ foldl' f z' xs

-- foldr f z [x1, x2, ..., xn] == x1 `f` (x2 `f` ... (xn `f` z)...)
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z []     = z
foldr f z (x:xs) = x `f` foldr f z xs

left fold

both strict and non-strict variants

  • Always walks the whole spine of the input list
  • Assuming no continuations are used: Can ignore payloads from the right side of the input list based on computation from the left
    • Actually can ignore intermediate payloads, but information flow for the ignore-decision can only be left-to-right
    • Ignore-decision can also be based on the initial accumulator and on more than one read list value
    • Ignoring list payloads can avoid computation if the list contains thunks
  • Information flow can actually be inverted with continuations, so left folds can also ignore payloads from the left side based on computation from the right

foldl (non-strict)

  • Can discard thunks resulting from the left side of the input list based on a read list value
    • If no continuations are used, this can only happen based on one single input list value, no other information
    • Builds a chains of thunks prior to each discarding that get garbage collected
    • Avoids the evaluation time of the discarded thunks that result from the left side of the input list
    • The initial accumulator value is never used when discarding (same for any intermediate accumulator value prior to the discarding, however neither the initial nor the intermediate values get evaluated)
    • When discarding, has to make up an accumulator value based on only the read list value (and nothing else, if no continuations are used) to be used for the computation on the rest of the list
    • If the evaluation time of the discarded thunks is not significant, foldl’ will probably still be faster
  • If it doesn’t discard thunks, it builds up a chain of thunks that in the end get evaluated anyway -> Using foldl’ when not discarding thunks is more efficient

foldl’ (strict)

  • Consumes the whole list (spine) without the time and memory overhead of the thunk chain
  • Probably is the most efficient fold if the accumulation function is sufficiently strict (if you have a choice between folds)

right fold

both strict and non-strict variants

  • Assuming no continuations are used: Can ignore payloads on the left side of the input list based on computation from the right
    • Actually can ignore intermediate payloads, but information flow for the ignore-decision can only be right-to-left
    • Ignore-decision can also be based on the initial accumulator and on more than one read list value
    • Ignoring list payloads can avoid computation if the list contains thunks
  • Information flow can actually be inverted with continuations, so right folds can also ignore payloads from the right side based on computation from the left

foldr (non-strict)

  • If the accumulation function is too strict, it builds a nested ‘call stack’ during evaluation that is as deep as the consumed part of the list spine
  • On the other hand, if the accumulation function is sufficiently lazy in its second arg, foldr doesn’t build up a call stack and is probably the most efficient fold (if you have a choice between folds). Good example is concat, which has quadratic complexity when implemented with left folds, and is O(n) with foldr.
  • Based on a read list value, can ignore the part of the input list to the right of that value
    • In other words, can abort consumption of the input list (both spine and payloads) after reading a prefix of it
    • Avoids computation if what comes after the prefix is a thunk (lazily generated list)
    • Without continuations: This can only happen based on one single input list value, no other information
    • The initial accumulator value is never used when aborting consumption
    • When aborting consumption, has to make up an accumulator value based on only the read list value (and nothing else) to be used for the computation on the consumed part of the list
  • the accumulation function can ‘pause’ and ‘communicate’ with the outside world by returning a continuation. See example section.

foldr’ (strict)

  • Can’t abort consumption
  • Always builds a nested ‘call stack’ during evaluation that is as deep as the length of the list
  • Maybe it is slightly more efficient than foldr when potential abortion isn’t needed?

general

  • Using foldr you can implement all three other variants
  • Neither of the four variants necessarily evaluate all of the list payloads
  • Without continuations:
    • Left fold can ignore input list payloads on the right, right fold can ignore input list payloads on the left
    • Ignoring input list payloads from the wrong side has no meaningful use
      • If it does ignore on the wrong side, then all input list payloads get ignored
      • The whole spine still gets consumed because left fold always does and right fold can only abort based on read input list payloads
      • The result can only be derived from the initial accumulator and the list length
      • right fold: It would be more efficient to compute the list length without building a nested ‘call stack’. Tail recursion!

Examples

Emulating foldl with foldr

From http://www.willamette.edu/~fruehr/haskell/evolution.html, section “leaned so far right he came back left again!”:

fac n = foldr (\x g n -> g (x*n)) id [1..n] 1

-- evaluation:
fac 3
= foldr (\x g n -> g (x*n)) id [1..3] 1
= (\n -> (foldr (\x g n -> g (x*n)) id [2..3]) (1*n)) 1 -- continuation gets 1 from outside
= (foldr (\x g n -> g (x*n)) id [2..3]) (1*1)           -- resumes folding as it pleases
= (\n -> (foldr (\x g n -> g (x*n)) id [3]) (2*n)) (1*1)
= (foldr (\x g n -> g (x*n)) id [3]) (2*(1*1))
= (\n -> id (3*n)) (2*(1*1))
= id (3*(2*(1*1)))
= 3*(2*(1*1)) -- making the n param strict avoids this buildup
= 3*(2*1)
= 3*2
= 6

This also demonstrates how foldr can make use of continuations.

foldr' :: (a -> b -> b) -> b -> t a -> b
foldr' f z0 xs = foldl f' id xs z0
  where f' k x z = k $! f x z

-- evaluation
foldr' (+) 0 [1..3]
= foldl f' id [1..3] 0
    where f' k x z = k $! f x z
= foldl f' (f' id 1) [2..3] 0
= foldl f' (f' (f' id 1) 2) [3] 0
= foldl f' (f' (f' (f' id 1) 2) 3) [] 0
= (f' (f' (f' id 1) 2) 3) 0
= f' (f' (f' id 1) 2) 3 0
= (f' (f' id 1) 2) $! 3 + 0
= f' (f' id 1) 2 3
= (f' id 1) $! 2 + 3
= f' id 1 5
= id $! 1 + 5
= id 6
= 6

Clearly foldr’ for lists can’t avoid building up a call stack. Defining it it terms of foldl’ instead of foldl would not help.

This is more interesting:

foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
  where f' x k z = k $! f z x

-- evaluation
foldl' (+) 0 [1..3]
= foldr f' id [1..3] 0
    where f' x k z = k $! f z x
= f' 1 (foldr f' id [2..3]) 0
= foldr f' id [2..3] $! 0 + 1
= foldr f' id [2..3] 1
= f' 2 (foldr f' id [3]) 1
= foldr f' id [3] $! 1 + 2
= foldr f' id [3] 3
= f' 3 (foldr f' id []) 3
= foldr f' id [] $! 3 + 3
= foldr f' id [] 6
= id 6
= 6

This also demonstrates how foldr can make use of continuations.

@cjay
Copy link
Author

cjay commented Jan 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment