cjay/folds.org

## folds.org

      
    Raw
  

              folds.org
            
          
    List folds in Haskell from a use case oriented perspective

Since the definitions in Data.Foldable are too generic, here are list-specific ones:
-- foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z []     = z
foldl f z (x:xs) = let z' = z `f` x
                   in foldl f z' xs

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f z []     = z
foldl' f z (x:xs) = let z' = z `f` x
                    in seq z' $ foldl' f z' xs

-- foldr f z [x1, x2, ..., xn] == x1 `f` (x2 `f` ... (xn `f` z)...)
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z []     = z
foldr f z (x:xs) = x `f` foldr f z xs
left fold

both strict and non-strict variants


  Always walks the whole spine of the input list
  Assuming no continuations are used: Can ignore payloads from the right side of
    the input list based on computation from the left
    
      Actually can ignore intermediate payloads, but information flow for the ignore-decision can only be
        left-to-right
      Ignore-decision can also be based on the initial accumulator and on more than one read list value
      Ignoring list payloads can avoid computation if the list contains thunks
    
  
  Information flow can actually be inverted with continuations, so left folds can also
    ignore payloads from the left side based on computation from the right

foldl (non-strict)


  Can discard thunks resulting from the left side of the input list based on a read list value
    
      If no continuations are used, this can only happen based on one single input list value, no other information
      Builds a chains of thunks prior to each discarding that get garbage collected
      Avoids the evaluation time of the discarded thunks that result from the left side of the input list
      The initial accumulator value is never used when discarding (same for any intermediate accumulator value
        prior to the discarding, however neither the initial nor the intermediate values get evaluated)
      When discarding, has to make up an accumulator value based on only the read list value (and nothing else, if no continuations are used)
        to be used for the computation on the rest of the list
      If the evaluation time of the discarded thunks is not significant, foldl’ will probably still be faster
    
  
  If it doesn’t discard thunks, it builds up a chain of thunks that in the end get evaluated anyway
    -> Using foldl’ when not discarding thunks is more efficient

foldl’ (strict)


  Consumes the whole list (spine) without the time and memory overhead of the thunk chain
  Probably is the most efficient fold if the accumulation function is sufficiently strict (if you have a choice
    between folds)

right fold

both strict and non-strict variants


  Assuming no continuations are used: Can ignore payloads on the left side of
    the input list based on computation from the right
    
      Actually can ignore intermediate payloads, but information flow for the ignore-decision can only be
        right-to-left
      Ignore-decision can also be based on the initial accumulator and on more than one read list value
      Ignoring list payloads can avoid computation if the list contains thunks
    
  
  Information flow can actually be inverted with continuations, so right folds can also
    ignore payloads from the right side based on computation from the left

foldr (non-strict)


  If the accumulation function is too strict, it builds a nested ‘call stack’
    during evaluation that is as deep as the consumed part of the list spine
  On the other hand, if the accumulation function is sufficiently lazy in its
    second arg, foldr doesn’t build up a call stack and is probably the most
    efficient fold (if you have a choice between folds). Good example is concat,
    which has quadratic complexity when implemented with left folds, and is O(n)
    with foldr.
  Based on a read list value, can ignore the part of the input list to the right of that value
    
      In other words, can abort consumption of the input list (both spine and payloads) after reading a prefix
        of it
      Avoids computation if what comes after the prefix is a thunk (lazily generated list)
      Without continuations: This can only happen based on one single input list value, no other information
      The initial accumulator value is never used when aborting consumption
      When aborting consumption, has to make up an accumulator value based on only the read list value (and
        nothing else) to be used for the computation on the consumed part of the list
    
  
  the accumulation function can ‘pause’ and ‘communicate’ with the outside world
    by returning a continuation. See example section.

foldr’ (strict)


  Can’t abort consumption
  Always builds a nested ‘call stack’ during evaluation that is as deep as the length of the list
  Maybe it is slightly more efficient than foldr when potential abortion isn’t needed?

general


  Using foldr you can implement all three other variants
  Neither of the four variants necessarily evaluate all of the list payloads
  Without continuations:
    
      Left fold can ignore input list payloads on the right,
        right fold can ignore input list payloads on the left
      Ignoring input list payloads from the wrong side has no meaningful use
        
          If it does ignore on the wrong side, then all input list payloads get ignored
          The whole spine still gets consumed because left fold always does and right fold can only abort based on
            read input list payloads
          The result can only be derived from the initial accumulator and the list length
          right fold: It would be more efficient to compute the list length without building a nested ‘call stack’.
            Tail recursion!
        
      
Examples

Emulating foldl with foldr

From http://www.willamette.edu/~fruehr/haskell/evolution.html, section “leaned
  so far right he came back left again!”:
fac n = foldr (\x g n -> g (x*n)) id [1..n] 1

-- evaluation:
fac 3
= foldr (\x g n -> g (x*n)) id [1..3] 1
= (\n -> (foldr (\x g n -> g (x*n)) id [2..3]) (1*n)) 1 -- continuation gets 1 from outside
= (foldr (\x g n -> g (x*n)) id [2..3]) (1*1)           -- resumes folding as it pleases
= (\n -> (foldr (\x g n -> g (x*n)) id [3]) (2*n)) (1*1)
= (foldr (\x g n -> g (x*n)) id [3]) (2*(1*1))
= (\n -> id (3*n)) (2*(1*1))
= id (3*(2*(1*1)))
= 3*(2*(1*1)) -- making the n param strict avoids this buildup
= 3*(2*1)
= 3*2
= 6
This also demonstrates how foldr can make use of continuations.
From http://hackage.haskell.org/package/base-4.12.0.0/docs/src/Data.Foldable.html

foldr' :: (a -> b -> b) -> b -> t a -> b
foldr' f z0 xs = foldl f' id xs z0
  where f' k x z = k $! f x z

-- evaluation
foldr' (+) 0 [1..3]
= foldl f' id [1..3] 0
    where f' k x z = k $! f x z
= foldl f' (f' id 1) [2..3] 0
= foldl f' (f' (f' id 1) 2) [3] 0
= foldl f' (f' (f' (f' id 1) 2) 3) [] 0
= (f' (f' (f' id 1) 2) 3) 0
= f' (f' (f' id 1) 2) 3 0
= (f' (f' id 1) 2) $! 3 + 0
= f' (f' id 1) 2 3
= (f' id 1) $! 2 + 3
= f' id 1 5
= id $! 1 + 5
= id 6
= 6
Clearly foldr’ for lists can’t avoid building up a call stack. Defining it it
  terms of foldl’ instead of foldl would not help.
This is more interesting:
foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
  where f' x k z = k $! f z x

-- evaluation
foldl' (+) 0 [1..3]
= foldr f' id [1..3] 0
    where f' x k z = k $! f z x
= f' 1 (foldr f' id [2..3]) 0
= foldr f' id [2..3] $! 0 + 1
= foldr f' id [2..3] 1
= f' 2 (foldr f' id [3]) 1
= foldr f' id [3] $! 1 + 2
= foldr f' id [3] 3
= f' 3 (foldr f' id []) 3
= foldr f' id [] $! 3 + 3
= foldr f' id [] 6
= id 6
= 6
This also demonstrates how foldr can make use of continuations.