Saturday, August 10, 2013

foldl-1.0.0: Composable, streaming, and efficient left folds

I'm releasing the foldl library, which builds upon this previous post of mine. This library lets you build and combine multiple folds that traverse streams in constant memory.

I will begin using the same example from the previous post: computing an average. The naive version would be written like this:
average :: [Int] -> Int
average xs = sum xs `div` length xs
This does not work well because it:
  • loads the entire list into memory, and:
  • cannot fuse away the intermediate list.
We can improve on this using foldl where we will compute the average by combining a sum fold with a length fold, using Applicative style:
import Control.Applicative
import qualified Control.Foldl as L

average :: L.Fold Int Int
average = div <$> L.sum <*> L.length
All folds built this way have two nice properties which foldl guarantees:
  • They only traverse the list once
  • They will not leak space
We can use fold to apply this combined fold to the list:
main = print $ L.fold average [0..100000000]
This gives good efficiency at about 1 nanosecond per list element because it fuses away the intermediate list:

real 0m0.956s
user 0m0.952s
sys 0m0.000s
We can see this if we inspect the core, too, which produces this reasonably tight loop (which I've cleaned up):
$wgo =
  \ (x :: Int#)
    (y :: Int#)      -- y is keeping tally of the sum
    (z :: Int#) ->   -- z is keeping tally of the length
    case x of x' {
      __DEFAULT ->
          (x' +# 1 )
          (y  +# x')
          (z  +# 1 );
      1000000000 ->
        (# I# (y +# 1000000000),
           I# (z +# 1) #)
This works because fold is implemented in terms of foldr to trigger build/foldr fusion. However, not all lists will fuse this way. I find, for example, that lists of Doubles refuse to fuse in this way:
import Control.Applicative
import qualified Control.Foldl as L

-- This gives worse performance: ~40 ns per step
average :: L.Fold Double Double
average = (/) <$> L.sum <*> L.genericLength

main = print $ L.fold average [0..1000000000]
I haven't yet figured out how to trick GHC into fusing away these lists. If anybody knows how to do this, please contribute a patch.

Fortunately, these folds preserve the original accumulator, step function, and extraction function, so you can always unpack the Fold type and then implement the list fusion yourself:
main = case average of
    L.Fold step start done -> do
        let go n x =
                if (n > 1000000000)
                then x
                else go (n + 1) $! step x n
        print $ done (go 0 start)
So while reliable list fusion is still an unsolved problem, you can still use foldl to get reasonable performance and guarantee no space leaks. Also, there is always the option of using foldl to do the work of assembling derived strict and streaming folds and then implementing list fusion yourself if you want to squeak out every last drop of performance.

I wrote up this library to dispel the myth that only experts can fold things efficiently in Haskell. foldl lets you begin from simple primitive folds that you can fit in your head and then chain them together into complex folds with little effort.


  1. I'm surprised this isn't done in terms of your pipes library. Couldn't it be massively generalise to that framework?

    1. It's actually *more* general to do it this way. The `Fold` type can be unpackaged and used within any library that needs strict left folds.

      For example, right now `pipes` has this function:

      foldl :: (Monad m) => (x -> a -> x) -> x -> (x -> b) -> Producer a m r -> m b

      You can use `Fold`s to fold `pipes` just by unpackaging the type and passing the fields to that `foldl` command. I was considering making `foldl` do the unpacking automatically but I've currently chosen not to in order to avoid the dependency.

      You can imagine that any library (like `conduit` or `machines`) can reuse `Fold` in the exact same way. It's very reusable.

      The reason that this `Fold` type is more general is that it preserves the minimal information necessary to encode the fold without coupling it to any framework. Also, it wouldn't work anyway if you specialized it to `pipes`. You'd lose the information necessary to combine folds if you did that.

  2. I think your naive version of `average` misses a `xs` as its parameter. ;)