## Saturday, August 10, 2013

### foldl-1.0.0: Composable, streaming, and efficient left folds

I'm releasing the foldl library, which builds upon this previous post of mine. This library lets you build and combine multiple folds that traverse streams in constant memory.

I will begin using the same example from the previous post: computing an average. The naive version would be written like this:
```average :: [Int] -> Int
average xs = sum xs `div` length xs
```
This does not work well because it:
• loads the entire list into memory, and:
• cannot fuse away the intermediate list.
We can improve on this using foldl where we will compute the average by combining a sum fold with a length fold, using Applicative style:
```import Control.Applicative
import qualified Control.Foldl as L

average :: L.Fold Int Int
average = div <\$> L.sum <*> L.length
```
All folds built this way have two nice properties which foldl guarantees:
• They only traverse the list once
• They will not leak space
We can use fold to apply this combined fold to the list:
```main = print \$ L.fold average [0..100000000]
```
This gives good efficiency at about 1 nanosecond per list element because it fuses away the intermediate list:
```500000000

real 0m0.956s
user 0m0.952s
sys 0m0.000s
```
We can see this if we inspect the core, too, which produces this reasonably tight loop (which I've cleaned up):
```\$wgo =
\ (x :: Int#)
(y :: Int#)      -- y is keeping tally of the sum
(z :: Int#) ->   -- z is keeping tally of the length
case x of x' {
__DEFAULT ->
\$wgo
(x' +# 1 )
(y  +# x')
(z  +# 1 );
1000000000 ->
(# I# (y +# 1000000000),
I# (z +# 1) #)
}
```
This works because fold is implemented in terms of foldr to trigger build/foldr fusion. However, not all lists will fuse this way. I find, for example, that lists of Doubles refuse to fuse in this way:
```import Control.Applicative
import qualified Control.Foldl as L

-- This gives worse performance: ~40 ns per step
average :: L.Fold Double Double
average = (/) <\$> L.sum <*> L.genericLength

main = print \$ L.fold average [0..1000000000]
```
I haven't yet figured out how to trick GHC into fusing away these lists. If anybody knows how to do this, please contribute a patch.

Fortunately, these folds preserve the original accumulator, step function, and extraction function, so you can always unpack the Fold type and then implement the list fusion yourself:
```main = case average of
L.Fold step start done -> do
let go n x =
if (n > 1000000000)
then x
else go (n + 1) \$! step x n
print \$ done (go 0 start)
```
So while reliable list fusion is still an unsolved problem, you can still use foldl to get reasonable performance and guarantee no space leaks. Also, there is always the option of using foldl to do the work of assembling derived strict and streaming folds and then implementing list fusion yourself if you want to squeak out every last drop of performance.

I wrote up this library to dispel the myth that only experts can fold things efficiently in Haskell. foldl lets you begin from simple primitive folds that you can fit in your head and then chain them together into complex folds with little effort.

## Saturday, August 3, 2013

### Composable streaming folds

The Haskell Prelude provides multiple ways to fold lists into a single value. For example, you can count the number of elements in a list:
```import Data.List (genericLength)

genericLength :: (Num i) => [a] -> i
```
... or you can add them up:
```import Prelude hiding (sum)

-- I'm deviating from the Prelude's sum, which leaks space
sum :: (Num a) => [a] -> a
sum = foldl' (+) 0
```
Individually, these two folds run in constant memory when given a lazy list as an argument, never bringing more than one element into memory at a time:
```>>> genericLength [1..100000000]
100000000
>>> sum' [1..100000000]
5000000050000000
```
However, we get an immediate space leak if we try to combine these two folds to compute an average:
```>>> let average xs = sum xs / genericLength xs
>>> average [1..100000000]
<Huge space leak>
```
The original isolated folds streamed in constant memory because Haskell is lazy and does not compute each element of the list until the fold actually requests the element. After the fold traverses each element the garbage collector detects the element will no longer be used and collects it immediately, preventing any build-up of elements.

However, when we combine these two folds naively like we did with average then our program leaks space while we compute sum and before we get a chance to compute genericLength. As sum traverses the list, the garbage collector cannot collect any of the elements because we have to hold on to the entire list for the subsequent genericLength fold.

Unfortunately, the conventional solution to this is not pretty:
```mean :: [Double] -> Double
mean = go 0 0
where
go s l []     = s / fromIntegral l
go s l (x:xs) = s `seq` l `seq`
go (s+x) (l+1) xs
```
Here we've sliced open the guts of each fold and combined their individual step functions into a new step function so we can pass over the list just once. We also had to pay a lot of attention to detail regarding strictness. This is what newcomers to Haskell complain about when they say you need to be an expert at Haskell to produce highly efficient code.

#### The Fold type

Let's fix this by reformulating our original folds to preserve more information so that we can transparently combine multiple folds into a single pass over the list:
```{-# LANGUAGE ExistentialQuantification #-}

import Data.List (foldl')
import Data.Monoid

data Fold a b = forall w.  (Monoid w) => Fold
{ tally     :: a -> w
, summarize :: w -> b
}

fold :: Fold a b -> [a] -> b
fold (Fold t c) xs =
c (foldl' mappend mempty (map t xs))
```
Here I've taken a fold and split it into two parts:
• tally: The step function that we use to accumulate each element of the list
• summarize: The final function we call at the end of the fold to convert our accumulator into the desired result
The w type variable represents the internal accumulator that our Fold will use as it traverses the list. The Fold can use any accumulator of its choice as long as the accumulator is a Monoid of some sort. We specify that in the types by existentially quantifying the accumulator using the ExistentialQuantification extension.

The end user also doesn't care what the internal accumulator is either, because the user only interacts with Folds using the fold function. The type system enforces that fold (or any other function) cannot use any specific details about a Fold's accumulator other than the fact that the accumulator is a Monoid.

We'll test out this type by rewriting out our original folds using the new Fold type:
```genericLength :: (Num i) => Fold a i
genericLength =
Fold (\_ -> Sum 1) (fromIntegral . getSum)

sum :: (Num a) => Fold a a
sum = Fold Sum getSum
```
Notice how the Monoid we choose implicitly encodes how to accumulate the result. genericLength counts the number of elements simply by mapping them all to Sum 1, and then the Monoid instance for Sum just adds up all these ones to get the list length. sum is even simpler: just wrap each element in Sum and the Monoid instance for Sum adds up every element of the list. When we're done, we unwrap the final result using getSum.

We can now apply these folds to any list using the fold function, which handles all the details of accumulating each element of the list and summarizing the result:
```>>> fold genericLength [(1::Int)..100000000]
100000000
>>> fold sum [(1::Int)..100000000]
5000000050000000
```
So far, so good, but how do we combine them into an average?

#### Combining Folds

Fold has the nice property that it is an Applicative, given by the following definition:
```import Control.Applicative
import Data.Strict.Tuple

instance (Monoid a, Monoid b) => Monoid (Pair a b) where
mempty = (mempty :!: mempty)
mappend (aL :!: aR) (bL :!: bR) =
(mappend aL bL :!: mappend aR bR)

instance Functor (Fold a) where
fmap f (Fold t k) = Fold t (f . k)

instance Applicative (Fold a) where
pure a    = Fold (\_ -> ()) (\_ -> a)
(Fold tL cL) <*> (Fold tR cR) =
let t x = (tL x :!: tR x)
c (wL :!: wR) = (cL wL) (cR wR)
in  Fold t c
```
Note that this uses strict Pairs from Data.Strict.Tuple to ensure that the combined Fold still automatically runs in constant space. You only need to remember that (x :!: y) is the strict analog of (x, y).

With this Applicative instance in hand, we can very easily combine our sum and genericLength folds into an average fold:
```average :: (Fractional a) => Fold a a
average = (/) <\$> sum <*> genericLength
```
This combines the two folds transparently into a single fold that traverses the list just once in constant memory, computing the average of all elements within the list:
```>>> fold average [1..1000000]
500000.5
```
Now we're programming at a high-altitude instead of hand-writing our own accumulators and left folds and praying to the strictness gods.

What if we wanted to compute the standard deviation of a list? All we need is one extra primitive fold that computes the sum of squares:
```sumSq :: (Num a) => Fold a a
sumSq = Fold (\x -> Sum (x ^ 2)) getSum
```
Now we can write a derived fold using Applicative style:
```std :: (Floating a) => Fold a a
std =  (\ss s len -> sqrt (ss / len - (s / len)^2))
<\$> sumSq
<*> sum
<*> genericLength
```
... which still traverses the list just once:
```fold std [1..10000000]
2886751.345954732
```
In fact, this is the exact same principle that the BIRCH data clustering algorithm uses for clustering features. You keep a tally of the length, sum, and sum of squares, and you can compute most useful statistics in O(1) time from those three tallies.

Similarly, what if we wanted to compute both the sum and product of a list in a single pass?
```product :: (Num a) => Fold a a
product = Fold Product getProduct
```
Once again, we can just use Applicative style:
```>>> fold ((,) <\$> sum <*> product) [1..100]
(5050,9332621544394415268169923885626670049071596826438162146859
2963895217599993229915608941463976156518286253697920827223758251
185210916864000000000000000000000000)
```

#### Conclusion

Contrary to conventional wisdom, you can program in Haskell at a high level without leaking space. Haskell gives you the tools to abstract away efficient idioms behind a convenient and composable interface, so use them!

#### Appendix

I've included the full code so that people can play with this themselves:
```{-# LANGUAGE ExistentialQuantification #-}

import Control.Applicative
import Data.List (foldl')
import Data.Monoid
import Data.Strict.Tuple
import Prelude hiding (sum, length)

data Fold a b = forall w.  (Monoid w) => Fold
{ tally   :: a -> w
, compute :: w -> b
}

fold :: Fold a b -> [a] -> b
fold (Fold t c) xs =
c (foldl' mappend mempty (map t xs))

instance (Monoid a, Monoid b) => Monoid (Pair a b) where
mempty = (mempty :!: mempty)
mappend (aL :!: aR) (bL :!: bR) =
(mappend aL bL :!: mappend aR bR)

instance Functor (Fold a) where
fmap f (Fold t k) = Fold t (f . k)

instance Applicative (Fold a) where
pure a    = Fold (\_ -> ()) (\_ -> a)
(Fold tL cL) <*> (Fold tR cR) =
let t x = (tL x :!: tR x)
c (wL :!: wR) = (cL wL) (cR wR)
in  Fold t c

genericLength :: (Num b) => Fold a b
genericLength =
Fold (\_ -> Sum (1::Int)) (fromIntegral . getSum)

sum :: (Num a) => Fold a a
sum = Fold Sum getSum

sumSq :: (Num a) => Fold a a
sumSq = Fold (\x -> Sum (x ^ 2)) getSum

average :: (Fractional a) => Fold a a
average = (\s c -> s / c) <\$> sum <*> genericLength

product :: (Num a) => Fold a a
product = Fold Product getProduct

std :: (Floating a) => Fold a a
std =  (\ss s len -> sqrt (ss / len - (s / len)^2))
<\$> sumSq
<*> sum
<*> genericLength
```

## Friday, August 2, 2013

### Sometimes less is more in language design

Haskell programmers commonly say that Haskell code is easy to reason about, but rarely explain why. This stems from one simple guiding principle: Haskell is simple by default.

Wait, what? Are we talking about the same language? The one with monads and zygohistomorphic prepromorphisms? Yes, I mean that Haskell.

For example, what does this type signature tell us:
```x :: Bool
```
This type signature says that x is a boolean value ... and that's it! Type signatures are stronger in Haskell than other languages because they also tells us what values are not:
• x is not a time-varying value
• x is not nullable
• x is not a String being coerced to a truthy value
• x does not have any side effects
In other words, complexity is opt-in when you program in Haskell.

Imperative languages, object-oriented languages, and most other functional languages begin from a more complex baseline than Haskell. They all compete for which language provides the most built-in bells and whistles, because they all begin from the premise that more built-in features is better.

However, the problem is that you can't opt out of these features and you can't assume that any library function you call doesn't use all of them. This means that you must either rely on careful documentation like:
• "This function need not be reentrant. A function that is not required to be reentrant is not required to be thread-safe."
• "Throws: will not throw"
• "The array is changed every time the block is called, not after the iteration is over"
• "If x is not a Python int object, it has to define an __index__() method that returns an integer.
• "Great care must be exercised if mutable objects are map keys"
... or you must carefully inspect the original source code.

Haskell takes a different design tradeoff: you begin from a simpler baseline and explicitly declare what you add. If you want statefulness, you have to declare it in the type:
```-- Read and write an Int state to compute a String
stateful :: State Int String
```
If you want side effects, you have to declare it in the type:
```takeMedicine :: IO ()
```toBeOrNotToBe :: Maybe Be