## Wednesday, December 25, 2013

### Equational reasoning

You may have heard that Haskell is "great for equational reasoning", but perhaps you didn't know precisely what that meant. This post will walk through an intermediate example of equational reasoning to show how you can interpret Haskell code by hand by applying substitutions as if your code were a mathematical expression. This process of interpreting code by substituting equals-for-equals is known as equational reasoning.

#### Equality

In Haskell, when you see a single equals sign it means that the left-hand and right-hand side are substitutable for each other both ways. For example, if you see:

``x = 5``

That means that wherever you see an `x` in your program, you can substitute it with a `5`, and, vice versa, anywhere you see a `5` in your program you can substitute it with an `x`. Substitutions like these preserve the behavior of your program.

In fact, you can "run" your program just by repeatedly applying these substitutions until your program is just one giant `main`. To illustrate this, we will begin with the following program, which should print the number `1` three times:

``````import Control.Monad

main = replicateM_ 3 (print 1)``````

`replicateM_` is a function that repeats an action a specified number of times. Its type when specialized to `IO` is:

``replicateM_ :: Int -> IO () -> IO ()``

The first argument is an `Int` specifying how many times to repeat the action and the second argument is the action we wish to repeat. In the above example, we specify that we wish to repeat the `print 1` action three times.

But what if you don't believe me? What if you wanted to prove to yourself that it repeated the action three times? How would you prove that?

#### Use the source!

You can locate the source to Haskell functions using one of three tricks:

• Use Hoogle, which can also search for functions by type signature

• Use Hayoo!, which is like hoogle, but searches a larger package database and is more strict about matches

• Use Google and search for "hackage <function>". This also works well for searching for packages.

Using either of those three tricks we would locate `replicateM_` here and then we can click the Source link to the right of it to view its definition, which I reproduce here:

``replicateM_ n x = sequence_ (replicate n x)``

The equals sign means that any time we see something of the form `replicateM_ n x`, we can substitute it with `sequence_ (replicate n x)`, for any choice of `n` or `x`. For example, if we choose the following values for `n` and `x`:

``````n = 3

x = print 1``````

... then we obtain the following more specific equation:

``replicateM_ 3 (print 1) = sequence_ (replicate 3 (print 1))``

We will use this equation to replace our program's `replicateM_` command with an equal program built from `sequence` and `replicate`:

``main = sequence_ (replicate 3 (print 1))``

The equation tells us that this substitution is safe and preserves the original behavior of our program.

Now, in order to simplify this further we must consult the definition of both `replicate` and `sequence_`. When in doubt which one to pick, always pick the outermost function because Haskell is lazy and evaluates everything from outside to in.

In this case, our outermost function is `sequence_`, defined here:

``````-- | Combine a list of actions into a single action
sequence_ :: [IO ()] -> IO ()  -- I've simplified the type
sequence_ ms = foldr (>>) (return ()) ms``````

We will substitute this into our `main`, noting that `ms` is `replicate 3 (print 1)` for the purpose of this substitution:

``main = foldr (>>) (return ()) (replicate 3 (print 1))``

Now `foldr` is our outermost function, so we'll consult the definition of `foldr`:

``````-- | 'foldr k z' replaces all `(:)`s with `k`
--   and replaces `[]` with `z`
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr k z  []    = z
foldr k z (x:xs) = k x (foldr k z xs)``````

Here we see two equations. Both equations work both ways, but we don't really know which equation to apply unless we know whether or not the third argument to `foldr` is an empty list or not. We must evaluate the call to `replicate` to see whether it will pass us an empty list or a non-empty list, so we consult the definition of `replicate`:

``````-- | Create a list containing `x` repeated `n` times
replicate :: Int -> a -> [a]
replicate n x = take n (repeat x)

-- Apply the substitution, using these values:
-- n = 3
-- x = print 1

main = foldr (>>) (return ()) (take 3 (repeat (print 1)))``````

Boy, this rabbit hole keeps getting deeper and deeper! However, we must persevere. Let's now consult the definition of `take`:

``````-- | Take the first `n` elements from a list
take :: Int -> [a] -> [a]
take n _      | n <= 0 =  []
take _ []              =  []
take n (x:xs)          =  x : take (n-1) xs``````

Here we see three equations. The first one has a predicate, saying that the equality is only valid if `n` is less than or equal to 0. In our case `n` is 3, so we skip that equation. However, we cannot distinguish which of the latter two equations to use unless we know whether `repeat (print 1)` produces an empty list or not, so we must consult the definition of `repeat`:

``````-- | `repeat x` creates an infinite list of `x`s
repeat :: a -> [a]
repeat x = x:repeat x  -- Simplified from the real definition

-- Apply the substitution, using these values:
-- x = print 1

main = foldr (>>) (return ()) (take 3 (print 1:repeat (print 1)))``````

The buck stops here! Although `repeat` is infinitely recursive, we don't have to fully evaluate it. We can just evaluate it once and lazily defer the recursive call, since all we need to know for now is that the list has at least one value. This now provides us with enough information to select the third equation for `take` that requires a non-empty list as its argument:

``````take n (x:xs)          =  x : take (n-1) xs

-- Apply the substitution, using these values:
-- n  = 3
-- x  = print 1
-- xs = repeat (print 1)

main = foldr (>>) (return ()) (print 1:take 2 (repeat (print 1)))``````

Now we know for sure that `foldr`'s third argument is a non-empty list, so we can select the second equation for `foldr`:

``````foldr k z (x:xs) = k x (foldr k z xs)

-- Apply the substitution, using these values:
-- k  = (>>)
-- z  = return ()
-- x  = print 1
-- xs = take 2 (repeat (print 1))

main =
(>>) (print 1)
(foldr (>>) (return ()) (take 2 (repeat (print 1))))

-- Note: "(>>) x y" is the same thing as "x >> y"
main = print 1 >> foldr (>>)
(return ())
(take 2 (repeat (print 1)))

-- Note: "x >> y" is the same thing as "do { x; y }
main = do
print 1
foldr (>>) (return ()) (take 2 (repeat (print 1)))``````

Now our Haskell runtime knows enough information to realize that it needs to `print` a single 1. The language is smart and will execute the first `print` statement before further evaluating the call to `foldr`.

We can repeat this process two more times, cycling through evaluating `repeat`, `take`, and `foldr`, which emit two additional `print` commands:

``````-- Evaluate `repeat`
main = do
print 1
foldr (>>) (return ()) (take 2 (print 1:repeat (print 1)))

-- Evaluate `take`
main = do
print 1
foldr (>>) (return ()) (print 1:take 1 (repeat (print 1)))

-- Evaluate `foldr`
main = do
print 1
print 1
foldr (>>) (return ()) (take 1 (repeat (print 1)))

-- Evaluate `repeat`
main = do
print 1
print 1
foldr (>>) (return ()) (take 1 (print 1:repeat (print 1)))

-- Evaluate `take`
main = do
print 1
print 1
foldr (>>) (return ()) (print 1:take 0 (repeat (print 1)))

-- Evaluate `foldr`
main = do
print 1
print 1
print 1
foldr (>>) (return ()) (take 0 (repeat (print 1)))``````

Now something different will happen. Let's revisit our definition of `take`:

``````take n _      | n <= 0 =  []
take _ []              =  []
take n (x:xs)          =  x : take (n-1) xs``````

This time the first equation matches, because `n` is now equal to `0`. However, we also know the third equation will match because `repeat` will emit a new element. Whenever more than one equation matches Haskell takes the first one by default, so we use the first equation to substitute the call to `take` with an empty list:

``````main = do
print 1
print 1
print 1
foldr (>>) (return ()) []``````

This then triggers the first equation for `foldr`:

``````foldr k z [] = z

-- Therefore:
main = do
print 1
print 1
print 1
return ()``````

There it is! We've fully expanded out our use of `replicateM_` to prove that it `print`s the number 1 three times. For reasons I won't go into, we can also remove the final `return ()` and finish with:

``````main = do
print 1
print 1
print 1``````

#### Equational reasoning - Part 1

Notice how Haskell has a very straightforward way to interpret all code: substitution. If you can substitute equals-for-equals then you can interpret a Haskell program on pen and paper. Substitution is the engine of application state.

Equally cool: we never needed to understand the language runtime to know what our code did. The Haskell language has a very limited role: ensure that substitution is valid and otherwise get out of our way.

Unlike imperative languages, there are no extra language statements such as `for` or `break` that we need to understand in order to interpret our code. Our printing "loop" that repeated three times was just a bunch of ordinary Haskell code. This is a common theme in Haskell: "language features" are usually ordinary libraries.

Also, had we tried the same pen-and-paper approach to interpreting an imperative language we would have had to keep track of temporary values somewhere in the margins while evaluating our program. In Haskell, all the "state" of our program resides within the expression we are evaluating, and in our case the "state" was the integer argument to `take` that we threaded through our subsitutions. Haskell code requires less context to understand than imperative code because Haskell expressions are self-contained.

We didn't need to be told to keep track of state. We just kept mindlessly applying substitution and perhaps realized after the fact that what we were doing was equivalent to a state machine. Indeed, state is implemented within the language just like everything else.

#### Proof reduction

Proving the behavior of our code was really tedious, and we're really interested in proving more generally reusable properties rather than deducing the behavior of specific, disposable programs. However, we spent a lot of effort to prove our last equation, so we want to pick our battles wisely and only spend time proving equations that we can reuse heavily.

So I will propose that we should only really bother to prove the following four equations in order to cover most common uses of `replicateM_`:

``````-- replicateM_ "distributes" over addition
replicateM_  0      x = return ()
replicateM_ (m + n) x = replicateM_ m x >> replicateM_ n x

-- replicateM_ "distributes" over multiplication
replicateM_  1      = id
replicateM_ (m * n) = replicateM_ m . replicateM_ n``````

The last two equations are written in a "point-free style" to emphasize that `replicateM_` distributes in a different way over multiplication. If you expand those two equations out to a "point-ful style" you get:

``````replicateM_ (m * n) x = replicateM_ m (replicateM_ n x)
replicateM   1      x = x``````

If we could prove these equations, then we could much more easily deduce how `replicateM_` behaves for our example program, because we could transform our code more rapidly like this:

``````replicateM_ 3 (print 1)

-- 3 = 1 + 1 + 1
= replicateM_ (1 + 1 + 1) (print 1)

= do replicateM_ 1 (print 1)
replicateM_ 1 (print 1)
replicateM_ 1 (print 1)

-- replicateM_ 1 x = x
= do print 1
print 1
print 1``````

These four master equations are still very tedious to prove all in one go, but we can break this complex task into smaller bite-sized tasks. As a bonus, this divide-and-conquer approach will also produce several other useful and highly reusable equations along the way.

Let's begin by revisiting the definition of `replicateM_`:

``replicateM_ n x = sequence_ (replicate n x)``

Like many Haskell functions, `replicateM_` is built from two smaller composable pieces: `sequence_` and `replicate`. So perhaps we can also build our proofs from smaller and composable proofs about the individual behaviors of `sequence_` and `replicate`.

Indeed, `replicate` possesses a set of properties that are remarkably similar to `replicateM_`:

``````-- replicate distributes over addition
replicate  0      x = []
replicate (m + n) x = replicate m x ++ replicate n x

-- replicate distributes over multiplication
replicate  1      = return
replicate (m * n) = replicate m <=< replicate n``````

Again, the last two equations can be expanded to a "point-ful" form:

``````-- `return x` is `[x]` for lists
replicate 1 x = [x]

-- `(<=<)` is like a point-free `concatMap` for lists
replicate (m * n) x = concatMap (replicate m) (replicate n x)``````

These four `replicate` equations are easier to prove than the corresponding `replicateM_` equations. If we can somehow prove these simpler equations, then all we have to do is then prove that `sequence_` lifts all of the `replicate` proofs into the equivalent `replicateM_` proofs:

``````-- sequence_ lifts list concatenation to sequencing
sequence_  []        = return ()
sequence_ (xs ++ ys) = sequence_ xs >> sequence_ ys

-- (sequence_ .) lifts list functions to ordinary functions
sequence_ .  return   = id
sequence_ . (f <=< g) = (sequence_ . f) . (sequence_ . g)

-- Point-ful version of last two equations:
sequence_ [x] = x
sequence_ (concatMap f (g x)) = sequence_ (f (sequence_ (g x)))``````

It might not be obvious at first, but the above four equations for `sequence_` suffice to transform the `replicate` proofs into the analogous `replicateM_` proofs. For example, this is how we would prove the first `replicateM_` equation in terms of the first `replicate` equation and the first `sequence` equation:

``````replicateM_ 0 x

-- Definition of `replicateM_`
= sequence_ (replicate 0 x)

-- replicate 0 x = []
= sequence_ []

-- sequence_ [] = return ()
= return ()``````

That was simple. It is only slightly more tricky to prove the second equation:

``````replicateM_ (m + n) x

-- Definition of `replicateM_`
= sequence_ (replicate (m + n) x)

-- replicate (m + n) x = replicate m x ++ replicate n x
= sequence_ (replicate m x ++ replicate n x)

-- sequence (xs ++ ys) = sequence xs >> sequence ys
= sequence_ (replicate m x) >> sequence_ (replicate n x)

-- Definition of `replicateM_`, in reverse!
= replicateM_ m x >> replicateM_ n x``````

Notice how the last step of the proof involves using the original `replicateM_` equation, but instead substituting from right-to-left!

``````--  +-- We can substitute this way --+
--  |                                |
--  ^                                v

replicateM_ n x = sequence_ (replicate n x)

--  ^                                v
--  |                                |
--  +--------- Or this way! ---------+``````

This is a nice example of the utility of bidirectional substitution. We can replace the body of a function with the equivalent function call.

The third `replicateM_` equation is also simple to prove in terms of the third `replicate` and `sequence_` equations. I will use the point-ful forms of all these equations for simplicity:

``````replicateM_ 1 x

-- Definition of `replicateM_`
= sequence_ (replicate 1 x)

-- replicate 1 x = [x]
= sequence_ [x]

-- sequence_ [x] = x
= x``````

The fourth property is surprisingly short, too:

``````replicateM_ (m * n) x

-- Definition of `replicateM_`
= sequence_ (replicate (m * n) x)

-- replicate (m * n) x = concatMap (replicate m) (replicate n x)
= sequence_ (concatMap (replicate m) (replicate n x))

-- sequence_ (concatMap f (g x)) = sequence_ (f (sequence_ (g x)))
= sequence_ (replicate m (sequence_ (replicate n x)))

-- Definition of `replicateM_`, in reverse
= replicateM_ m (replicateM_ n x)``````

#### Equational reasoning - Part 2

We reduced our proofs of the `replicateM_` properties to smaller proofs for `replicate` and `sequence` properties. The overhead of this proof reduction is tiny and we can gain the benefit of reusing proofs for `replicate` and `sequence`.

As programmers we try to reuse code when we program and the way we promote code reuse is to divide programs into smaller composable pieces that are more reusable. Likewise, we try to reuse proofs when we equationally reason about code and the way we encourage proof reuse is to divide larger proofs into smaller proofs using proof reduction. In the above example we reduced the four equations for `replicateM_` into four equations for `replicate` and four equations for `sequence_`. These smaller equations are equally useful in their own right and they can be reused by other people as sub-proofs for their own proofs.

However, proof reuse also faces the same challenges as code reuse. When we break up code into smaller pieces sometimes we take things too far and create components we like to think are reusable but really aren't. Similarly, when we reduce proofs sometimes we pick sub-proofs that are worthless and only add more overhead to the entire proof process. How can we sift out the gold from the garbage?

I find that the most reusable proofs are category laws or functor laws of some sort. In fact, every single proof from the previous section was a functor law in disguise. To learn more about functor laws and how they arise everywhere you can read another post of mine about the functor design pattern.

#### Proof techniques

This section will walk through the complete proofs for the `replicate` equations to provide several worked examples and to also illustrate several useful proof tricks.

I deliberately write these proofs to be reasonably detailed and to skip as few steps as possible. In practice, though, proofs become much easier the more you equationally reason about code because you get better at taking larger steps.

Let's revisit the equations we wish to prove for `replicate`, in point-ful form:

``````replicate  0      x = []
replicate (m + n) x = replicate m x ++ replicate n x

replicate  1      x = [x]
replicate (m * n) x = concatMap (replicate m) (replicate n x)``````

... where `concatMap` is defined as:

``````concatMap :: (a -> [b]) -> [a] -> [b]
concatMap f = foldr ((++) . f) []``````

Now we must use the same equational reasoning skills we developed in the first section to prove all four of these equations.

The first equation is simple:

``````replicate 0 x

-- Definition of `replicate`
= take 0 (repeat x)

-- Definition of `take`
= []

-- Proof complete``````

The second equation is trickier:

``````replicate (m + n) x

-- Definition of `take`
= take (m + n) (repeat x)``````

We can't proceed further unless we know whether or not `m + n` is greater than `0`. For simplicity we'll assume that `m` and `n` are non-negative.

We then do something analogous to "case analysis" on `m`, pretending it is like a Peano number. That means that `m` is either `0` or positive (i.e. greater than 0). We'll first prove our equation for the case where `m` equals `0`:

``````-- Assume: m = 0
= take (0 + n) (repeat x)

-- 0 + n = n
= take n (repeat x)

-- Definition of `(++)`, in reverse
= [] ++ take n (repeat x)

-- Definition of `take`, in reverse
= take 0 (repeat x) ++ take n (repeat x)

-- m = 0
= take m (repeat x) ++ take n (repeat x)

-- Definition of `replicate`, in reverse
= replicate m x ++ replicate n x

-- Proof complete for m = 0``````

Then there is the second case, where `m` is positive, meaning that we can represent `m` as `1` plus some other non-negative number `m'`:

``````-- Assume: m = 1 + m'
= take (1 + m' + n) (repeat x)

-- Definition of `repeat`
= take (1 + m' + n) (x:repeat x)

-- Definition of `take`
= x:take (m' + n) (repeat x)

-- Definition of `replicate` in reverse
= x:replicate (m' + n) x``````

Now we can use induction to reuse the original premise since `m'` is strictly smaller than `m`. Since we are assuming that `m` is non-negative this logical recursion is well-founded and guaranteed to eventually bottom out at the base case where `m` equals `0`:

``````-- Induction: reuse the premise
= x:(replicate m' x ++ replicate n x)

-- Definition of `(++)`, in reverse
= (x:replicate m' x) ++ replicate n x

-- Definition of `replicate`
= (x:take m' (repeat x)) ++ replicate n x

-- Definition of `take`, in reverse
= take (1 + m') (repeat x) ++ replicate n x

-- Definition of `replicate`, in reverse
= replicate (1 + m') x ++ replicate n x

-- m = 1 + m', in reverse
= replicate m x ++ replicate n x

-- Proof complete for m = 1 + m'``````

This completes the proof for both cases so the proof is "total", meaning that we covered all possibilities. Actually, that's a lie because really rigorous Haskell proofs must account for the possibility of non-termination (a.k.a. "bottom"). However, I usually consider proofs that don't account for non-termination to be good enough for most practical purposes.

The third `replicate` law is very straightforward to prove:

``````replicate 1 x

-- Definition of `replicate`
= take 1 (repeat x)

-- Definition of `repeat`
= take 1 (x:repeat x)

-- Definition of `take`
= x:take 0 (repeat x)

-- Definition of `take`
= x:[]

-- [x] is syntactic sugar for `x:[]`
= [x]``````

The fourth equation for `replicate` also requires us to split our proof into two branches. Either `n` is zero or greater than zero. First we consider the case where `n` is zero:

``````replicate (m * n) x

-- Assume: n = 0
= replicate 0 x

-- replicate 0 x = []
= []

-- Definition of `foldr`, in reverse
= foldr ((++) . replicate m) [] []

-- Definition of `concatMap`, in reverse
= concatMap (replicate m) []

-- replicate 0 x = [], in reverse
= concatMap (replicate m) (replicate 0 x)

-- n = 0, in reverse
= concatMap (replicate m) (replicate n x)

-- Proof complete for n = 0``````

Then we consider the case where `n` is greater than zero:

``````replicate (m * n) x

-- Assume: n = 1 + n'
= replicate (m * (1 + n')) x

-- m * (1 + n') = m + m * n'
= replicate (m + m * n') x

= replicate m x ++ replicate (m * n') x

-- Induction: reuse the premise
= replicate m x ++ concatMap (replicate m) (replicate n' x)

-- Definition of `concatMap`
= replicate m x ++ foldr ((++) . replicate m) [] (replicate n' x)

-- Definition of `foldr`, in reverse
= foldr ((++) . replicate m)) [] (x:replicate n' x)

-- Definition of `concatMap`, in reverse
= concatMap (replicate m) (x:replicate n' x)

-- Definition of `replicate`
= concatMap (replicate m) (x:take n' (repeat x))

-- Definition of `take`, in reverse
= concatMap (replicate m) (take (1 + n') (x:repeat x))

-- n = 1 + n', in reverse
= concatMap (replicate m) (take n (x:repeat x))

-- Definition of `repeat`, in reverse
= concatMap (replicate m) (take n (repeat x))

-- Definition of `replicate`, in reverse
= concatMap (replicate m) (replicate n x)

-- Proof complete for n = 1 + n'``````

Hopefully these proofs give an idea for the amount of effort involved to prove properties of moderate complexity. I omitted the final part of proving the `sequence_` equations in the interest of space, but they make for a great exercise.

#### Equational Reasoning - Part 3

• building a formal model of a program's algorithm,
• reasoning about the behavior of the formal model using its axioms, and
• proving that the program matches the model.

In a purely functional language like Haskell you formally reason about your code within the language itself. There is no need for a separate formal model because the code is already written as a bunch of formal equations. This is what people mean when they say that Haskell has strong ties to mathematics, because you can reason about Haskell code the same way you reason about mathematical equations.

This is why Haskell syntax for function definitions deviates from mainstream languages. All function definitions are just equalities, which is why Haskell is great for equational reasoning.

This post illustrated how equational reasoning in Haskell can scale to larger complexity through the use of proof reduction. A future post of mine will walk through a second tool for reducing proof complexity: type classes inspired by mathematics that come with associated type class laws.

## Thursday, December 19, 2013

### Lift error handling with lens-like syntax

One of the main deficiencies of `transformers` is that the `MonadTrans` class does not let you lift functions like `catchError`. The `mtl` library provides one solution to this problem, which is to type-class `catchError` and `throwError` using the `MonadError` type class.

That's not to say that `transformers` has no solution for lifting `catchError` and `throwError`; it's just really verbose. Each module provides a `liftCatch` function that you use to lift a `catchError` function from the base monad to the transformed monad.

For example, `Control.Monad.Trans.State` provides the following `liftCatch` function:

``````Control.Monad.Trans.State.liftCatch
:: (m (a, s) -> (e -> m (a, s)) -> m (a, s))
-> StateT s m a -> (e -> StateT s m a) -> StateT s m a``````

``myStack :: (Monad m, Error e) => StateT s (ErrorT e m)``

... then you would apply `liftCatch` directly to `catchError` to get a catch function that worked at the `StateT` layer:

``````Control.Monad.Trans.State.liftCatch catchError
=> StateT s (ErrorT e m) a
-> (e -> StateT s (ErrorT e m) a)
-> StateT s (ErrorT e m) a``````

But what if you had a `WriterT` layer in between like this:

``myStack :: (Monad m) => StateT s (WriterT w (ErrorT e m))``

You'd have to use `liftCatch` from the `Control.Monad.Trans.Writer` module to further lift the `catchError` to work with this stack:

``````import Control.Monad.Trans.State as S

S.liftCatch (W.liftCatch catchError)
=> StateT s (WriterT w (ErrorT e m)) a
-> (e -> StateT s (WriterT w (ErrorT e m)) a)
-> StateT s (WriterT w (ErrorT e m)) a``````

The advantage of this solution is that type inference works extraordinarily well and it is Haskell98. The disadvantage of this solution is that it is very verbose.

So I will propose a less verbose solution that has a syntax resembling the `lens` library, although there will be no true lenses involved (I think). Define the following `catching` function:

``````catching
=> ((ErrorT e m a -> (e -> ErrorT e m a) -> ErrorT e m a) -> r)
-> r
catching k = k catchError``````

The `k` in the above function will be a series of composed `liftCatch` functions that we will apply to `catchError`. However, in the spirit of the `lens` library we will rename these `liftCatch` functions to be less verbose and more hip and sexy:

``````import qualified Control.Monad.Trans.Maybe  as M
import qualified Pipes.Lift                 as P

state  = S.liftCatch

writer = W.liftCatch

maybe  = M.liftCatch

pipe   = P.liftCatchError``````

Now let's say that we have a monad transformer stack of type:

``````myStack
=> StateT s (MaybeT (ErrorT e m)) r``````

If I want to catch something at the outer `StateT` level, I would just write:

``````myStack = catching (state.maybe) action \$ \e -> do
-- The handler if `action` fails
...``````

If I add more layers to my monad transformer stack, all I have to do is point deeper to the stack by composing more references:

``````myStack
:: (Monad m, Error e, Monoid w)
=> Pipe a b (StateT s (WriterT w (MaybeT m))) r
myStack = catching (pipe.state.writer.maybe) action \$ \e -> ...``````

Also, these references are first-class Haskell values, so you can bundle them and manipulate them with your favorite functional tools:

``````myStack = do
let total = pipe.state.writer.maybe
catching total action1 \$ \e -> ...
catching total action2 \$ \e -> ...``````

This approach has a few advantages over the traditional `MonadError` approach:

• You get improved type inference
• You get type errors earlier in development. With `MonadError` the compiler will not detect an error until you try to run your monad transformer stack.
• You get better type errors. `MonadError` errors will arise at a distance where you call `runErrorT` even though the logical error is probably at the site of the `catchError` function.
• Functional references are first class and type classes are not

However, we don't want to lift just `catchError`. There are many other functions that `transformers` can lift such as `local`, `listen`, and `callCC`. An interesting question would be whether this is some elegant abstraction that packages all these lifting operations into a simple type in the same way that lenses package getters, setters, traversals, maps, and zooms into a single abstraction. If there were, then we could reuse the same references for catching, listening, and other operations that are otherwise difficult to lift:

``````listening (state.error) m

passing (maybe.writer) m

catching (writer.maybe) m \$ \e -> ...``````

I have no idea if such an elegant abstraction exists, though, which is why I'm writing this post to solicit ideas.

## Friday, November 1, 2013

### Test stream programming using Haskell's `QuickCheck`

`pipes` is a stream programming library built on top of a foundation of basic category theory. The core of the library consists of a set of five categories that all intersect in a single streaming data type and the library's contract is the set of associated category laws.

For example, one such category is the "respond category", which `pipes` uses to implement `for` loops and `ListT`. The two key operations are `(/>/)`, which is the category composition operator, and `respond`, which is the identity. These must satisfy the following category laws:

``````respond />/ f = f                  -- Left Identity

f />/ respond = f                  -- Right Identity

(f />/ g) />/ h = f />/ (g />/ h)  -- Associativity``````

Previously, I described how I manually proved the category laws using equational reasoning, which elicited a strong response from readers that I should also verify the laws empirically using Haskell's `QuickCheck` library. After all, Haskell's `QuickCheck` library shines at automated property-based testing. Why not just fire up `QuickCheck` and test something like:

``> quickCheck \$ \f -> f />/ respond == f``

However, this leads to several problems:

• You can't compare pipes for `Eq`uality
• `QuickCheck` can't `Show` pipes when it discovers a counter-example
• You can't generate `Arbitrary` pipes

The latter is the most challenging problem of the three to solve: how do we generate random pipes to test? This has to be done in such a way that it exercises the system and efficiently discovers corner cases.

#### Randomizing pipes

I decided to try encoding a random pipe as random sequence of operations (specifically Kleisli arrows). This actually works out quite well, because we already have two natural operations built into the core bidirectional API: `request` and `respond`. Both of them allow information to pass through twice, as illustrated by the following ASCII diagrams:

``````request :: Monad m => a' -> Proxy a' a y' y m a

a'
|
+----|----+
|    |    |
a' <=====/    |
|         |
a  ======\    |
|    |    |
+----|----+
v
a

respond :: Monad m => a  -> Proxy x' x a' a m a'

a
|
+----|----+
|    |    |
|    \ /==== a'
|     X   |
|    / \===> a
|    |    |
+----|----+
v
a'``````

If we generate a random sequence of them them we just connect them head to toe. For simplicity, I will just assume that all inputs and outputs are of type `Int`:

``````pRandom1 = respond >=> request >=> respond

Int
|
+----|----+
|    |    |
|    \ /==== Int
|     X   |
|    / \===> Int
|    |    |
+----|----+
v
Int
|
+----|----+
|    |    |
Int <=====/    |
|         |
Int ======\    |
|    |    |
+----|----+
v
Int
|
+----|----+
|    |    |
|    \ /==== Int
|     X   |
|    / \===> Int
|    |    |
+----|----+
v
Int``````

When we compose proxies using something like `(>+>)` (the bidirectional generalization of `(>->)`), we conceptually place these random chains side by side and match inputs with outputs. For example, if we generate the following two random sequences of `request` and `respond`:

``````pRandom1 = respond >=> request >=> respond
pRandom2 = request >=> request``````

... then this generates the following flow of information, beginning from the top-right `Int`.

``````pRandom1 >+> pRandom2:

Int
|
+----|----+
|    |    |
Int<======================/    |
|                   |         |
+----|----+            /======\    |
|    |    |           /  |    |    |
|    \ /==== Int <=\ /   +----|----+
|     X   |         X         |
|    / \===> Int ==/ \        v
|    |    |          |       Int
+----|----+          |        |
v               \   +----|----+
Int               \  |    |    |
|                 \======/    |
+----|----+              |         |
|    |    |            /======\    |
... <=====/    |           /  |    |    |
|         |          /   +----|----+
... ======\    |          |        v
|    |    |          |       Int
+----|----+          |
v               |
Int              |
|               |
+----|----+          |
|    |    |          |
|    \ /==== Int     /
|     X   |         /
|    / \===> Int ==/
|    |    |
+----|----+
v
Int``````

This shows how you can easily generate complex flows of information to exercise the API with just a few simple permutations of `request` and `respond`.

However, notice how the `Int` flows through the pipeline without interruption until the pipeline terminates. This means that unless we do something else we will always return the same `Int` we began with.

So we add two additional actions to our random repertoire which will provide diagnostic information about which paths we took. The first one is `inc`, which increments the `Int` flowing through it:

``````inc :: (Monad m) => Int -> m Int
inc n = return (n + 1)``````

The second is `log`, which stores the current value of the `Int` using a `Writer []` base monad and reforwards the `Int` further along:

``````log :: Int -> Proxy a' a b' b (Writer [Int]) Int
log n = do
lift (tell [n])
return n``````

Now we can easily generate random `Proxy`s just by assembling random chains of these four actions:

``pRandom = log >=> inc >=> request >=> inc >=> inc >=> ...``

We can use this same trick to generate arbitrary `Server`s and `Client`s (i.e. one-sided `Proxy`s) by just selectively removing either `request` or `respond` from the list of permissible actions.

Once we have random `Proxy`s, `Server`s, and `Client`s, we can test for equality by testing the values that each side returns and logs when given an initial value of `0`:

``````p1 === p2 = run p1 == run p2
where
run p = runWriter (runEffect (p 0))

infix 4 ===``````

This comparison is pure because the `Writer []` base monad is pure, so we can pass it as suitable property that `QuickCheck` can test. Well, almost...

We also need to be able to `Show` the randomized values that we selected so that `QuickCheck` can print any counter-examples it discovers. The solution, though, is pretty simple. We can use an intermediate representation that is just an enumeration. This just stores placeholders for each action in our chain, and these placeholders are `Show`able:

``````-- Place holder for a single `Proxy` action
data ProxyStep
= ProxyRequest
| ProxyRespond
| ProxyLog
| ProxyInc
deriving (Enum, Bounded)

instance Arbitrary ProxyStep where
arbitrary = arbitraryBoundedEnum
shrink _  = []

instance Show ProxyStep where
show x = case x of
ProxyRequest -> "request"
ProxyRespond -> "respond"
ProxyLog     -> "log"
ProxyInc     -> "inc"

-- Place holder for a single `Client` action
data ClientStep
= ClientRequest
| ClientLog
| ClientInc
deriving (Enum, Bounded)

instance Arbitrary ClientStep where
arbitrary = arbitraryBoundedEnum
shrink _  = []

instance Show ClientStep where
show x = case x of
ClientRequest -> "request"
ClientLog     -> "log"
ClientInc     -> "inc"

-- Place holder for a single `Server` action
data ServerStep
= ServerRespond
| ServerLog
| ServerInc
deriving (Enum, Bounded)

instance Arbitrary ServerStep where
arbitrary = arbitraryBoundedEnum
shrink _  = []

instance Show ServerStep where
show x = case x of
ServerRespond -> "respond"
ServerLog     -> "log"
ServerInc     -> "inc"``````

Then we tell `QuickCheck` to generate random lists of these actions which will form our complete `Proxy`s, `Client`s, or `Server`s:

``````correct :: String -> String
correct str = case str of
[] -> "return"
_  -> str

-- Place holder for a `Proxy`
newtype AProxy = AProxy { unAProxy :: [ProxyStep] }

instance Arbitrary AProxy where
arbitrary = fmap AProxy arbitrary
shrink = map AProxy . shrink . unAProxy

instance Show AProxy where
show = correct . intercalate " >=> " . map show . unAProxy

-- Place holder for a `Client`
newtype AClient = AClient { unAClient :: [ClientStep] }

instance Arbitrary AClient where
arbitrary = fmap AClient arbitrary
shrink = map AClient . shrink . unAClient

instance Show AClient where
show = correct . intercalate " >=> " . map show . unAClient

-- Place holder for a `Server`
newtype AServer = AServer { unAServer :: [ServerStep] }

instance Arbitrary AServer where
arbitrary = fmap AServer arbitrary
shrink = map AServer . shrink . unAServer

instance Show AServer where
show = correct . intercalate " >=> " . map show . unAServer``````

Once `QuickCheck` generates these randomized lists of actions we can convert them to honest-to-god `Proxy`s/`Server`s/`Client`s for testing purposes using the following conversion functions:

``````aProxy
:: AProxy -> Int -> Proxy Int Int Int Int (Writer [Int]) Int
aProxy = foldr (>=>) return . map f . unAProxy
where
f x = case x of
ProxyRequest -> request
ProxyRespond -> respond
ProxyLog     -> log
ProxyInc     -> inc

aClient :: AClient -> Int -> Client Int Int (Writer [Int]) Int
aClient = foldr (>=>) return . map f . unAClient
where
f x = case x of
ClientRequest -> request
ClientLog     -> log
ClientInc     -> inc

aServer :: AServer -> Int -> Server Int Int (Writer [Int]) Int
aServer = foldr (>=>) return . map f . unAServer
where
f x = case x of
ServerRespond -> respond
ServerLog     -> log
ServerInc     -> inc``````

#### Test-driving `pipes`

Combined with the previous `(===)` function for testing equality, I can now test high-level properties like associativity:

``````main = quickCheck \$ \p1' p2' p3' ->
let p1 = aServer p1'
p2 = aProxy  p2'
p3 = aClient p3'
in  p1 >+> (p2 >+> p3) === (p1 >+> p2) >+> p3``````

`QuickCheck` then generates 100 random test cases and verifies that all of them obey the associativity law:

``````>>> main
++ OK, passed 100 tests.``````

However, this is still not enough. Perhaps my randomization scheme is simply not exercising corner cases sufficiently well. To really convince myself that I have a good randomization scheme I must try a few negative controls to see how effectively `QuickCheck` uncovers property violations.

For example, let's suppose that some enterprising young NSA employee were to try to commit a modification to the identity pipe `pull'` to try to `log` the second value flowing upstream. We could set up a test case to warn us if the modified `pull'` function failed to obey the identity law:

``````main = quickCheck \$ \p1' p2' ->
let p1    = aServer p1'

-- The maliciously modified `pull` function
pull' = request >=> respond >=> log >=> pull'

p2    = aClient p2'
in  p1 >+> (pull' >+> p2) === p1 >+> p2``````

`QuickCheck` would then snoop this out in a hurry and catch the law violation.

``````*** Failed! Falsifiable (after 12 tests and 6 shrinks):
respond
request >=> request``````

Not only does `QuickCheck` detect violations, but it also goes out of its way to minimize the violation to the minimum reproducing test case. In this case, the way you read the `QuickCheck` output is that the minimal code necessary to trigger the violation is when:

``````p1 = respond
p2 = request >=> request``````

In other words, `QuickCheck` detects a spurious `log` on the left-hand side of the equation if `p2` `request`s two values and `p1` `respond`s with at least one value. Notice that if `p1` did not `respond` with at least one value then the left pipeline would terminate before `p2`'s second request and avoid triggering the `log` statement.

`QuickCheck` can do this kind of minimization because of purity. Since our test case is pure, `QuickCheck` can safely run it repeatedly as it tries to `shrink` the counter-example, without having to worry that repeated runs will interfere with each other because of side effects or statefulness.

Here's another example, where we accidentally wrote `pull` wrong and inserted one extra `request` too many:

``````main = quickCheck \$ \p1' p3' ->
let p1    = aServer p1'
pull' = request >=> respond >=> request >=> pull
p3    = aClient p3'
in  p1 >+> (pull' >+> p3) === p1 >+> p3``````

`QuickCheck` immediately pulls out this clever counter-example:

``````*** Failed! Falsifiable (after 15 tests and 8 shrinks):
respond >=> respond
request >=> request >=> inc``````

This instructs `p1` and `p2` to exchange information twice. This triggers `pull'` to accidentally `request` one value too many after the second exchange and terminate early before `p2` can call `inc`.

Examples like these give me confidence that permutations on these four actions suffice to build most useful counter-examples. I could probably even narrow it down to three commands by eliminating `log`, but for now I will keep it.

#### Conclusion

I believe this is the first example of a useful `Arbitrary` instance for randomizing a stream programming data type. This allows `pipes` to test more powerful properties than most stream programming libraries, which would normally settle for randomizing input to the system instead of randomizing the flow of control.

I want to thank Csernik Flaviu Andrei who took the time to write up all the `pipes` laws into a `QuickCheck` suite integrated with `cabal test`. Thanks to him you can now have even greater confidence in the correctness of the `pipes` library.

The next step would be to machine-check the `pipes` laws using Agda, which would provide the greatest assurance of correctness.

## Saturday, October 12, 2013

This post discusses a Haskell-based project that is the central component of my thesis: a fast, atomic-level structural search engine named "Suns". I will discuss what problem this search engine solves, why I chose Haskell for this project, and what were the pros and cons of my language choice.

#### The Challenge

Proteins are amazing molecules that control biology on the molecular level. If you are new to cellular biology, you can get an immediate sense for how diverse and sophisticated they are by watching The Inner Life of the Cell, an artistic rendition of several diverse processes controlled by proteins, including:

• cellular adhesion (how cells recognize and stick to each other)
• forming a skeleton for the cell
• assembling and disassembling cellular highways
• trafficking cargo along those highways

• sense the molecular environment
• perform chemistry
• are cellular weapons
• and much, much more!

I am a graduate student in a research lab that designs new proteins for both medical treatments and scientific study. Unfortunately, proteins are significantly harder to design than software. A typical protein can be very chemically complex and tightly integrated:

Proteins are tangled ropes of amino acids, so they are the epitome of spaghetti code. However, it's not as bad as it sounds. Within these tangled balls we see several patterns emerge, both on small, medium, and large scales.

On a larger scale you have "protein domains". I liken these to high-level scripts: self-contained blobs of functionality with little coupling or state. They are often easy to connect together to generate new functionality:

On the medium scale you have "secondary structure", consisting primarily of alpha helices and beta strands, which form locally repetitive structures. I liken these to C code: low-level, but still reasonably portable. Our lab has historically had success transplanting patterns in secondary structure to new contexts, generating new functionality.

On the small scale you have "motifs", small configurations of a few amino acids that frequently occur in natural proteins. I liken these to assembly code: they are very tightly integrated and pack a lot of utility into a small space, but they are not very portable because they have very precise chemical and geometric requirements and little room for customization:

Our lab needs to design proteins on this small and compact scale, so they use my search engine to discover and incorporate these small motifs into their designed proteins. The search engine overcomes the "portability issues" of these low-level motifs by finding existing motifs that are exactly geometrically and chemically compatible with partial blueprints.

#### Overview

The most popular interface to the search engine is through `PyMOL`, a molecular graphics system that lets you visualize protein structures three-dimensionally. You can find a `PyMOL` search plugin on the official Suns web site, which also includes an extended tutorial for how to use the search plugin.

The search plugin lets you point and click on multiple protein fragments and then click "Search". Typically within less than a second it will stream up to 100 matches to the fragment of interest, superimposed on your original search query. You can then expand on your original query by incorporating new fragments from the search results: An exmple iterative design process that begins from a guanidinium group and which grows to an N-terminal helix-capping motif. Black dashes highlight fragments incorporated from search results.

The `PyMOL` plugin is a thin Python-based client that connects to a remote search backend written in Haskell.

I initially did not begin this project in Haskell. I actually began it in C. The reason why was that at the time I did not have a solid background in data structures and algorithms, so I would reflexively turn to C for high-performance projects to try to make up for it by improving the constant factors of my code. However, all-atom searching is the kind of problem where the algorithm matters and the bare-metal performance does not. My initial implentation in C attempted to brute force the solution to no avail, taking hours to perform a single search.

The second problem that I encountered was that when I got the first slow draft working I immediately got additional feature requests and I was not prepared to handle them. C is a very brittle language to program in because of manual memory management and no standard library of data structures. However, like many inexperienced programmers I had trouble saying "no" and bit off more than I could chew trying (and failing) to satisfy these feature requests.

At that time I had been dabbling in Haskell in my free time and took interest in the language. Things that intrigued me about the language included:

• expressive and strong types
• an elegant mix of imperative and functional style (using monads)
• high-level idioms and design patterns

Plus, Haskell is garbage collected and has a rich library of data structures, which solved my two main qualms with C, so I decided to rewrite my code base in Haskell.

This rewrite was not a simple translation of the original C code. One of the main reasons that I transitioned was that I wanted to familiarize myself with improved data structures and algorithms. Haskell made it very easy to prototype new code for several reasons:

• You have a rich set of high-performance purely functional data structures in the `containers`, `unordered-containers`, and `vector` libraries
• Garbage collection, type inference, and type classes automate common menial tasks
• Strong static types make it a breeze to refactor code without having to write tests

Consequently, Haskell gave me much greater confidence to rewrite large swaths of code because the compiler caught more of my mistakes and automated more details. This encouraged me to try out new ideas and rapidly gravitate towards a good solution instead of getting complacent with my first working solution. These changes improved performance and now queries took milliseconds instead of hours. At this point, `PyMOL` became the bottleneck and could not load search results fast enough, so there was no point improving performance further.

Another major advantage of Haskell for my project is parsing. Haskell libraries excel at parsing by necessity: there is a huge pressure to convert untyped textual data to more strongly typed data structures in order to take advantage of the type system. I want to give a special shoutout to `attoparsec` and `aeson`, which are fantastic for this purpose.

Another big benefit of programming in Haskell was that my code was remarkably stable and devoid of bugs. This meant that I spent less time maintaining the code and more time adding features.

The first issue I encountered is that if you program in Haskell you need to be prepared to fill in the missing gaps in the library ecosystem. I began at a time when the Haskell ecosystem was beginning to mature, but there were still several important gaps. The foremost of these was a stream programming library, which I needed to write myself because I was unsatisfied with existing iteratee-based solutions, and this eventually evolved into my `pipes` stream programming library. If you are considering starting a large project in Haskell you need to be prepared to do some library work of your own.

Another issue was numerical code. This is limited almost exclusively to `hmatrix` (Haskell bindings to the GNU scientific library) and `repa` (a fast and native Haskell library, but without a standard library of numerical algorithms). `hmatrix` was problematic because it is GPLv3 licensed, and my university, UCSF, forbids releasing code with a GPLv3 license because of the patent clause. This meant that I was forced to write my own LAPACK bindings for performance-critical code. Haskell's foreign-function interface (FFI) is nice, but not having to use it at all is even nicer.

Next, space leaks cause problems. Space leaks are like the Haskell analog to C's segmentation faults: they require programmer discipline to prevent and they are not easy to debug. With experience I learned how to avoid them, mainly by:

• Defining data types with strict fields
• Ensuring all folds are strict in the accumulator

... but this is still a problem because it bites beginners to the language really early. Also, space leaks are the biggest source of program instability, which is a shame because it compromises what would be an otherwise rock-solid stability story for the language.

The next problem is the lack of standardized error-handling idioms. I ended up writing my own library for this, too, called the `errors` library, but this did not completely solve the problem either. I'm currently collaborating with John Wiegley and Michael Snoyman (of Yesod fame) to try to solve this problem.

Another problem is the pre-existing ecosystem of functionality dependent on lazy `IO`. I'm not going to name any names, but one library that I depend on uses lazy `IO` internally and it is still a source of bugs. Lazy `IO` causes these sort of problems because it does not guarantee ordering of effects and also results in synchronous exceptions being thrown asynchronously in pure code (which is a big no-no in the Haskell world).

#### Conclusions

The code for this project is open source and hosted on Github. It consists of three separate projects:

People interested in learning Haskell may find the code for the search engine useful to see what a medium-sized Haskell application looks like. Haskell gets a bad reputation for being overly academic and I hope people can see past that to discover an excellent programming language for rapid development with high reliability.

## Wednesday, October 9, 2013

### How to reimplement the conduit parsing API in 50 lines of pipes code

Michael's recent blog posts highlighted several deficiences of `pipes`-based parsing. Particularly, he emphasized that it was incompatible with idioms from the core `pipes` library, and I agree with that assessment. Programming with `pipes-parse` is a different beast from programming with vanilla `pipes` and `pipes-parse` idioms more closely resemble `conduit` idioms.

Several comments in response to Michael's post asked if one could internally implement `conduit` on top of `pipes`, in order to simplify `conduit` internals. This post answers half that question by showing how to implement conduit sans finalization on top `pipes` using the tools from `pipes-parse`.

This code is short, but very dense, so I will walk through the implementation step-by-step, explaining the underlying `pipes-parse` idioms that I'm using to reconstruct conduit functionality. If you just want to skip to the complete code then go straight to the Appendix at the end of this post.

#### The Conduit Type

The way you internally represent a conduit-like parser using `pipes` is the following data type:

``````import Pipes

newtype ConduitM i o m r = ConduitM
{ unConduitM ::
forall x . Producer o (StateT (Producer i m x) m) r }``````

To recap, a `ConduitM i o m r` has an input of type `i` and an output of type `o`, and the output is distinct from the return value, just like `pipes`.

I model this as a `Producer` of `o`s that reads from and writes to a `Producer` of `i`s. The `Producer` of `i`s is our conduit's upstream input end. `await`ing a value will pop an elements off of this `Producer` and adding back a `leftover` pushes an element back onto this `Producer`.

This representation differs from conduit's implementation in one important way: it makes no distinction between leftovers and future input. Both are stored together within the inner `Producer`. This is one neat trick of reifying future input as a `Producer`: you now have an obvious place to store leftovers.

#### Primitives

The next step is to implement `await`, which is just a thin wrapper around `draw` from `pipes-parse`:

``````type Consumer i m r = forall o . ConduitM i o m r

await :: (Monad m) => Consumer i m (Maybe i)
await = ConduitM \$ lift \$ liftM f draw
where
f (Left  _) = Nothing
f (Right r) = Just r``````

However, this doesn't explain what `draw` is actually doing, so let's refer to its implementation:

``````draw = do
p <- get                 -- Retrieve our source of input
x <- lift (next p)       -- Attempt to pop one element off
case x of
Right (a, p') -> do  -- Success: bind the element
put p'           -- Save the remainder
return (Right a)
Left   r      -> do  -- Failure: No more input
put (return r)
return (Left  r)``````

If you are more comfortable with `StateT` you might prefer the following less verbose form which inlines all the state passing:

``````draw = StateT \$ \p -> do
x <- next p
return \$ case x of
Right (a, p') -> (Right a, p'      )
Left      r   -> (Left  r, return r)``````

If you think of a `Producer a m ()` as isomorphic to `ListT m a`, then `next` is the equivalent of `uncons` for `ListT`.

Similarly, we can add elements back to the producer, using `leftover`, which is just a thin wrapper around `unDraw` from `pipes-parse`:

``````leftover :: (Monad m) => i -> ConduitM i o m ()
leftover i = ConduitM \$ lift \$ unDraw i``````

`unDraw` has a simple implementation:

``unDraw a = modify (Pipes.yield a >>)``

It just prepends a `yield` statement onto the `Producer`. This is the equivalent of `cons` for `ListT`.

What about `yield`? Well, that's exactly the same:

``````yield :: (Monad m) => o -> ConduitM i o m ()
yield o = ConduitM (Pipes.yield o)``````

#### Composition

Now for the interesting part: conduit composition, which has the following type:

``````(=\$=) :: (Monad m)
=> ConduitM a b m ()
-> ConduitM b c m r
-> ConduitM a c m r``````

If we were to replace these `ConduitM`s with the underlying pipe type, we would get:

``````(=\$=) :: (Monad m)
=> forall x . Producer b (StateT (Producer a m x) m) ()
-> forall y . Producer c (StateT (Producer b m y) m) r
-> forall z . Producer c (StateT (Producer a m z) m) r``````

How do we even begin to approach that?

The key is the `runStateP` function from `Pipes.Lift`, which has the following (simplified) type:

``````runStateP
:: s -> Producer a (StateT s m) r -> Producer a m (r, s)``````

Compare this with the type for `runStateT`:

``runStateT :: StateT s m r -> s -> m (r, s)``

`runStateP` differs from `runStateT` in two ways:

• `runStateP` unwraps a `StateT` buried inside of a pipe

• `runStateP` takes arguments in the opposite order from `runStateT`

`runStateP` takes care to thread state as it wraps the internal `StateT` so it behaves just like `runStateT`. Once you familiarize yourself with how `runStateP` works, the solution is a matter of type-chasing. In fact, what you will discover is that if you restrict yourself to `runStateP`, there is only one solution that type-checks.

We begin with two arguments two our operator:

``ConduitM pL =\$= ConduitM pR = ConduitM \$ ...``

`pL` has type:

``pL :: forall x . Producer b (StateT (Producer a m x) m) ()``

Let's look at what type we get when we unwrap `pL` using `runStateP`:

``````parseL
=> Producer a m x -> Producer b m ((), Producer a m x)
parseL as = runStateP as pL``````

This now looks just like a parser combinator. It takes an input stream of values of type `a` and generates an output stream of values of type `b`, returning unused input alongside the `()` return value. We're not interested in this `()` return value, so we'll use `execStateP` instead:

``````parseL
=> Producer a m x -> Producer b m (Producer a m x)
parseL as = execStateP as pL``````

Similarly, we'll convert `pR` to a parser:

``````parseR
=> Producer b m y -> Producer c m (r, Producer b m y)
parseR bs = runStateP bs pR``````

Now what's our goal? We're trying to build a `ConduitM a c m r`, which is equivalent to the following parser:

``````parse
=> Producer a m z -> Producer c m (r, Producer a m z)``````

This means that we need to introduce a stream of `as`:

``````parse as = do
-- as :: Producer a m x``````

We can now feed that stream to `parseL`

``````parse as = do
-- as        :: Producer a m x
-- parseL as :: Producer b m (Producer a m x)``````

We can then feed that to `parseR`. This works because `parseR` is universally quantified in `y`, which type-checks as `Producer a m x`:

``````parse as = do
-- as  :: Producer a m x
-- parseL as
--     :: Producer b m (Producer a m x)
-- parseR (parseL as)
--     :: Producer c m (r, Producer b m (Producer a m x))``````

This is almost what we want. We just need to discard the unused stream of `b`s:

``````parse as = do
(r, pp) <- parseR (parseL as)
p'      <- lift \$ drain pp
return (r, p')
where
drain p = runEffect (for p discard)``````

If we inline all of that logic, we get the following 5-line implementation of `conduit` composition:

``````ConduitM pL =\$= ConduitM pR = ConduitM \$
stateP \$ \as -> do
(r, pp) <- runStateP (execStateP as pL) pR
p'      <- lift \$ drain pp
return (r, p')``````

This gives a birds-eye view of how `conduit` composition works. When we compose two conduits, we:

• Feed the input stream of `as` to the upstream conduit
• Feed that to the downstream conduit
• Discard all leftovers from their intermediate interface

Once we have this composition operator, the right and left fuse are just type-specializations (the same as in `conduit`):

``````type Conduit  i m o = ConduitM i  o    m ()
type Source     m o = ConduitM () o    m ()
type Sink     i m r = ConduitM i  Void m r

(\$=) :: (Monad m) => Source m a -> Conduit a m b -> Source m b
(\$=) = (=\$=)

(=\$) :: (Monad m) => Conduit a m b -> Sink b m c -> Sink a m c
(=\$) = (=\$=)``````

What about `(\$\$)`? That is even simpler:

``````empty :: (Monad m) => Producer () m r
empty = return () >~ cat  -- equivalent to "forever \$ yield ()"

(\$\$) :: (Monad m) => Source m a -> Sink a m b -> m b
ConduitM pL \$\$ ConduitM pR =
evalStateT (runEffect pR) (evalStateP empty pL)``````

This implementation says at a high-level:

• Feed an unused leftover stream to `pL` (unused because it's a `Source`)
• Feed that to `pR`
• There is no step 3

#### Identity

If that is composition, what is the identity? Why, it's just `input` from `pipes-parse`:

``````idP :: (Monad m) => ConduitM a a m ()
idP = ConduitM (void input)``````

Neat how that works out. This is equivalent in behavior to:

``````idP = do
ma <- await
case ma of
Nothing -> return ()
Just a  -> do
yield a
idP``````

#### Connect and Resume

Last but not least we need connect and resume. Like I said before, this will ignore finalization concerns, so I will only implement a variation on `(\$\$+)` that returns a new `Source`, rather than a `ResumableSource` (which is just a `Source` tagged with a finalizer).

``````(\$\$+)
=> Source m a -> Sink a m b -> m (b, Source m a)
ConduitM pL \$\$+ ConduitM pR = do
(b, as) <- runEffect \$ runStateP (execStateP empty pL) pR
let as' = ConduitM \$ stateP \$ \p -> ((), p) <\$ as
return (b, as')``````

This says:

• Feed an unused input stream to `pL` (it's a `Source`)
• Feed that to `pR`
• Discard `pR`'s inexistent output (it's a `Sink`)
• Create a new `Source` that also ignores its input stream

#### Conclusion

The purpose of this post is not to suggest that Michael necessarily should implement `conduit` in terms of `pipes`, especially since this does not contain finalization code, yet. Rather, I wanted to exhibit that `pipes` is a powerful tool that you can use to build other abstractions concisely and with less room for error.

#### Appendix

The minimal test implementation is 50 lines of code, which I've included here:

``````{-# LANGUAGE RankNTypes #-}

import Control.Applicative ((<\$))
import Pipes hiding (await, yield, Consumer)
import qualified Pipes
import Pipes.Lift
import Pipes.Parse

newtype ConduitM i o m r = ConduitM
{ unConduitM :: forall x .
Producer o (StateT (Producer i m x) m) r }

return r = ConduitM (return r)
m >>= f  = ConduitM \$ unConduitM m >>= \r -> unConduitM (f r)

instance MonadTrans (ConduitM i o) where
lift m = ConduitM (lift (lift m))

type Consumer i m r = forall o . ConduitM i  o    m r
type Source     m o =            ConduitM () o    m ()
type Sink     i m r =            ConduitM i  Void m r
type Conduit  i m o =            ConduitM i  o    m ()

await :: (Monad m) => Consumer i m (Maybe i)
await = ConduitM \$ lift \$ liftM f draw
where
f (Left  _) = Nothing
f (Right r) = Just r

yield :: (Monad m) => o -> ConduitM i o m ()
yield o = ConduitM (Pipes.yield o)

leftover :: (Monad m) => i -> ConduitM i o m ()
leftover i = ConduitM \$ lift \$ unDraw i

(=\$=)
=> Conduit a m b
-> ConduitM b c m r
-> ConduitM a c m r
ConduitM pL =\$= ConduitM pR = ConduitM \$
stateP \$ \p -> do
(r, pp) <- runStateP (execStateP p pL) pR
p'      <- lift \$ drain pp
return (r, p')

drain :: (Monad m) => Producer a m r -> m r
drain p = runEffect (for p discard)

(\$=) :: (Monad m) => Source m a -> Conduit a m b -> Source m b
(\$=) = (=\$=)

(=\$) :: (Monad m) => Conduit a m b -> Sink b m c -> Sink a m c
(=\$) = (=\$=)

empty :: (Monad m) => Producer () m r
empty = return () >~ cat

(\$\$) :: (Monad m) => Source m a -> Sink a m b -> m b
ConduitM pL \$\$ ConduitM pR =
evalStateT (runEffect pR) (evalStateP empty pL)

idP :: (Monad m) => ConduitM a a m ()
idP = ConduitM (void input)

(\$\$+)
=> Source m a -> Sink a m b -> m (b, Source m a)
ConduitM pL \$\$+ ConduitM pR = do
(b, pa) <- runEffect \$ runStateP (execStateP empty pL) pR
let p' = ConduitM \$ stateP \$ \p -> ((), p) <\$ pa
return (b, p')``````

## Sunday, October 6, 2013

### Manual proofs for the `pipes` laws

Out of all of Haskell's streaming libraries, `pipes` is the only that does not have a test suite. This is because I prefer to use equational reasoning to prove the correctness of `pipes`. For those new to Haskell, equational reasoning is the use of equations to prove code is correct without ever running it. Haskell makes this possible by strictly enforcing purity.

Equational reasoning works well for a library like `pipes` because most properties of interest can easily be specified as category theory laws. If you are unfamiliar with category theory, you can read this short primer to learn how it relates to programming.

Every primitive in the `pipes` library is related to a category in some way:
• `yield` and `(~>)` are the identity and composition operator of a category
• `await` and `(>~)` are the identity and composition operator of a category
• `cat` and `(>->)` are the identity and composition operator of a category
This means that we state the expected behavior for all six primitives just by stating their category laws:
```-- Left Identity
yield ~> f = f

-- Right Identity
f ~> yield = f

-- Associativity
(f ~> g) ~> h = f ~> (g ~> h)

-- Left Identity
await >~ f = f

-- Right Identity
f ~> await = f

-- Associativity
(f >~ g) >~ h = f >~ (g >~ h)

-- Left Identity
cat >-> f = f

-- Right Identity
f >-> cat = f

-- Associativity
(f >-> g) >-> h = f >-> (g >-> h)
```
Category theory laws are like tests: they are properties that we expect our code to obey if we wrote it correctly. These kinds of laws tend to have the nice property that they tend to uncannily summarize many useful properties we would like to know in a remarkably small set of equations. For example, `(~>)` is defined in terms of `for` like this:
```(f ~> g) x = for (f x) g
```
When you restate the category laws for `yield` and `(~>)` in terms of `for`, you get the "for loop laws":
```-- Looping over a single yield simplifies to function
-- application

for (yield x) \$ \e -> do
f e

= f x

-- Re-yielding every element of a generator returns the
-- original generator

for gen \$ \e -> do
yield e

= gen

-- Nested for loops can become a sequential for loops if the
-- inner loop body ignores the outer loop variable

for gen0 \$ \i -> do
for (f i) \$ \j -> do
g j

= for gen1 \$ \j -> do
g j
where
gen1 = for gen0 \$ \i -> do
f i
```
These are common sense laws that we would expect any language with generators and for loops to obey. Amazingly, the for loop laws emerge naturally from category laws, almost as if they were handed down to us on stone tablets.

For a while I've been writing these proofs out by hand informally in notebooks, but now that I'm writing up my thesis I needed to organize my notes and fill in all the little steps that I would previously skip over. You can find the full set of my notes as an organized markdown file that I've committed to the `pipes` repository that contains all my manual proofs of the pipe laws.

#### Caveats

There are several caveats that I need to state about these proofs:

First, there is the obvious caveat that these proofs are still not machine-checked, but that's the next step. Paolo Capriotti has been patiently teaching me how to encode my reasoning into Agda. However, I believe this should be possible and that the proofs are very close to sound, taking into account the following caveats below.

Second, these proofs do not account for bottom (i.e. `_|_`). My intuition is that the `pipes` laws do hold even in the presence of bottom, by I am still new to reasoning about bottom, so I omitted that for now. If anybody can point me to a good reference on this I would be grateful.

Third, these proofs use a non-standard coinductive reasoning. To explain why, it's important to note that typical coinductive reasoning in Haskell requires the result to be productive and to guard coinduction behind the production of a constructor. However, not all pipelines are productive, as the following examples illustrate:
```infinite = forever \$ yield ()

infinite >-> forever await = _|_

```
So instead of using productivity to impose a total ordering on coinductive reasoning I use "consuming" to order my coinductive proofs, meaning that all uses of coinduction are guarded behind the consumption of a constructor. In other words, my proofs do not guarantee that pipe composition makes progress in production, but they do guarantee that pipe composition makes progress in consumption.

Fourth, another departure from typical coinductive proofs is notation. Coinductive reasoning uses bisimilarity to prove "equality". I also use bisimilarity, but notationally I just write it down as a single chain of equations that is bisected by a reference to coinduction. The two halves of the equation chain, above and below the bisection, constitute the two bisimilar halves.

Fifth, proofs notationally discharge other proofs using headers as the namespace. If you see a reference like `[Kleisli Category - Left Identity Law - Pointful]`, that means you should look under top-level header `Kleisli Category` for sub-header `Left Identity Law` and sub-sub-header `Pointful`.

Sixth, I've only proven the laws for the bidirectional API in the `Pipes.Core` module. However, the category laws for the unidirectional API are very easy to derive from the laws for the bidirectional API, so I leave those as an exercise until I have time to add them myself.

#### Conclusions

My goal is to eventually machine-check these proofs so that `pipes` will be a practical and instructive example of a library with a statically verified kernel. Moreover, I hope that these proofs will allow other people to equationally reason about even more sophisticated systems built on top of `pipes` and prove higher-level properties by discharging the proofs that I have assembled for them.

## Friday, September 20, 2013

### Perfect streaming using `pipes-bytestring`

`pipes-bytestring-1.0.0` is complete, providing `pipes` utilities for reading and writing effectul streams of `ByteString`s. Most people who have been waiting on this library were mainly interested in the following four pipes:

``````stdin :: MonadIO m => Producer' ByteString m ()

stdout :: MonadIO m => Consumer' ByteString m ()

fromHandle :: MonadIO m => Handle -> Producer' ByteString m ()

toHandle :: MonadIO m => Handle -> Consumer' ByteString m r``````

However, I delayed releasing `pipes-bytestring` for so long because there was a separate problem I wanted to solve first, which I like to call the Perfect Streaming Problem.

#### The Problem

Here is the statement of the Perfect Streaming Problem:

How do you partition an effectful stream into logical units while strictly enforcing an upper bound on memory use?

This problem will resonate with anybody who has ever built a streaming web service: how do you group your stream of bytes into logical units like files or messages? If you naively load each file or message into memory then a malicious input that is one extremely long file or message will take down your server by exhausting all available memory. Most web services work around this by imposing artificial limits on file or message length, or by writing brittle and non-compositional code or protocols that are difficult to understand, debug, and maintain.

However, we don't have to invoke web services to illustrate this problem. Just consider this simpler problem: how do you implement the Unix `head` utility in Haskell so that it runs in constant space on any input, including infinite inputs without any line boundaries like `/dev/zero`:

``\$ head < /dev/zero > /dev/null  # This must run in constant space``

Let's keep it simple and assume that our `head` utility only needs to forward exactly 10 lines from `stdin` and to `stdout`.

Before `pipes-bytestring`, the only way to do this was using lazy `IO`, like this:

``````takeLines :: String -> String
takeLines n = unlines . take n . lines

main = do
str <- getContents
putStr (takeLines 10 str)``````

However, `pipes` and other streaming libraries were built to replace lazy `IO` so that Haskell programmers could easily reason about the order of effects and decouple them from Haskell's evaluation order. Yet, this simple `head` challenge remains a perennial problem for streaming libraries. To illustrate why, let's consider how `conduit` does this, using `Data.Conduit.Text.lines`:

``lines :: Monad m => Conduit Text m Text``

This conduit receives a stream of `Text` chunks of arbitrary size and outputs `Text` chunks one line long. This sounds reasonable until you feed your program `/dev/zero`. The `lines` conduit would then attempt to load an infinitely large chunk of zeroes as the first output and exhaust all memory.

Michael Snoyman already realized this problem and added the `linesBounded` conduit as a temporary workaround:

``linesBounded :: MonadThrow m => Int -> Conduit Text m Text``

`linesBounded` throws an exception when given a line that exceeds the specified length. This is the same workaround that web services use: artificially limit your input stream. I wasn't satisfied with this solution, though. How can I claim that `pipes` is a mature replacement for lazy `IO` if lazy `IO` soundly beats `pipes` on something as simple as `head`?

#### The Solution

I will introduce the solution by beginning from the final code:

Note: There was a last minute bug which I introduced in the `lines` function before release. Use `pipes-bytestring-1.0.1`, which fixes this bug, to run this example.
``````-- head.hs

import Pipes
import Pipes.ByteString
import Pipes.Parse (takeFree)
import Prelude hiding (lines, unlines)

takeLines
=> Int
-> Producer ByteString m () -> Producer ByteString m ()
takeLines n = unlines . takeFree n . lines

main = runEffect \$ takeLines 10 stdin >-> stdout``````

Compile and run this to verify that it takes the first ten lines of input for any file:

``````\$ ./head < head.hs

import Pipes
import Pipes.ByteString
import Pipes.Parse (takeFree)
import Prelude hiding (lines, unlines)

takeLines
=> Int``````

... while still handling infinitely long lines in constant space:

``\$ ./head < /dev/zero >/dev/null  # Runs forever in constant space``

To see why this works, first take a look at the type of `Pipes.ByteString.lines`:

``````lines
=> Producer ByteString m ()
-> FreeT (Producer ByteString m) m ()``````

Now compare to that the type of `Data.ByteString.Lazy.Char8.lines`:

``lines :: ByteString -> [ByteString]``

`pipes` treats a `Producer` of `ByteString`s as the effectful analog of a lazy `ByteString`:

``````-- '~' means "is analogous to"
Producer ByteString m ()  ~  ByteString``````

Similarly, `pipes` also treats a `FreeT` of `Producer`s as the effectful analog of a list of lazy `ByteString`s:

``FreeT (Producer ByteString m) m ()  ~  [ByteString]``

You can think of `FreeT` as a "linked list" of zero or more `Producer`s, where each `Producer`'s return value contains either the next `Producer` or the final return value (`()` in this case). So if a `Producer` is analogous to a lazy `ByteString` then a `FreeT`-based "linked list" of `Producer`s is analogous to a true linked list of lazy `ByteString`s.

Each layer of our `FreeT` is a `Producer` that streams one line's worth of chunks. This differs from a single chunk one line long because it's still in `Producer` form so we haven't actually read anything from the file yet. Also, `FreeT` is smart and statically enforces that we cannot read lines from the next `Producer` (i.e. the next line) until we finish the first line.

`FreeT` has a very important property which other solutions do not have: we can use `FreeT` to sub-divide the `Producer` into logical units while still keeping the original chunking scheme. Once we have these nice logical boundaries we can manipulate the `FreeT` using high-level list-like functions such as `Pipes.Parse.takeFree`:

``````takeFree
=> Int -> FreeT f m () -> FreeT f m ()``````

`takeFree` is the `FreeT` analog of `Prelude.take`. We keep the first three `f` layers of the `FreeT` and discard the rest. In this case our `f` is `(Producer ByteString m)` so if each `Producer` represents one line then we can take a fixed number of lines.

This works because `FreeT (Producer ByteString m) m ()` is just an ordinary Haskell data type. This data type doesn't contain any actual chunks. Instead, it is just a description of how we might traverse our input stream. When we call `takeFree` on it we are just saying: "Now that I think about it, I don't intend to traverse as much of the input as I had originally planned".

`unlines` completes the analogy, collapsing our logical units back down into an unannotated stream of bytes:

``````unlines
=> FreeT (Producer ByteString m) m r
-> Producer ByteString m r``````

We can then combine `lines`, `takeFree` and `unline`s to mirror the lazy `IO` approach:

``````-- Lazy IO:
takeLines n = unlines . take n . lines

-- pipes:
takeLines n = unlines . takeFree n . lines``````

The main difference is that the intermediate types are larger because we're moving more information in to the type system and, more importantly, moving that same information out of implicit magic.

#### Conclusions

The next `pipes` library in development is `pipes-text`. The main development bottle-neck (besides my own graduation) is that the `text` library does not export functions to partially decode incomplete `ByteString` fragments as much as possible without throwing errors. If somebody were to write that up it would speed up the release of `pipes-text` significantly.

As always, you can follow or contribute to `pipes` development or just ask questions by joining the haskell-pipes mailing list.

## Saturday, September 7, 2013

### pipes-4.0: Simpler types and API

I'm announcing `pipes-4.0` which greatly simplifies the types and API of the `pipes` ecosystem. For people new to `pipes`, `pipes` is a compositional streaming library that decouples streaming programs into simpler, reusable components.

The purpose behind `pipes` is to simplify effectful stream programming for both library authors and application writers. Library authors can package streaming components into a highly reusable interface, and application writers can easily connect together streaming components to build correct, efficient, and low-memory streaming programs. For example, the following program connects three reusable components to stream lines from standard input to standard output until the user types `"quit"`:

```import Pipes
import qualified Pipes.Prelude as P

main = runEffect \$
P.stdinLn >-> P.takeWhile (/= "quit") >-> P.stdoutLn
```

`pipes` distinguishes itself from other stream programming libraries in three main ways:

• Insistence on elegance, symmetry, and theoretical purity
• Careful attention to correctness, documentation, and detail
• Emphasis on a high power-to-weight ratio

#### Credits

This release was made possible due to the suggestions and contributions of many people and I want to give special mention to several people:

• Renzo Carbonara, who is the largest contributor of downstream libraries, building and maintaining `pipes-network`, `pipes-network-tls`, `pipes-zlib`, `pipes-binary`, `pipes-attoparsec` and `pipes-aeson`. He also provided lots of useful feedback on design proposals because of his experience maintaining these libraries.

• Ben Gamari, who contributed `pipes-vector` and `pipes-interleave`.

• Oliver Charles who contributed to the design of the new `pipes-parse` library in the process of developing the `pipes-tar` library.

• Csernik Flaviu Andrei, who contributed the complete benchmark suite for `pipes`.

• Merijn Verstraaten, who contribute the new `mtl` instances for `pipes`.

• Gergely Risko, who fixed a concurrency bug in `pipes-concurrency`.

• Mihaly Barasz, who contributed the complete test suite for `pipes-concurrency`.

• Tony Day who helped automate the `pipes` ecosystem and contributed lots of useful feedback on documentation.

• Aleksey Khudyakov first proposed the idea to remove the old proxy transformer system and outsource the same functionality to monad transformers in the base monad. This change alone accounted for an almost 60% reduction in the library size and the greatest simplification of the types.

• Johan Tibell proposed the initial idea to provide a simpler unidirectional subset of the API by default. This removed the warts that bidirectionality introduced.

• Florian Hofmann whose work on `pipes-eep` led to the discovery of an `Arrow` instance for push-based pipes.

• Aristid Breitkreuz and Pierre Radermecker, who both caught important bugs in `pipes-parse` and `pipes-safe`.

• Oliver Batchelor, whose work on integrating `pipes` with Cloud Haskell improved the design of `pipes-safe`.

Also, I would like to also thank everybody who provided feedback on the library and its documentation and also contributed code.

#### Change Log

People familiar with `pipes` will notice that the biggest change to the library is the elimination of the proxy transformer system. This was made possible by an insight of Aleksey Khudyakov that the proxy transformers were isomorphism to monad transformers in the base monad if you ignored their ability to be unwrapped before the `Proxy` layer. I later discovered how to unwrap these base monad transformers while preserving the `Proxy` layer, which made possible the complete elimination of the proxy transformer system.

This had the largest impact on simplifying the API:

• The number of exported functions dropped to approximately 1/3 of the original size (from about 300+ to 100+)

• The number of modules dropped to 1/3 of the original size (from 18 to 6)

• The `p` type parameter in type signatures disappeared, along with the `Proxy` type class, which became the concrete `Proxy` type (which was the old `ProxyFast` implementation).

• No need for `runIdentityP` any longer

The next most important change was a simplification of the API to a unidirectional subset which is the new default. This fixed several warts of the previous API:

• No more gratuitous `()` parameters

• The pipe monad and category now overlap

• Polymorphic type synonyms can now be used to simplify the types

The original bidirectional functionality still remains intact within the `Pipes.Core` module. The only difference is that it is not exported by default.

The next important change was the realization that bind in the `respond` Category (i.e. `(//>)`) was exactly equivalent to a `for` loop, so the unidirectional API now uses `for` as a synonym for `(//>)` and produces really beautiful `for` loops.

Other important syntactic changes:

• The unidirectional API uses `yield` instead of `respond` like it was back in `pipes-1.0`

• The unidirectional API uses `await` instead of `request` like it was back in `pipes-1.0`

• `runProxy` is now `runEffect`

• `(>->)` is the unidirectional pull-based composition, instead of bidirectional composition

`Pipes.Prelude` has also changed to remove the suffix from all utilities, but is no longer re-exported from the main `Pipes` module.

The downstream libraries have been updated as well to use the `pipes-4.0` API and several of these now have much simpler APIs, too, particularly `pipes-safe`. I will discuss these libraries in separate library-specific posts later on.

#### Future Goals

This release is intended to be the last major version bump. The next development priorities are:

• Stabilize the core `pipes` library

• Improve distribution by packaging up `pipes` for several package managers

• Continue to build out the `pipes` ecosystem, particularly dedicated `Text` and `ByteString` libraries

The long-term goal is to get `pipes` into the Haskell platform once the API has proven itself stable and the ecosystem matures.

People interested in learning more about `pipes` or contributing to development can join the official mailing list.