Haskell for all: Manual proofs for the `pipes` laws

Sunday, October 6, 2013

Manual proofs for the `pipes` laws

Out of all of Haskell's streaming libraries, pipes is the only that does not have a test suite. This is because I prefer to use equational reasoning to prove the correctness of pipes. For those new to Haskell, equational reasoning is the use of equations to prove code is correct without ever running it. Haskell makes this possible by strictly enforcing purity.

Equational reasoning works well for a library like pipes because most properties of interest can easily be specified as category theory laws. If you are unfamiliar with category theory, you can read this short primer to learn how it relates to programming.

Every primitive in the pipes library is related to a category in some way:

yield and (~>) are the identity and composition operator of a category
await and (>~) are the identity and composition operator of a category
cat and (>->) are the identity and composition operator of a category

This means that we state the expected behavior for all six primitives just by stating their category laws:

-- Left Identity
yield ~> f = f

-- Right Identity
f ~> yield = f

-- Associativity
(f ~> g) ~> h = f ~> (g ~> h)


-- Left Identity
await >~ f = f

-- Right Identity
f ~> await = f

-- Associativity
(f >~ g) >~ h = f >~ (g >~ h)


-- Left Identity
cat >-> f = f

-- Right Identity
f >-> cat = f

-- Associativity
(f >-> g) >-> h = f >-> (g >-> h)

Category theory laws are like tests: they are properties that we expect our code to obey if we wrote it correctly. These kinds of laws tend to have the nice property that they tend to uncannily summarize many useful properties we would like to know in a remarkably small set of equations. For example, (~>) is defined in terms of for like this:

(f ~> g) x = for (f x) g

When you restate the category laws for yield and (~>) in terms of for, you get the "for loop laws":

-- Looping over a single yield simplifies to function
-- application

for (yield x) $ \e -> do
    f e

= f x


-- Re-yielding every element of a generator returns the
-- original generator

for gen $ \e -> do
    yield e

= gen


-- Nested for loops can become a sequential for loops if the
-- inner loop body ignores the outer loop variable

for gen0 $ \i -> do
    for (f i) $ \j -> do
        g j

= for gen1 $ \j -> do
    g j
  where
    gen1 = for gen0 $ \i -> do
        f i

These are common sense laws that we would expect any language with generators and for loops to obey. Amazingly, the for loop laws emerge naturally from category laws, almost as if they were handed down to us on stone tablets.

For a while I've been writing these proofs out by hand informally in notebooks, but now that I'm writing up my thesis I needed to organize my notes and fill in all the little steps that I would previously skip over. You can find the full set of my notes as an organized markdown file that I've committed to the pipes repository that contains all my manual proofs of the pipe laws.

Caveats

There are several caveats that I need to state about these proofs:

First, there is the obvious caveat that these proofs are still not machine-checked, but that's the next step. Paolo Capriotti has been patiently teaching me how to encode my reasoning into Agda. However, I believe this should be possible and that the proofs are very close to sound, taking into account the following caveats below.

Second, these proofs do not account for bottom (i.e. _|_). My intuition is that the pipes laws do hold even in the presence of bottom, by I am still new to reasoning about bottom, so I omitted that for now. If anybody can point me to a good reference on this I would be grateful.

Third, these proofs use a non-standard coinductive reasoning. To explain why, it's important to note that typical coinductive reasoning in Haskell requires the result to be productive and to guard coinduction behind the production of a constructor. However, not all pipelines are productive, as the following examples illustrate:

infinite = forever $ yield ()

infinite >-> forever await = _|_

for infinite discard = _|_

So instead of using productivity to impose a total ordering on coinductive reasoning I use "consuming" to order my coinductive proofs, meaning that all uses of coinduction are guarded behind the consumption of a constructor. In other words, my proofs do not guarantee that pipe composition makes progress in production, but they do guarantee that pipe composition makes progress in consumption.

Fourth, another departure from typical coinductive proofs is notation. Coinductive reasoning uses bisimilarity to prove "equality". I also use bisimilarity, but notationally I just write it down as a single chain of equations that is bisected by a reference to coinduction. The two halves of the equation chain, above and below the bisection, constitute the two bisimilar halves.

Fifth, proofs notationally discharge other proofs using headers as the namespace. If you see a reference like [Kleisli Category - Left Identity Law - Pointful], that means you should look under top-level header Kleisli Category for sub-header Left Identity Law and sub-sub-header Pointful.

Sixth, I've only proven the laws for the bidirectional API in the Pipes.Core module. However, the category laws for the unidirectional API are very easy to derive from the laws for the bidirectional API, so I leave those as an exercise until I have time to add them myself.

Conclusions

My goal is to eventually machine-check these proofs so that pipes will be a practical and instructive example of a library with a statically verified kernel. Moreover, I hope that these proofs will allow other people to equationally reason about even more sophisticated systems built on top of pipes and prove higher-level properties by discharging the proofs that I have assembled for them.

8 comments:

AlexOctober 7, 2013 at 7:01 AM
Much as I respect your use of equational reasoning and your CT-inspired design philosophy, I can't help but disagree with you here. Not having a test suite is not something to be proud of.

I'm sure you're aware of the following quotation:

> Beware of bugs in the above code; I have only proved it correct, not tried it.
> -- Donald Knuth

Proofs are not a replacement for tests, in the same way that a type system is not. Proofs definitely reduce the burden on the test suite, but there are things that can be tested that cannot be reasonably proved (that is, it is enough effort to prove them that it would be better to have tests, at least in the interim, than to wait).

Here are some things I would expect to see in a test suite if you develop it:

* Do your combinators behave as expected in the cases of bottom? In the case of async exceptions?
* Do they use and free resources as you expect?
* Are fixed bugs still fixed (regression tests)?

To reiterate: I love the formal reasoning approach, but I think it's naive to eschew tests in light of it.
ReplyDelete
Replies
Gabriella GonzalezOctober 7, 2013 at 8:15 AM
What makes you think I'm proud of it? I've had an open issue for adding a test suite to pipes for ages:

https://github.com/Gabriel439/Haskell-Pipes-Library/issues/49

Also, note that some downstream libraries (like `pipes-concurrency`) do have test suites:

https://github.com/Gabriel439/Haskell-Pipes-Concurrency-Library/blob/master/tests/tests-main.hs

I meant that first sentence in the following way: "Here's my excuse for not having written unit tests for so long". I was prioritizing equational reasoning before tackling unit tests.

There is also a second reason why I have delayed writing unit tests, particularly QuickCheck, which is that it is very difficult to do property-based testing of pipes because every constructor holds either a polymorphic functor or a function, and I don't have a good way to enumerate polymorphic functors or functions. In other words, I can't write an `Arbitrary` instance for the `Proxy` type.

I could also do non-property-based testing, too, and if somebody can contribute patches for that then I would welcome them.
ReplyDelete
Replies
AlexOctober 7, 2013 at 10:37 AM
Forgive. I misread the opening sentence as pride.

I was actually advocating non-property-based testing (or at least, not randomized property-based testing); quickcheck is sort of a middle ground between unit tests and proofs.
ReplyDelete
Replies
UnknownNovember 11, 2013 at 10:20 PM
Hi Gabriel, thanks for this enlightening post! I do have one question. Do you design your programs with these proofs as an end result, and if so what do you do during the design phase to facilitate that?
ReplyDelete
Replies

Add comment

Sunday, October 6, 2013

Manual proofs for the `pipes` laws

Caveats

Conclusions

8 comments:

Followers