Saturday, February 22, 2014

Reasoning about stream programming

This post answers a question people sometimes ask me about pipes, which I will paraphase here:

If resource management is not a core focus of pipes, why should I use pipes instead of lazy IO?

Many people who ask this question discovered stream programming through Oleg, who framed the lazy IO problem in terms of resource management. However, I never found this argument compelling in isolation; you can solve most resource management issues simply by separating resource acquisition from the lazy IO, like this:

import System.IO

main = withFile "example.txt" ReadMode $ \handle -> do
    str <- hGetContents handle
    doSomethingWith str

You can now easily reason about resource management: the handle lifetime matches the duration of the callback provided to withFile. This solves 95% of resource management scenarios, and libraries like resourcet/conduit/pipes-safe handle the more exotic cases.

If lazy IO were only a problem for 5% of all use cases, I wouldn't object to its use. Rather, resource management issues are symptoms of a larger problem: lazy IO conflicts with equational reasoning about ordering of effects.

The issue with lazy IO

Lazy IO is teaching an entire generation of Haskell programmers to "not think very hard" about ordering of side effects. Some consider this a feature, but I believe it is a disadvantage because it conflicts with the original spirit of Haskell IO: preserving equational reasoning. Lazy IO ties the effect order to evaluation order, and equational reasoning does not usually preserve evaluation order.

In my eyes, Haskell's treatment of IO was one of the major revolutions in programming language theory, because for the first time ever one could formally reason about side effects within the language rather than building an external formal model. This empowered lay programmers to reason about side effects using simple substitution and greatly lowered the entry barrier to formal reasoning in general.

Consequently, I was very disappointed to see Haskell squander this immense progress by teaching new programmers to abandon reasoning about ordering of side effects through the use of lazy IO. We should be democratizing computer science by helping the average programmer be precise, rather than encouraging imprecision and instructing newcomers to defer precision to the experts.

Pipes

I built pipes to restore equational reasoning to stream programming. Besides eschewing lazy IO, pipes also promotes equational reasoning in other ways:

  • pipes obeys several laws inspired by category theory that simplify equational rearrangements
  • The core pipes implementation is minimal
  • pipes extensions are pipes-independent, so you can reason about them orthogonally

pipes does not come with every bell and whistle pre-installed because pipes is a minimal substrate upon which other streaming abstractions can be built. This reflects the Haskell ethos of only using the minimum power necessary for the task at hand, which improves correctness and minimizes surprise.

Therefore, you can think of pipes as a gateway drug to equationally reasoning about stream processing. pipes empowers you to be correct and rigorous so that you can spend less time chasing bugs and more time releasing features.

Those who are interested in equationally reasoning about stream programming can read the pipes tutorial, which illustrates how many common streaming abstractions that we take for granted possess beautiful equational properties that we can harness for real applications.

No comments:

Post a Comment