Haskell for all: pipes-3.0 - A simpler, unified API

Introduction

I'm releasing pipes-3.0, which significantly simplifies the entire library. This release began as the misnamed 2.6 branch of my Github repository, but then I finally cleanly solved the polymorphic constraints issue and this solution unlocked several features that I could finally implement.

The large change log includes:

Type-classing the entire Proxy API
Offering both fast and correct base implementations
Fixing type synonyms
Unifying the Pipes and Proxy APIs and dropping Frames
Adding the PFunctor type class (functors over proxies)
Performance improvements
Expanded laws and guarantees
Smaller dependencies

As always, if you want to learn how to use pipes, just consult the tutorial.

Code

I can demonstrate a lot of the new improvements just by taking the take' function from the pipes-2.4 announcement post and showing the difference before and after the new changes.

Here is the previous version:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Control.Monad.Trans.Class (MonadTrans(lift))
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Control.Proxy.Trans.State as S
import Data.Monoid ((<>))
import Data.Text as T

newtype ParseP p a' a b' b m r = ParseP {
    unParseP :: StateP Text (EitherP Text p) a' a b' b m r }
    deriving (Monad, MonadTrans, Channel)

take' :: (Monad m, Monad (p () Text b' b m), Interact p,
          Channel p)
      => Int -> () -> ParseP p () Text b' b m Text
take' n () = ParseP go where
    go = do
        s <- S.get
        if (T.length s < n)
        then do
            s' <- liftP $ liftP $ request ()
            S.put (s <> s')
            go
        else do
            let (h, t) = T.splitAt n s
            S.put t
            return h

Here is the new version:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

-- Smaller import list
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Control.Proxy.Trans.State as S
import Data.Monoid ((<>))
import Data.Text as T

newtype ParseP p a' a b' b m r = ParseP {
    unParseP :: StateP Text (EitherP Text p) a' a b' b m r }
    deriving (Monad, MonadTrans, Proxy)

instance ProxyTrans ParseP where
    liftP = liftP . liftP

--                 +-- Cleaner constraints
--                 |
--                 v
take' :: (Monad m, Proxy p)
      => Int -> () -> Consumer (ParseP p) Text m Text
--                    ^         ^
--                    |         |
--                    +-- Type synonyms work
--                        with extensions!

take' n () = ParseP go where
    go = do
        s <- S.get
        if (T.length s < n)
        then do
            s' <- request () -- No more liftP!
            S.put (s <> s')
            go
        else do
            let (h, t) = T.splitAt n s
            S.put t
            return h

You can already see several differences:

Type synonyms work with everything, so you can always use them now.
Constraints are MUCH simpler and significantly more polymorphic
Proxy transformers can now request and respond natively
Control.Proxy now imports useful things from other libraries (like lift).

However, this release includes several more great changes as well.

Polymorphic Constraints

The big issue that held back the library in the wake of the 2.4 release was that I could not type class the request and respond operations. The pipes library really only has three fundamental operations:

(>->)
request
respond

Unfortunately, I could not type class request and respond to work with proxy transformers, meaning any standard libraries would be substantially crippled.

The problem boiled down to my inability to write polymorphic type class contexts like this:

instance  -- Not valid Haskell
    (forall a' a b' b m .
        (Monad m) => Monad (p a' a b' b m), Proxy p)
    => Proxy (EitherP e p) where
    ...

The solution (in hindsight) is simple. You define a higher-kinded type class and you copy the lower-kinded class's functions:

class MonadP p where
    (?>=) :: (Monad m)
          => p a' a b' b m r1
          -> (r1 -> p a' a b' b m r2)
          -> p a' a b' b m r2
    return_P :: (Monad m) => r -> p a 'a b' b m r

-- In practice, 'MonadP' is part of 'Proxy'

Then you can write Haskell98 contexts like:

instance (MonadP p, Proxy p) => Proxy (EitherP e p) where ...

... but at the same time you can still keep the original lower-kinded type class for the user API:

instance (MonadP p, Monad m)
    => Monad (EitherP e p a' a b' b m) where ...

This method has one drawback, which is that sometimes you need to embed the code in a newtype if all the type variables are polymorphic. The IdentityP proxy transformer fits this purpose perfectly, meaning that if you don't use any proxy transformers, you just write:

myPipe () = runIdentityP $ pipeCode

The () and runIdentityP are the only syntactic noise now. The () is essential, as I will describe below, and the runIdentityP avoids OverlappingInstances when all the type variables are polymorphic.

With this change I could then lift request and respond over proxy transformers and consolidate the entire Proxy API into the single Proxy type class. this solution is entirely Haskell98 and no longer requires FlexibleContexts. If you use pipes already you will discover that your new type signatures will become gorgeous now.

Type-classed API

pipes-3.0 type classes the entire Proxy API using the Proxy type class, which defines the following three essential functions:

(>->)
request
respond

The entire library implement EVERYTHING using these three functions and the upcoming prompt finalization library also is built entirely on these three functions. I can safely say that you don't need to use any other functions to write fully featured pipes libraries, which dramatically simplifies the API.

I type-classed all the utilities which means that I can now offer two base proxy implementations:

ProxyFast The fast proxy implementation from pipes-2.5
ProxyCorrect The correct proxy implementation from pipes-2.4

So now you can choose which implementation you prefer. Control.Proxy exports ProxyFast implementation by default, but you can easily switch to the correct implementation. Because all pipes code builds on the Proxy type class you are completely free to pick whichever implementation you prefer and they will both work transparently with all standard libraries.

However, these utilities don't just work with both base proxy implementations; they also now work as proxy transformers, too! This means you can now use these utilities seamlessly within any feature set without any lifting.

Laws

I briefly touched on the proxy laws in my previous post: Concurrency = Lists of Kleisli Arrows. However, there were two mistakes with those sets of laws:

The double request law was incorrect
The laws were insufficiently general

Fixing the double request law was an interesting challenge, but it was ultimately very rewarding and led to the correct formulation of the laws in terms of two symmetric proxy composition categories:

Pull-based category: (>->) and idT (the current one)
Push-based category: (>~>) and coidT (the new one)

The pull-based category composes proxies blocked on respond and returns a new proxy blocked on respond. The push-based category composes proxies blocked on request and returns a new proxy blocked on request.

The proxy laws say that these two categories are perfectly symmetric and that your code's behavior never changes under the following functor, which is an isomorphism between the two categories:

(>~>) to (>->)
(>->) to (>~>)
request to respond
respond to request

This means that the choice of composition operator is arbitrary, and I standardize on using the pull-based (>->) operator because the library already uses it and also matches the lazy demand-based expectations of Haskell programmers. Therefore, you never have to use the (>~>) operator, but I include in the library as a theoretical curiosity.

I've already spoken about the practical benefit of centralizing on proxies (duplex channels), but I also standardized on proxies because of their elegance, even more elegant than even pipes. For example:

The Proxy laws have a purely categorical formulation. Pipes do not permit such a formulation.
Proxy Kleisli arrows form morphisms in at least 5 categories, all of which have the same shape. Pipes are morphisms in just two categories (composition and Kleisli categories), and their morphisms do not overlap.
You can assemble most useful proxies purely using composition operators: i.e. (>=>) and (>->), unlike Pipes, which have an impedance mismatch between the two categories.
You cannot implement a push-based category using Pipes.

Type synonyms

Now type synonyms work incredibly well after I parametrized them to take the Proxy instance as a type parameter. This means that you never have to give them up when things get hairy and if you loved your good old-fashioned and simple Pipe, Consumer and Producer types, you can now use them transparently with the entire proxy ecosystem and they mesh perfectly with everything.

There is another reason I really like proxies: The Producer type synonym forbids requests. You can't do that with the unidirectional pipe implementation. I special case the Producer type synonym to close the upstream end to forbid communication upstream, so now you have a stronger guarantee.

I also went back to using () and C for type synonyms instead of Rank2Types to universally quantify unused ends. Rank2Types caused all sorts of problems when you tried to write proxy combinators that accepted pipes as arguments, whereas the simple approach always works and gives clearer types. The only disadvantage is that if you want to insert a pipe with a blocked end within an open one (like a Producer within a Pipe), you must explicitly reopen the end using the unitD and unitU helper functions.

Also, I chose not to include the initial parameter of proxies in the type synonym. The main reason is that you then lose the type synonym any time you define anything other than a composable proxy, which defeats all the other gains I just mentioned. Also, proxies lie along 5 categories, and special-casing the type synonyms for just one of those categories is a bad idea, especially when at least one of those other categories (the Kleisli category) is commonly used as well.

I also empirically experimented with both approaches in my own projects, and I can pretty confidently state that the type synonyms should never include the initial parameter. It does mean that your type signatures will be slightly longer, but it's worth it. Also, it makes it much easier for people less familiar with the pipes library to consume pipes utilities because they don't need to remember what the initial argument for any given type synonym is supposed to be.

API Consolidation

I went to a great deal of trouble to clean up the fragmentation that I began when I released Frames. Frames are gone and I've merged Pipe functionality into Proxys, so the library now only has three operations you ever need to use:

(>->)
request
respond

There is now just one way to compose things, something which a lot of users have requested.

I've deprecated the old Pipe API and included a "Control.Proxy.Pipe" module which helps users transition from Pipes to Proxys. This module not only provides the transition API but also has detailed instructions for how to upgrade your library while still keeping the niceties of the Pipes API.

In addition, I've consolidated all the tutorials into a single coherent tutorial in Control.Proxy.Tutorial with a single logical progression. The tutorial is long, but describes a great deal of the pipe idioms I've collected through my extensive usage of pipes in my own projects. More importantly, the tutorial collects all of the documentation for the library into a single location again so that users don't need to hunt over several tutorial modules or several blog posts of mine to learn about some key concept.

I also spent a lot of effort into guiding people towards a single natural coding style for pipes to make it easier for pipes users to read each other's code. The API exported from Control.Proxy has very little redundancy, and many old redundant API features (like the old await and yield) require you to pay an "import tax" to the tune of one extra import if you want to deviate from the new pipes "style guide". The tutorial also spends a lot of time talking about common idioms to encourage a uniform coding style.

Performance

The pipes library now includes rewrite rules that rewrite unoptimized user code into the equivalent hand-tuned code. Despite type-classing the entire standard library these rewrite rules fire very robustly without any assistance on your part. Just enjoy the extra performance. There are no performance regressions in this release and all the standard library utilities perform just as damn fast as they did in pipes-2.5, despite now being fully polymorphic and written using ordinary do notation.

Like always, "Control.Proxy.Prelude.Base" provides tons of examples of how to write idiomatic and high-performance pipes code, so feel free to consult that if you are learning how to write your own utilities.

Lightweight

I've dropped Frames, which means that I can now drop the index-core dependency. This means that the library only has one dependency: transformers-0.2. pipes is now the lightest dependency for any streaming library.

I will not add any extra dependencies to the main package and I will release all additional features in separate packages that build on top of pipes. I would like to see the core pipes package go as viral as possible. This is quite easy because in addition to having only a single dependency, pipes is:

Quick to compile
Safe Haskell
Light on extensions: Only KindSignatures and Rank2Types (for MFunctor and PFunctor)

Convenience imports

I added some extra imports from other modules to Control.Proxy, like lift and forever. I was pretty conservative in what I added and stuck to things that I considered essential for using pipes idiomatically.

`hoist` and `lift`

I renamed MFunctor's mapT to hoist. Sorry for the inconvenience, but mapT was a terrible name for several reasons and instead went with Edward's naming convention which I felt was more tasteful.

One of the really nice things to come out of pipes is a clean mechanism for mixing arbitrary monad transformer stacks and arbitrary proxy transformer stacks. It turns out that if two transformer stacks have the same base monad, then you can always interleave them in any way using the right combination of hoist and lift.

The exact same trick also applies to proxy transformer stacks. You can always interleave them using the right combination of hoistP (from the new PFunctor class) and liftP, which are just the higher-kinded versions of hoist and lift (and they also obey more laws, too).

The tutorial describes this trick in more detail and I find that this works phenomenally well and is very easy to reason about. For this reason I think MFunctor needs to be somewhere in the standard libraries as soon as possible because it is incredibly useful for this purpose.

Stability

I can state confidently that the pipes API can now handle any problem you throw at it and that the core is feature complete. The tutorial shows how you can combine the incredibly small pipes core into all the functionality you need. The only exception is prompt finalization, which I will describe in a separate upcoming library, but finalization similarly builds entirely on this core and does not add any new concepts or primitives.

There are only a few changes I foresee making to the main library in the future, which are:

Moving MFunctor to a better home
Adding a base proxy implementation with strictness annotations (never leaks, but half the performance of the fast implementation)

Note that this doesn't mean that pipes frequently leaks space. I've only found one case of leaking, which is the following code segment:

runProxy $ enumFromS 1

Everything else I have tried does not leak and I currently use pipes in long-running server applications of my own that also do not leak without any strictness annotations. However, if users do encounter leaks then I will go ahead and add the strict base implementation.

I will also be pretty conservative about expanding the core library because I want to stabilize it now. Almost all new functions will go in separate libraries now.

Conclusion

At this point I am very happy with the state of the main pipes library and I'm currently devoting my next efforts to standard libraries. The next libraries I will release will be (in this order):

Finalization support
bytestring / text support
Parsing extensions

12 comments:

AnonymousDecember 12, 2012 at 4:04 PM
Nice. Is there any particular reason for wanting to have your own uninhabited type (C)? I think it'd be nice to see people consolidate on http://hackage.haskell.org/package/void for empty types; unless there's a real reason for distinguishing different notions of emptiness.
Quentin MoserDecember 13, 2012 at 5:54 AM
Props for the excellent tutorial. I'm finally seeing the beauty behind that inscrutable API.

Will be using from now on :).
rirelanDecember 23, 2012 at 9:21 PM
In your tutorial, withFile needs a second argument of ReadMode for the code to compile.
Ørjan JohansenDecember 28, 2012 at 2:05 PM
Hi, I am just reading the tutorial, and shouldn't
type Pipe p a b = p () a () b
really have been
type Pipe p a b = p C a () b
?

Haskell for all

Wednesday, December 12, 2012

pipes-3.0 - A simpler, unified API

Introduction

Code

Polymorphic Constraints

Type-classed API

Laws

Type synonyms

API Consolidation

Performance

Lightweight

Convenience imports

`hoist` and `lift`

Stability

Conclusion

12 comments:

Followers

Wednesday, December 12, 2012

pipes-3.0 - A simpler, unified API

Introduction

Code

Polymorphic Constraints

Type-classed API

Laws

Type synonyms

API Consolidation

Performance

Lightweight

Convenience imports

hoist and lift

Stability

Conclusion

12 comments:

Followers

`hoist` and `lift`