Wednesday, December 12, 2012

pipes-3.0 - A simpler, unified API

Introduction


I'm releasing pipes-3.0, which significantly simplifies the entire library. This release began as the misnamed 2.6 branch of my Github repository, but then I finally cleanly solved the polymorphic constraints issue and this solution unlocked several features that I could finally implement.

The large change log includes:
  • Type-classing the entire Proxy API
  • Offering both fast and correct base implementations
  • Fixing type synonyms
  • Unifying the Pipes and Proxy APIs and dropping Frames
  • Adding the PFunctor type class (functors over proxies)
  • Performance improvements
  • Expanded laws and guarantees
  • Smaller dependencies
As always, if you want to learn how to use pipes, just consult the tutorial.


Code


I can demonstrate a lot of the new improvements just by taking the take' function from the pipes-2.4 announcement post and showing the difference before and after the new changes.

Here is the previous version:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Control.Monad.Trans.Class (MonadTrans(lift))
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Control.Proxy.Trans.State as S
import Data.Monoid ((<>))
import Data.Text as T

newtype ParseP p a' a b' b m r = ParseP {
    unParseP :: StateP Text (EitherP Text p) a' a b' b m r }
    deriving (Monad, MonadTrans, Channel)

take' :: (Monad m, Monad (p () Text b' b m), Interact p,
          Channel p)
      => Int -> () -> ParseP p () Text b' b m Text
take' n () = ParseP go where
    go = do
        s <- S.get
        if (T.length s < n)
        then do
            s' <- liftP $ liftP $ request ()
            S.put (s <> s')
            go
        else do
            let (h, t) = T.splitAt n s
            S.put t
            return h
Here is the new version:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

-- Smaller import list
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Control.Proxy.Trans.State as S
import Data.Monoid ((<>))
import Data.Text as T

newtype ParseP p a' a b' b m r = ParseP {
    unParseP :: StateP Text (EitherP Text p) a' a b' b m r }
    deriving (Monad, MonadTrans, Proxy)

instance ProxyTrans ParseP where
    liftP = liftP . liftP

--                 +-- Cleaner constraints
--                 |
--                 v
take' :: (Monad m, Proxy p)
      => Int -> () -> Consumer (ParseP p) Text m Text
--                    ^         ^
--                    |         |
--                    +-- Type synonyms work
--                        with extensions!

take' n () = ParseP go where
    go = do
        s <- S.get
        if (T.length s < n)
        then do
            s' <- request () -- No more liftP!
            S.put (s <> s')
            go
        else do
            let (h, t) = T.splitAt n s
            S.put t
            return h
You can already see several differences:
  • Type synonyms work with everything, so you can always use them now.
  • Constraints are MUCH simpler and significantly more polymorphic
  • Proxy transformers can now request and respond natively
  • Control.Proxy now imports useful things from other libraries (like lift).
However, this release includes several more great changes as well.


Polymorphic Constraints


The big issue that held back the library in the wake of the 2.4 release was that I could not type class the request and respond operations. The pipes library really only has three fundamental operations:
  • (>->)
  • request
  • respond
Unfortunately, I could not type class request and respond to work with proxy transformers, meaning any standard libraries would be substantially crippled.

The problem boiled down to my inability to write polymorphic type class contexts like this:
instance  -- Not valid Haskell
    (forall a' a b' b m .
        (Monad m) => Monad (p a' a b' b m), Proxy p)
    => Proxy (EitherP e p) where
    ...
The solution (in hindsight) is simple. You define a higher-kinded type class and you copy the lower-kinded class's functions:
class MonadP p where
    (?>=) :: (Monad m)
          => p a' a b' b m r1
          -> (r1 -> p a' a b' b m r2)
          -> p a' a b' b m r2
    return_P :: (Monad m) => r -> p a 'a b' b m r

-- In practice, 'MonadP' is part of 'Proxy'
Then you can write Haskell98 contexts like:
instance (MonadP p, Proxy p) => Proxy (EitherP e p) where ...
... but at the same time you can still keep the original lower-kinded type class for the user API:
instance (MonadP p, Monad m)
    => Monad (EitherP e p a' a b' b m) where ...
This method has one drawback, which is that sometimes you need to embed the code in a newtype if all the type variables are polymorphic. The IdentityP proxy transformer fits this purpose perfectly, meaning that if you don't use any proxy transformers, you just write:
myPipe () = runIdentityP $ pipeCode
The () and runIdentityP are the only syntactic noise now. The () is essential, as I will describe below, and the runIdentityP avoids OverlappingInstances when all the type variables are polymorphic.

With this change I could then lift request and respond over proxy transformers and consolidate the entire Proxy API into the single Proxy type class. this solution is entirely Haskell98 and no longer requires FlexibleContexts. If you use pipes already you will discover that your new type signatures will become gorgeous now.


Type-classed API


pipes-3.0 type classes the entire Proxy API using the Proxy type class, which defines the following three essential functions:
  • (>->)
  • request
  • respond
The entire library implement EVERYTHING using these three functions and the upcoming prompt finalization library also is built entirely on these three functions. I can safely say that you don't need to use any other functions to write fully featured pipes libraries, which dramatically simplifies the API.

I type-classed all the utilities which means that I can now offer two base proxy implementations:
  • ProxyFast The fast proxy implementation from pipes-2.5
  • ProxyCorrect The correct proxy implementation from pipes-2.4
So now you can choose which implementation you prefer. Control.Proxy exports ProxyFast implementation by default, but you can easily switch to the correct implementation. Because all pipes code builds on the Proxy type class you are completely free to pick whichever implementation you prefer and they will both work transparently with all standard libraries.

However, these utilities don't just work with both base proxy implementations; they also now work as proxy transformers, too! This means you can now use these utilities seamlessly within any feature set without any lifting.


Laws


I briefly touched on the proxy laws in my previous post: Concurrency = Lists of Kleisli Arrows. However, there were two mistakes with those sets of laws:
  • The double request law was incorrect
  • The laws were insufficiently general
Fixing the double request law was an interesting challenge, but it was ultimately very rewarding and led to the correct formulation of the laws in terms of two symmetric proxy composition categories:
  • Pull-based category: (>->) and idT (the current one)
  • Push-based category: (>~>) and coidT (the new one)
The pull-based category composes proxies blocked on respond and returns a new proxy blocked on respond. The push-based category composes proxies blocked on request and returns a new proxy blocked on request.

The proxy laws say that these two categories are perfectly symmetric and that your code's behavior never changes under the following functor, which is an isomorphism between the two categories:
  • (>~>) to (>->)
  • (>->) to (>~>)
  • request to respond
  • respond to request
This means that the choice of composition operator is arbitrary, and I standardize on using the pull-based (>->) operator because the library already uses it and also matches the lazy demand-based expectations of Haskell programmers. Therefore, you never have to use the (>~>) operator, but I include in the library as a theoretical curiosity.

I've already spoken about the practical benefit of centralizing on proxies (duplex channels), but I also standardized on proxies because of their elegance, even more elegant than even pipes. For example:
  • The Proxy laws have a purely categorical formulation. Pipes do not permit such a formulation.
  • Proxy Kleisli arrows form morphisms in at least 5 categories, all of which have the same shape. Pipes are morphisms in just two categories (composition and Kleisli categories), and their morphisms do not overlap.
  • You can assemble most useful proxies purely using composition operators: i.e. (>=>) and (>->), unlike Pipes, which have an impedance mismatch between the two categories.
  • You cannot implement a push-based category using Pipes.

Type synonyms


Now type synonyms work incredibly well after I parametrized them to take the Proxy instance as a type parameter. This means that you never have to give them up when things get hairy and if you loved your good old-fashioned and simple Pipe, Consumer and Producer types, you can now use them transparently with the entire proxy ecosystem and they mesh perfectly with everything.

There is another reason I really like proxies: The Producer type synonym forbids requests. You can't do that with the unidirectional pipe implementation. I special case the Producer type synonym to close the upstream end to forbid communication upstream, so now you have a stronger guarantee.

I also went back to using () and C for type synonyms instead of Rank2Types to universally quantify unused ends. Rank2Types caused all sorts of problems when you tried to write proxy combinators that accepted pipes as arguments, whereas the simple approach always works and gives clearer types. The only disadvantage is that if you want to insert a pipe with a blocked end within an open one (like a Producer within a Pipe), you must explicitly reopen the end using the unitD and unitU helper functions.

Also, I chose not to include the initial parameter of proxies in the type synonym. The main reason is that you then lose the type synonym any time you define anything other than a composable proxy, which defeats all the other gains I just mentioned. Also, proxies lie along 5 categories, and special-casing the type synonyms for just one of those categories is a bad idea, especially when at least one of those other categories (the Kleisli category) is commonly used as well.

I also empirically experimented with both approaches in my own projects, and I can pretty confidently state that the type synonyms should never include the initial parameter. It does mean that your type signatures will be slightly longer, but it's worth it. Also, it makes it much easier for people less familiar with the pipes library to consume pipes utilities because they don't need to remember what the initial argument for any given type synonym is supposed to be.


API Consolidation


I went to a great deal of trouble to clean up the fragmentation that I began when I released Frames. Frames are gone and I've merged Pipe functionality into Proxys, so the library now only has three operations you ever need to use:
  • (>->)
  • request
  • respond
There is now just one way to compose things, something which a lot of users have requested.

I've deprecated the old Pipe API and included a "Control.Proxy.Pipe" module which helps users transition from Pipes to Proxys. This module not only provides the transition API but also has detailed instructions for how to upgrade your library while still keeping the niceties of the Pipes API.

In addition, I've consolidated all the tutorials into a single coherent tutorial in Control.Proxy.Tutorial with a single logical progression. The tutorial is long, but describes a great deal of the pipe idioms I've collected through my extensive usage of pipes in my own projects. More importantly, the tutorial collects all of the documentation for the library into a single location again so that users don't need to hunt over several tutorial modules or several blog posts of mine to learn about some key concept.

I also spent a lot of effort into guiding people towards a single natural coding style for pipes to make it easier for pipes users to read each other's code. The API exported from Control.Proxy has very little redundancy, and many old redundant API features (like the old await and yield) require you to pay an "import tax" to the tune of one extra import if you want to deviate from the new pipes "style guide". The tutorial also spends a lot of time talking about common idioms to encourage a uniform coding style.


Performance


The pipes library now includes rewrite rules that rewrite unoptimized user code into the equivalent hand-tuned code. Despite type-classing the entire standard library these rewrite rules fire very robustly without any assistance on your part. Just enjoy the extra performance. There are no performance regressions in this release and all the standard library utilities perform just as damn fast as they did in pipes-2.5, despite now being fully polymorphic and written using ordinary do notation.

Like always, "Control.Proxy.Prelude.Base" provides tons of examples of how to write idiomatic and high-performance pipes code, so feel free to consult that if you are learning how to write your own utilities.


Lightweight


I've dropped Frames, which means that I can now drop the index-core dependency. This means that the library only has one dependency: transformers-0.2. pipes is now the lightest dependency for any streaming library.

I will not add any extra dependencies to the main package and I will release all additional features in separate packages that build on top of pipes. I would like to see the core pipes package go as viral as possible. This is quite easy because in addition to having only a single dependency, pipes is:
  • Quick to compile
  • Safe Haskell
  • Light on extensions: Only KindSignatures and Rank2Types (for MFunctor and PFunctor)

Convenience imports


I added some extra imports from other modules to Control.Proxy, like lift and forever. I was pretty conservative in what I added and stuck to things that I considered essential for using pipes idiomatically.


hoist and lift


I renamed MFunctor's mapT to hoist. Sorry for the inconvenience, but mapT was a terrible name for several reasons and instead went with Edward's naming convention which I felt was more tasteful.

One of the really nice things to come out of pipes is a clean mechanism for mixing arbitrary monad transformer stacks and arbitrary proxy transformer stacks. It turns out that if two transformer stacks have the same base monad, then you can always interleave them in any way using the right combination of hoist and lift.

The exact same trick also applies to proxy transformer stacks. You can always interleave them using the right combination of hoistP (from the new PFunctor class) and liftP, which are just the higher-kinded versions of hoist and lift (and they also obey more laws, too).

The tutorial describes this trick in more detail and I find that this works phenomenally well and is very easy to reason about. For this reason I think MFunctor needs to be somewhere in the standard libraries as soon as possible because it is incredibly useful for this purpose.


Stability


I can state confidently that the pipes API can now handle any problem you throw at it and that the core is feature complete. The tutorial shows how you can combine the incredibly small pipes core into all the functionality you need. The only exception is prompt finalization, which I will describe in a separate upcoming library, but finalization similarly builds entirely on this core and does not add any new concepts or primitives.

There are only a few changes I foresee making to the main library in the future, which are:
  • Moving MFunctor to a better home
  • Adding a base proxy implementation with strictness annotations (never leaks, but half the performance of the fast implementation)
Note that this doesn't mean that pipes frequently leaks space. I've only found one case of leaking, which is the following code segment:
runProxy $ enumFromS 1
Everything else I have tried does not leak and I currently use pipes in long-running server applications of my own that also do not leak without any strictness annotations. However, if users do encounter leaks then I will go ahead and add the strict base implementation.

I will also be pretty conservative about expanding the core library because I want to stabilize it now. Almost all new functions will go in separate libraries now.


Conclusion


At this point I am very happy with the state of the main pipes library and I'm currently devoting my next efforts to standard libraries. The next libraries I will release will be (in this order):
  • Finalization support
  • bytestring / text support
  • Parsing extensions

12 comments:

  1. Nice. Is there any particular reason for wanting to have your own uninhabited type (C)? I think it'd be nice to see people consolidate on http://hackage.haskell.org/package/void for empty types; unless there's a real reason for distinguishing different notions of emptiness.

    ReplyDelete
    Replies
    1. There are three reasons I hand-roll my own empty type.

      First, the `void` package brings in several dependencies transitively by virtue of its `semigroups` dependency, such as `containers` and `deepseq` and `array`, and I was trying to minimize dependencies.

      Second, you don't gain anything from reusing the `Void` type. The classic reason to use Edward's `void` instead of rolling your own is so that you can reuse his absurd function and his variations on it, but you don't actually need any of those with `pipes`. You can verily cast blocked ends to polymorphic ends without using any tricks. See the implementation of `unitD` and `unitU` in `Control.Proxy.Prelude.Base`, neither of which require any of the fancy tricks from the `void` package.

      Finally, it's a shorter type name. This helps a lot when users get a type error or ask for an inferred type and the type synonym does not stick. It's just one less thing to intimidate users of the library.

      Delete
    2. *very easily

      Not "verily"; I don't want to come across as pompous :)

      Delete
    3. By using void you get to use "Void", which is good because it's a known name for the empty data type. "C" in undoubtedly shorter, but doesn't mean anything (I read it as "constructor" on my head).

      Delete
    4. Yes, but there is still the dependency issue, plus the fact that `void` is not in the Haskell platform yet. However, if it does get in the Haskell platform then I probably would reconsider and switch `C` to a type synonym around `Void` or maybe just get rid of `C` completely.

      Delete
  2. Props for the excellent tutorial. I'm finally seeing the beauty behind that inscrutable API.

    Will be using from now on :).

    ReplyDelete
  3. In your tutorial, withFile needs a second argument of ReadMode for the code to compile.

    ReplyDelete
    Replies
    1. Yeah. I fixed all the tutorial examples in the Github repository and the fixed versions will be included in the next minor release in January.

      Delete
  4. Hi, I am just reading the tutorial, and shouldn't
    type Pipe p a b = p () a () b
    really have been
    type Pipe p a b = p C a () b
    ?

    ReplyDelete
    Replies
    1. If you use `p C a () b`, then you can't compose two pipes. For example, if you had:

      p1 :: () -> p C a () b m r
      p2 :: () -> p C b () c m r

      ... then composition would not type-check because their connected interface would not match:

      p1 >-> p2 -- Type error

      This implies that Producers are NOT special cases of pipes. A producer can never await, but a pipe always can. This is actually a feature of the Proxy model over the original Pipe model. With classic pipes, there was no way to forbid awaits for producers.

      Delete
    2. Oh, thanks!

      Delete