Haskell for all: pipes-2.4: Proxy transformers, extra categories, utilities, and benchmarks

This release packs a LOT of new features, so I will begin with the most significant feature: proxy transformers. The proxy transformer pattern provides a very simple extension framework that cleanly solves many problems that iteratee library authors face.

Users of the library should read Control.Proxy.Trans.Tutorial, which explains how proxy transformers work. However, this post also provides a decent introduction to them, too.

Introduction

Wouldn't it be nice if you could catch and handle errors within a proxy? Now you can! It's as simple as:

import Control.Monad (forever)
import Control.Monad.Trans (lift)
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Safe (readMay)

promptInts :: () -> EitherP String Proxy C () () Int IO r
promptInts () = recover $ forever $ do
    str <- lift getLine
    case readMay str of
        Nothing -> E.throw "Could not parse an integer"
        Just n  -> liftP $ respond n

recover p =
    p `E.catch` (\str -> lift (putStrLn str) >> recover p)

main = runProxy $ runEitherK $ mapP printD <-< promptInts

>>> main
1<Enter>
1
Test<Enter>
Could not parse an integer
Apple<Enter>
Could not parse an integer
5<Enter>
5

The above program condenses many new features of this release into a nice compact example and I'll use it to show-case each feature.

Proxy transformers

The above program uses the EitherP proxy transformer. To access this feature, you just import the transformer you wish to use:

import Control.Proxy.Trans.Either as E

Control.Proxy imports all the remaining machinery you need.

EitherP extends any proxy-like type with the ability to throw and catch errors locally, as if it lived inside a native EitherT block. It does so in such a way that preserves composition (and the category laws!), so you can directly compose the result without unwrapping the EitherP.

When you are done composing, just use runEitherK to convert it back to the underlying proxy:

runEitherK
  :: (q -> EitherP e p a' a b' b m r )
  -> (q -> p a' a b' b m (Either e r))

Utilities

This release introduces the "proxy prelude", a set of convenience functions for users of the library. Control.Proxy automatically exports these and they don't clash with the Prelude or any common libraries.

Our old friend printer got a name-change and now goes by printD. This utility function prints all values bound downstream:

printD :: Show a => x -> Proxy x a x a IO r

I provide many more utility functions under the Control.Proxy.Prelude hierarchy, and people who enjoyed my functor design pattern post will also enjoy the abundance of cute trivial examples of the functor pattern in the documentation for Control.Proxy.Prelude.Base.

Proxy transformers are functors

However, this release includes a far more sophisticated set of functors: the proxy transformers themselves. Each proxy transformer implements the ProxyTrans class which defines two functions: mapP and liftP, related by the equation:

mapP = (liftP .)

mapP defines two separate functors.

The first functor behaves like a traditional monad transformer, converting the base Kleisli category to the extended Kleisli category:

mapP return = return

mapP (f >=> g) = mapP f >=> mapP g

You can write these laws using liftP to see that our proxy transformers behave like ordinary monad transformers:

liftP $ return x = return x

do x <- liftP m
   liftP $ f x
= liftP $ do x <- m
             f x

The above program uses this capacity of liftP to lift operations from the Proxy monad to the EitherP String Proxy monad.

The second functor lifts the base proxy composition to the extended proxy composition:

mapP idT = idT

mapP (p1 >-> p2) = mapP p1 >-> mapP p2

This latter functor lets you compose simpler proxies with extended proxies. The above program uses mapP in this capacity to promote printD for composition with promptInts:

mapP printD <-< promptInts

This demonstrates a concrete application of the functor design pattern, allowing seamless interoperability between proxies written to varying feature sets. The proxy transformers lift both the monad instance and the composition instance correctly so that simpler proxies play nicely with extended proxies.

Type signatures

The above program demos the new replacement for Void: C. This will shorten type signatures and also removes the dependency on void.

Also, now the Proxy type is a newtype around the underlying FreeT implementation. This gives nicer type errors when things go wrong.

Proxy Transformer Stacks

Just like monad transformers, you can stack proxy transformers to automatically combine their effects. By combining the StateP and EitherP proxy transformers, you can implement non-backtracking parsers for free:

{-# LANGUAGE GeneralizedNewtypeDeriving, OverloadedStrings #-}

import Control.Monad.Trans
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Control.Proxy.Trans.State
import Data.Text as T hiding (take)

newtype ParseP p a' a b' b m r =
    ParseP { unParseP ::
        StateP Text (EitherP Text p) a' a b' b m r }
    deriving (Monad, MonadTrans, Channel)

instance ProxyTrans ParseP where
    liftP = ParseP . liftP . liftP

runParseK
  :: (q -> ParseP p a' a b' b m r)
  -> (q -> p a' a b' b m (Either Text (r, Text)))
runParseK = runEitherK . runStateK T.empty . (unParseP .)

The Channel type class defines proxy composition, so we can compose our parsing proxies seamlessly.

Let's write a few parsing primitives:

import Data.Monoid
import Data.Text.IO as T
import Prelude hiding (take)

take n = ParseP go where
    go = do
        s <- get
        if (T.length s < n)
        then do
            s' <- liftP $ liftP $ request ()
            put (s <> s')
            go
        else do
            let (h, t) = T.splitAt n s
            put t
            return h

parseFail str = ParseP $ liftP $ E.throw str

string str = do
    str' <- take (T.length str)
    if (str' == str)
    then return str
    else parseFail $
        "Expected: " <> str <> " -- Found: " <> str'

You wouldn't even know those were proxies if it were not for that single request statement.

Let's write a contrived parser based off of those primitives:

parser () = do
    string "Hello"
    str <- take 5
    lift $ T.putStrLn str

... and supply it with some input:

source () = do
    respond "Hell"
    respond "o, world!"

Now compose!

>>> runProxy $ runParserK $ parser <-< mapP source
, wor
Right ((),"ld!")

Let's see how failed parses turn out:

invalid () = do
    respond "A"
    respond "AAAAAAAA"

>>> runProxy $ runParseK $ parser <-< mapP invalid
Left "Expected: Hello -- Found: AAAAA"

I didn't include parsers in the library because I didn't want to add a bytestring or text dependency to the main pipes package. Instead, I will release the parsing extension as a separate library. This library will provide you with the streaming benefits of attoparsec with the ability to interleave effects.

Pushback

The above parsing example suggests my solution to push-back, which is to give each proxy its own local state using the StateP proxy transformer. You can then use the local state to keep track of unused input, as the above parsing example did.

Like all proxy transformers, this extension requires no special integration with the underlying proxy type and you can layer it anywhere within a proxy transformer stack with no special considerations.

Extra categories

The library now provides two additional categories for interacting with the Proxy type. These are term-rewriting categories (I believe the technical term is "sesquicategory", but I may be mistaken).

The first category's composition operator replaces all request statements within a Proxy with a suitably typed replacement:

f /</ g -- Replace all occurrences of 'request' in 'f' with 'g'

request is the identity of this category, so we expect that:

-- Replacing 'request' with 'request' changes nothing
f /</ request = f

-- Replacing 'request' with 'f' gives 'f'
request /</ f = f

Also, this substitution is associative:

(f /</ g) /</ h = f /</ (g /</ h)

Similarly, the respond command has its own substition operator, (\<\), and they form their own category:

f \<\ g  -- Replaces all 'respond's in 'g' with 'f'

f \<\ respond = f

respond \<\ f = f

(f \<\ g) \<\ h = f \<\ (g \<\ h)

Each category distributes in one direction over the Kleisli category:

-- Distributivity
r \<\ (f <=< g) = (r \<\ f) <=< (r \<\ g)

-- Zero
r \<\ return = return

-- Distributivity
(f <=< g) /</ r = (f /</ r) <=< (g /</ r)

-- Zero
return /</ r = return

Lifting `request` and `respond`

I originally envisioned that proxy transformers would also automatically lift request and respond statements. The laws for this lifting are quite simple:

mapP request = request

mapP respond = respond

In other words, the functor laws, applied to the identity of the two new categories I just introduced. However, unfortunately Haskell's type class system severely got in my way and I could not solve the issue before the release. I have a tentative plan for how to solve this using Edward's constraint package but it will take time. Until then, you will have to manually lift request and respond statements from the base Proxy type.

Overall, I was pretty disappointed with Haskell's type class system (more so than usual). This library really exercised it considerably and I even had to drop an additional proxy transformer because it was unimplementable due to the broken constraint system.

Performance

Raw proxies give performance comparable to conduit when doing simple IO:

import Control.Monad
import Control.Monad.Trans
import Control.Proxy hiding (await)
import Data.Conduit
import Data.Conduit.List as L
import Data.Maybe (fromJust) -- You did not see this

n = 100000 :: Int

-- Choose your poison
main = runProxy $ printD <-< enumFromToS 1 n
main = L.enumFromTo 1 n
    $$ forever (await >>= lift . print . fromJust)

Using pipes:

real    0m1.761s
user    0m0.384s
sys     0m0.712s

Using conduit:

real    0m1.528s
user    0m0.224s
sys     0m0.660s

Conduit is 15% faster.

The margin is substantially larger for entirely pure code:

import Control.Monad
import Control.Monad.Trans
import Control.Proxy hiding (await)
import Data.Conduit
import Data.Conduit.List as L

n = 100000 :: Int

main = runProxy $ discard <-< enumFromToS 1 n

discard' = do
    a <- await
    case a of
        Nothing -> return ()
        Just _  -> discard'

main = L.enumFromTo 1 n $$ discard'

Using pipes:

real    0m0.085s
user    0m0.088s
sys     0m0.000s

Using conduit:

real    0m0.011s
user    0m0.004s
sys     0m0.004s

Conduit is almost 8(!) times faster.

Conduit dramatically improves for entirely pure code since it bends the monad transformer laws to skip binds in the base monad. This is one reason that this pipes release type-classes all the Proxy operations. If people request that I copy conduit's approach, I will release a separate library that copies conduit's optional monad bind and have it implement all the same type-classes. Then all the proxy transformers are guaranteed to work transparently with it because they abstract completely over the type classes.

Additionally, I want to note that the pipes library currently has only one optimization PRAGMA in the entire library:

{-# INLINABLE mapK #-} -- An obscure utility function

... whereas conduit uses a considerable number of rewrite rules and INLINABLE statements. I don't know how much these contribute to conduit's speed, but I will copy Michael's optimizations in the next few releases and benchmark how much they contribute to performance.

Additionally, I've also benchmarked the overhead of proxy transformers. First, comparing performance for some trivial IO:

import Control.Monad
import Control.Monad.Trans
import Control.Proxy
import Control.Proxy.Trans.Writer
import Data.Monoid

n = 100000 :: Int

main = runProxy $ without <-< enumFromToS 1 n

main :: IO ((), Sum Int)
main = runProxy $ runWriterK $ with <-< mapP (enumFromToS 1 n)

with
 :: (Monoid w, Show a)
 => () -> WriterP w Proxy () a () C IO r
with () = forever $ do
    n <- liftP $ request ()
    lift $ print n

without :: (Show a) => () -> Proxy () a () C IO r
without () = forever $ do
    n <- request ()
    lift $ print n

Using the bind in the WriterP w Proxy monad (i.e. with):

real    0m1.739s
user    0m0.396s
sys     0m0.680s

Using the bind in the Proxy monad (i.e. without):

real    0m1.704s
user    0m0.368s
sys     0m0.668s

A difference of 2%(!).

Again, the difference widens if you switch to pure code:

import Control.Monad
import Control.Monad.Trans
import Control.Proxy
import Control.Proxy.Trans.Writer
import Data.Monoid

n = 100000 :: Int

main = runProxy $ without <-< enumFromToS 1 n

main :: IO ((), Sum Int)
main = runProxy $ runWriterK $ with <-< mapP (enumFromToS 1 n)

with
 :: (Monoid w, Show a)
 => () -> WriterP w Proxy () a () C IO r
with () = forever $ liftP $ request ()

without :: (Show a) => () -> Proxy () a () C IO r
without () = forever $ request ()

Using WriterP w Proxy's bind:

real    0m0.134s
user    0m0.124s
sys     0m0.008s

Using Proxy's bind:

real    0m0.084s
user    0m0.076s
sys     0m0.004s

Now it's about a factor of 2.

So I can summarize these benchmarks by saying that if you are doing even a little bit of IO, the performance differences are pretty small, and as I aggressively optimize the library, they should get even smaller.

Switch to `free`

Edward was kind enough to migrate my transformers-free functionality into his free package, so now pipes uses free for its free monad transformer dependency.

Resource management

I plan on releasing a Proxy-like type that implements resource management that will replace the Frame type. This type will include functions to promote existing Proxy code to this resource-managed version. Until then, you will have to manually manage resources by opening all file handles before composition, and closing them all afterwards, like so:

import Control.Proxy
import System.IO

main = do
    h <- openFile "test.txt" WriteMode
    runProxy $ hPrintD h <-< enumFromToS 1 10
    hClose h

... or you can use Michael's ResourceT in the base monad, if that is your thing.

You won't get the benefit of conserving handles, but you will still get predictable streaming performance.

Library writers

If you are considering building off the pipes library, I recommend implementing any functionality using the Proxy type, which I guarantee will be promotable to any future extensions, and I plan on personally writing several Proxy-based libraries over the next few months.

While I still preserve the Pipe type, I fully endorse the Proxy type as the type to standardize on as it has many more nice theoretical properties than the Pipe type and also supports greater functionality.

Conclusions

This release is very close to the final state I envisioned for the core pipes library. Most existing features won't disappear, with the exception of Control.Frame, which I will phase out once I release a suitable replacement in a separate library.

Most additional features that I plan on implementing will go into separate libraries that build on top of this one. I only plan on adding functionality to the core library if I discover additional interesting structure for the Proxy type.

5 comments:

SingpolymaOctober 7, 2012 at 12:29 PM
Seems strange to add a whole new kind of thing (proxy transformers) for pushback/errot handling since pushback can be done with raw proxies and error handling can be done in the base Monad.

Also, you seem to imply that a ResourceT-like approach is not great with Proxy. Could you elaborate on that? It seems to work really well.

Haskell for all

Saturday, October 6, 2012

pipes-2.4: Proxy transformers, extra categories, utilities, and benchmarks

Introduction

Proxy transformers

Utilities

Proxy transformers are functors

Type signatures

Proxy Transformer Stacks

Pushback

Extra categories

Lifting `request` and `respond`

Performance

Switch to `free`

Resource management

Library writers

Conclusions

5 comments:

Followers

Saturday, October 6, 2012

pipes-2.4: Proxy transformers, extra categories, utilities, and benchmarks

Introduction

Proxy transformers

Utilities

Proxy transformers are functors

Type signatures

Proxy Transformer Stacks

Pushback

Extra categories

Lifting request and respond

Performance

Switch to free

Resource management

Library writers

Conclusions

5 comments:

Followers

Lifting `request` and `respond`

Switch to `free`