Saturday, October 6, 2012

pipes-2.4: Proxy transformers, extra categories, utilities, and benchmarks

This release packs a LOT of new features, so I will begin with the most significant feature: proxy transformers. The proxy transformer pattern provides a very simple extension framework that cleanly solves many problems that iteratee library authors face.

Users of the library should read Control.Proxy.Trans.Tutorial, which explains how proxy transformers work. However, this post also provides a decent introduction to them, too.

Introduction


Wouldn't it be nice if you could catch and handle errors within a proxy? Now you can! It's as simple as:
import Control.Monad (forever)
import Control.Monad.Trans (lift)
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Safe (readMay)

promptInts :: () -> EitherP String Proxy C () () Int IO r
promptInts () = recover $ forever $ do
    str <- lift getLine
    case readMay str of
        Nothing -> E.throw "Could not parse an integer"
        Just n  -> liftP $ respond n

recover p =
    p `E.catch` (\str -> lift (putStrLn str) >> recover p)

main = runProxy $ runEitherK $ mapP printD <-< promptInts
>>> main
1<Enter>
1
Test<Enter>
Could not parse an integer
Apple<Enter>
Could not parse an integer
5<Enter>
5
The above program condenses many new features of this release into a nice compact example and I'll use it to show-case each feature.


Proxy transformers


The above program uses the EitherP proxy transformer. To access this feature, you just import the transformer you wish to use:
import Control.Proxy.Trans.Either as E
Control.Proxy imports all the remaining machinery you need.

EitherP extends any proxy-like type with the ability to throw and catch errors locally, as if it lived inside a native EitherT block. It does so in such a way that preserves composition (and the category laws!), so you can directly compose the result without unwrapping the EitherP.

When you are done composing, just use runEitherK to convert it back to the underlying proxy:
runEitherK
  :: (q -> EitherP e p a' a b' b m r )
  -> (q -> p a' a b' b m (Either e r))

Utilities


This release introduces the "proxy prelude", a set of convenience functions for users of the library. Control.Proxy automatically exports these and they don't clash with the Prelude or any common libraries.

Our old friend printer got a name-change and now goes by printD. This utility function prints all values bound downstream:
printD :: Show a => x -> Proxy x a x a IO r
I provide many more utility functions under the Control.Proxy.Prelude hierarchy, and people who enjoyed my functor design pattern post will also enjoy the abundance of cute trivial examples of the functor pattern in the documentation for Control.Proxy.Prelude.Base.


Proxy transformers are functors


However, this release includes a far more sophisticated set of functors: the proxy transformers themselves. Each proxy transformer implements the ProxyTrans class which defines two functions: mapP and liftP, related by the equation:
mapP = (liftP .)
mapP defines two separate functors.

The first functor behaves like a traditional monad transformer, converting the base Kleisli category to the extended Kleisli category:
mapP return = return

mapP (f >=> g) = mapP f >=> mapP g
You can write these laws using liftP to see that our proxy transformers behave like ordinary monad transformers:
liftP $ return x = return x

do x <- liftP m
   liftP $ f x
= liftP $ do x <- m
             f x
The above program uses this capacity of liftP to lift operations from the Proxy monad to the EitherP String Proxy monad.

The second functor lifts the base proxy composition to the extended proxy composition:
mapP idT = idT

mapP (p1 >-> p2) = mapP p1 >-> mapP p2
This latter functor lets you compose simpler proxies with extended proxies. The above program uses mapP in this capacity to promote printD for composition with promptInts:
mapP printD <-< promptInts
This demonstrates a concrete application of the functor design pattern, allowing seamless interoperability between proxies written to varying feature sets. The proxy transformers lift both the monad instance and the composition instance correctly so that simpler proxies play nicely with extended proxies.


Type signatures


The above program demos the new replacement for Void: C. This will shorten type signatures and also removes the dependency on void.

Also, now the Proxy type is a newtype around the underlying FreeT implementation. This gives nicer type errors when things go wrong.


Proxy Transformer Stacks


Just like monad transformers, you can stack proxy transformers to automatically combine their effects. By combining the StateP and EitherP proxy transformers, you can implement non-backtracking parsers for free:
{-# LANGUAGE GeneralizedNewtypeDeriving, OverloadedStrings #-}

import Control.Monad.Trans
import Control.Proxy
import Control.Proxy.Trans.Either as E
import Control.Proxy.Trans.State
import Data.Text as T hiding (take)

newtype ParseP p a' a b' b m r =
    ParseP { unParseP ::
        StateP Text (EitherP Text p) a' a b' b m r }
    deriving (Monad, MonadTrans, Channel)

instance ProxyTrans ParseP where
    liftP = ParseP . liftP . liftP

runParseK
  :: (q -> ParseP p a' a b' b m r)
  -> (q -> p a' a b' b m (Either Text (r, Text)))
runParseK = runEitherK . runStateK T.empty . (unParseP .)
The Channel type class defines proxy composition, so we can compose our parsing proxies seamlessly.

Let's write a few parsing primitives:
import Data.Monoid
import Data.Text.IO as T
import Prelude hiding (take)

take n = ParseP go where
    go = do
        s <- get
        if (T.length s < n)
        then do
            s' <- liftP $ liftP $ request ()
            put (s <> s')
            go
        else do
            let (h, t) = T.splitAt n s
            put t
            return h

parseFail str = ParseP $ liftP $ E.throw str

string str = do
    str' <- take (T.length str)
    if (str' == str)
    then return str
    else parseFail $
        "Expected: " <> str <> " -- Found: " <> str'
You wouldn't even know those were proxies if it were not for that single request statement.

Let's write a contrived parser based off of those primitives:
parser () = do
    string "Hello"
    str <- take 5
    lift $ T.putStrLn str
... and supply it with some input:
source () = do
    respond "Hell"
    respond "o, world!"
Now compose!
>>> runProxy $ runParserK $ parser <-< mapP source
, wor
Right ((),"ld!")
Let's see how failed parses turn out:
invalid () = do
    respond "A"
    respond "AAAAAAAA"
>>> runProxy $ runParseK $ parser <-< mapP invalid
Left "Expected: Hello -- Found: AAAAA"
I didn't include parsers in the library because I didn't want to add a bytestring or text dependency to the main pipes package. Instead, I will release the parsing extension as a separate library. This library will provide you with the streaming benefits of attoparsec with the ability to interleave effects.


Pushback


The above parsing example suggests my solution to push-back, which is to give each proxy its own local state using the StateP proxy transformer. You can then use the local state to keep track of unused input, as the above parsing example did.

Like all proxy transformers, this extension requires no special integration with the underlying proxy type and you can layer it anywhere within a proxy transformer stack with no special considerations.


Extra categories


The library now provides two additional categories for interacting with the Proxy type. These are term-rewriting categories (I believe the technical term is "sesquicategory", but I may be mistaken).

The first category's composition operator replaces all request statements within a Proxy with a suitably typed replacement:
f /</ g -- Replace all occurrences of 'request' in 'f' with 'g'
request is the identity of this category, so we expect that:
-- Replacing 'request' with 'request' changes nothing
f /</ request = f

-- Replacing 'request' with 'f' gives 'f'
request /</ f = f
Also, this substitution is associative:
(f /</ g) /</ h = f /</ (g /</ h)
Similarly, the respond command has its own substition operator, (\<\), and they form their own category:
f \<\ g  -- Replaces all 'respond's in 'g' with 'f'

f \<\ respond = f

respond \<\ f = f

(f \<\ g) \<\ h = f \<\ (g \<\ h)
Each category distributes in one direction over the Kleisli category:
-- Distributivity
r \<\ (f <=< g) = (r \<\ f) <=< (r \<\ g)

-- Zero
r \<\ return = return

-- Distributivity
(f <=< g) /</ r = (f /</ r) <=< (g /</ r)

-- Zero
return /</ r = return

Lifting request and respond


I originally envisioned that proxy transformers would also automatically lift request and respond statements. The laws for this lifting are quite simple:
mapP request = request

mapP respond = respond
In other words, the functor laws, applied to the identity of the two new categories I just introduced. However, unfortunately Haskell's type class system severely got in my way and I could not solve the issue before the release. I have a tentative plan for how to solve this using Edward's constraint package but it will take time. Until then, you will have to manually lift request and respond statements from the base Proxy type.

Overall, I was pretty disappointed with Haskell's type class system (more so than usual). This library really exercised it considerably and I even had to drop an additional proxy transformer because it was unimplementable due to the broken constraint system.


Performance


Raw proxies give performance comparable to conduit when doing simple IO:
import Control.Monad
import Control.Monad.Trans
import Control.Proxy hiding (await)
import Data.Conduit
import Data.Conduit.List as L
import Data.Maybe (fromJust) -- You did not see this

n = 100000 :: Int

-- Choose your poison
main = runProxy $ printD <-< enumFromToS 1 n
main = L.enumFromTo 1 n
    $$ forever (await >>= lift . print . fromJust)
Using pipes:
real    0m1.761s
user    0m0.384s
sys     0m0.712s
Using conduit:
real    0m1.528s
user    0m0.224s
sys     0m0.660s
Conduit is 15% faster.

The margin is substantially larger for entirely pure code:
import Control.Monad
import Control.Monad.Trans
import Control.Proxy hiding (await)
import Data.Conduit
import Data.Conduit.List as L

n = 100000 :: Int

main = runProxy $ discard <-< enumFromToS 1 n

discard' = do
    a <- await
    case a of
        Nothing -> return ()
        Just _  -> discard'

main = L.enumFromTo 1 n $$ discard'
Using pipes:
real    0m0.085s
user    0m0.088s
sys     0m0.000s
Using conduit:
real    0m0.011s
user    0m0.004s
sys     0m0.004s
Conduit is almost 8(!) times faster.

Conduit dramatically improves for entirely pure code since it bends the monad transformer laws to skip binds in the base monad. This is one reason that this pipes release type-classes all the Proxy operations. If people request that I copy conduit's approach, I will release a separate library that copies conduit's optional monad bind and have it implement all the same type-classes. Then all the proxy transformers are guaranteed to work transparently with it because they abstract completely over the type classes.

Additionally, I want to note that the pipes library currently has only one optimization PRAGMA in the entire library:
{-# INLINABLE mapK #-} -- An obscure utility function
... whereas conduit uses a considerable number of rewrite rules and INLINABLE statements. I don't know how much these contribute to conduit's speed, but I will copy Michael's optimizations in the next few releases and benchmark how much they contribute to performance.

Additionally, I've also benchmarked the overhead of proxy transformers. First, comparing performance for some trivial IO:
import Control.Monad
import Control.Monad.Trans
import Control.Proxy
import Control.Proxy.Trans.Writer
import Data.Monoid

n = 100000 :: Int

main = runProxy $ without <-< enumFromToS 1 n

main :: IO ((), Sum Int)
main = runProxy $ runWriterK $ with <-< mapP (enumFromToS 1 n)

with
 :: (Monoid w, Show a)
 => () -> WriterP w Proxy () a () C IO r
with () = forever $ do
    n <- liftP $ request ()
    lift $ print n

without :: (Show a) => () -> Proxy () a () C IO r
without () = forever $ do
    n <- request ()
    lift $ print n
Using the bind in the WriterP w Proxy monad (i.e. with):
real    0m1.739s
user    0m0.396s
sys     0m0.680s
Using the bind in the Proxy monad (i.e. without):
real    0m1.704s
user    0m0.368s
sys     0m0.668s
A difference of 2%(!).

Again, the difference widens if you switch to pure code:
import Control.Monad
import Control.Monad.Trans
import Control.Proxy
import Control.Proxy.Trans.Writer
import Data.Monoid

n = 100000 :: Int

main = runProxy $ without <-< enumFromToS 1 n

main :: IO ((), Sum Int)
main = runProxy $ runWriterK $ with <-< mapP (enumFromToS 1 n)

with
 :: (Monoid w, Show a)
 => () -> WriterP w Proxy () a () C IO r
with () = forever $ liftP $ request ()

without :: (Show a) => () -> Proxy () a () C IO r
without () = forever $ request ()
Using WriterP w Proxy's bind:
real    0m0.134s
user    0m0.124s
sys     0m0.008s
Using Proxy's bind:
real    0m0.084s
user    0m0.076s
sys     0m0.004s
Now it's about a factor of 2.

So I can summarize these benchmarks by saying that if you are doing even a little bit of IO, the performance differences are pretty small, and as I aggressively optimize the library, they should get even smaller.


Switch to free


Edward was kind enough to migrate my transformers-free functionality into his free package, so now pipes uses free for its free monad transformer dependency.


Resource management


I plan on releasing a Proxy-like type that implements resource management that will replace the Frame type. This type will include functions to promote existing Proxy code to this resource-managed version. Until then, you will have to manually manage resources by opening all file handles before composition, and closing them all afterwards, like so:
import Control.Proxy
import System.IO

main = do
    h <- openFile "test.txt" WriteMode
    runProxy $ hPrintD h <-< enumFromToS 1 10
    hClose h
... or you can use Michael's ResourceT in the base monad, if that is your thing.

You won't get the benefit of conserving handles, but you will still get predictable streaming performance.


Library writers


If you are considering building off the pipes library, I recommend implementing any functionality using the Proxy type, which I guarantee will be promotable to any future extensions, and I plan on personally writing several Proxy-based libraries over the next few months.

While I still preserve the Pipe type, I fully endorse the Proxy type as the type to standardize on as it has many more nice theoretical properties than the Pipe type and also supports greater functionality.


Conclusions


This release is very close to the final state I envisioned for the core pipes library. Most existing features won't disappear, with the exception of Control.Frame, which I will phase out once I release a suitable replacement in a separate library.

Most additional features that I plan on implementing will go into separate libraries that build on top of this one. I only plan on adding functionality to the core library if I discover additional interesting structure for the Proxy type.

5 comments:

  1. Seems strange to add a whole new kind of thing (proxy transformers) for pushback/errot handling since pushback can be done with raw proxies and error handling can be done in the base Monad.

    Also, you seem to imply that a ResourceT-like approach is not great with Proxy. Could you elaborate on that? It seems to work really well.

    ReplyDelete
    Replies
    1. Pushback and error handling don't work in the base monad. If you use the base monad to save unused input, it is shared across the entire chain and you get interference between components. Error handling in the base monad cannot be caught within a proxy since there is no way to access the base monad until you have run the session. Many people, myself included, tried exactly what you suggest, which is what motivated switching the order of the transformers.

      Delete
    2. Pushback wouldn't be in the base Monad, you just put a proxy upstream that accepts requests and pushbacks.

      I guess I can see what you mean about error handling in the base Monad. I would have used an Either as the response type, but I can see why the transformer can make that more clean.

      Delete
    3. The big advantage of Proxy transformers is the ability to transparently promote proxies with different feature sets to interoperate with each other. This matters a LOT to both library writers and extension writers (i.e. me).

      For example, I will shortly release a zlib library for proxies, and I can safely write that library to the smallest feature set that library needs, which is just the ordinary Proxy type. When I write that code, I need not concern myself with what extensions may be released in the future or which feature sets users prefer, because I know that I can always transparently promote my zlib proxies to work with any extended proxy types, both as a term in the extended type's monad (since liftP correctly lifts it to the extended monad) or as a term in a composition chain (since mapP correctly promotes it to the extended composition).

      This was a big reason that I had no released any libraries before now, because there was always the specter that users might demand some extension to the Proxy type which would invalidate all the existing library code that depended on the old type. Now I don't have that problem at all so I can write extensions without invalidating existing library code. More importantly, every library can write their library using the smallest feature set they actually need.

      That transparent interoperability is what distinguishes these proxy transformers from the solution you just suggested. You can't cleanly layer your protocols (both pushback or Either) transparently on top of existing proxies. Proxy library authors would have to rewrite all their existing code to standardize on that particular protocol you just defined if they wanted to work with the rest of the proxy ecosystem. On the other hand, with proxy transformers there is no "buy-in" to any feature set. Every library writer can use whichever feature set they prefer and users can always make different feature sets play nice with each other instead of forcing the proxy ecosystem to standardize on a particular monolithic feature set.

      Also, I forgot to answer your original question about ResourceT. My only qualm with it is that it's inelegant (because of the use of unique keys to identify resources), but inelegant is better than nothing.

      Delete
    4. Thans for the explanations. I agree that transformers make automatic promotion a bit nicer than inserting proxies to add/remove features (which my solution would have required), though at the expense of slightly more complex types :)

      Thanks for all your work!

      Delete