Haskell for all: pipes-2.3 - Bidirectional pipes

Wednesday, September 5, 2012

pipes-2.3 - Bidirectional pipes

One thing I love about Blogger is the detailed traffic information it provides out of the box. I enjoy seeing what keywords direct people to my blog, and one particular search result came up a lot recently, namely bidirectional pipes. Every time I saw somebody searching for bidirectional pipes I would think to myself "You and me both!" since I've been wanting bidirectional pipes for quite some time now to implement features that users have been requesting.

Well, anonymous googlers, today is your day! I'm releasing pipes-2.3 which introduces a new bidirectional pipe type, which I call a Proxy and I've proven the category laws for Proxy composition.

This blog post is not a proper tutorial but rather a meta-discussion of this release. This post discusses context surrounding this release for people who follow iteratee development, so if you just want to see cool examples, then read the Proxy tutorial over at Control.Proxy.Tutorial.

Also, this post is not technically part of my category theory series that I'm writing, but it does fortuitously tie in to it. The Proxy type provides an elegant framework for composing reusable client/proxy/server primitives into powerful applications, so if you started following my blog because of my discussion about compositionality, then I recommend you read the Proxy tutorial.

Generalizing Pipes

The Proxy terminology is built on the client-server metaphor, and if you already understand Pipes the following translations will help you map your Pipe intuition onto Proxy terms:

-- Types
Pipe     -> Proxy

Producer -> Server
Consumer -> Client
Pipeline -> Session

-- commands
await    -> request
yield    -> respond

Clients resemble Consumers, except you replace await with request, which provides an argument to upstream:

myClient () = do
    ...
    answer <- request argument

Servers resemble Producers, except you replace yield with respond. Composition requires a parameter to pass in the first request:

--       +-- 1st request
--       |
--       v
myServer argument = do
    ...

... and every subsequent request is bound to the return value of respond:

myServer argument = do
    x <- computeSomething argument
    -- "respond" binds the next argument
    nextArgument <- respond x
    myServer nextArgument

-- or: myServer = computeSomething >=> respond >=> myServer

I provide the foreverK function which abstracts away this common recursion pattern:

-- i.e. forever 'K'leisli arrow
foreverK f = f >=> foreverK f

myServer = foreverK $ \argument -> do
    result <- computeSomething argument
    respond result

-- or: myServer = foreverK (computeSomething >=> respond)

That looks just like the way you'd write a server's loop: get some argument, compute some result, respond with the result. However, you can do significantly more sophisticated things than just loop.

A Proxy sits between servers and clients. It can query servers on its upstream interface, and respond to clients on its downstream interface:

      | Upstream  | Downstream |
      | interface | interface  |
Proxy   arg1 ret1    arg2 ret2   m r

As with Pipes, the intermediate Proxy type is the unifying compositional type which generalizes the endpoint types. Server and Client are just type synonyms around the Proxy type with one of its two ends closed.

You can then compose as many components as you please into a single Session using composition and then use runSession to convert the results back to the base monad:

runSession $ client <-< proxy_1 <-<  ... <-< proxy_n <-< server

In the following sections, I will motivate this upgrade to bidirectional pipes by providing some examples of trivial problems that have embarrassed the entire iteratee community (myself included) up until now.

Dumb sources

The simplest example is a file reader. Using any iteratee implementation out there, it is very awkward to specify how many bytes you wish to pull from the upstream source on a request-to-request basis. Most implementations either:

Hard-code the number of bytes delivered on each request (i.e. conduit/iterIO)
Initialize the source with a given buffer size and then fix it from that point onward (i.e. enumerator/iteratee)

Now, there's nothing wrong with hard-coding the size for the read from the file since typically there is an optimum buffer size for disk I/O, but you'd still like to be able to layer another component downstream that can then parcel that out into chunk sizes that the user actually wants.

Unfortunately, the gold standard solution (pushback) is unsatisfactory because it:

only solves this narrow use case and does not generalize,
cannot push back portions of input without imposing some sort of Monoid restriction on the iteratee type itself, and
requires that the user maintain certain invariants to prevent breaking the Category laws.

Wouldn't it be nice if we could just directly tell upstream what we wanted instead of playing all these games? Proxys let you do that through the argument you supply to request.

Remote-procedure call

The next example is interfacing with some server. This is a real-world example from my own work. I've written a protein structural search engine and I've set it up as an RPC service: protein structure goes in, a bunch of search results come out. I'd like to write a Pipes interface to this so I can stream the results coming out of the server, but unfortunately I can't. If I tried, I might do something like this:

searchEngine? :: Pipe Structure [Structure] IO r

I can't really accomplish this because Pipes only permit a unidirectional flow of information. I can't both provide the query and receive the results within the same component without resorting to brittle non-compositional tricks like IORefs that defeat the entire point of the iteratee abstraction. However, with Proxys, the solution is incredibly easy:

The input ---------+-------------------+          +- The results
                   |                   |          |
                   v                   v          v
searchEngine :: Structure -> Server Structure [Structure] IO r
searchEngine = foreverK $ \structure -> do
    -- "search" might send a network query to the actual server
    results <- lift $ search structure
    respond results

-- searchEngine = foreverK ((lift . search) >=> respond)

Note that this time the query and response occupy the same interface, rather than two opposing interfaces, so I can now hook up a Client to it that send in requests and receive responses within the same block of code.

No other iteratee implementation out there can accomplish this. Instead, they restrict us to using blind sources that don't know what downstream actually wants.

Closures

You can also implement imperative-style closures using Proxys. Simply define:

type Closure = Server

... and you are good to go! Consider the Python example from the Wikipedia article on closures:

def counter():
    x = 0
    def increment(y):
        nonlocal x
        x += y
        print(x)
    return increment

We can translate this directly into Proxys:

counter :: Int -> Closure Int () IO r
counter = counter' 0

counter' x y = do
    let x' = x + y
    lift $ print x'
    y' <- respond ()
    counter' x' y'

We can then consume the closure in a structured way using composition:

type Opening = Client -- The opposite of a closure?

useClosure :: () -> Opening Int () IO ()
useClosure () = mapM_ request [1, 7, 1, 1]

main = runSession $ useClosure <-< counter

... or we can manually peel off individual elements from the closure using runFreeT:

pop :: (Monad m)
 => a
 -> Closure a b m r
 -> m (Maybe (b, Closure a b m r))
pop y = do
    f <- runFreeT (counter y)
    case f of
        Pure          _  -> return   Nothing
        Free (Yield x c) -> return $ Just (x, c)

Proxy internals are all exposed without compromising any safety, so if you choose not to buy in to the whole composition framework you can always manually deconstruct Proxys by hand and go along your way.

Compositional message passing

As far as I can tell, this is the only bidirectional message passing framework that satisfies the category laws. This guarantees several nice properties:

The identity laws enforce that composition of components must be completely transparent.
The associativity law guarantees that each component can be written completely context-free.

Unlike most message passing frameworks, Proxys promote component decoupling by structuring message passing through typed interfaces and composing those interfaces to mix and match components. This promotes code reuse and makes it easy to encapsulate complete functionality into single black-box objects instead of exposing a bunch of initialization/push/pull/finalization routines that your user must worry about threading together correctly with every other component.

When you have compositional components, combining them together is as easy as snapping a bunch of legos together.

Extensions

Another motivation for this upgrade is finalization. With the ability to send information back upstream, I can now implement bidirectional finalization using ordinary monads and not indexed monads. This will replace Frames, which I will deprecate and either remove or migrate to a separate library.

Pipe compatibility

Pipes are a strict subset of Proxys so if you have existing Pipe code you can replace Control.Pipe with Control.Proxy which provides backwards-compatible definitions for all Pipe primitives and your previous code will still work.

You can understand the relationship between Pipes and Proxys by checking out the type synonym for Pipes provided by Control.Proxy:

type Pipe a b = Proxy () a () b

In other words, a Pipe is a Proxy that never sends any information upstream when it requests input.

There is another advantage of Proxys over Pipes, which is that now it is possible to forbid awaits. The Proxy implementation is highly symmetric and fills a lot of elegance holes that Pipes had.

However, if you love Pipes, never fear, because Control.Pipe will never be deprecated, ever. It provides the simplest iteratee API on Hackage, and I plan to continue to upgrade it with all features compatible with the Pipe type.

Kleisli arrow

One of the surprising results of the bidirectional implementation was that it unifies Kleisli composition and Proxy composition, whose arguments overlap. One thing you will discover the more you program with Proxys is that most useful Proxy components end up being Kleisli arrows and you'll often find that a lot of your code simplifies to the following point-free style:

-- Not that I necessarily recommend writing it this way
((p1 <=< p2 <=< p3) <-< (p4 <=< p5)) <=< (p6 <-< p7)

This isn't a coincidence. A very abstract way to understand Proxy composition is that it is just merging lists of Kleisli arrows in a structured way.

Conclusions

I know in the past I've stated that bidirectional information flow does not form a category, so now I'm publicly eating my own words.

There will be two more release in the next two months. The first release will provide the first general mechanism for extending Pipes with your own custom extensions and will include error handling and parsing extensions implemented using this approach.

The second release will provide a second way to customize pipes and will include finalization/reinitialization and stack traces implemented using that approach.

15 comments:

dmbarbourSeptember 6, 2012 at 4:18 PM
Excellent work. Bidirectional will make the package usable in many more domains, and the client-server metaphor will help people understand the explanation (so long as they aren't thinking of multi-client concurrency).

Regarding your comment about bidirectional with category laws: a paradigm I am developing, Reactive Demand Programming, achieves that (and generalized arrow laws). RDP is not message passing; it involves bidirectional data synchronization ('reactive' paradigm) and I elide events (such as messages) because they encourage a great deal of non-essential state. However, it is easy to model events with short-lived data.

ReplyDelete
Replies
Paul ChiusanoSeptember 9, 2012 at 5:30 PM
One limitation of this API is the server must always respond with a value of the same type, which loses out on some really obvious use cases that come up when a client conceptually needs to pull from multiple servers. Ed Kmett's (very recently released) machines package has a solution to this problem. What do you think of it, and could you see pipes going in a similar direction in the future?
ReplyDelete
Replies
Gabriella GonzalezSeptember 9, 2012 at 8:07 PM
Actually, you can interact with multiple servers (or multiple clients, or multiple proxies). You just nest a Proxy within a Proxy. The outer proxy interacts with one session while the inner proxy interacts with an entirely different session. This isn't just hypothetical as I do this in my own code all the time and it works really well for me. This is also the exact same approach that Oleg and John Millikin advocate for enumerator and I'll explain why I agree with them.

First off, this approach automatically works for interacting with multiple servers, clients, or proxies, whereas Edward's machines currently only support multiple servers. I assume that he'll write machines to handle the other two cases, but right off the bat you can see that even if he does then his approach leads to API complexity and overhead. Now you have three concepts instead of one, similar to the situation before pipes where people had three APIs for sources/sinks/transducers and three sets of semantics that the users had to keep track of. It would seem strange to liberate ourselves from that distinction only to reinvent it again.

Ok, but say that I ignore that and let's concern ourselves with just servers for now. I will draw an analogy between the two alternative approaches and simple functions. You can imagine that the proxy approach of layering two proxies to interact with multiple inputs is analogous to curried functions:

a -> b -> c

Why curried functions? Because I can partially apply a layered proxy by composing and running just the outermost session, leaving behind a proxy that interacts with just a single session. Edward's machines are analogous to uncurrying functions:

(a, b) -> c

... except that he only really sweetens the special case of two inputs. For more inputs you end up with something analogous to:

(((a, b), c), d) -> e

... in which case you aren't sweetening that much and just trading one type of layering for another one.

However, while I wouldn't use Edward's approach in my code, I still think it has a lot of potential for other purposes (such as possibly efficiency) and I can see how other people might like it for reasons of personal preference.

In principle, anybody can write machines that compile to proxies (just like Edward's machines currently compile to pipes). However, I wouldn't write it myself, only because I can't actively maintain and advocate for something I don't strongly believe in. Somebody who believes in machines strongly should be the one to implement that.
ReplyDelete
Replies
AnonymousSeptember 10, 2012 at 8:52 PM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment