`pipes-parse`is finally out!

`pipes`users know that

`pipes`has lagged behind

`conduit`and

`io-streams`in the parsing arena and this library provides the utilities necessary to close the gap. You can find the

`pipes-parse`library here, and I recommend reading the tutorial. This post will mainly discuss the development of

`pipes-parse`and compare it to parsing solutions from other streaming libraries.

#### End of Input

`pipes-parse`copies both

`io-streams`and

`conduit`for the end of input protocol: wrap values in

`Just`and end with a stream of

`Nothing`s. There are two ways you can modify an input stream to obey this protocol.

The first approach is to use the

`wrap`function, which enforces this protocol:

wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m sThen you can just write:

wrap . producer >-> consumer

`wrap`proves its termination safety by having a polymorphic return value (because it ends with a never-ending stream of

`Nothing`s).

The second approach is to rewrite your producer as an input stream in the style of

`io-streams`(see this post for details):

source' :: (Monad m, Proxy p) => () -> Session p m (Maybe a)... and use

`request`composition to connect the producer:

source' \>\ consumerThis approach proves its termination safety by virtue of using

`request`composition. The composition operator specializes to:

(\>\) :: (Monad m, Proxy p) => (() -> Session p m (Maybe a)) -> (() -> Consumer p (Maybe a) m b ) -> (() -> Session p m b )The composite pipe's return value only derives from the downstream pipe (i.e.

`consumer`in this case). This is because

`request`composition is automatically safe against termination from the upstream pipe. In the above example,

`source'`just replaces every

`request`within

`consumer`and if

`source'`terminates all that means is that the

`request`completes.

What's nice is that both approaches are 100% compatible with each other. You, the pipe writer, do not need to anticipate which way users will supply input. You just write a pipe that consumes values of type

`Maybe a`and both of the above approaches will work with your pipe. Also, both of these approaches guarantee that you can return values directly from the downstream pipe without guarding the return value with a

`Maybe`.

#### Pushback and Leftovers

`pipes`implements pushback using the

`StateP`proxy transformer:

-- Like @request ()@, except try to use the leftovers buffer first draw :: (Monad m, Proxy p) => StateP [a] p () (Maybe a) y' y m (Maybe a) -- Push an element back onto the leftovers buffer unDraw :: (Monad m, Proxy p) => a -> StateP [a] p x' x y' y m ()This is a great example of how the proxy transformer system makes it easy to extend

`pipes`with new features without baking them into the core implementation. I can use the (newly-fixed)

`StateP`proxy transformer to add a leftovers buffer that

`draw`and

`unDraw`both use.

Pushback is where

`pipes-parse`

**significantly**improves on the competition. To motivate the

`pipes-parse`solution, consider the type for

`conduit`'s most general composition operator:

(>+>) :: Monad m => Pipe l a b r0 m r1 -> Pipe Void b c r1 m r2 -> Pipe l a c r0 m r2 -- ^ -- | -- +-- LeftoversThe downstream conduit cannot provide leftovers because they will be lost after composition. With

`pipes-parse`you can save leftovers from both composed pipes very easily. To see how, imagine we have the following two pipe types:

p1 :: (Monad m, Proxy p) => () -> Pipe (StateP [a] p) (Maybe a) (Maybe b) m r p2 :: (Monad m, Proxy p) => () -> Pipe (StateP [b] p) (Maybe b) (Maybe c) m r -- ^ -- | -- +-- LeftoversEach of these pipes stores a leftovers buffer equal to its input type, but we can't yet compose these pipes because their leftovers buffers don't match. However,

`pipes-parse`provides lens support in the form of the

`zoom`function so that you can easily unify two leftovers buffers in order to compose them:

zoom _fst . p1 :: (Monad m, Proxy p) => () -> Pipe (StateP ([a], [b]) p) (Maybe a) (Maybe b) m r zoom _snd . p2 :: (Monad m, Proxy p) => () -> Pipe (StateP ([a], [b]) p) (Maybe b) (Maybe c) m r zoom _fst . p1 >-> zoom _snd . p2 :: (Monad m, Proxy p) => () -> Pipe (StateP ([a], [b]) p) (Maybe a) (Maybe c) m rBut you can do more than that! You can still access the leftovers buffers afterwards, too, again using

`zoom`:

example = do (zoom _fst . p1 >-> zoom _snd . p2) () -- Draw, reusing the leftovers from @p1@ ma <- zoom _fst draw -- Retrieve the leftovers from @p2@ mb <- zoom _snd get ...This kind of multiple-buffer management isn't possible using

`conduit`.

`zoom`is a perfect example of the functor design pattern. We lift two existing proxies to agree on a common global state for compatibility purposes. Therefore, we expect that three should be functor laws at play:

zoom id = id zoom (f . g) = zoom f . zoom g

`pipes`also improves upon

`io-streams`pushback, too. With

`io-streams`all the push-back is done using

`IORef`s, meaning that:

- It isn't pure
- You can't easily control which streams share leftovers and which ones do not
- None of the state is reflected in the types

`pipes-parse`you get pure and precise control over leftovers. Moreover, you do not need to instrument streams to correctly forward values that you push back upstream, because

`StateP`abstracts over that for you.

#### Nesting and delimiting parsers

Like other streaming libraries,

`pipes-parse`makes it very easy to run a parser on a subset of the stream. This was probably the #1 feature requested, followed shortly by...

#### Resumable parsing

`pipes-parse`uses

`StateP`, so if you want to interrupt parsing you can just use

`runStateK`to return the current state of the leftovers for use in a later computation. Simple!

#### Perfect streaming

One of the more advanced features to come out of the last wave of development was what I like to call "perfect streaming". This has a very specific meaning: grouping the input and interacting with each group as a stream without bringing more than one chunk into memory.

For example, consider the following conduit:

lines :: Monad m => Conduit ByteString m ByteStringThis will load each line into memory, which means that if your file is one long line then you will load the entire file into memory, defeating the purpose of streaming!

`io-streams`has the same problem, but, unlike

`conduit`,

`io-streams`can easily fix its

`lines`utility to stream perfectly and I plan to show Greg how to do this so that

`io-streams`users can benefit from the same trick.

`pipes-parse`does not teach how to use this trick, but it does lay the groundwork for it and the upcoming

`pipes-bytestring`library will provide examples of this idiom. If you want to see a concrete example of this trick in action, check out Oliver Charles's upcoming

`pipes-tar`library on Github to see a preview of this idiom, where he streams individual files from a TAR archive without ever loading more than one chunk in memory. His very interesting use case was the inspiration for this trick, and I also preview this idiom in this Stack Overflow answer.

More generally, perfect streaming uses the

`respond`category's composition operator, which has the following general type:

(/>/) :: (Monad m, Proxy p) => (a -> p x' x b' b m a') -> (b -> p x' x c' c m b') -> (a -> p x' x c' c m a')When you use

`respond`composition, both pipes share the same upstream interface meaning that you can group the input into subsections but still allow each subsection to access the original upstream interface. With appropriate information hiding you can set up pipes which behave like lenses to specific subsections of the stream and allow the user to stream from each subsection independently.

#### Compatibility

`pipes-parse`takes great care to ensure that non-parsing pipes are completely interoperable with parsing pipes, thanks to the following compatibility functions:

fmapPull :: (Monad m, Proxy p) => (x -> p x a x b m r) -> (x -> p x (Maybe a) x (Maybe b) m r) returnPull :: (Monad m, Proxy p) => x -> p x a x (Maybe a) m r bindPull :: (Monad m, Proxy p) => (x -> p x a x (Maybe b) m r) -> (x -> p x (Maybe a) x (Maybe b) m r)These three functions define functors and monads in the category where the objects are the downstream components of each proxy interface and the morphisms are pull-based pipes.

As you might guess,

`fmapPull`satisfies the functor laws:

fmapPull f >-> fmapPull g = fmapPull (f >-> g) fmapPull pull = pullSimilarly,

`returnPull`and

`bindPull`satisfy the monad laws:

-- Using: f >>> g = f >-> bindPull g returnPull >>> f = f f >>> returnPull = f (f >>> g) >>> h = f >>> (g >>> h)... which are equivalent to:

returnPull >-> bindPull f = f bindPull returnPull = pull bindPull (f >-> bindPull g) = bindPull f >-> bindPull g... and we can derive the functor from the monad:

fmapPull f = bindPull (f >-> returnPull) -- i.e. fmap f = (>>= return . f)These functions could replace

`Maybe`and parametrize it with a type class like:

class FunctorPull f where fmapPull :: (Monad m, Proxy p) => (x -> p x a x b m r) -> (x -> p x (f a) x (f b) m r)... and there is a sensible instance for

`Either`, too (in fact, that's how

`rightD`from the

`pipes`prelude works). However, I decided to keep them monomorphic for now for simplicity.

#### Conclusion

`pipes-parse`, like most

`pipes`libraries, keeps the best spirit of Haskell programming by:

- composing features from smaller, simpler, and correct building blocks,
- using higher-order functions to lift existing functions for compatibility,
- isolating features from each other to statically prevent accidental complexity

`pipes-parse`is the last of the three core libraries, the other two being

`pipes-safe`and

`pipes-concurrency`. These libraries define the central idioms for the

`pipes`ecosystem and they were all designed to be instructive and convention-setting in areas where there isn't a perfectly elegant solution and some pragmatic trade-offs had to be made.

The completion of these libraries marks the point where I feel the core

`pipes`API has proven itself to be sufficiently versatile and future-proof. The proxy transformer system makes the central API unusually stable because I don't need to bake in any new features that I want to add.

This means I will be upgrading

`pipes`to version

`4.0`soon to mark the transition to a stabler API in preparation for eventual inclusion in the Haskell platform. Also, most development work will shift to derived libraries now.

That does not mean that the derived libraries are complete, yet. For example, I am currently writing up a

`pipes-safe-2.0.0`which will feature improved promptness guarantees and eliminate the need for unsafe finalization primitives. Similarly, I am about to release a

`pipes-safe-1.2.0`at the end of this week which will add broadcasts and continuous behaviors. More generally, I will only consider the derived core libraries to be mature when more code is built on top of them on the scale of what

`conduit`has right now.

The next library on the development docket is

`pipes-bytestring`. Now that

`pipes-parse`is complete I feel much more comfortable about the stability of

`pipes-bytestring`API. Also,

`pipes`now has an official mailing list where you can ask questions, follow

`pipes`development, or offer feedback and contribute to upcoming

`pipes`libraries.

## No comments:

## Post a Comment