Monday, May 6, 2013

pipes-3.3.0: Folds and uniting ListT with Proxy

pipes-3.3.0 simultaneously resolves two long-standing problems in the library:
  • Not all proxy transformers implemented ListT
  • Folds required using the base monad
It turns out that fixing ListT for the remaining hold-outs proved to solve the fold problem as well, and this post will detail that a bit more.


ListT


pipes-2.4 first identified the existence of three extra categories, two of which I call the "request" and "respond" categories. These categories are enormously useful, especially since you can implement ListT with both of them, but then I discovered that you couldn't implement the (/>/) and (\>\) composition operators for certain proxy transformers, specifically:
  • MaybeP
  • EitherP
  • StateP
  • WriterP
This was really disconcerting, and it seemed really odd that every proxy transformer could lift the identities for those two categories (i.e. request and respond), but not always lift the corresponding composition operators. I settled for a temporary solution, which was to split out (/>/) and (\>\) into a separate ListT type class.

However, several recent events made me suspect that something was amiss and caused me to revisit this solution. I received my first clue while working on the pipes-directory library, where I wanted to model getDirectoryContents using the following type:
getDirectoryContents
    :: (Proxy p)
    => FilePath -> ProduceT (ExceptionP p) SafeIO FilePath
This would let users bind directories non-deterministically in ProduceT so that they could describe effectful directory traversals at a high-level. This required pipes-safe so that the directory stream would be properly finalized in the event of exceptions of termination, which is why it uses ExceptionP and SafeIO.

However, ExceptionP is just a type synonym for EitherP, and EitherP did not implement the ListT type class, which meant that I could not use the ProduceT monad. So I revisited EitherP and discovered that there was a law-obiding ListT instance for EitherP that I had missed the first time around. Moreover, I could use the exact same trick to implement ListT for MaybeP, too.

This meant that only two proxy transformers remained which did not implement ListT:
  • StateP
  • WriterP
Moreover, WriterP was internally implemented using StateP under the hood, meaning that if I could solve StateP then I could finally merge the ListT class back into the Proxy class.

Simultaneously, while working on pipes-parse I encountered several buggy corner cases with StateP, all of which gave the wrong behavior. Similarly, WriterP also gave the wrong behavior in a wide variety of cases and this Stack Overflow question gives a great example of how useless WriterP was. This suggested that I had implemented both of those two proxy transformers incorrectly, since both of them gave the wrong behavior in many corner cases and both of them resisted a correct ListT implementation.

This observation led me to discover the correct solution: make StateP and WriterP share their effects globally across the pipeline, instead of locally. This fix solved both problems:
  • Both of them now implement List and obey the ListT laws
  • Both of them now give correct behavior in all corner cases
Consequently, I can now merge the ListT class into the Proxy class and reunite request and respond with their respective composition operators. Also, now all proxy transformers lift all four categories correctly.


Folds


The WriterP fix leads to a big improvement in the pipes folding API. Now you can do folds using WriterP within the pipeline and without using the base monad.

For example, if you want to fold all positive elements from upstream, you can now write:
somePipe = do
    -- The unitU discards values that 'toListD' reforwards
    xs <- execWriterK (takeWhileD (> 0) >-> toListD >-> unitU) ()
    respond xs
    ...
You can now access the result of folds within pipes! You no longer have to wait until the Session is complete to retrieve the folded data.

Also, since folds don't use the base monad you no longer need to hoist stages that you compose with a fold. For example, if you want to sum the first ten lines of user input, you can just write:
runProxy $ execWriterK $ readLnS >-> takeB_ 10 >-> sumD

Deprecation


I've also started to deprecate several parts of the API in preparation for an eventual pipes-4.0.0 release. These are the main things I deprecated:
  • The classic pipes API (i.e. await, yield, and (>+>))
  • raise functions (i.e. raise and raiseP)
  • K functions, like hoistK and liftK (exception: I keep the run...K functions)
  • Many bidirectional utilities from the pipes prelude and some upstream utilities
  • idT and coidT are renamed to pull and push
If you disagree with any of these deprecations, please let me know since I'm always open to suggestions to keep them or migrate them to a pipes-extras library.

I renamed idT and coidT because this allows for a nice convention where every category is named after its identity operator. Also, the new names are more suggestive of their behavior: idT begins by pulling information, while coidT first pushes information.

This rename becomes even more advantageous when you use the io-streams style I discussed in a previous post, but I will save the full explanation of why for later.


Modules


I've also tightened up the module hierarchy, which has gone down from 23 modules to 18, and will go down further to 16 when I remove the deprecated Control.Pipe and Control.Proxy.Pipe modules in pipes-4.0.0. Hopefully this makes the library a bit less intimidating to newcomers and easier to navigate.


Future Work


As always, I'm still working on pipes-parse. The big holdup is that I have been experimenting with more elegant solutions to pushback, mainly because I would like to implement many non-trivial features like nested sub-parsers. If worse comes to worse, I will just drop those advanced features and push the simpler version out the door in the next few weeks.

No comments:

Post a Comment