Haskell for all: pipes-2.5: Faster and slimmer

Wednesday, October 31, 2012

pipes-2.5: Faster and slimmer

Introduction

I optimized the entire pipes library very aggressively for version 2.5, and now the library runs faster than conduit on my micro-benchmarks. I'll begin with the purest benchmark which gives the greatest difference in speed since it only measure the efficiency of each library's implementation without any IO bottlenecks:

import Control.Proxy
import Data.Conduit
import Data.Conduit.List as L

n = 1000000 :: Int

-- Pipes
main = runProxy $ discard <-< enumFromToS 1 n

-- Conduit
main = L.enumFromTo 1 n $$ L.sinkNull

Note some differences from last time. This time I'm using conduit's built-in optimized discard equivalent: sinkNull. Also, I've multiplied n by 10 to more accurately measure throughput. I compile both implementations with -O2.

pipes now spends about 112 ns per round-trip:

real    0m0.112s
user    0m0.104s
sys     0m0.004s

... while conduit spends about 167 ns per round-trip:

real    0m0.167s
user    0m0.156s
sys     0m0.008s

I achieved this speed increase by reverting to the pipes-1.0 trick of making the base monad optional, at the expense of breaking the monad transformer laws. I spent a considerable amount of effort trying to get the correct version to work, but I was led inexorably to the same conclusion that Michael already reached, which was that the original approach was best and that the gain in performance is worth bending the monad transformer laws.

Note that the above benchmark exaggerates the differences and is not indicative of real-world performance differences. For typical code you will not observe measurable differences between pipes and conduit when IO is the bottle-neck.

There is also one area in which conduit may still give (very slightly) better performance, which is in speeding up user-defined pipes. One goal I did not complete for this release was copying Michael's trick of using rewrite RULES to inline the Monad instance for user-defined pipes. I plan to copy this same trick in a separate release because I want to take the time to ensure that I can get the rewrite RULES to always fire without interfering with other optimizations.

Light-weight

The big focus of this release was to make pipes a very light-weight dependency, both in terms of performance and transitive dependencies. In the rewrite I dropped the free dependency so now the package only has two non-base dependencies:

transformers >= 0.2.0.0
index-core

... and I plan on dropping index-core along with Frames once I complete my resource management solution, leaving just a transformers dependency, which is about as light-weight as it could possibly get.

I also stole a page from Michael's book, by removing the -O2 flag from the pipes.cabal file. This flag no longer has any effect on performance after the rewrite, so you should see quicker compile times, making the pipes dependency even lighter.

There is still one other way I could make the pipes dependency even lighter, which is to remove the MFunctor class. The Control.MFunctor module requires Rank2Types, which might rule out pipes for projects that use non-GHC compilers, so if this is an issue for you, just let me know and I will try to migrate MFunctor to a separate library. Frames use a lot of extensions, but those will be on the way out, leaving behind just FlexibleContexts and KindSignatures, which are very mild extensions.

Resource management

I also wanted to use this update to point out that you can get deterministic resource management with pipes today if you use Michael's ResourceT in the base monad. So if you want to use pipes and all you care about is resource determinism then you can switch over already.

However, that alone will NOT give you prompt finalization and if you want promptness you will have to wait until I complete my own resource management extension. The extension I have in mind will be released as a proxy transformer that you can layer in any proxy transformer stack, so any proxy code you currently write can be transparently upgraded to work with resource management later on when I release the extension.

Another thing I want to mention is that while I will release the tools to manage resources promptly and deterministically, I do not plan on using these tools in the proxy standard libraries that I will release. The main reason for this is that:

There is no one true solution to finalization and I don't want people to have to buy in to my finalization approach to use the standard libraries I provide.
Most people I've talked to who care about finalization usually take the initiative write their own higher-level abstractions on top of whatever finalization primitives I provide them.

So my plan is that the standard libraries I will write will purely focus on the streaming part of the problem and leave initialization/finalization to the end user, which they can optionally implement using my resource management solution or whatever other approach they prefer (such as ResourceT, for example).

If you want to know what I personally use in my own projects at the moment, I just use the following pattern:

do withResource $ \h -> 
   runProxy $ ... <-< streamFromResource h

This gives "good enough" behavior for my purposes and out of all the finalization alternatives I've tried, it is by far the easiest one to understand and use.

The other reason I'm trying out this agnostic approach to finalization is due to discussion with Gregory Collins about his upcoming io-streams library, where he takes a very similar approach to the one I just described of leaving initialization/finalization to the end user to avoid cross-talk between abstractions and to emphasize handling the streaming aspect correctly.

Goals for the near future

I focused on improving the performance of pure code because I plan to release bytestring/text standard libraries and their corresponding parsing proxy transformers very soon, which demand exceptional pure performance. The goal of the upcoming proxy-based parsing libraries is not to beat attoparsec in speed (which I'm reasonably sure is impossible), but rather to:

Interleave parsing with with effects
Provide a low-memory streaming parser by allowing the user to selectively control backtracking
Still be really fast

The first two features are sorely missing from attoparsec, which can't interleave effects and always backtracks so that the file isn't cleared from memory until the parsing completes. For my own projects I need the second feature the most because I get a lot of requests to parse huge files (i.e. 20 GB) that do not fit in my computer's memory. More generally, I want pipes to be the fallback parser of choice for all problems that attoparsec does not solve.

13 comments:

Tom EOctober 31, 2012 at 11:31 PM
> I spent a considerable amount of effort trying to get the correct version to work, but I was led inexorably to the same conclusion that Michael already reached, which was that the original approach was best and that the gain in performance is worth bending the monad transformer laws.

Can you go into more detail about exactly what the difficulty is and what "law bending" had to take place to overcome it?
ReplyDelete
Replies
Richard WallaceNovember 1, 2012 at 9:22 AM
I'm also curious about the specifics of the monad transformer law bending. A link to read up on it would be great.

It sounds likes pipes and conduit are moving closer together in many ways, I'm curious what are the ways they are still different - the resource finalization being abstracted away is obviously one, and a good one - but I'm curious about where else they still differ.
ReplyDelete
Replies
UnknownNovember 8, 2012 at 7:05 PM
I agree that benchmarks which perform no IO are useful for comparing the overheads of libraries, but I often wonder how well they translate to real-world (i.e. IO-performing) code. In particular, I suspect GHC's optimizer does a much better job of unrolling/inlining/etc. pure code, so I wouldn't be surprised if there are some major differences in the generated core of IO-performing code and non-IO code. I've seen this in the past, but haven't checked recently.

Partly because of this, I think it's important to focus on the performance of IO-based code. This is especially true as one of the major benefits of these libraries, deterministic interleaving of effects, isn't relevant to non-IO code, so a user can often write traditional, lazy functions in those cases (or more likely, already has lazy functions that can be used directly).

I'm also not entirely sure what you mean when you state that other iteratee implementations can't implement arbitrary effects interleaved with parsing. In iteratee at least, an iteratee *is* a parser, and can be interleaved freely with other arbitrary effects. This has been true since the beginning. I'm probably misunderstanding your argument here.
ReplyDelete
Replies
David SargeantDecember 11, 2012 at 5:54 PM
Will you release an http client that uses pipes underneath?
ReplyDelete
Replies

Add comment