Monday, January 14, 2013

pipes-safe-1.0 - Resource management and exception handling for pipes

I promised in pipes-3.0 that I would release a resource management library for pipes and now I'm delivering on that promise. pipes-safe extends pipes with resource management and exception handling. Here are the big highlights that I want to emphasize:
  • You can now lazily manage resources and conserve handles
  • The library also adds exception handling, meaning that you can catch and resume from any exception within a pipeline
  • pipes-safe interoperates cleanly with existing unmanaged pipes code.
  • The central code is reasonably simple and many people should be able to read the source and reason about its safety
As always, check out the tutorial if you want to learn the library.


Lazy Initializatiom


Now you can package up allocation information with streaming resources, which simplifies their presentation. You don't have to say "run these allocation routines before the session to expose this resource, now stream from that resource, and then run these close routines afterwards".

This means that you can now just concatenate multiple resources and trust that they only open in response to demand and only one resource is open at any given time.


Prompt Finalization


There was one issue with finalization, which is that in order to guarantee safety I cannot always guarantee prompt finalization when composition terminates. I can only safely run dropped finalizers at the end of the Session. However, the library lets you trade safety for prompt finalization when you can prove that the prompter finalization safe.

In practice, this will not be an issue for most users since the dominant use case is a session that is just one linear chain like this:
session = p1 >-> p2 >-> ... >-> pn
In that case, the end of composition coincides with the end of the Session, so there is no delay in finalization. You only need concern yourself with it if you try to do fancier things, and the documentation explains how to safely use the prompt finalization primitives in those cases.

In the documentation for the prompt finalization primitives I outline the "pathological" case that foils all attempts to safely finalize things promptly. I'll repeat it here since it is very illuminating:
p1 >-> ((p2 >-> p3) >=> p4)
When p3 finalizes, we might naively expect that it finalizes p2 promptly, but not p1. After all, if we finalized p1, we might accidentally access the finalized resource if p4 were to request more input.

However, this intuition leads to a contradiction when we carefully select p2 to be idT and p4 to be return:
  p1 >-> ((idT >-> p3) >=> return)
= p1 >-> p3
In this scenario, if we don't finalize p1 when p3 terminates, then we are not being prompt! You don't even necessarily have to use idT. Setting p4 to return suffices to trigger the problem, thanks to associativity:
  p1 >-> ((p2 >-> p3) >=> return)
= p1 >-> (p2 >-> p3)
= (p1 >-> p2) >-> p3
= p12 >-> p3
Associativity guarantees that we can combine the two upstream pipes and treat them like a black box. Again, if p3 terminates, we would have to finalize p12 which contains p1. This contradicts our assumption that we could not finalize p1.

The old Frames implementation used indexed monads to avoid this problem because the result of composition had to end in a closed state. Therefore, when (p2 >-> p3) would terminate, it would end in the closed state and would consequently forbid p4 from requesting more input, thus guaranteeing that you could safely finalize p1 promptly.

This example demonstrates something that I had difficulty articulating up until recently: There is no meaningful way to distinguish between pipes that are "directly" composed (like p2 and p3) and "indirectly" composed (like p1 and p3). This foils any attempt to finalize things both promptly and safely.

Note that the second example applies to conduit, too, and I suspect that conduit has the same latent problem and cannot guarantee both prompt finalization and associativity. When I have more time I will dig back in to conduit's source and see if my intuition is correct.

Update and clarification: pipe-safe DOES promptly finalize if any bracketed block terminates normally or receives an exception. The finalizer is only delayed if another pipe composed with it terminates before the bracketed block completes.


Native exception handling


pipes-safe improves on conduit in one important way: You can catch and resume from exceptions in pipes code so that you can continue streaming where you left off. pipes-safe builds on the EitherP proxy transformer to integrate exception handling natively within proxies.

In fact, EitherP gave me the strongest motivation to complete this library. I felt that it would be a really big shame to be the only streaming library with an elegant error-handling framework but then not use it to handle exceptions.


Backwards Compatibility


Another way that pipes-safe improves on conduit is that the resource management system does not require any integration with the core Proxy type or with the standard libraries. It is a well-behaved member of the pipes ecosystem that requires no buy-in from other pipes libraries in order to interoperate with them.

I provide the try function, which upgrades "unmanaged" proxies to "managed" proxies. try is a "proxy morphism", meaning that the corresponding functor preserves all five of these categories:
  • The Kleisli category
  • The pull-based proxy composition category
  • The push-based proxy composition category
  • The "request" category
  • The "respond" category
This solution is a perfect example of practical category theory, specifically the functor design pattern. I don't have to require a rewrite of every existing proxy to take into account resource management. I instead just define a functor that automatically promotes unmanaged proxies to managed ones as if they had been written from the ground up with resource management in mind.

Code that doesn't need resource management just proceeds as before, blissfully unaware that there is such a thing as a pipes-safe library or exceptions or resource management. If it ever needs to be used in a safe context, try automatically promotes it to behave correctly, avoiding unnecessary code duplication.

My big objective when designing this library was that pipes-safe would require zero buy-in from the community and from the standard libraries. Fortunately, that's precisely the problem that functors solve by providing well-behaved compatibility layers. In this case, the try function provides that compatibility layer.


Simple Implementation


pipes-safe is very simple and has a clear implementation. In fact, I encourage you to read the source yourself if you want to reason about the safety of the library. The only non-trivial function is the internal registerK function, which serves a similar purpose to the resourcet library.

registerK saves pending finalizers from other proxies so they don't get lost if composition drops them. Unlike resourcet it uses an elegant zipper-like behavior to keep track of finalizers rather than a Map that requires globally unique IDs. This also means that it has perfect time complexity, being just O(1) for all finalization operations. In fact, you could actually implement it using just StateT in the base monad if it were not for exceptions. However, I had to use IORefs in order to ensure that the finalizer state survived exceptions so it is similar to resourcet in that regard.

pipes-safe does not use monad-control and it doesn't use any ad-hoc or unprincipled type classes. Instead it just reuses the Proxy class and the EitherP proxy transformer to do everything so that you don't have to learn any new concepts to understand how it works.


Conclusion


With pipes-safe complete, my next major targets are:
  • Native parsing for proxies with optional backtracking
  • Bytestring support
I actually already have the bytestring library up on GitHub, but I haven't released it yet. The reason is that I've been doing a lot of work recently on distinguishing between pipes as a bytestring (or builder) transport layer and as an ordinary session layer. The former is quite challenging to implement correctly, but it will be ultimately rewarding because it will allow people to control the properties of the stream without affecting the payload, and also allow people to stream irregular payloads instead of just list-like things.

I will elaborate more on this in a later post, but the point is that the direction of that work affects what proxies I include in the bytestring library and what proxies will go in a separate transport layer library and that's why I haven't published it yet.

1 comment: