Tuesday, July 31, 2012

Free monad transformers

I'm spinning off the Control.Monad.Trans.Free module from pipes into its own package: transformers-free. Some people requested this because they wanted to experiment with the FreeT type in their own code without making pipes a full-blown dependency.


Free monad transformers


Recently I've evangelized the use of free monads for building abstract syntax trees that let you abstract away the interpreter. This lets you seamlessly build custom domain-specific languages in Haskell.

However, sometimes we can't specify the syntax tree all at once. Often we want to interleave the syntax tree with some other monad to generate streaming or interactive computations. FreeT solves this problem by allowing us to mix building steps of the abstract syntax tree with calling actions in some base monad:
data FreeF f a x = Pure a | Free (f x)

newtype FreeT f m a = FreeT {
    runFreeT :: m (FreeF f a (FreeT f m a)) }
For example, let's say we want to write our own Python-style generator:
type Generator b m r = FreeT ((,) b) m r
In this case, our syntax tree is an ordinary list where we run the base monad to generate the next element. We can even duplicate Python syntax:
yield :: b -> Generator b m ()
yield b = liftF (b, ())
We can set the base monad to IO if we want to prompt the user to enter each subsequent element of the list:
prompt :: Generator String IO r
prompt = forever $ do
    lift $ putStrLn "Enter a string:"
    str <- lift getLine
    yield str
We can then demand the next element from our generator using runFreeT:
main = do
    x <- runFreeT prompt
    case x of
        Pure       _  -> return ()
        Free (str, _) -> putStrLn $ "User entered: " ++ str
This does not prompt the user for an infinite number of values. Instead, we only prompt the user to enter as many values as we demand from the generator, which was just one value in the above example. However, if we wanted to bleed our user dry, we could gratuitously demand the entire generator:
main = putStrLnAllTheThings prompt

putStrLnAllTheThings gen = do
    x <- runFreeT gen
    case x of
        Pure         _   -> return ()
        Free (str, gen') -> do
            putStrLn str
            putStrLnAllTheThings gen'
However, even that still streams the values and never retains them in memory.


Denotation


Sometimes we want to give our free monad a complete denotation, but our base functor does not cut it. The classic example is simulating the State monad using a free monad. We could try adding the following two terms to our base functor to act like a State monad:
data BaseF s x
  = Get (s -> x)
  | Put s x
  | Something x
  ...
... and we even can write the following State-like primitives:
get' :: Free (BaseF s) s
get' = liftF $ Get id

put' :: s -> Free (BaseF s) ()
put' s = liftF $ Put s ()
... but these simulated primitives do not necessarily obey the State monad equations such as:
put x >> put y = put y
get >>= put = return ()
However, we can instead outsource part of the denotation to the actual State monad. All we do is delete the Get and Put constructors from our base functor:
data BaseF x
  = Something x
  ...
... and instead just use State (or StateT m) as the base monad:
doSomething :: FreeT BaseF (State s) ()
doSomething = do
    x <- lift get
    lift $ put x
    something
The FreeT monad transformer is correct by construction, so we can use the monad transformer laws to equationally reason about our program to eliminate the dead code at the end of our function:
-- lift m >>= lift . f = lift (m >>= f)
doSomething = do
    lift $ do
        x <- get
        put x
    something

-- get >>= put = return ()
doSomething = do
    lift $ return ()
    something

-- lift (return x) = return x
doSomething = do
    return ()
    something

-- return () >> m = m
doSomething = something
FreeT offers us a nice way to selectively outsource our denotation to other monads whenever we need stronger equational guarantees.


free compatibility


I do not intend to use transformers-free to replace the free package for free monads that are not monad transformers. Even in my own code I still use free for ordinary free monads. I only provide the ordinary Free type to keep the transformers tradition of formulating the ordinary monad in terms of the corresponding monad transformer.

When designing the transformers-free library, I tried to adhere to the free package as closely as possible for naming conventions. I really only disagree with one naming convention Edward used, which is naming one of the constructors Free, for two reasons:
  • It shares the same name as the type, which is confusing since it's not the only constructor.
  • It does not share the same name as its smart constructor, wrap, which is confusing because Pure does share the same name as its smart constructor, pure.
So he probably should have named the constructor Wrap, but I decided to stick with his name and not buck convention.

I also structured the FreeT type so that the derived Free type would resemble Free from the free package as closely as possible. The only difference is that in transformers-free you have to use the runFree observation function first before you can pattern match on the constructors, but otherwise it's identical:
-- using the "free" package
f :: Free f r -> ...
f x = case x of
    Pure r -> ...
    Free w -> ...

-- using the "transformers-free" package
f :: Free f r -> ...
f x = case runFree x of
    Pure r -> ...
    Free w -> ...
I also found that this was the most natural way to write the FreeT type and the easiest to use, based on my experience using it within the pipes library. Of course, your own mileage might vary!

I haven't copied all the functions that the free package provides. I mainly omitted the recursion schemes because there are a few other recursion schemes that I was also considering:
foldFree1 :: (a -> r) -> (f r -> r) -> Free f a -> r
-- or
foldFree2 :: (FreeF f a r -> r) -> Free f a -> r
... and their FreeT equivalents, which use m r as the carrier of the algebra instead:
foldFreeT1 :: (a -> m r) -> (f (m r) -> m r) -> FreeT f m a -> r
-- or
foldFreeT2 :: (FreeF f a (m r) -> m r) -> FreeT f m a -> r
In the end, I chose not to include anything yet and leave it up to discussion, since it is easier to add API functions than remove them.


Performance


I also wrote up a codensity version implemented using the following type:
newtype FreeT {
    foldFreeT :: (a -> m r) -> (f (m r) -> m r) -> m r }
This gives large speed differences for code that never uses the base monad. I'm on vacation, so I don't have access to my work computer to reproduce the benchmarks, but I vaguely remember that they ranged from a best case improvement of roughly 50% faster for operations on the tail of the list to a worst case penalty of roughly 30% slower for operations on the head of the list. However, for real world use cases where you intermingle IO operations like simple print statements at each step then the performance differences drop to less than 5%.

The main reasons I haven't released the codensity version yet are:
  • If I release multiple implementations, I want to type-class them with appropriate laws.
  • Type-class signatures cannot be easily updated, so I want to solidify the API first.
Note: Some reddit readers may remember me remarking on the naive implementation giving linear time complexity for left-associative code, but this was a false alarm: the code was actually right-associative. Since do notation is right-associative, the time complexity difference between left-associative code doesn't arise frequently in practice, but one should always still keep it in mind, especially when writing code like:
do x <- longFreeMonad
   somethingElse

transformers compatibility


I'm proposing this library as a candidate for inclusion in the transformers package, so the documentation is structured very similarly.

I also provide a MonadIO instance for the type, something I was very reluctant to do since I consider MonadTrans to be sufficient for that purpose. However, I relented since:
  • I've gotten a lot of feedback in favor of a MonadIO instance for this type
  • If this ever does get accepted into transformers, it will get a MonadIO instance anyway.

Iteratees


The FreeT type (or its isomorphic codensity version) arises very frequently in iteratee libraries. For example, the Iteratee type from the enumerator package is isomorphic to:
data IterateeF a next
  = Continue (a -> next)
  | Error SomeException
  
Iteratee a m r ~ FreeT (IterateeF (Stream a)) (r, Stream a)
The Iteratee type from the iteratee package is isomorphic to:
data IterateeF a next
  = Await (a -> next) (Maybe SomeException)

Iteratee a m r ~ FreeT (IterateeF (Stream a)) (r, Stream a)
The Pipe type from pipes is:
data PipeF a b next
  = Await (a -> next)
  | Yield (b, next)

type Pipe a b m r = FreeT (PipeF a b) m r
And the Pipe type from conduit is (sort of) isomorphic to:
data ConduitF l i o u m next
  = HaveOutput (m ()) o next
  | NeedInput (i -> next) (u -> next)
  | LeftOver l next

Pipe l i o u m r ~ FreeT (ConduitF l i o u m) m r
I say "sort of" because conduit followed the pipes-1.0 approach that made the base monad optional, which only gives a correct monad transformer when viewed through the lens of runPipe. However, in principle, conduit with congruence laws implements something isomorphic to FreeT since conduit requires its users only use runPipe to inspect conduits.


pipes


The pipes-2.2 package now uses transformers-free to provide Control.Monad.Trans.Free meaning that Pipe now has a MonadIO instance, something which many people have requested before.

Also, if you install transformers-free on its own and have an existing pipes installation, make sure that you upgrade pipes to version 2.2 to avoid conflicting exports for Control.Monad.Trans.Free.

However, I have no plans yet to shift Control.IMonad.Trans.Free to an external package since I doubt other people will use it any time in the near future.


Conclusion


The transformers-free package provides the unifying theoretical abstraction for streaming or interactive programs: the free monad transformer. Once you learn the abstraction, you will start to see it everywhere, even in non-iteratee libraries.

Thursday, July 19, 2012

First-class modules without defaults

Recently Chris Doner proposed a first-class module approach which uses the Default type-class and then I revised his approach to not use type-classes at all, encoding all the information in dictionaries. I'm using this post to expand upon my variation of Chris's approach and show how one would translate his approach to my approach and explain what I believe are the advantages of this improvement.


Dictionaries


This approach builds off the classic encoding of a type-class as a dictionary. The trick is simple, first you convert a module to an equivalent type-class representing that module's interface, then you convert that type-class to a dictionary.

As an example, I'm going to start from Dan Burton's Modular Prelude and rework his ByteStringModule type to use my improvement and then show how they differ.

Dan's ByteStringModule looks like this:
data ByteStringModule = S
  { map :: (Word8 -> Word8) -> ByteString -> ByteString
  , concatMap :: (Word8 -> ByteString) -> ByteString -> ByteString
  , filter :: (Word8 -> Bool) -> ByteString -> ByteString
  , length :: ByteString -> Int
  , singleton :: Word8 -> ByteString
  , null :: ByteString -> Bool
  , pack :: [Word8] -> ByteString
  , unpack :: ByteString -> [Word8]
  , empty :: ByteString
  , readFile :: FilePath -> IO ByteString
  , writeFile :: FilePath -> ByteString -> IO ()
  , break :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
  , span :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
  , dropWhile :: (Word8 -> Bool) -> ByteString -> ByteString
  , takeWhile :: (Word8 -> Bool) -> ByteString -> ByteString
  , any :: (Word8 -> Bool) -> ByteString -> Bool
  , all :: (Word8 -> Bool) -> ByteString -> Bool
  , splitAt :: Int -> ByteString -> (ByteString, ByteString)
  }


_Data_ByteString_ :: ByteStringModule
_Data_ByteString_ = S
  { null = ...
    ... }

instance Default ByteStringModule where
  def = _Data_ByteString_
To use my approach, you instead parametrize the dictionary on the type of the "string-like" thing:
data StringModule s = String
  { map :: (Word8 -> Word8) -> s -> s
  , concatMap :: (Word8 -> s) -> s -> s
  , filter :: (Word8 -> Bool) -> s -> s
  , length :: s -> Int
  , singleton :: Word8 -> s
  , null :: s -> Bool
  , pack :: [Word8] -> s
  , unpack :: s -> [Word8]
  , empty :: s
  , readFile :: FilePath -> IO s
  , writeFile :: FilePath -> s -> IO ()
  , break :: (Word8 -> Bool) -> s -> (s, s)
  , span :: (Word8 -> Bool) -> s -> (s, s)
  , dropWhile :: (Word8 -> Bool) -> s -> s
  , takeWhile :: (Word8 -> Bool) -> s -> s
  , any :: (Word8 -> Bool) -> s -> Bool
  , all :: (Word8 -> Bool) -> s -> Bool
  , splitAt :: Int -> s -> (s, s)
  }
Using this approach you can encode String, ByteString and Text, all using the same dictionary type:
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import qualified Data.Text as T
import qualified Prelude as P

lazyByteString :: StringModule L.ByteString
lazyByteString = String {
    null = L.null,
    ... }

strictByteString :: StringModule S.ByteString
strictByteString = String {
    null = S.null,
    ... }

text :: StringModule T.Text
text = String {
    null = T.null,
    ... }

string :: StringModule P.String
string = String {
    null = P.null,
    ... }
Here I've parametrized the dictionary on the "string-like" type, so we can reuse the same dictionary type for all of them. This represents the dictionary equivalent of the following type-class:
class StringModule s where
    map :: (Word8 -> Word8) -> s -> s
    concatMap :: (Word8 -> s) -> s -> s
    filter :: (Word8 -> Bool) -> s -> s
    length :: s -> Int
    singleton :: Word8 -> s
    null :: s -> Bool
    pack :: [Word8] -> s
    unpack :: s -> [Word8]
    empty :: s
    readFile :: FilePath -> IO s
    writeFile :: FilePath -> s -> IO ()
    break :: (Word8 -> Bool) -> s -> (s, s)
    span :: (Word8 -> Bool) -> s -> (s, s)
    dropWhile :: (Word8 -> Bool) -> s -> s
    takeWhile :: (Word8 -> Bool) -> s -> s
    any :: (Word8 -> Bool) -> s -> Bool
    all :: (Word8 -> Bool) -> s -> Bool
    splitAt :: Int -> s -> (s, s)    

Comparison


This improvement has several advantages over Chris's original approach:
  • You don't have to define a new data structure for each "instance" of the module.
  • The shared dictionary type guarantees that all "instance"s expose the same signature
  • You can program generically over the StringModule "class"
  • No type-classes are required, so you can define multiple competing "instances" of the same module without conflicts.
  • You can qualify first-class modules instead of completely unpacking them.

Syntax


Now I'll show how you syntactically translate all the features of Chris's modules to my improved version. Let's assume we have some module Data.String, that exports the above four StringModule instances (i.e. lazyByteString, strictByteString, text, and string.

Using Chris's approach, you distinguish which module you wish to unpack into the current scope by choosing which constructor you unpack into:
zot :: (L.ByteString, S.ByteString)
zot = (a,b) where
  a = pack [1,2,3] where L{..} = def
  b = pack [1,2,3] where B{..} = def
Using the improved version, you distinguish which module you wish to unpack by explicitly selecting which dictionary you unpack:
zot = (a,b) where
  a = pack [1,2,3] where String{..} = lazyByteString
  b = pack [1,2,3] where String{..} = strictByteString
Hmmm, those are long names. Wouldn't it be nice if we could somehow alias them?
l = lazyByteString
s = strictByteString

zot = (a,b) where
  a = pack [1,2,3] where String{..} = l
  b = pack [1,2,3] where String{..} = s
Oh yeah! Dictionaries are first-class because they are ordinary Haskell values, so renaming them is easy. We can even locally alias modules, something which the ordinary module system cannot do:
zot = (a,b) where
  l = lazyByteString
  s = strictByteString
  a = pack [1,2,3] where String{..} = l
  b = pack [1,2,3] where String{..} = s
Also, using my improvement we can program generically over the string dictionary type:
zot :: StringModule s -> (s, s)
zot s = (a,b) where
  String{..} = s
  a = pack [1,2,3]
  b = pack [1,2,3]
This is just the dictionary version of a class constraint, where you pass the instance as an ordinary parameter. This is not possible using Chris's approach, which is one reason I am advocating this change.

Another advantage of this improvement is that you don't even need to unpack the module at all. You get qualification for free:
l = lazyByteString
s = strictByteString

zot = (a,b) where
  a = pack l [1,2,3]
  b = pack s [1,2,3]
In fact, if you alias first-class modules to single-letter names (as you might do with ordinary modules), then the syntactic overhead is identical to using ordinary modules: 2 extra characters, except as a suffix instead of a prefix.

Using Chris's approach, you would have to use the ordinary module system to qualify which pack you meant:
zot s = (a, b) where
  a = L.pack def [1,2,3]
  b = S.pack def [1,2,3]
So using his approach there is actually more syntactic overhead for qualifying imports, plus you must rely on the ordinary module system to namespace qualified imports.

Also, using his approach there is no good way to nest a module within a module and still qualify a nested import, since you can't program generically over the outer module. So there would be no good way to do something like this:
data OuterModule a = Out { outVal :: a }
data InnerModule a = In  {  inVal :: a }

dict :: OuterModule (InnerModule String)
dict = Out (In "Hello, world!")

contrived :: String
contrived = inVal (outVal dict) -- nested qualified import


Common features


No matter which approach you like, there are several cool features that both approaches share. For example, you can unpack unqualified names into the top-level global namespace. For Chris's approach you would insert the following top-level declaration:
L {..} = def
... and for my variation you would use:
String {..} = lazyByteString
Also, I will argue (to the death!) that both approaches are superior to type-class-based approaches. While I believe type-classes are okay for theoretically-grounded constructs (like Monad or Category), I believe that the dictionary approach is superior for banal interfaces like ListLike/ListModule or StringLike/StringModule since it is completely first-class.


Conclusions


This post is NOT intended to rip on Chris, but simply to improve on his original proposal. He had the two brilliant ideas of both using "type-classes" as modules and using the RecordWildCards extension to unpack names unqualified. I think the only mistake he made was unnecessarily filtering everything through the `Default` type-class and I only want to say that I think the pure dictionary approach is a strict improvement on his otherwise already brilliant idea.

Wednesday, July 18, 2012

Purify code using free monads

Experienced Haskell programmers commonly advise newcomers to keep as much of their program as pure as possible. Purity confers many practical advantages:
  • You can formally prove things about your code
  • Barring that, you can reason easily about your code
  • If all else fails, you can QuickCheck your code
To demonstrate this, I'll use the following simple echo program:
import System.Exit

main = do x <- getLine
          putStrLn x
          exitSuccess
          putStrLn "Finished"
The above program only makes one mistake: It mixes business logic with side-effects. Now, there's nothing wrong with that and I program like that all the time for simple programs that I can fit in my head. However, I hope to interest you in all the cool things you can do by factoring out the side-effects from the program logic.


Free monads


First, we must purify our program, and we can always factor out impure parts from any code by using free monads. Free monads let you decompose any impure program into a pure representation of its behavior and a minimal impure interpreter:
import Control.Monad.Free
import System.Exit hiding (ExitSuccess)

data TeletypeF x
  = PutStrLn String x
  | GetLine (String -> x)
  | ExitSuccess

instance Functor TeletypeF where
    fmap f (PutStrLn str x) = PutStrLn str (f x)
    fmap f (GetLine      k) = GetLine (f . k)
    fmap f  ExitSuccess     = ExitSuccess

type Teletype = Free TeletypeF

putStrLn' :: String -> Teletype ()
putStrLn' str = liftF $ PutStrLn str ()

getLine' :: Teletype String
getLine' = liftF $ GetLine id

exitSuccess' :: Teletype r
exitSuccess' = liftF ExitSuccess

run :: Teletype r -> IO r
run (Pure r) = return r
run (Free (PutStrLn str t)) = putStrLn str >>  run t
run (Free (GetLine  f    )) = getLine      >>= run . f
run (Free  ExitSuccess    ) = exitSuccess

echo :: Teletype ()
echo = do str <- getLine'
          putStrLn' str
          exitSuccess'
          putStrLn' "Finished"

main = run echo
The above rewrite concentrates all of the impure code within the run function. I like to call this process "purifying the code" because we distill out all of the program's logic into pure code leaving behind only the bare minimum impure residue in our run interpreter.


Proofs


Now, it seems like we didn't gain much from purifying our code and we paid a price in code size. However, now we can prove things about our code using equational reasoning.

For example, everybody reading this probably noticed the obvious bug in the original echo program:
import System.Exit

main = do x <- getLine
          putStrLn x
          exitSuccess
          putStrLn "Finished" <-- oops!
The last command never executes ... or does it? How would we prove that?

Actually, we can't prove that because it's not true. The author of the System.Exit module could simply change the definition of exitSuccess to:
exitSuccess :: IO ()
exitSuccess = return ()
The above program would still type-check, but now it would also successfully print Finished to the console.

However, for our purified version, we can prove that any command after exitSuccess' never executes:
exitSuccess' >> m = exitSuccess'
We do so using "equational reasoning", which is the most important benefit of purity. "Equational reasoning" means that we can take any expression and just substitute in the function definitions of the components. Each step of the following proof uses a comment to indicate the specific function definition used to reach the next step:
exitSuccess' >> m

-- exitSuccess' = liftF ExitSuccess
= liftF ExitSuccess >> m

-- m >> m' = m >>= \_ -> m'
= liftF ExitSuccess >>= \_ -> m

-- liftF f = Free (fmap Pure f)
= Free (fmap Pure ExitSuccess) >>= \_ -> m

-- fmap f ExitSuccess = ExitSuccess
= Free ExitSuccess >>= \_ -> m

-- Free m >>= f = Free (fmap (>>= f) m)
= Free (fmap (>>= \_ -> m) ExitSuccess)

-- fmap f ExitSuccess = ExitSuccess
= Free ExitSuccess

-- fmap f ExitSuccess = ExitSuccess
= Free (fmap Pure ExitSuccess)

-- liftF f = Free (fmap Pure f)
= liftF ExitSuccess

-- exitSuccess' = liftF ExitSuccess
= exitSuccess'
Notice how in the last steps we reversed the equality and worked backwards from the function definition to the defined expression. This is perfectly legitimate because thanks to purity the equals sign in Haskell doesn't mean assignment: it actually means equality! This means we can translate both ways across an equals sign when equationally reasoning about code. That's pretty amazing!

Equally important, the above proof is true no matter how the free monad is interpreted. We could swap out the run function for any other interpreter, including a pure interpreter, and the equation still holds:
exitSuccess' >> m = exitSuccess'
This means that we could safely use this as a GHC rewrite rule for an optimization pass over our program:
{-# RULES
  "exit" forall m. exitSuccess' >> m = exitSuccess'
    #-}
...and we can guarantee that the rule is always safe to apply and will never be broken.


Reasoning


We'd like to prove that our program always outputs the same string it received as input. Unfortunately, we can't prove this because it's not true. The maintainer of the putStrLn function could always have a change of heart and redefine it to:
putStrLn str = return () -- worst maintainer, ever
In fact, we can't even prove this is true for our free monad version, either. run uses the same putStrLn, so it would suffer from the same bug. However, we can still prove things about the free monad itself by instead studying it through a pure interpreter:
runPure :: Teletype r -> [String] -> [String]
runPure (Pure r)                  xs  = []
runPure (Free (PutStrLn y  t))    xs  = y:runPure t xs
runPure (Free (GetLine     k))    []  = []
runPure (Free (GetLine     k)) (x:xs) = runPure (k x) xs
runPure (Free  ExitSuccess   )    xs  = []
Now, we can prove that:
runPure echo = take 1
... based off of the Haskell98 Prelude's take. I leave this one as an exercise for the reader, because this post is already pretty long.

Notice that by examining echo through a pure lens, we caught a small corner case we didn't originally anticipate: The user might not enter any input! In that scenario our interpreter must return an empty list, just like take. Equational reasoning forces us to be rigorous in a way that simply saying "our program always outputs the same string it received" can never be. The more you work through these kinds of equations, the more you improve your ability to reason about your own code and catch mistakes in your own assumptions.

Equally important, equational transformations let you view your program in a whole new light. We saw our program was a glorified take 1 when filtered through runPure, meaning we could leverage our intuition about take for understanding our program and catching that corner case.

Once you factor your code through the Free monad and prove its soundness it acts like a trusted kernel and then you need only maintain the interpreter from that point forward. So while we couldn't quite prove the entire program was correct, we were able to prove that everything except the interpreter is correct. Equally important, we reduced the interpreter to the absolute minimal attack surface possible so that we can fit it in our head and maintain it by hand.


Testing


We can't prove anything about code in the IO monad. How would we even do that? We could say something like: "If you compile this and run the program and type some string into the prompt, you will get the same string back", but that's not really an equation, so there's nothing rigorous about it. However, we could write it as a test for our program.

Unfortunately, tests for impure code don't really scale to large and complicated programs and in test-dependent programming languages writing these tests consumes a significant fraction of the programmer's time.

Fortunately, though, we can easily exercise pure code with the QuickCheck library, which pathologically scours pure assertions for a violation in a completely automated fashion.

For example, let's assume that you were too lazy to prove that runPure echo = take 1. We can instead ask QuickCheck to test that proposition for us:
>>> import Test.QuickCheck
>>> quickCheck (\xs -> runPure echo xs == take 1 xs)
+++ OK, passed 100 tests.
How cool is that! You can test your code much more aggressively if you keep it as pure as possible.


Conclusions


Equational reasoning only works for pure code, so the pure component of your code base serves as a trusted kernel for which you can:
  • prove soundness,
  • reason about behavior, and
  • aggressively test.
This is why we always strive to maximize the pure portions of our code bases and minimize the impure parts.

Fortunately, the Free monad guarantees that you can always easily achieve the absolute maximal level of purity possible and the absolute minimal amount of impure code. This is why every experienced Haskell programmer should be fluent in free monads.

Wednesday, July 11, 2012

Breaking from a loop

This post describes how to break from a code block by using EitherT/MaybeT instead of ContT. This technique isn't new, and has already been described at least once before here. However, there is still some weird culture of teaching ContT for exiting from loops, which is incredibly over-kill and bad practice because it makes beginners think it's complicated when it's not.


The Trick


Exiting from a code block is ridiculously simple:
import Control.Error -- from the 'errors' package
import Control.Monad
import Control.Monad.Trans

exit = left -- or rename 'exit' to 'break' if you prefer

main = runEitherT $ forever $ do
    str <- lift getLine
    when (str == "exit") $ exit ()
I find this significantly easier to understand than the equivalent ContT version. Normally when you use Either/EitherT, you terminate on the first Left you encounter. All that exit does is return its argument wrapped in a Left, halting the surrounding EitherT block.

You can even use the value returned from the block, which will be a Left if the block exited with a exit statement, or a Right if the block terminated normally:
main = do
    e <- runEitherT $ forever $ do
        str <- lift getLine
        when (str == "exit") $ exit ()
    case e of
        Left  a -> ... -- Loop terminated with an 'exit'
        Right b -> ... -- Loop terminated normally
This approach is incredibly simple and lets you distinguish how the loop terminates. If you don't care about distinguishing the two code paths and they return the same result type, then just use:
main = do
    r <- fmap (either id id) $ runEitherT $ ...
    ...
... which also works if the loop is non-terminating.

If you want to exit without returning any value, just use MaybeT instead:
exit = mzero

main = runMaybeT $ forever $ do
    str <- lift getLine
    when (str == "exit") exit
Or you could stick with EitherT and just use exit (), as the first example did, since Either () r is isomorphic to Maybe r.


EitherT vs. ErrorT


You might wonder why people historically teach ContT instead of EitherT for exiting from loops. I can only guess that they did this because of the terrible EitherT implementation in the transformers package that goes by the name ErrorT. The ErrorT implementation makes two major mistakes:
  • It constrains the Left value with the Error class.
  • The name mistakenly misleads users to believe that the Left value is only useful for returning errors.
The first mistake prevents you from exiting with any old value unless you first make it an instance of the Error class. The second mistake insidiously misleads new Haskell programmers to miss the opportunity to exit from EitherT blocks using ordinary non-exceptional values and then they get led astray by people peddling ContT.

The above examples worked because they didn't use ErrorT at all and instead used the superior implementation in the either package which doesn't constrain the behavior of the Left value, either practically or semantically. This is why you should always give data types neutral names to encourage people to use them in ways you didn't anticipate.


Nested blocks


You might think you need ContT if you want to do anything more complicated such as multiple levels of nesting and exiting. However, if you thought so you'd be wrong! Nesting multiple EitherT blocks works perfectly fine:
-- I'm too lazy to add type signatures for this
{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Error
import Control.Monad
import Control.Monad.Trans

exit = left

main =
    runEitherT $ forM_ [1..3] $ \i ->
        runEitherT $ forM_ [4..6] $ \j -> do
            when (j == 6) $ liftInner $ exit ()
            liftIO $ print (i, j)
  where liftInner = id
        liftOuter = lift
        liftIO    = lift . lift
Let's check it out:
$ ./break
(1,4)
(1,5)
(2,4)
(2,5)
(3,4)
(3,5)
I can choose to break out of two nested levels by replacing liftInner with liftOuter, which just lifts the exit call to the surrounding EitherT block:
            ...
            when (j == 6) $ liftOuter $ exit ()
            ...
$ ./break
(1, 4)
(1, 5)
Nice! Mainstream languages require extended syntax to let you break out of multiple nested loops. Haskell does it using ordinary functions. I really can't convey how amazing that is without being entirely unprofessional.

Sunday, July 8, 2012

errors-1.0: Simplified error handling

Update


I just released a quick patch changing a dependency from the EitherT package to the Either package. What this means is that if you already installed version 1.0 and upgrade to 1.1, you will have two packages exporting conflicting modules for Control.Monad.Trans.Either. The fix is very simple. Just type:
ghc-pkg hide EitherT
Which will hide the EitherT package so it does not conflict.

I tried to make this fix as rapidly as possible to mitigate the damage. I apologize, but somebody notified me that either is a higher quality dependency for the EitherT type.


Introduction


This post marks the release of errors-1.0, a package designed to fill all the gaps in type-safe error handling. I target this package specifically towards Haskell programmers that prefer to use Maybe and Either for type-safe exception handling, yet get frustrated by thousands of paper-cuts every time some convenient function is missing.

Now, all those pain points are gone, and instead of an import list that looks something like this:
import Control.Monad.Trans.Either
import Data.Maybe
import Safe
...
You now can just import:
import Control.Error


Utility Functions


Control.Error re-exports Control.Error.Util, which provides some very useful utility functions:
hush       :: Either a b -> Maybe b
note       :: a -> Maybe b -> Either a b
hushT      :: Monad m => EitherT a m b -> MaybeT m b
noteT      :: Monad m => a -> MaybeT m b -> EitherT a m b

liftMaybe  :: Monad m => Maybe b    -> MaybeT m b
liftEither :: Monad m => Either a b -> EitherT a m b
I'm pretty sure every experienced Haskell programmer has desired either the hush or note functions at some point and ended up either in-lining it into their code or defining it in some sort of Util module of their project.

It's quite frustrating, actually. We frequently encounter code that uses a mixture of Maybe or Either and want to combine them both within the same monad, but when we search using Hoogle or Hayoo for a function like hush, we get just one result:
Precis.Utils.ControlOperators.suppress :: Either e a -> Maybe a
This exemplifies one such utility function that the precis package wrote to scratch its own itch. Who would want to make the precis package a dependency for such a trivial function, especially when the precis package has nothing to do with error-handling?

I also couldn't even find an equivalent to the note function, although perhaps that's just because my Hoogle-fu isn't strong enough. However, I find it surprising that such a useful conversion function does not reside in some standard location on Hackage. Well, now it officially resides in the errors package.


Teaching


Also, most Haskell aficionados know that liftMaybe and liftEither are easy to write:
liftMaybe  = MaybeT  . return
liftEither = EitherT . return
However, many beginners to Haskell don't know how to lift a Maybe to a MaybeT and would benefit from the above function. Imagine a beginner Hoogling for:
(Monad m) => Maybe a -> MaybeT m a
... and instead of getting a bunch of almost-correct matches, they get an exact match and they can click the link to the liftMaybe function, consult the source code and have an "AHA!" moment where they learn a bit more about how monad transformers work. I'll wager there are many Haskell beginners right now that give up on using Maybes within MaybeT because they don't even know it's possible to do so. Well, now they do.


Social Confirmation


I designed the errors library to encourage type-safe error-handling style in Haskell, showing beginners that they can use the type system to handle errors painlessly, without using out-of-band language features like Control.Exception.

While I have nothing against Control.Exception, I'm worried that the proliferation of libraries revolving around it along with the absence of high-quality and instructive Either/EitherT libraries might lead beginners to believe that Haskell does not have a type-safe, simple, and easy way to handle errors. They might even lose interest in the language because they mistakenly feel it does not live up to their expectations of elegance and simplicity. After all, how can functional programming be so great if it can't do something as simple as error handling elegantly?

A recent post on reddit emphasizes that we shouldn't take for granted that beginners even know how to integrate high quality Maybe/Either code in complicated applications. All the expert knowledge of how to seamlessly inter-convert the various error-handling styles types gets locked away in people's pet utility libraries. This package solves that problem by providing easily accessible source code that beginner's can use and learn from.

More importantly, I designed the library to encourage beginners to use the type-safe error-handling style by providing a sort of social confirmation that is lacking on Hackage. The lack of a type-safe error handling ecosystem acts like a social cue to beginners that perhaps they are going down the wrong path and should reconsider. After all, if Either is supposedly the right way to handle errors, why does it seem like nobody cares about it enough to create a proper package covering common use cases? Well, now they know at least one person cares.


Scripting


Control.Error also exports Control.Error.Script, which covers the common use case of simple scripting:
type Script a = EitherT String IO a
The entire module provides a way to convert between Control.Exception and EitherT, so that people who prefer to not use Control.Exception now have a standard way to interface it with EitherT-style code. The most important function in the module is:
{- NOTE: This has been renamed to "scriptIO" and "tryIO" now resides in
         Control.Error.Util and only catches IO exceptions -}
tryIO :: IO a -> Script a
... which is like lift except that it also catches exceptions and translates them into Lefts. It also provides convenience functions to bind Maybes and Eithers within the monad:
tryMaybe :: String -> Maybe a -> Script a
tryEither :: Either String r -> Script r
These seem a little bit cumbersome to use at first, since you'd have to take all your favorite partial functions and convert them to Maybe or Either first and then pass them to tryMaybe/tryEither ... or do you?

Fortunately for you, this library has "batteries included", and Control.Error also exports the Control.Error.Safe module which wraps all your favorite partial functions both in the Either and EitherT monad.

The Either variants close a gaping hole in the safe library, where the only way you could attach a descriptive error message was using exceptions (again, this is the kind of social cue I'm talking about). Also, the EitherT variants are incredibly useful within the Script monad, where you can now seamlessly bind all your favorite error-handling functions of all types in the same monad:
import Control.Error
import System.Environment

main = runScript $ do
    as <- tryIO getArgs
    (file, n) <- case as of
        [file, n'] -> do
            n <- tryRead "Could not parse LINENUM" n'
            return (file, n)
        _ -> throwT "Usage: MYPROGRAM FILE LINENUM"
    str <- tryIO $ readFile file
    line <- tryAt "Line not found" (lines str) n
    tryIO $ putStrLn line
Notice the pattern? Everything that can be bound in the Script monad begins with the prefix try. A future release of the library will also provide versions with the default error messages (i.e. "Prelude.read: no parse"), but I just haven't decided what to name those versions, yet.

Also, notice that you don't have to import Control.Monad.Trans since for the special case of the Script monad the tryIO function serves the role of lift. The compiler error for the lack of Control.Monad.Trans is a nice reminder that you used lift instead of tryIO by mistake.


Why not ErrorT?


I'm a huge fan of the transformers package, with one major exception: ErrorT.

The first reason is that ErrorT comes with the additional baggage of the Error class, which really has no place in a proper EitherT implementation. I don't know how many times I've tried using that type, got hit by the Error class constraint, and threw up my hands in disgust and went back to EitherT (provided by the appropriately-named EitherT package and re-exported by Control.Error).

However, there is a deeper reason that I'm basing my library on EitherT and not ErrorT, which is the implementation of catchError and throwError.

For the longest time, I never appreciated that throw and catch in any Either-like block are actually return and (>>=) in the dual Either monad (the one with the type variables swapped). I missed this because the types of catchError and throwError I would see in every library were never made fully polymorphic:
-- The non-monad-transformer versions, for simplicity
throwError :: e -> Either e r
catchError :: Either e r -> (e -> Either e r) -> Either e r
If I generalized the two above function signatures to be fully polymorphic, you'd have something that looks remarkably like the signatures for return and (>>=):
throwE/return :: a -> Either a r
catchE/(>>=)  :: Either a r -> (a -> Either b r) -> Either b r
In fact, if you just used a newtype to swap the type variables, you'd have something that works exactly like a monad:
newtype EitherR r e = EitherR { runEitherR :: Either e r }

return :: a -> EitherR r a
return :: a ->         m a
return = throwE -- except with newtypes

(>>=)  :: EitherR r a -> (a -> EitherR r b) -> EitherR r b
(>>=)  ::         m a -> (a ->         m b) ->         m b
(>>=) = catchE -- except with newtypes
So now we have the ability to not only throw and catch exceptional values, but to even change the type of the exceptional value. I like to call this the "success" monad (after ehird's term for it, since he was the one who pointed out this awesome symmetry to me). In this monad, each statement is an exception handler and the monad terminates when you "throw" a successful result:
runEitherRT $ do
    e2 <- ioExceptionHandler e1
    bool <- arithmeticExceptionhandler e2
    when bool $ lift $ putStrLn "DEBUG: Something happened"
    succeed () -- the dual of "throw"
    -- Statements beyond here will not be evaluated
    notEvaluated
If any handler succeeds (by returning a Right result), the entire monad terminates with the successful result.


Laws for throwE and catchE


In fact, since throwE and catchE are just return and (>>=) in disguise, we can use the monad laws to state the behavior that throwE and catchE are expected to satisfy:
-- return x >>= f = f x
throwE x `catchE` f = f x

-- m >>= return = m
m `catchE` throwE = m

-- (m >>= f) >>= g = m >>= (\x -> f x >>= g)
(m `catchE` f) `catchE` g = m `catchE` (\e -> f e `catchE` g)
If you spend a moment to think about all of those laws they all make intuitive sense. They each say, respectively:
  • If you throw a value, the catch block processes it
  • If your catch block just rethrows the error, it's the same as not catching in the first place
  • Catch blocks are associative
These laws exactly match our intuition for how throwE and catchE should behave!

If all of this excites you, then you'll love Data.EitherR, which is one of the modules exported by Control.Error. It provides all the machinery necessary for working in the EitherR monad and also provides the convenience functions for the generalized throw and catch:
throwE :: e -> Either e r
catchE :: Either a r -> (a -> Either b r) -> Either b r
These are just newtype wrappers around the return and (>>=) in the success monad. If all you want is throwE and catchE, then you never need to actually use EitherR directly and the above functions are sufficient.

Also, technically you could just use Either itself to implement the "success" monad by just reversing the convention for the type variables and use throwE and catchE to implement the error monad. However, the main reason you might not want to do that is to not confuse other people who are familiar with the traditional convention for Either. People will understand your code better if you stick to Either for code that terminates on errors and EitherR for code that terminates on successes.

Additionally, if you want to actually switch between both monads in your code and use ordinary do notation for each one, then the EitherR and EitherRT newtypes will be absolutely essential to convert between both monads.


Simplicity


There's one last thing that's nice about Control.Error: it re-exports Data.Either and Data.Maybe. It's just one nice little feature that helps trim down your import list.

I always enjoy coming up with compelling and interesting Haskell code examples in as few lines as possible, and keeping the import list clean is just one of those "nice" things when showing people from other languages how clean and simple Haskell can be. My rule of thumb is that if you really want to impress other people with Haskell, then show them how much power you can fit clearly and expressively into a single 80x24 terminal window, including all imports and extensions.


Conclusions


Hopefully this library will help shrink up a lot of people's utility modules and encourage the Either/EitherT style of error-handling. Let me know if there is a feature that you think is missing from this library, because the goal of this library is to make error handling as simple and painless as possible.

Sunday, July 1, 2012

pipes-2.1 and index-core-1.0 - Indexed types

This new release marks a major upgrade to the pipes Frame implementation that unifies the entire implementation within a single type using indexed types. The most important benefits of this are:
  • Frames are now unified into a single type
  • The newtype is gone
  • do notation can be rebound to work with monads on indexed types.
Now, Frame code is virtually indistinguishable from Pipe code. For example, the take' Frame looks like this:
take' n = do
    replicateMR_ $ do
        a <- await
        yield a
    close
    liftU $ putStrLn "You shall not pass"
However, before I continue, I want to make good on a promise to notify readers of my blog that I made some mistakes in my previous post on bugs in conduit. I corrected some of the mistakes in the original post and in Michael's response post he explained several of his design decisions.


Indexed types


To upgrade pipes to use indexed types, I wrote a base library that provides the foundation for indexed types: index-core-1.0, which you can find here. This library is strongly based on the functional pearl "Kleisli arrows of outrageous fortune" by Conor McBride.

The other significant alternative for indexed monads was the indexed package, but I chose not to use it for several reasons.

First, Conor's approach is strictly more powerful, allowing computations that can end in multiple states while still preserving type safety. Although pipes does not make use of this facility, I preferred using a stronger framework for indexed types as a dependency to encourage others to use his approach.

Second, Conor's approach makes it MUCH easier to translate ordinary types into indexed types mechanically. The best example of this is the "indexed free monad transformer" (what a mouthful) used to implement Frame, which is the indexed equivalent to the free monad transformer used for Pipes. However, not only are the types mechanically translatable, but so is the code, which is almost indistinguishable from unindexed code.

However, I make one important deviation from Conor's approach, which is terminology. Unfortunately, Conor's paper uses the term "indexed monads" to refer to the conventional approach represented in the indexed package, and uses the term "monads on indexed types" to describe his approach, and it actually took me my third read-through of his paper before I even caught that distinction at all. As a result, I decided to use more distinctive terminology to distinguish them so I prefer to use the term "indexed monad" to refer to Conor's approach and "restricted monad" to refer to the more restrictive approach found in indexed. All the documentation in index-core follows this naming convention.

Besides providing indexed and restricted monads, index-core also provides the tools to switch between ordinary monads and restricted monads. For example, you can always upgrade an ordinary monad to a restricted monad using the u (for 'u'pgrade) function, and downgrade it again using unU. This is useful when you enable do notation for indexed monads, but you still want to also use do notation for ordinary monads:
-- Upgrading IO to work in an indexed do block
unU $ do
    str <- u getLine
    u $ putStrLn str
Alternatively, you can choose to not enable indexed do notation and instead downgrade index-preserving restricted monads to ordinary monads using the D (for 'D'owngrade) function and upgrade it again using unD:
-- Downgrading Frame to work in an ordinary do block
unD $ do
    a <- D await
    D $ yield a
However, if you took the latter approach, you would need to use the indexed monad bind directly anytime you use an operation that changes the index (such as the close operation). It's up to you which approach you prefer.

Also, index-core exports restricted monad versions of the functions found in Control.Monad, except with 'R'-suffixed names (for 'R'estricted), like foreverR and replicateMR.

Note that index-core does not yet export a restricted Applicative class. It can be done, but I just haven't gotten a chance to do it, so all the examples in this post and in the pipes tutorial don't yet use the Applicative style.


Sugar


Frames are now on par with Pipes in terms of elegance and syntactic sugar. All of the complaints raised for pipes-2.0 are fixed by this upgrade. There is no two-stage monad any longer, meaning that you can do elegant things like:
strict = toList >>= fromList
... and the implementation of toList is now very succinct, exactly the way you'd expect to write it:
toList = do
    a' <- awaitF
    case a' of
        Nothing -> return []
        Just a  -> do
            as <- toList
            return (a:as)
Note that I've switched the roles of awaitF and await. This is so that Frame await is now a drop-in replacement for Pipe await and also to bring pipess in line with pipes-core, notationally, which uses await to denote the default request which does not return a Maybe. awaitF becomes the Frame equivalent to tryAwait from pipes-core.


Miscellany


The hierarchy has been reorganized a bit. I've moved Frames to Control.Frame because they are no longer built on top of the Pipe type. Also, now I've moved the tutorials to Control.Pipe.Tutorial and Control.Frame.Tutorial and made a lot of updates to the Frame tutorial, especially the strictness part, which now does a really good job of demonstrating how you can be selectively strict in the input.

Now that conduit is also expanding its documentation, I thought it would be a good time to choose a better module hierarchy to set as an example for other libraries. I think that a simple ".Tutorial" extension for a module's corresponding tutorial is perhaps the most flexible and straightforward way to navigate to the tutorial for a particularly module. The advantage of splitting the tutorial module from the actual API module is so that you can feel free to write a long tutorial without worrying about getting in the way of navigating the API.


The Streaming Haskell Group


Besides cleaning up the Frame implementation, there's another reason I'm switching to indexed types. Recently, Michael founded the streaming-haskell group and after a some of discussion with the other members of the Streaming Haskell group, I'm beginning to believe that the final type that we will end up agreeing on will require some form of indexed types, so this release is my effort to lay the groundwork for an indexed type ecosystem so that people will be less afraid to experiment with indexed types. With this release I hope to give people:
  • The tools to work with indexed types
  • An instructive library to consult as a reference
  • A motivating example for the practical benefits of indexed types
In a future post I will discuss in greater length why indexed types are the future of more advanced iteratee implementations, perhaps even for ones that rely on implementation hiding.


Future Directions


As many people know, I make a big deal about avoiding implementation hiding when enforcing class laws. I believe strongly in keeping the entire library correct by construction so that it is easy for other people to extend without worrying about breaking any laws. At the same time, I strive to simplify the interface the user must understand to use the library and I think this version is a big step in that direction.

I'm a big advocate for free monads and one of the big benefits of free monads is the power to choose your observation function. Historically, runPipe (or runFrame) has been the canonical observation function, but I believe people's creativity has been filtered through the lens of what they could accomplish through runPipe and they don't think of all the things they could do by writing their own observation functions. By sticking to strong guarantees I try to promote experimentation in the observation function and in the future I will write a post with some motivating examples to get people to think outside the runPipe box.

Now that I'm happier with the state of the Frame API you will soon see a library of utility functions included in upcoming minor releases so that you don't have to write your own (although, it's very easy to do so now!). Also, parsing is a major goal in the near future and a complete parsing implementation is my target for version 3.0.

As far as exceptions go, you can still have the same exception-handling power that conduit does by just layering Frame on top of ResourceT, but I choose not to do so yet because I have other ideas for how to implement it. Because advanced Frame users can in theory implement exception handling using the API I currently present, it's lower on my priority list than parsing, which is non-trivial. However, if you disagree with my priorities, feel free to let me know. I always strive to follow feedback from users.