Wednesday, July 11, 2012

Breaking from a loop

This post describes how to break from a code block by using EitherT/MaybeT instead of ContT. This technique isn't new, and has already been described at least once before here. However, there is still some weird culture of teaching ContT for exiting from loops, which is incredibly over-kill and bad practice because it makes beginners think it's complicated when it's not.


The Trick


Exiting from a code block is ridiculously simple:
import Control.Error -- from the 'errors' package
import Control.Monad
import Control.Monad.Trans

exit = left -- or rename 'exit' to 'break' if you prefer

main = runEitherT $ forever $ do
    str <- lift getLine
    when (str == "exit") $ exit ()
I find this significantly easier to understand than the equivalent ContT version. Normally when you use Either/EitherT, you terminate on the first Left you encounter. All that exit does is return its argument wrapped in a Left, halting the surrounding EitherT block.

You can even use the value returned from the block, which will be a Left if the block exited with a exit statement, or a Right if the block terminated normally:
main = do
    e <- runEitherT $ forever $ do
        str <- lift getLine
        when (str == "exit") $ exit ()
    case e of
        Left  a -> ... -- Loop terminated with an 'exit'
        Right b -> ... -- Loop terminated normally
This approach is incredibly simple and lets you distinguish how the loop terminates. If you don't care about distinguishing the two code paths and they return the same result type, then just use:
main = do
    r <- fmap (either id id) $ runEitherT $ ...
    ...
... which also works if the loop is non-terminating.

If you want to exit without returning any value, just use MaybeT instead:
exit = mzero

main = runMaybeT $ forever $ do
    str <- lift getLine
    when (str == "exit") exit
Or you could stick with EitherT and just use exit (), as the first example did, since Either () r is isomorphic to Maybe r.


EitherT vs. ErrorT


You might wonder why people historically teach ContT instead of EitherT for exiting from loops. I can only guess that they did this because of the terrible EitherT implementation in the transformers package that goes by the name ErrorT. The ErrorT implementation makes two major mistakes:
  • It constrains the Left value with the Error class.
  • The name mistakenly misleads users to believe that the Left value is only useful for returning errors.
The first mistake prevents you from exiting with any old value unless you first make it an instance of the Error class. The second mistake insidiously misleads new Haskell programmers to miss the opportunity to exit from EitherT blocks using ordinary non-exceptional values and then they get led astray by people peddling ContT.

The above examples worked because they didn't use ErrorT at all and instead used the superior implementation in the either package which doesn't constrain the behavior of the Left value, either practically or semantically. This is why you should always give data types neutral names to encourage people to use them in ways you didn't anticipate.


Nested blocks


You might think you need ContT if you want to do anything more complicated such as multiple levels of nesting and exiting. However, if you thought so you'd be wrong! Nesting multiple EitherT blocks works perfectly fine:
-- I'm too lazy to add type signatures for this
{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Error
import Control.Monad
import Control.Monad.Trans

exit = left

main =
    runEitherT $ forM_ [1..3] $ \i ->
        runEitherT $ forM_ [4..6] $ \j -> do
            when (j == 6) $ liftInner $ exit ()
            liftIO $ print (i, j)
  where liftInner = id
        liftOuter = lift
        liftIO    = lift . lift
Let's check it out:
$ ./break
(1,4)
(1,5)
(2,4)
(2,5)
(3,4)
(3,5)
I can choose to break out of two nested levels by replacing liftInner with liftOuter, which just lifts the exit call to the surrounding EitherT block:
            ...
            when (j == 6) $ liftOuter $ exit ()
            ...
$ ./break
(1, 4)
(1, 5)
Nice! Mainstream languages require extended syntax to let you break out of multiple nested loops. Haskell does it using ordinary functions. I really can't convey how amazing that is without being entirely unprofessional.

16 comments:

  1. "You might wonder why people historically teach ContT instead of EitherT for exiting from loops."

    My guess is because continuations in CS were discovered earlier than monads, and historically breaking loops has been associated with call/cc.

    It's a nice meta-observation, btw.

    ReplyDelete
  2. Thank you for writing this article!

    As a Haskell noob, learning idioms is quite useful to me. It took an appreciable effort for me to learn about the the types and functions in the three modules in your examples and to wrap my head around MonadTrans. But I think that what I learned made it worthwhile.

    ReplyDelete
    Replies
    1. If you are new to monad transformers, the best tutorial is the following paper:

      http://www.grabmueller.de/martin/www/pub/Transformers.pdf

      It's really well written and targeted towards "noobs". :)

      Delete
  3. In your first code block, there is a typo. The line reading `exit = left` should be `exit = Left`.

    ReplyDelete
    Replies
    1. It's actually supposed to be lowercase because it is in the `EitherT` monad and not the `Either` monad. The `either` package provides the `left` combinator, defined as:

      left :: (Monad m) => e -> EitherT e m r
      left e = EitherT (return (Left e))

      However, if it were in the `Either` monad you'd be correct. :)

      Delete
    2. Thanks. That makes a lot more sense to me now.

      Delete
  4. Also, the box for the Maybe case needs `list getLine` instead of `getLine`.

    ReplyDelete
    Replies
    1. You're right! I fixed it. Thanks for pointing that out! :)

      Delete
  5. This is exactly what i was looking for. If you have an cumulative result, let's say a list, how do you modify it on each function call?

    ReplyDelete
    Replies
    1. That's what folds, filters, and maps are designed for. Building a result from iterating over a collection is an example where I would recommend the functional style over an imperative loop.

      Delete
    2. How do you combine a fold with this solution (Either) to stop folding, in case you find an chain action that can't go further, just like an exception?

      Delete
    3. So if you really want to stick to the `EitherT` solution then you combine it with `WriterT` in a monad transformer stack. The order would be `EitherT e (WriterT w m) r`. The idea is that you use `WriterT` to accumulate values into the fold and then use `EitherT` to break out of the loop when you're done accumulating. It would look something like this:

      loop $ do
      lift $ tell
      when someCondition $ exit ()
      ...

      Delete
    4. Hope i'm not disturbing you. Would you recommend other approach than the EitherT monad? I can think of a few cases which go perfectly with what you just wrote:
      - Dependency graph resolution where nodes are IO actions
      - Handling of data which doesn't fit in memory
      - Transactions (database, services)

      Delete
  6. You're not disturbing me! :)

    So for dependency graph resolution I would recommend you use `ListT` from my `pipes` library.

    For handling data which doesn't fit in memory I would also recommend `pipes`.

    For transactions, I'm not sure exactly what you mean. You would need to elaborate more.

    ReplyDelete
    Replies
    1. I was just reading your pipes tutorial and realized that. Nevermind about services. Thank you very much, Gabriel.

      Delete