Wednesday, July 11, 2012

Breaking from a loop

This post describes how to break from a code block by using EitherT/MaybeT instead of ContT. This technique isn't new, and has already been described at least once before here. However, there is still some weird culture of teaching ContT for exiting from loops, which is incredibly over-kill and bad practice because it makes beginners think it's complicated when it's not.


The Trick


Exiting from a code block is ridiculously simple:
import Control.Error -- from the 'errors' package
import Control.Monad
import Control.Monad.Trans

exit = left -- or rename 'exit' to 'break' if you prefer

main = runEitherT $ forever $ do
    str <- lift getLine
    when (str == "exit") $ exit ()
I find this significantly easier to understand than the equivalent ContT version. Normally when you use Either/EitherT, you terminate on the first Left you encounter. All that exit does is return its argument wrapped in a Left, halting the surrounding EitherT block.

You can even use the value returned from the block, which will be a Left if the block exited with a exit statement, or a Right if the block terminated normally:
main = do
    e <- runEitherT $ forever $ do
        str <- lift getLine
        when (str == "exit") $ exit ()
    case e of
        Left  a -> ... -- Loop terminated with an 'exit'
        Right b -> ... -- Loop terminated normally
This approach is incredibly simple and lets you distinguish how the loop terminates. If you don't care about distinguishing the two code paths and they return the same result type, then just use:
main = do
    r <- fmap (either id id) $ runEitherT $ ...
    ...
... which also works if the loop is non-terminating.

If you want to exit without returning any value, just use MaybeT instead:
exit = mzero

main = runMaybeT $ forever $ do
    str <- getLine
    when (str == "exit") exit
Or you could stick with EitherT and just use exit (), as the first example did, since Either () r is isomorphic to Maybe r.


EitherT vs. ErrorT


You might wonder why people historically teach ContT instead of EitherT for exiting from loops. I can only guess that they did this because of the terrible EitherT implementation in the transformers package that goes by the name ErrorT. The ErrorT implementation makes two major mistakes:
  • It constrains the Left value with the Error class.
  • The name mistakenly misleads users to believe that the Left value is only useful for returning errors.
The first mistake prevents you from exiting with any old value unless you first make it an instance of the Error class. The second mistake insidiously misleads new Haskell programmers to miss the opportunity to exit from EitherT blocks using ordinary non-exceptional values and then they get led astray by people peddling ContT.

The above examples worked because they didn't use ErrorT at all and instead used the superior implementation in the either package which doesn't constrain the behavior of the Left value, either practically or semantically. This is why you should always give data types neutral names to encourage people to use them in ways you didn't anticipate.


Nested blocks


You might think you need ContT if you want to do anything more complicated such as multiple levels of nesting and exiting. However, if you thought so you'd be wrong! Nesting multiple EitherT blocks works perfectly fine:
-- I'm too lazy to add type signatures for this
{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Error
import Control.Monad
import Control.Monad.Trans

exit = left

main =
    runEitherT $ forM_ [1..3] $ \i ->
        runEitherT $ forM_ [4..6] $ \j -> do
            when (j == 6) $ liftInner $ exit ()
            liftIO $ print (i, j)
  where liftInner = id
        liftOuter = lift
        liftIO    = lift . lift
Let's check it out:
$ ./break
(1,4)
(1,5)
(2,4)
(2,5)
(3,4)
(3,5)
I can choose to break out of two nested levels by replacing liftInner with liftOuter, which just lifts the exit call to the surrounding EitherT block:
            ...
            when (j == 6) $ liftOuter $ exit ()
            ...
$ ./break
(1, 4)
(1, 5)
Nice! Mainstream languages require extended syntax to let you break out of multiple nested loops. Haskell does it using ordinary functions. I really can't convey how amazing that is without being entirely unprofessional.

3 comments:

  1. "You might wonder why people historically teach ContT instead of EitherT for exiting from loops."

    My guess is because continuations in CS were discovered earlier than monads, and historically breaking loops has been associated with call/cc.

    It's a nice meta-observation, btw.

    ReplyDelete
  2. Thank you for writing this article!

    As a Haskell noob, learning idioms is quite useful to me. It took an appreciable effort for me to learn about the the types and functions in the three modules in your examples and to wrap my head around MonadTrans. But I think that what I learned made it worthwhile.

    ReplyDelete
    Replies
    1. If you are new to monad transformers, the best tutorial is the following paper:

      http://www.grabmueller.de/martin/www/pub/Transformers.pdf

      It's really well written and targeted towards "noobs". :)

      Delete