Wednesday, May 5, 2021

The trick to avoid deeply-nested error-handling code

either-trick

This post documents a small trick that I use to avoid deeply-nested error-handling code. This trick is a common piece of Haskell folklore that many people either learn from others or figure out on their own, but I’m not sure what the official name of this trick is (so I had difficulty searching for prior art explaining this trick). However, I’ve taught this trick to others enough times that I think it merits a blog post of its own.

This post assumes some familiarity with Haskell’s Either type and do notation, but the Appendix at the end of the post will walk through all of the details using equational reasoning if you’re having trouble following along with how things work.

The motivating example

Let’s begin from the following contrived Either-based example that uses deeply nested error-handling:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

example :: String -> String -> Either String Person
example ageString aliveString = do
    case readMaybe ageString of
        Nothing -> do
            Left "Invalid age string"

        Just age -> do
            if age < 0
                then do
                    Left "Negative age"

                else do
                    case readMaybe aliveString of
                        Nothing -> do
                            Left "Invalid alive string"

                        Just alive -> do
                            return Person{ age, alive }

… which we can use like this:

>>> example "24" "True"
Right (Person {age = 24, alive = True})

>>> example "24" "true"
Left "Invalid alive string"

>>> example "" "True"
Left "Invalid age string"

>>> example "-5" "True"
Left "Negative age"

Notice how the above coding style increases the nesting / indentation level every time we parse or validate the input in some way. We could conserve indentation by using 2-space indents or indenting only once for each level of nesting:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

example :: String -> String -> Either String Person
example ageString aliveString = case readMaybe ageString of
  Nothing  -> Left "Invalid age string"
  Just age -> if age < 0
    then Left "Negative age"
    else case readMaybe aliveString of
      Nothing    -> Left "Invalid alive string"
      Just alive -> return Person{ age, alive }

However, I think most people writing code like that would prefer to keep the indentation level stable, no matter how many validations the code requires.

The trick

Fortunately, we can avoid nesting the code with the following change:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

example :: String -> String -> Either String Person
example ageString aliveString = do
    age <- case readMaybe ageString of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe aliveString of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

Here we make use of several useful properties:

  • return in Haskell does not short-circuit or exit from the surrounding code

    In fact, some Haskell programmers prefer to use pure (a synonym for return) to better convey the absence of short-circuiting behavior:

    {-# LANGUAGE NamedFieldPuns #-}
    
    import Text.Read (readMaybe)
    
    data Person = Person { age :: Int, alive :: Bool } deriving (Show)
    
    example :: String -> String -> Either String Person
    example ageString aliveString = do
        age <- case readMaybe ageString of
            Nothing  -> Left "Invalid age string"
            Just age -> pure age
    
        if age < 0
            then Left "Negative age"
            else pure ()
    
        alive <- case readMaybe aliveString of
            Nothing    -> Left "Invalid alive string"
            Just alive -> pure alive
    
        pure Person{ age, alive }
  • Left does short-circuit from the surrounding code

    In fact, we can define a synonym for Left named throw if we want to better convey the presence of short-circuiting behavior::

    {-# LANGUAGE NamedFieldPuns #-}
    
    import Text.Read (readMaybe)
    
    data Person = Person { age :: Int, alive :: Bool } deriving (Show)
    
    example :: String -> String -> Either String Person
    example ageString aliveString = do
        age <- case readMaybe ageString of
            Nothing  -> throw "Invalid age string"
            Just age -> pure age
    
        if age < 0
            then throw "Negative age"
            else pure ()
    
        alive <- case readMaybe aliveString of
            Nothing    -> throw "Invalid alive string"
            Just alive -> pure alive
    
        pure Person{ age, alive }
    
    throw :: String -> Either String a
    throw = Left
  • Left / throw “return” any type of value

    If you look at the type of throw, the “return” type is a polymorphic type a because throw short-circuits (and therefore doesn’t need to return a real value of that type).

    This is why the type checker doesn’t complain when we do this:

    age <- case readMaybe ageString of
        Nothing  -> throw "Invalid age string"
        Just age -> pure age

    … because both branches of the case expression type-check as an expression that “return”s an Int. The left branch type-checks as a branch that returns an Int because it cheats and never needs to return anything and the right branch returns a bona-fide Int (the age).

    We can make this explicit by giving both branches of the case expression an explicit type annotation:

    age <- case readMaybe ageString of
        Nothing  -> throw "Invalid age string" :: Either String Int
        Just age -> pure age                   :: Either String Int

    This means that both branches of a case expression will always share the same return type if at least one branch is a Left / throw. Similarly, both branches of an if expression will share the same return type if at least one branch is a Left / throw:

    if age < 0
        then throw "Negative age" :: Either String ()
        else pure ()              :: Either String ()
  • We can return a value from a case expression

    New Haskell programmers might not realize that case expressions can return a value (just like any other expression), which means that their result can be stored as a new value within the surrounding scope:

    age <- case readMaybe ageString of
        Nothing  -> throw "Invalid age string" :: Either String Int
        Just age -> pure age                   :: Either String Int

    The type-checker doesn’t mind that only the second branch returns a valid age because it knows that the outer age is unreachable if the first branch short-circuits. This understanding is not built into the compiler, but is rather an emergent property of how do notation works for Either. See the Appendix for a fully-worked equational reasoning example showing why this works.

Combinators

You can build upon this trick by defining helpful utility functions to simplify things further. For example, I sometimes like to define the following helper function:

orDie :: Maybe a -> String -> Either String a
Just a  `orDie` _      = return a
Nothing `orDie` string = Left string

{- Equivalent, more explicit, implementation:

maybe `orDie` string =
    case maybe of
        Nothing -> Left string
        Just a  -> return a
-}

… which you can use like this:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

orDie :: Maybe a -> String -> Either String a
Just a  `orDie` _      = Right a
Nothing `orDie` string = Left string

example :: String -> String -> Either String Person
example ageString aliveString = do
    age <- readMaybe ageString `orDie` "Invalid age string"

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- readMaybe aliveString `orDie` "Invalid alive string"

    return Person{ age, alive }

… which is even more clear (in my opinion).

Conclusion

That is the entirety of the trick. You can return values from case expressions to avoid deeply-nesting your Either code, or you can define utility functions (such as orDie) which do essentially the same thing.

This trick applies equally well to any other Monad that supports some notion of short-circuiting on failure, such as ExceptT (from transformers / mtl). The only essential ingredient is some throw-like function that short-circuits and therefore has a polymorphic return type.

Appendix

You can build a better intuition for why this works using equational reasoning. We’ll begin from an example usage of our function and at each step of the process we will either desugar the code or substitute one subexpression with another equal subexpression.

In all of the examples, we’ll begin from this definition for the example function:

example :: String -> String -> Either String Person
example ageString aliveString = do
    age <- case readMaybe ageString of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe aliveString of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

Let’s first begin with the example that fails the most quickly:

example "" "True"

-- Substitute `example` with its definition:
--
-- … replacing `ageString` with `""`
--
-- … replacing `aliveString` with `"True"`
do  age <- case readMaybe "" of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Definition of `readMaybe` (not shown):
--
-- (readMaybe "" :: Maybe Int) = Nothing
do  age <- case Nothing of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Simplify the first `case` expression
do  age <- Left "Invalid age string"

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Left "Invalid age string" >>= \age ->
    do  if age < 0
            then Left "Negative age"
            else return ()

        alive <- case readMaybe "True" of
            Nothing    -> Left "Invalid alive string"
            Just alive -> return alive

        return Person{ age, alive }

-- Definition of `>>=` for `Either`
--
-- (Left x >>= _) = (Left x)
Left "Invalid age string"

… and we’re done! The key step is the last bit where we simplify (Left … >>= _) to (Left …), which is how Either short-circuits on failure. Because this simplification does not use the downstream code everything type-checks just fine even though we never supply a valid age.

For completeness, let’s also walk through the example where everything succeeds:

example "24" "True"

-- Substitute `example` with its definition:
--
-- … replacing `ageString` with `"24"`
--
-- … replacing `aliveString` with `"True"`
do  age <- case readMaybe "24" of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Definition of `readMaybe` (not shown):
--
-- (readMaybe "24" :: Maybe Int) = (Just 24)
do  age <- case Just 24 of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Simplify the first `case` expression
do  age <- return 24

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- return = Right
do  age <- Right 24

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Right 24 >>= \age ->
    do  if age < 0
            then Left "Negative age"
            else return ()

        alive <- case readMaybe "True" of
            Nothing    -> Left "Invalid alive string"
            Just alive -> return alive

        return Person{ age, alive }

-- Definition of `>>=` for `Either`
--
-- (Right x >>= f) = (f x)
--
-- This means that we substitute `age` with `24`
do  if 24 < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- (24 < 0) = False
do  if False
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- (if False then l else r) = r
do  return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- return = Right
do  Right ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- (do m; n) = (do _ <- m; n)
do  _ <- Right ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Right () >>= _ -> 
    do  alive <- case readMaybe "True" of
            Nothing    -> Left "Invalid alive string"
            Just alive -> return alive

        return Person{ age = 24, alive }

-- Definition of `>>=` for `Either`
--
-- (Right x >>= f) = (f x)
--
-- This means that we substitute `_` (unused) with `()`
do  alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- Definition of `readMaybe` (not shown):
--
-- (readMaybe "True" :: Maybe Bool) = (Just True)
do  alive <- case Just True of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- Simplify the `case` expression
do  alive <- return True

    return Person{ age = 24, alive }

-- return = Right
do  alive <- Right True

    return Person{ age = 24, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Right True >>= \alive ->
    do  return Person{ age = 24, alive }

-- Definition of `>>=` for `Either`
--
-- (Right x >>= f) = (f x)
--
-- This means that we substitute `alive` with `True`
do  return Person{ age = 24, alive = True }

-- Desugar `do` notation
--
-- do m = m
return Person{ age = 24, alive = True }

-- return = Right
Right Person{ age = 24, alive = True }

As an exercise, you can try to extrapolate between those two examples to reason through what happens when we evaluate the remaining two examples which fail somewhere in between:

>>> example "24" "true"

>>> example "-5" "True"

6 comments:

  1. It's also nice to use `when` to test age:

    when (age < 0) $ Left "Negative age"

    ReplyDelete
    Replies
    1. Someone on Reddit pointed out that:

      guard (age < 0) `orDie` "Negative age"

      … would also work

      Delete
    2. should the guard arg be (age => 0)?

      Delete
  2. Rust calls this pattern "Early returns": https://doc.rust-lang.org/rust-by-example/error/result/early_returns.html

    ReplyDelete
  3. I'm surprised you didn't mention the note and hush functions in your errors package?

    https://hackage.haskell.org/package/errors-2.3.0/docs/Control-Error-Util.html#v:note

    I love these functions. note is the same as orDie.

    ReplyDelete