Haskell for all: May 2024

Prefer do notation over Applicative operators when assembling records

This is a short post explaining why you should prefer do notation when assembling a record, instead of using Applicative operators (i.e. (<$>)/(<*>)). This advice applies both for type constructors that implement Monad (e.g. IO) and also for type constructors that implement Applicative but not Monad (e.g. the Parser type constructor from the optparse-applicative package). The only difference is that in the latter case you would need to enable the ApplicativeDo language extension.

The guidance is pretty simple. Instead of doing this:

data Person = Person
    { firstName :: String
    , lastName :: String
    }

getPerson :: IO Person
getPerson = Person <$> getLine <*> getLine

… you should do this:

{-# LANGUAGE RecordWildCards #-}

{-# OPTIONS_GHC -Werror=missing-fields #-}

data Person = Person
    { firstName :: String
    , lastName :: String
    }

getPerson :: IO Person
getPerson = do
    firstName <- getLine
    lastName <- getLine
    return Person{..}

Why is the latter version better? There are a few reasons.

Ergonomics

It’s more ergonomic to assemble a record using do notation because you’re less pressured to try to cram all the logic into a single expression.

For example, suppose we wanted to explicitly prompt the user to enter their first and last name. The typical way people would do extend the former example using Applicative operators would be something like this:

getPerson :: IO Person
getPerson =
        Person
    <$> (putStrLn "Enter your first name:" *> getLine)
    <*> (putStrLn "Enter your last name:"  *> getLine)

The expression gets so large that you end up having to split it over multiple lines, but if we’re already splitting it over multiple lines then why not use do notation?

getPerson :: IO Person
getPerson = do
    putStrLn "Enter your first name:"
    firstName <- getLine

    putStrLn "Enter your last name:"
    lastName <- getLine

    return Person{..}

Wow, much clearer! Also, the version using do notation doesn’t require that the reader is familiar with all of the Applicative operators, so it’s more approachable to Haskell beginners.

Order insensitivity

Suppose we take that last example and then change the Person type to reorder the two fields:

data Person = Person
    { lastName :: String
    , firstName :: String
    }

… then the former version using Applicative operators would silently break: the first name and last name would now be read in the wrong order. The latter version (using do notation) is unaffected.

More generally, the approach using do notation never breaks or changes its behavior if you reorder the fields in the datatype definition. It’s completely order-insensitive.

Better error messages

If you add a new argument to the Person constructor, like this:

data Person = Person
    { alive :: Bool
    , firstName :: String
    , lastName :: String
    }

… and you don’t make any other changes to the code then the former version will produce two error messages, neither of which is great:

Example.hs:
    • Couldn't match type ‘String -> Person’ with ‘Person’
      Expected: Bool -> String -> Person
        Actual: Bool -> String -> String -> Person
    • Probable cause: ‘Person’ is applied to too few arguments
      In the first argument of ‘(<$>)’, namely ‘Person’
      In the first argument of ‘(<*>)’, namely ‘Person <$> getLine’
      In the expression: Person <$> getLine <*> getLine
  |
  | getPerson = Person <$> getLine <*> getLine
  |             ^^^^^^

Example.hs:
    • Couldn't match type ‘[Char]’ with ‘Bool’
      Expected: IO Bool
        Actual: IO String
    • In the second argument of ‘(<$>)’, namely ‘getLine’
      In the first argument of ‘(<*>)’, namely ‘Person <$> getLine’
      In the expression: Person <$> getLine <*> getLine
  |
  | getPerson = Person <$> getLine <*> getLine
  |                        ^^^^^^^

… whereas the latter version produces a much more direct error message:

Example.hs:…
    • Fields of ‘Person’ not initialised:
        alive :: Bool
    • In the first argument of ‘return’, namely ‘Person {..}’
      In a stmt of a 'do' block: return Person {..}
      In the expression:
        do putStrLn "Enter your first name: "
           firstName <- getLine
           putStrLn "Enter your last name: "
           lastName <- getLine
           ....
   |
   |     return Person{..}
   |            ^^^^^^^^^^
 ^^^^^^^^^^

… and that error message more clearly suggests to the developer what needs to be fixed: the alive field needs to be initialized. The developer doesn’t have to understand or reason about curried function types to fix things.

Caveats

This advice obviously only applies for datatypes that are defined using record syntax. The approach I’m advocating here doesn’t work at all for datatypes with positional arguments (or arbitrary functions).

However, this advice does still apply for type constructors that are Applicatives and not Monads; you just need to enable the ApplicativeDo language extension. For example, this means that you can use this same trick for defining command-line Parsers from the optparse-applicative package:

{-# LANGUAGE ApplicativeDo #-}
{-# LANGUAGE RecordWildCards #-}

{-# OPTIONS_GHC -Werror=missing-fields #-}

import Options.Applicative (Parser, ParserInfo)

import qualified Options.Applicative as Options

data Person = Person
    { firstName :: String
    , lastName :: String
    } deriving (Show)

parsePerson :: Parser Person
parsePerson = do
    firstName <- Options.strOption
        (   Options.long "first-name"
        <>  Options.help "Your first name"
        <>  Options.metavar "NAME"
        )

    lastName <- Options.strOption
        (   Options.long "last-name"
        <>  Options.help "Your last name"
        <>  Options.metavar "NAME"
        )

    return Person{..}

parserInfo :: ParserInfo Person
parserInfo =
    Options.info parsePerson
        (Options.progDesc "Parse and display a person's first and last name")

main :: IO ()
main = do
    person <- Options.execParser parserInfo

    print person

All error messages are necessarily bad to some degree

This is something I feel like enough people don’t appreciate. One of the ways I like to explain this is by this old tweet of mine:

The evolution of an error message:

No error message

A one-line message

“Expected: … / Actual: …”

“Here’s what went wrong: …”

“Here’s what you should do: …”

I automated away what you should do

The invalid state is no longer representable

One of the common gripes I will hear about error messages is that they don’t tell the user what to do, but if you stop to think about it: if the error message knew exactly what you were supposed to do instead then your tool could just fix it for you (by automatically doing the right thing instead).

“But wait!”, you might say, “sometimes an error message can’t automatically fix the problem for you because there’s not necessarily a right or obvious way to fix the problem or the user’s intent is not clear.” Yes, exactly, which brings us back to the original point:

Error messages are necessarily bad because they cannot anticipate what you should have done instead. If an error message could read your mind then they’d eventually evolve into something better than an error message. This creates a selection bias where the only remaining error messages are the ones that can’t read your mind.

Haskell for all

Monday, May 20, 2024