Wednesday, October 6, 2021

The "return a command" trick

return-command

This post illustrates a trick that I’ve taught a few times to minimize the “change surface” of a Haskell program. By “change surface” I mean the number of places Haskell code needs to be updated when adding a new feature.

The motivation

I’ll motivate the trick through the following example code for a simple REPL:

import Control.Applicative ((<|>))
import Data.Void (Void)
import Text.Megaparsec (Parsec)

import qualified Data.Char            as Char
import qualified System.IO            as IO
import qualified Text.Megaparsec      as Megaparsec
import qualified Text.Megaparsec.Char as Megaparsec

type Parser = Parsec Void String

data Command = Print String | Save FilePath String

parsePrint :: Parser Command
parsePrint = do
    Megaparsec.string "print"

    Megaparsec.space1

    string <- Megaparsec.takeRest

    return (Print string)

parseSave :: Parser Command
parseSave = do
    Megaparsec.string "save"

    Megaparsec.space1

    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    Megaparsec.space1

    string <- Megaparsec.takeRest

    return (Save file string)

parseCommand :: Parser Command
parseCommand = parsePrint <|> parseSave

main :: IO ()
main = do
    putStr "> "

    eof <- IO.isEOF

    if eof
        then do
            putStrLn ""

        else do
            text <- getLine

            case Megaparsec.parse parseCommand "(input)" text of
                Left e -> do
                    putStr (Megaparsec.errorBundlePretty e)

                Right command -> do
                    case command of
                        Print string -> do
                            putStrLn string

                        Save file string -> do
                            writeFile file string

            main

This REPL supports two commands: print and save:

> print Hello, world!
Hello, world!
> save number.txt 42

print echoes back whatever string you supply and save writes the given string to a file.

Now suppose that we wanted to add a new load command to read and display the contents of a file. We would need to change our code in four places.

First, we would need to change the Command type to add a new Load constructor:

data Command = Print String | Save FilePath String | Load FilePath

Second, we would need to add a new parser to parse the load command:

parseLoad :: Parser Command
parseLoad = do
    Megaparsec.string "load"

    Megaparsec.space1

    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    return (Load file)

Third, we would need to add this new parser to parseCommand:

parseCommand :: Parser Command
parseCommand = parsePrint <|> parseSave <|> parseLoad

Fourth, we would need to add logic for handling our new Load constructor in our main loop:

                    case command of
                        Print string -> do
                            putStrLn string

                        Save file string -> do
                            writeFile file string
                        
                        Load file -> do
                            string <- readFile file 

                            putStrLn string

I’m not a fan of this sort of program structure because the logic for how to handle each command isn’t all in one place. However, we can make a small change to our program structure that will not only simplify the code but also consolidate the logic for each command.

The trick

We can restructure our code by changing the type of all of our parsers from this:

parsePrint :: Parser Command

parseSave :: Parser Command

parseLoad :: Parser Command

parseCommand :: Parser Command

… to this:

parsePrint :: Parser (IO ())

parseSave :: Parser (IO ())

parseLoad :: Parser (IO ())

parseCommand :: Parser (IO ())

In other words, our parsers now return an actual command (i.e. IO ()) instead of returning a Command data structure that still needs to be interpreted.

This entails the following changes to the implementation of our three command parsers:

{-# LANGUAGE BlockArguments #-}

parsePrint :: Parser (IO ())
parsePrint = do
    Megaparsec.string "print"

    Megaparsec.space1

    string <- Megaparsec.takeRest

    return do
        putStrLn string

parseSave :: Parser (IO ())
parseSave = do
    Megaparsec.string "save"

    Megaparsec.space1

    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    Megaparsec.space1

    string <- Megaparsec.takeRest

    return do
        writeFile file string

parseLoad :: Parser (IO ())
parseLoad = do
    Megaparsec.string "load"

    Megaparsec.space1

    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    return do
        string <- readFile file
        
        putStrLn string

Now that each parser returns an IO () action, we no longer need the Command type, so we can delete the following datatype definition:

data Command = Print String | Save FilePath String | Load FilePath

Finally, our main loop gets much simpler, because we no longer need to specify how to handle each command. That means that instead of handling each Command constructor:

            case Megaparsec.parse parseCommand "(input)" text of
                Left e -> do
                    putStr (Megaparsec.errorBundlePretty e)

                Right command -> do
                    case command of
                        Print string -> do
                            putStrLn string

                        Save file string -> do
                            writeFile file string

                        Load file -> do
                            string <- readFile file

                            putStrLn string

… we just run whatever IO () command was parsed, like this:

            case Megaparsec.parse parseCommand "(input)" text of
                Left e -> do
                    putStr (Megaparsec.errorBundlePretty e)

                Right io -> do
                    io

Now we only need to make two changes to the code any time we add a new command. For example, all of the logic for the load command is right here:

parseLoad :: Parser (IO ())
parseLoad = do
    Megaparsec.string "load"

    Megaparsec.space1

    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    return do
        string <- readFile file

        putStrLn string

… and here:

parseCommand :: Parser (IO ())
parseCommand = parsePrint <|> parseSave <|> parseLoad
                                         -- ↑

… and that’s it. We no longer need to change our REPL loop or add a new constructor to our Command datatype (because there is no Command datatype any longer).

What’s neat about this trick is that the IO () command we return has direct access to variables extracted by the corresponding Parser. For example:

parseLoad = do
    Megaparsec.string "load"

    Megaparsec.space1

    -- The `file` variable that we parse here …
    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    return do
        -- … can be referenced by the corresponding `IO` action here
        string <- readFile file

        putStrLn string

There’s no need to pack our variables into a data structure and then unpack them again later on when we need to use them. This technique promotes tight and “vertically integrated” code where all of the logic is one place.

Final encodings

This trick is a special case of a more general trick known as a “final encoding” and the following post does a good job of explaining what “initial encoding” and “final encoding” mean:

To briefly explain initial and final encodings in my own words:

  • An “initial encoding” is one where you preserve as much information as possible in a data structure

    This keeps your options as open as possible since you haven’t specified what to do with the data yet

  • A “final encoding” is one where you encode information by how you intend to use it

    This tends to simplify your program if you know in advance how the information will be used

The initial example from this post was an initial encoding because each Parser returned a Command type which preserved as much information as possible without specifying what to do with it. The final example from this post was a final encoding because we encoded our commands by directly specifying what we planned to do with them.

Conclusion

This trick is not limited to returning IO actions from Parsers. For example, the following post illustrates a similar trick in the context of implementing configuration “wizards”:

… where a wizard has type IO (IO ()) (a command that returns another command).

More generally, you will naturally rediscover this trick if you stick to the principle of “keep each component’s logic all in one place”. In the above example the “components” were REPL commands, but this consolidation principle is useful for any sort of plugin-like system.

Appendix

Here is the complete code for the final version of the running example if you would like to test it out yourself:

{-# LANGUAGE BlockArguments #-}

import Control.Applicative ((<|>))
import Data.Void (Void)
import Text.Megaparsec (Parsec)

import qualified Data.Char            as Char
import qualified System.IO            as IO
import qualified Text.Megaparsec      as Megaparsec
import qualified Text.Megaparsec.Char as Megaparsec

type Parser = Parsec Void String

parsePrint :: Parser (IO ())
parsePrint = do
    Megaparsec.string "print"

    Megaparsec.space1

    string <- Megaparsec.takeRest

    return do
        putStrLn string

parseSave :: Parser (IO ())
parseSave = do
    Megaparsec.string "save"

    Megaparsec.space1

    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    Megaparsec.space1

    string <- Megaparsec.takeRest

    return do
        writeFile file string

parseLoad :: Parser (IO ())
parseLoad = do
    Megaparsec.string "load"

    Megaparsec.space1

    file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)

    return do
        string <- readFile file

        putStrLn string

parseCommand :: Parser (IO ())
parseCommand = parsePrint <|> parseSave <|> parseLoad

main :: IO ()
main = do
    putStr "> "

    eof <- IO.isEOF

    if eof
        then do
            putStrLn ""

        else do
            text <- getLine

            case Megaparsec.parse parseCommand "(input)" text of
                Left e -> do
                    putStr (Megaparsec.errorBundlePretty e)

                Right io -> do
                    io

            main

3 comments:

  1. The trade-off here is that the parser's correctness is now harder to test. That seems worth it in this case, since the parser has been tested to be rock-solid.

    Zach Tellman has an interesting talk[1] about some similar ideas around the spectrum from data to execution, although it's geared towards imperative programmers and so leans in the opposite direction.

    [1] https://www.youtube.com/watch?v=3oQTSP4FngY

    ReplyDelete
  2. Is this similar to Elm's commands that it sends to the update method?

    ReplyDelete
  3. > Now we only need to make two changes to the code any time we add a new command.

    IIRC, one of the motivations behind "hooks" in React was similar: to reduce the number of places in the code that should be modified when adding some new behavior.

    ReplyDelete