This post illustrates a trick that I’ve taught a few times to minimize the “change surface” of a Haskell program. By “change surface” I mean the number of places Haskell code needs to be updated when adding a new feature.
The motivation
I’ll motivate the trick through the following example code for a simple REPL:
import Control.Applicative ((<|>))
import Data.Void (Void)
import Text.Megaparsec (Parsec)
import qualified Data.Char as Char
import qualified System.IO as IO
import qualified Text.Megaparsec as Megaparsec
import qualified Text.Megaparsec.Char as Megaparsec
type Parser = Parsec Void String
data Command = Print String | Save FilePath String
parsePrint :: Parser Command
parsePrint = do
Megaparsec.string "print"
Megaparsec.space1
string <- Megaparsec.takeRest
return (Print string)
parseSave :: Parser Command
parseSave = do
Megaparsec.string "save"
Megaparsec.space1
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
Megaparsec.space1
string <- Megaparsec.takeRest
return (Save file string)
parseCommand :: Parser Command
parseCommand = parsePrint <|> parseSave
main :: IO ()
main = do
putStr "> "
eof <- IO.isEOF
if eof
then do
putStrLn ""
else do
text <- getLine
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right command -> do
case command of
Print string -> do
putStrLn string
Save file string -> do
writeFile file string
mainThis REPL supports two commands: print and save:
> print Hello, world!
Hello, world!
> save number.txt 42
print echoes back whatever string you supply and save writes the given string to a file.
Now suppose that we wanted to add a new load command to read and display the contents of a file. We would need to change our code in four places.
First, we would need to change the Command type to add a new Load constructor:
data Command = Print String | Save FilePath String | Load FilePathSecond, we would need to add a new parser to parse the load command:
parseLoad :: Parser Command
parseLoad = do
Megaparsec.string "load"
Megaparsec.space1
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
return (Load file)Third, we would need to add this new parser to parseCommand:
parseCommand :: Parser Command
parseCommand = parsePrint <|> parseSave <|> parseLoadFourth, we would need to add logic for handling our new Load constructor in our main loop:
case command of
Print string -> do
putStrLn string
Save file string -> do
writeFile file string
Load file -> do
string <- readFile file
putStrLn stringI’m not a fan of this sort of program structure because the logic for how to handle each command isn’t all in one place. However, we can make a small change to our program structure that will not only simplify the code but also consolidate the logic for each command.
The trick
We can restructure our code by changing the type of all of our parsers from this:
parsePrint :: Parser Command
parseSave :: Parser Command
parseLoad :: Parser Command
parseCommand :: Parser Command… to this:
parsePrint :: Parser (IO ())
parseSave :: Parser (IO ())
parseLoad :: Parser (IO ())
parseCommand :: Parser (IO ())In other words, our parsers now return an actual command (i.e. IO ()) instead of returning a Command data structure that still needs to be interpreted.
This entails the following changes to the implementation of our three command parsers:
{-# LANGUAGE BlockArguments #-}
parsePrint :: Parser (IO ())
parsePrint = do
Megaparsec.string "print"
Megaparsec.space1
string <- Megaparsec.takeRest
return do
putStrLn string
parseSave :: Parser (IO ())
parseSave = do
Megaparsec.string "save"
Megaparsec.space1
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
Megaparsec.space1
string <- Megaparsec.takeRest
return do
writeFile file string
parseLoad :: Parser (IO ())
parseLoad = do
Megaparsec.string "load"
Megaparsec.space1
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
return do
string <- readFile file
putStrLn stringNow that each parser returns an IO () action, we no longer need the Command type, so we can delete the following datatype definition:
data Command = Print String | Save FilePath String | Load FilePathFinally, our main loop gets much simpler, because we no longer need to specify how to handle each command. That means that instead of handling each Command constructor:
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right command -> do
case command of
Print string -> do
putStrLn string
Save file string -> do
writeFile file string
Load file -> do
string <- readFile file
putStrLn string… we just run whatever IO () command was parsed, like this:
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right io -> do
ioNow we only need to make two changes to the code any time we add a new command. For example, all of the logic for the load command is right here:
parseLoad :: Parser (IO ())
parseLoad = do
Megaparsec.string "load"
Megaparsec.space1
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
return do
string <- readFile file
putStrLn string… and here:
parseCommand :: Parser (IO ())
parseCommand = parsePrint <|> parseSave <|> parseLoad
-- ↑… and that’s it. We no longer need to change our REPL loop or add a new constructor to our Command datatype (because there is no Command datatype any longer).
What’s neat about this trick is that the IO () command we return has direct access to variables extracted by the corresponding Parser. For example:
parseLoad = do
Megaparsec.string "load"
Megaparsec.space1
-- The `file` variable that we parse here …
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
return do
-- … can be referenced by the corresponding `IO` action here
string <- readFile file
putStrLn stringThere’s no need to pack our variables into a data structure and then unpack them again later on when we need to use them. This technique promotes tight and “vertically integrated” code where all of the logic is one place.
Final encodings
This trick is a special case of a more general trick known as a “final encoding” and the following post does a good job of explaining what “initial encoding” and “final encoding” mean:
To briefly explain initial and final encodings in my own words:
An “initial encoding” is one where you preserve as much information as possible in a data structure
This keeps your options as open as possible since you haven’t specified what to do with the data yet
A “final encoding” is one where you encode information by how you intend to use it
This tends to simplify your program if you know in advance how the information will be used
The initial example from this post was an initial encoding because each Parser returned a Command type which preserved as much information as possible without specifying what to do with it. The final example from this post was a final encoding because we encoded our commands by directly specifying what we planned to do with them.
Conclusion
This trick is not limited to returning IO actions from Parsers. For example, the following post illustrates a similar trick in the context of implementing configuration “wizards”:
… where a wizard has type IO (IO ()) (a command that returns another command).
More generally, you will naturally rediscover this trick if you stick to the principle of “keep each component’s logic all in one place”. In the above example the “components” were REPL commands, but this consolidation principle is useful for any sort of plugin-like system.
Appendix
Here is the complete code for the final version of the running example if you would like to test it out yourself:
{-# LANGUAGE BlockArguments #-}
import Control.Applicative ((<|>))
import Data.Void (Void)
import Text.Megaparsec (Parsec)
import qualified Data.Char as Char
import qualified System.IO as IO
import qualified Text.Megaparsec as Megaparsec
import qualified Text.Megaparsec.Char as Megaparsec
type Parser = Parsec Void String
parsePrint :: Parser (IO ())
parsePrint = do
Megaparsec.string "print"
Megaparsec.space1
string <- Megaparsec.takeRest
return do
putStrLn string
parseSave :: Parser (IO ())
parseSave = do
Megaparsec.string "save"
Megaparsec.space1
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
Megaparsec.space1
string <- Megaparsec.takeRest
return do
writeFile file string
parseLoad :: Parser (IO ())
parseLoad = do
Megaparsec.string "load"
Megaparsec.space1
file <- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
return do
string <- readFile file
putStrLn string
parseCommand :: Parser (IO ())
parseCommand = parsePrint <|> parseSave <|> parseLoad
main :: IO ()
main = do
putStr "> "
eof <- IO.isEOF
if eof
then do
putStrLn ""
else do
text <- getLine
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right io -> do
io
main
The trade-off here is that the parser's correctness is now harder to test. That seems worth it in this case, since the parser has been tested to be rock-solid.
ReplyDeleteZach Tellman has an interesting talk[1] about some similar ideas around the spectrum from data to execution, although it's geared towards imperative programmers and so leans in the opposite direction.
[1] https://www.youtube.com/watch?v=3oQTSP4FngY
Is this similar to Elm's commands that it sends to the update method?
ReplyDelete> Now we only need to make two changes to the code any time we add a new command.
ReplyDeleteIIRC, one of the motivations behind "hooks" in React was similar: to reduce the number of places in the code that should be modified when adding some new behavior.