This post illustrates a trick that I’ve taught a few times to minimize the “change surface” of a Haskell program. By “change surface” I mean the number of places Haskell code needs to be updated when adding a new feature.
The motivation
I’ll motivate the trick through the following example code for a simple REPL:
import Control.Applicative ((<|>))
import Data.Void (Void)
import Text.Megaparsec (Parsec)
import qualified Data.Char as Char
import qualified System.IO as IO
import qualified Text.Megaparsec as Megaparsec
import qualified Text.Megaparsec.Char as Megaparsec
type Parser = Parsec Void String
data Command = Print String | Save FilePath String
parsePrint :: Parser Command
= do
parsePrint "print"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeRest
string
return (Print string)
parseSave :: Parser Command
= do
parseSave "save"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
Megaparsec.space1
<- Megaparsec.takeRest
string
return (Save file string)
parseCommand :: Parser Command
= parsePrint <|> parseSave
parseCommand
main :: IO ()
= do
main putStr "> "
<- IO.isEOF
eof
if eof
then do
putStrLn ""
else do
<- getLine
text
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right command -> do
case command of
Print string -> do
putStrLn string
Save file string -> do
writeFile file string
main
This REPL supports two commands: print
and save
:
> print Hello, world!
Hello, world!
> save number.txt 42
print
echoes back whatever string you supply and save
writes the given string to a file.
Now suppose that we wanted to add a new load
command to read and display the contents of a file. We would need to change our code in four places.
First, we would need to change the Command
type to add a new Load
constructor:
data Command = Print String | Save FilePath String | Load FilePath
Second, we would need to add a new parser to parse the load
command:
parseLoad :: Parser Command
= do
parseLoad "load"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
return (Load file)
Third, we would need to add this new parser to parseCommand
:
parseCommand :: Parser Command
= parsePrint <|> parseSave <|> parseLoad parseCommand
Fourth, we would need to add logic for handling our new Load
constructor in our main
loop:
case command of
Print string -> do
putStrLn string
Save file string -> do
writeFile file string
Load file -> do
<- readFile file
string
putStrLn string
I’m not a fan of this sort of program structure because the logic for how to handle each command isn’t all in one place. However, we can make a small change to our program structure that will not only simplify the code but also consolidate the logic for each command.
The trick
We can restructure our code by changing the type of all of our parsers from this:
parsePrint :: Parser Command
parseSave :: Parser Command
parseLoad :: Parser Command
parseCommand :: Parser Command
… to this:
parsePrint :: Parser (IO ())
parseSave :: Parser (IO ())
parseLoad :: Parser (IO ())
parseCommand :: Parser (IO ())
In other words, our parsers now return an actual command (i.e. IO ()
) instead of returning a Command
data structure that still needs to be interpreted.
This entails the following changes to the implementation of our three command parsers:
{-# LANGUAGE BlockArguments #-}
parsePrint :: Parser (IO ())
= do
parsePrint "print"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeRest
string
return do
putStrLn string
parseSave :: Parser (IO ())
= do
parseSave "save"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
Megaparsec.space1
<- Megaparsec.takeRest
string
return do
writeFile file string
parseLoad :: Parser (IO ())
= do
parseLoad "load"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
return do
<- readFile file
string
putStrLn string
Now that each parser returns an IO ()
action, we no longer need the Command
type, so we can delete the following datatype definition:
data Command = Print String | Save FilePath String | Load FilePath
Finally, our main
loop gets much simpler, because we no longer need to specify how to handle each command. That means that instead of handling each Command
constructor:
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right command -> do
case command of
Print string -> do
putStrLn string
Save file string -> do
writeFile file string
Load file -> do
<- readFile file
string
putStrLn string
… we just run whatever IO ()
command was parsed, like this:
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right io -> do
io
Now we only need to make two changes to the code any time we add a new command. For example, all of the logic for the load command is right here:
parseLoad :: Parser (IO ())
= do
parseLoad "load"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
return do
<- readFile file
string
putStrLn string
… and here:
parseCommand :: Parser (IO ())
= parsePrint <|> parseSave <|> parseLoad
parseCommand -- ↑
… and that’s it. We no longer need to change our REPL loop or add a new constructor to our Command
datatype (because there is no Command
datatype any longer).
What’s neat about this trick is that the IO ()
command we return has direct access to variables extracted by the corresponding Parser
. For example:
= do
parseLoad "load"
Megaparsec.string
Megaparsec.space1
-- The `file` variable that we parse here …
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
return do
-- … can be referenced by the corresponding `IO` action here
<- readFile file
string
putStrLn string
There’s no need to pack our variables into a data structure and then unpack them again later on when we need to use them. This technique promotes tight and “vertically integrated” code where all of the logic is one place.
Final encodings
This trick is a special case of a more general trick known as a “final encoding” and the following post does a good job of explaining what “initial encoding” and “final encoding” mean:
To briefly explain initial and final encodings in my own words:
An “initial encoding” is one where you preserve as much information as possible in a data structure
This keeps your options as open as possible since you haven’t specified what to do with the data yet
A “final encoding” is one where you encode information by how you intend to use it
This tends to simplify your program if you know in advance how the information will be used
The initial example from this post was an initial encoding because each Parser
returned a Command
type which preserved as much information as possible without specifying what to do with it. The final example from this post was a final encoding because we encoded our commands by directly specifying what we planned to do with them.
Conclusion
This trick is not limited to returning IO
actions from Parser
s. For example, the following post illustrates a similar trick in the context of implementing configuration “wizards”:
… where a wizard has type IO (IO ())
(a command that returns another command).
More generally, you will naturally rediscover this trick if you stick to the principle of “keep each component’s logic all in one place”. In the above example the “components” were REPL commands, but this consolidation principle is useful for any sort of plugin-like system.
Appendix
Here is the complete code for the final version of the running example if you would like to test it out yourself:
{-# LANGUAGE BlockArguments #-}
import Control.Applicative ((<|>))
import Data.Void (Void)
import Text.Megaparsec (Parsec)
import qualified Data.Char as Char
import qualified System.IO as IO
import qualified Text.Megaparsec as Megaparsec
import qualified Text.Megaparsec.Char as Megaparsec
type Parser = Parsec Void String
parsePrint :: Parser (IO ())
= do
parsePrint "print"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeRest
string
return do
putStrLn string
parseSave :: Parser (IO ())
= do
parseSave "save"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
Megaparsec.space1
<- Megaparsec.takeRest
string
return do
writeFile file string
parseLoad :: Parser (IO ())
= do
parseLoad "load"
Megaparsec.string
Megaparsec.space1
<- Megaparsec.takeWhile1P Nothing (not . Char.isSpace)
file
return do
<- readFile file
string
putStrLn string
parseCommand :: Parser (IO ())
= parsePrint <|> parseSave <|> parseLoad
parseCommand
main :: IO ()
= do
main putStr "> "
<- IO.isEOF
eof
if eof
then do
putStrLn ""
else do
<- getLine
text
case Megaparsec.parse parseCommand "(input)" text of
Left e -> do
putStr (Megaparsec.errorBundlePretty e)
Right io -> do
io
main
The trade-off here is that the parser's correctness is now harder to test. That seems worth it in this case, since the parser has been tested to be rock-solid.
ReplyDeleteZach Tellman has an interesting talk[1] about some similar ideas around the spectrum from data to execution, although it's geared towards imperative programmers and so leans in the opposite direction.
[1] https://www.youtube.com/watch?v=3oQTSP4FngY
Is this similar to Elm's commands that it sends to the update method?
ReplyDelete> Now we only need to make two changes to the code any time we add a new command.
ReplyDeleteIIRC, one of the motivations behind "hooks" in React was similar: to reduce the number of places in the code that should be modified when adding some new behavior.