This post walks through the development of a small Haskell program for aligning equals symbols in a block of text. This walkthrough targets a beginning programmer by describing several steps and concepts in extra detail.
Note that this post describes how to author, compile, and run single-file Haskell programs for ease of experimentation and learning. For larger Haskell projects you will want to use cabal
or stack
to scaffold and run the project and to share your work with others. I mainly introduce things this way because this provides a lightweight way to get started and try the language.
Background
I obsess about visual cleanliness of the code I write because I optimize for ease of reading rather than ease of writing. One way I try to improve appearance is aligning equals signs. For example, the code might begin like this:
address = "192.168.0.44"
port = 22
hostname = "wind"
… and then I manually indent the equals signs to all align with each other:
address = "192.168.0.44"
port = 22
hostname = "wind"
My editor is vim
, and you can install support for that using the Tabular plugin. However, I thought this would be an instructive example to implement from scratch to illustrate how to program in a functional style.
One neat feature of vim
is that you can transform text inside your editor using any command line program. For example, I can select text using visual mode and then type:
:!some-command
… and vim
will feed the selected text as standard input to the command-line program named some-command
and replace the selected text with whatever the command emits to standard output.
This means that I only need to write a program that takes text to align on standard input and then emits the aligned text on standard output. I’ll call this program: align-equals
.
Development environment
The command line is my “IDE”,so I will usually keep up to three terminal windows open:
- One window edits text using
vim
- One window displays type errors using
ghcid
- One window tests the Haskell code I write in a REPL
I also use Nix, specifically a nix-shell
to provision my development tools. I prefer Nix to provision development tools is because I don’t want to accumulate globally installed cruft on my system. Using Nix, I can provision any development tool or library transiently using nix-shell
.
I will run all of the following examples inside of the following Nix shell (created for each new terminal):
That creates a transient shell that provides ghc
, ghci
, ghcid
, and hoogle
with the text
and safe
Haskell packages available. If I need to change the set of available Haskell packages I edit the command line and recreate the shell.
I turn on live type-checking by running:
… which will then automatically reload align-equals.hs
every time the file changes and display any errors or warnings that the Haskell compiler has found.
In another terminal, I open the code I’m editing inside the ghci
REPL so that I can interactively test the functions that I am writing:
$ ghci align-equals.hs
GHCi, version 8.2.2: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( test.hs, interpreted )
Ok, one module loaded.
*Main>
Then in the third terminal, I actually edit the file:
The program
First, let’s describe what we plan to do using English prose, before we translate that to code:
We want to find the longest prefix preceding an =
sign and then add spaces to the end of all other prefixes to match the length of the longest prefix.
Prefix length
In order to do that, we first need a function that computes the number of characters preceding an =
sign for a given line. This function should have the following type:
You can read that type signature as saying: “prefixLength
is a function whose input is a value of type Text
(i.e. a line of input) and whose output is an Int
(the number of characters preceding the first =
sign)”. We can even add comments to that effect:
prefixLength
:: Text
-- ^ Line of input
-> Int
-- ^ Number of characters preceding the first @=@ sign
I import
Data.Text
because I prefer not to use the default String
type provided by Haskell’s Prelude, which is inefficient. The text
package provides a high-performance alternative type named Text
along with many useful utilities.
The implementation of the prefixLength
function directly follows the description:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text)
import qualified Data.Text
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
where
(prefix, suffix) = Data.Text.breakOn "=" line
As the name suggests, the prefixLength
is the length
of the prefix
. The only non-trivial part is just knowing how to discover the Data.Text.breakOn
function created for this purpose.
I usually browse the online documentation for Haskell packages by doing a Google search for “hackage ${package name}” (i.e. “hackage text”). That’s how I found the breakOn
function here:
However, some people also prefer to use hoogle
, which indexes and searches Haskell functions by name and by type. For example, if I’m looking for a function that splits a Text
value into a prefix and suffix, I can run:
$ hoogle 'Text -> (Text, Text)'
Data.Text breakOn :: Text -> Text -> (Text, Text)
Data.Text breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOn :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text transpose :: [Text] -> [Text]
Data.Text.Lazy transpose :: [Text] -> [Text]
Data.Text intercalate :: Text -> [Text] -> Text
Data.Text.Lazy intercalate :: Text -> [Text] -> Text
Data.Text splitOn :: Text -> Text -> [Text]
Data.Text.Lazy splitOn :: Text -> Text -> [Text]
-- plus more results not shown, pass --count=20 to see more
I can also verify that my function works as expected by testing my function in the long-running REPL I have open:
*Main> :reload
Ok, one module loaded.
*Main> :set -XOverloadedStrings
*Main> Data.Text.breakOn "=" "foo = 1"
("foo ","= 1")
*Main> prefixLength "foo = 1"
4
I have to enable theOverloadedStrings
extension because I’m not using the default String
type from the Haskell Prelude. This extension allows other packages to override how string literals behave to work with alternative implementations like Text
.
One thing that’s neat about Haskell is how order-insensitive the language is. You can define things out of order and the compiler doesn’t mind. That’s why we can write our code as if it says: “prefixLength
is the length
of the prefix
… and by the way you can compute the prefix
and suffix
by splitting the string using breakOn
”.
This out-of-order coding style also jives well with lazy evaluation. Haskell is a “lazy” language, meaning that the program can evaluate things out-of-order, too, or even not at all if they are never used. For example, our prefixLength
function never actually uses the suffix
, so the program will never actually bother to compute or allocate that value.
The more you program in Haskell the less you think about your program as a sequence of statements and the more you think about your program as a graph of dependent computations.
Indenting a line
Now we need a function that pads a prefix with spaces up to a desired number of characters:
… or with comments:
adjustLine
:: Int
-- ^ The desired prefix length
-> Text
-- ^ The line to pad
-> Text
-- ^ The padded line
This function is slightly longer, but still pretty straightforward:
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
where
(prefix, suffix) = Data.Text.breakOn "=" oldLine
actualPrefixLength = Data.Text.length prefix
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = Data.Text.replicate additionalSpaces " "
newLine = Data.Text.concat [ prefix, spaces, suffix ]
Note that we can reorder all the lines after the where
with no change to the program’s behavior. However, to keep things readable we order them so that the reader can understand them from top to bottom:
- Split the line into a prefix and suffix
- Compute the actual length of the prefix
- Compute the number of spaces to pad by subtracting the actual length from the desired length
- Create the padding by repeating a space the specified number of times
- Create a new line by inserting the padding in between the prefix and suffix
Note that this reads a lot like a function defined in an imperative language. For example, the analogous Python code is not much different:
def adjustLine(desiredPrefixLength, oldLine):
(prefix, suffix) = oldLine.split("=")
actualPrefixLength = len(prefix)
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = " " * additionalSpaces
# Python's `split` strips the `=`, so we have to reintroduce it
newLine = "".join([ prefix, spaces, "=", suffix ])
return newLine
In general, you can port functional programs to imperative programs in this way if the functional program sticks to simple types (i.e. strings, numbers, records, etc.). For these simple programs, the functional program is essentially an imperative program where you forbid reassigning a value (i.e. “mutation”), which is generally good practice anyway for ease of program comprehension.
We can confirm that our function works by saving the complete program so far:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text)
import qualified Data.Text
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
where
(prefix, suffix) = Data.Text.breakOn "=" line
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
where
(prefix, suffix) = Data.Text.breakOn "=" oldLine
actualPrefixLength = Data.Text.length prefix
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = Data.Text.replicate additionalSpaces " "
newLine = Data.Text.concat [ prefix, spaces, suffix ]
… and then :reload
ing that program in our REPL:
Indenting multiple lines
Now we can define a function to indent multiple lines:
import Safe
adjustText :: Text -> Text
adjustText oldText = newText
where
oldLines = Data.Text.lines oldText
prefixLengths = map prefixLength oldLines
newLines =
case Safe.maximumMay prefixLengths of
Nothing ->
[]
Just desiredPrefixLength ->
map (adjustLine desiredPrefixLength) oldLines
newText = Data.Text.unlines newLines
This function takes advantage of two convenient utilities called lines
and unlines
.
Data.Text.lines
splits a block of Text
into a list of lines:
… and Data.Text.unlines
combines a list of lines back into a block of Text
:
You can use these two utilities to simplify line-oriented Text
transformations in Haskell:
- Split the block of
Text
into a list of lines - Process the list of lines into a new list of lines
- Join the new list of lines back into a block of
Text
The interesting part of our adjustText
function is how we process the list of lines:
prefixLengths = map prefixLength oldLines
newLines =
case Safe.maximumMay prefixLengths of
Nothing ->
[]
Just desiredPrefixLength ->
map (adjustLine desiredPrefixLength) oldLines
You can read the above code as saying:
- Apply (“map”) the
prefixLength
function over each element of our list of lines to get a list of prefix lengths - Find the maximum length
- If there is no maximum length, return an empty list of lines
- If there is a maximum length, pad each line using that length
You might wonder: “Why would there not be a maximum length?” Well, consider the case where we begin with 0 lines of input: what is the maximum value of an empty list? The maximumMay
function doesn’t throw an exception or return a bogus sentinel value that might be confused for real data. Instead, the maximumMay
function returns an optional result:
The a
in the type of maximumMay
can be any type that is comparable ( i.e. implements Ord
), and in the context of our code that type is Int
, so we can pretend that the maximumMay
function actually has this more specific type:
In other words, given a list of Int
s, the maximumMay
function will Maybe
return an Int
(i.e. it might return an Int
or it might not). This means that the result will either be a Nothing
(i.e. no result) or an Int
value wrapped inside of a Just
.
We consume the result of maximumMay
by “pattern matching” on the result, like this:
case Safe.maximumMay prefixLengths of
Nothing ->
... -- First branch
Just desiredPrefixLength ->
... -- Second branch
The first branch covers the case where the list is empty. Here, the desiredPrefixLength
is not in scope, so if we try to use that value we will get a type error. This provides a nice safeguard so that we can’t attempt to access a result that does not exist. In other languages this might be caught at runtime as a java.lang.NullPointerException
or AttributeError: 'NoneType' object has no attribute 'x'
, but with pattern matching the type-checker can detect these sorts of bugs before we even attempt to run the program.
The second branch covers the case where the list is non-empty and does have a sensible maximum length. There we use that length to adjust each line.
The nice thing about pattern matching is that you can’t forget to handle these cases. If you try to use the result of maximumMay
directly as an Int
then you will get a type error. The maximumMay
function wraps its own result in a Maybe
to force downstream users to carefully consider how to handle the case where the list may be empty.
Tying it all together
So far all of our functions are “pure” functions, meaning that they are deterministic conversions from their inputs to their outputs that don’t mutate any variables and don’t have any side effects that we care about.
Carefully note the key phrase: “side effects that we care about”. These functions do technically have side effects, including:
- Allocating memory/registers
- Taking a non-zero amount of time to compute
Sometimes we do care about these types of side effects in certain contexts, like cryptography (where secure information can leak through these side channels) or embedded programming (where programs require careful attention to time/memory). However, for our simple utility these functions are effectively “pure”.
We can’t use our utility functions from the command line, though, unless we wrap them in a main
function which a program can run, like this:
The interact
function converts any pure Text
transformation into a runnable program that transforms standard input into standard output using the supplied transformation:
This is an example of a “higher-order” function: a function whose input is another function. The input to the interact
function is another function of type Text -> Text
. Conveniently, our adjustText
function has exactly the correct type to be supplied as an argument to the interact
function:
… and then any value of type IO ()
can be assigned to main
, which is what our program will run when invoked as a command-line executable.
If we save the following complete example to align-equals.hs
:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text (Text)
import qualified Data.Text
import qualified Data.Text.IO
import qualified Safe
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
where
(prefix, suffix) = Data.Text.breakOn "=" line
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
where
(prefix, suffix) = Data.Text.breakOn "=" oldLine
actualPrefixLength = Data.Text.length prefix
additionalSpaces = desiredPrefixLength - actualPrefixLength
spaces = Data.Text.replicate additionalSpaces " "
newLine = Data.Text.concat [ prefix, spaces, suffix ]
adjustText :: Text -> Text
adjustText oldText = newText
where
oldLines = Data.Text.lines oldText
prefixLengths = map prefixLength oldLines
newLines =
case Safe.maximumMay prefixLengths of
Nothing ->
[]
Just desiredPrefixLength ->
map (adjustLine desiredPrefixLength) oldLines
newText = Data.Text.unlines newLines
main :: IO ()
main = Data.Text.IO.interact adjustText
… then we can compile that using:
… and verify that the executable works using:
That means that I can now use ./align-equals
to transform my text buffer by selecting a text block like this:
address = "192.168.0.44"
port = 22
hostname = "wind"
… and running :!./align-equals
to align the block:
address = "192.168.0.44"
port = 22
hostname = "wind"
… and now I don’t have to tediously align the code manually.
Conclusion
Hopefully this illustrates one way to learn the Haskell language in the course of building small but practical programs. The language introduces many cool features and concepts and this post only touches the tip of the iceberg.