This post walks through the development of a small Haskell program for aligning equals symbols in a block of text. This walkthrough targets a beginning programmer by describing several steps and concepts in extra detail.
Note that this post describes how to author, compile, and run single-file Haskell programs for ease of experimentation and learning. For larger Haskell projects you will want to use cabal or stack to scaffold and run the project and to share your work with others. I mainly introduce things this way because this provides a lightweight way to get started and try the language.
Background
I obsess about visual cleanliness of the code I write because I optimize for ease of reading rather than ease of writing. One way I try to improve appearance is aligning equals signs. For example, the code might begin like this:
address = "192.168.0.44"
port = 22
hostname = "wind"… and then I manually indent the equals signs to all align with each other:
address  = "192.168.0.44"
port     = 22
hostname = "wind"My editor is vim, and you can install support for that using the Tabular plugin. However, I thought this would be an instructive example to implement from scratch to illustrate how to program in a functional style.
One neat feature of vim is that you can transform text inside your editor using any command line program. For example, I can select text using visual mode and then type:
:!some-command… and vim will feed the selected text as standard input to the command-line program named some-command and replace the selected text with whatever the command emits to standard output.
This means that I only need to write a program that takes text to align on standard input and then emits the aligned text on standard output. I’ll call this program: align-equals.
Development environment
The command line is my “IDE”,so I will usually keep up to three terminal windows open:
- One window edits text using vim
- One window displays type errors using ghcid
- One window tests the Haskell code I write in a REPL
I also use Nix, specifically a nix-shell to provision my development tools. I prefer Nix to provision development tools is because I don’t want to accumulate globally installed cruft on my system. Using Nix, I can provision any development tool or library transiently using nix-shell.
I will run all of the following examples inside of the following Nix shell (created for each new terminal):
That creates a transient shell that provides ghc, ghci, ghcid, and hoogle with the text and safe Haskell packages available. If I need to change the set of available Haskell packages I edit the command line and recreate the shell.
I turn on live type-checking by running:
… which will then automatically reload align-equals.hs every time the file changes and display any errors or warnings that the Haskell compiler has found.
In another terminal, I open the code I’m editing inside the ghci REPL so that I can interactively test the functions that I am writing:
$ ghci align-equals.hs
GHCi, version 8.2.2: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Main             ( test.hs, interpreted )
Ok, one module loaded.
*Main>Then in the third terminal, I actually edit the file:
The program
First, let’s describe what we plan to do using English prose, before we translate that to code:
We want to find the longest prefix preceding an = sign and then add spaces to the end of all other prefixes to match the length of the longest prefix.
Prefix length
In order to do that, we first need a function that computes the number of characters preceding an = sign for a given line. This function should have the following type:
You can read that type signature as saying: “prefixLength is a function whose input is a value of type Text (i.e. a line of input) and whose output is an Int (the number of characters preceding the first = sign)”. We can even add comments to that effect:
prefixLength
    :: Text
    -- ^ Line of input
    -> Int
    -- ^ Number of characters preceding the first @=@ signI import Data.Text because I prefer not to use the default String type provided by Haskell’s Prelude, which is inefficient. The text package provides a high-performance alternative type named Text along with many useful utilities.
The implementation of the prefixLength function directly follows the description:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text)
import qualified Data.Text
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" lineAs the name suggests, the prefixLength is the length of the prefix. The only non-trivial part is just knowing how to discover the Data.Text.breakOn function created for this purpose.
I usually browse the online documentation for Haskell packages by doing a Google search for “hackage ${package name}” (i.e. “hackage text”). That’s how I found the breakOn function here:
However, some people also prefer to use hoogle, which indexes and searches Haskell functions by name and by type. For example, if I’m looking for a function that splits a Text value into a prefix and suffix, I can run:
$ hoogle 'Text -> (Text, Text)'
Data.Text breakOn :: Text -> Text -> (Text, Text)
Data.Text breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOn :: Text -> Text -> (Text, Text)
Data.Text.Lazy breakOnEnd :: Text -> Text -> (Text, Text)
Data.Text transpose :: [Text] -> [Text]
Data.Text.Lazy transpose :: [Text] -> [Text]
Data.Text intercalate :: Text -> [Text] -> Text
Data.Text.Lazy intercalate :: Text -> [Text] -> Text
Data.Text splitOn :: Text -> Text -> [Text]
Data.Text.Lazy splitOn :: Text -> Text -> [Text]
-- plus more results not shown, pass --count=20 to see moreI can also verify that my function works as expected by testing my function in the long-running REPL I have open:
*Main> :reload
Ok, one module loaded.
*Main> :set -XOverloadedStrings
*Main> Data.Text.breakOn "=" "foo = 1"
("foo ","= 1")
*Main> prefixLength "foo = 1"
4I have to enable theOverloadedStrings extension because I’m not using the default String type from the Haskell Prelude. This extension allows other packages to override how string literals behave to work with alternative implementations like Text.
One thing that’s neat about Haskell is how order-insensitive the language is. You can define things out of order and the compiler doesn’t mind. That’s why we can write our code as if it says: “prefixLength is the length of the prefix … and by the way you can compute the prefix and suffix by splitting the string using breakOn”.
This out-of-order coding style also jives well with lazy evaluation. Haskell is a “lazy” language, meaning that the program can evaluate things out-of-order, too, or even not at all if they are never used. For example, our prefixLength function never actually uses the suffix, so the program will never actually bother to compute or allocate that value.
The more you program in Haskell the less you think about your program as a sequence of statements and the more you think about your program as a graph of dependent computations.
Indenting a line
Now we need a function that pads a prefix with spaces up to a desired number of characters:
… or with comments:
adjustLine
    :: Int
    -- ^ The desired prefix length
    -> Text
    -- ^ The line to pad
    -> Text
    -- ^ The padded lineThis function is slightly longer, but still pretty straightforward:
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine
    actualPrefixLength = Data.Text.length prefix
    additionalSpaces = desiredPrefixLength - actualPrefixLength
    spaces = Data.Text.replicate additionalSpaces " "
    newLine = Data.Text.concat [ prefix, spaces, suffix ]Note that we can reorder all the lines after the where with no change to the program’s behavior. However, to keep things readable we order them so that the reader can understand them from top to bottom:
- Split the line into a prefix and suffix
- Compute the actual length of the prefix
- Compute the number of spaces to pad by subtracting the actual length from the desired length
- Create the padding by repeating a space the specified number of times
- Create a new line by inserting the padding in between the prefix and suffix
Note that this reads a lot like a function defined in an imperative language. For example, the analogous Python code is not much different:
def adjustLine(desiredPrefixLength, oldLine):
    (prefix, suffix) = oldLine.split("=")
    actualPrefixLength = len(prefix)
    additionalSpaces = desiredPrefixLength - actualPrefixLength
    spaces = " " * additionalSpaces
    # Python's `split` strips the `=`, so we have to reintroduce it
    newLine = "".join([ prefix, spaces, "=", suffix ])
    return newLineIn general, you can port functional programs to imperative programs in this way if the functional program sticks to simple types (i.e. strings, numbers, records, etc.). For these simple programs, the functional program is essentially an imperative program where you forbid reassigning a value (i.e. “mutation”), which is generally good practice anyway for ease of program comprehension.
We can confirm that our function works by saving the complete program so far:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text)
import qualified Data.Text
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine
    actualPrefixLength = Data.Text.length prefix
    additionalSpaces = desiredPrefixLength - actualPrefixLength
    spaces = Data.Text.replicate additionalSpaces " "
    newLine = Data.Text.concat [ prefix, spaces, suffix ]… and then :reloading that program in our REPL:
Indenting multiple lines
Now we can define a function to indent multiple lines:
import Safe
adjustText :: Text -> Text
adjustText oldText = newText
  where
    oldLines = Data.Text.lines oldText
    prefixLengths = map prefixLength oldLines
    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLines
    newText = Data.Text.unlines newLinesThis function takes advantage of two convenient utilities called lines and unlines.
Data.Text.lines splits a block of Text into a list of lines:
… and Data.Text.unlines combines a list of lines back into a block of Text:
You can use these two utilities to simplify line-oriented Text transformations in Haskell:
- Split the block of Textinto a list of lines
- Process the list of lines into a new list of lines
- Join the new list of lines back into a block of Text
The interesting part of our adjustText function is how we process the list of lines:
    prefixLengths = map prefixLength oldLines
    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLinesYou can read the above code as saying:
- Apply (“map”) the prefixLengthfunction over each element of our list of lines to get a list of prefix lengths
- Find the maximum length
- If there is no maximum length, return an empty list of lines
- If there is a maximum length, pad each line using that length
You might wonder: “Why would there not be a maximum length?” Well, consider the case where we begin with 0 lines of input: what is the maximum value of an empty list? The maximumMay function doesn’t throw an exception or return a bogus sentinel value that might be confused for real data. Instead, the maximumMay function returns an optional result:
The a in the type of maximumMay can be any type that is comparable ( i.e. implements Ord), and in the context of our code that type is Int, so we can pretend that the maximumMay function actually has this more specific type:
In other words, given a list of Ints, the maximumMay function will Maybe return an Int (i.e. it might return an Int or it might not). This means that the result will either be a Nothing (i.e. no result) or an Int value wrapped inside of a Just.
We consume the result of maximumMay by “pattern matching” on the result, like this:
        case Safe.maximumMay prefixLengths of
            Nothing ->
                ...  -- First branch
            Just desiredPrefixLength ->
                ...  -- Second branchThe first branch covers the case where the list is empty. Here, the desiredPrefixLength is not in scope, so if we try to use that value we will get a type error. This provides a nice safeguard so that we can’t attempt to access a result that does not exist. In other languages this might be caught at runtime as a java.lang.NullPointerException or AttributeError: 'NoneType' object has no attribute 'x', but with pattern matching the type-checker can detect these sorts of bugs before we even attempt to run the program.
The second branch covers the case where the list is non-empty and does have a sensible maximum length. There we use that length to adjust each line.
The nice thing about pattern matching is that you can’t forget to handle these cases. If you try to use the result of maximumMay directly as an Int then you will get a type error. The maximumMay function wraps its own result in a Maybe to force downstream users to carefully consider how to handle the case where the list may be empty.
Tying it all together
So far all of our functions are “pure” functions, meaning that they are deterministic conversions from their inputs to their outputs that don’t mutate any variables and don’t have any side effects that we care about.
Carefully note the key phrase: “side effects that we care about”. These functions do technically have side effects, including:
- Allocating memory/registers
- Taking a non-zero amount of time to compute
Sometimes we do care about these types of side effects in certain contexts, like cryptography (where secure information can leak through these side channels) or embedded programming (where programs require careful attention to time/memory). However, for our simple utility these functions are effectively “pure”.
We can’t use our utility functions from the command line, though, unless we wrap them in a main function which a program can run, like this:
The interact function converts any pure Text transformation into a runnable program that transforms standard input into standard output using the supplied transformation:
This is an example of a “higher-order” function: a function whose input is another function. The input to the interact function is another function of type Text -> Text. Conveniently, our adjustText function has exactly the correct type to be supplied as an argument to the interact function:
… and then any value of type IO () can be assigned to main, which is what our program will run when invoked as a command-line executable.
If we save the following complete example to align-equals.hs:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text (Text)
import qualified Data.Text
import qualified Data.Text.IO
import qualified Safe
prefixLength :: Text -> Int
prefixLength line = Data.Text.length prefix
  where
    (prefix, suffix) = Data.Text.breakOn "=" line
adjustLine :: Int -> Text -> Text
adjustLine desiredPrefixLength oldLine = newLine
  where
    (prefix, suffix) = Data.Text.breakOn "=" oldLine
    actualPrefixLength = Data.Text.length prefix
    additionalSpaces = desiredPrefixLength - actualPrefixLength
    spaces = Data.Text.replicate additionalSpaces " "
    newLine = Data.Text.concat [ prefix, spaces, suffix ]
adjustText :: Text -> Text
adjustText oldText = newText
  where
    oldLines = Data.Text.lines oldText
    prefixLengths = map prefixLength oldLines
    newLines =
        case Safe.maximumMay prefixLengths of
            Nothing ->
                []
            Just desiredPrefixLength ->
                map (adjustLine desiredPrefixLength) oldLines
    newText = Data.Text.unlines newLines
main :: IO ()
main = Data.Text.IO.interact adjustText… then we can compile that using:
… and verify that the executable works using:
That means that I can now use ./align-equals to transform my text buffer by selecting a text block like this:
address = "192.168.0.44"
port = 22
hostname = "wind"… and running :!./align-equals to align the block:
address  = "192.168.0.44"
port     = 22
hostname = "wind"… and now I don’t have to tediously align the code manually.
Conclusion
Hopefully this illustrates one way to learn the Haskell language in the course of building small but practical programs. The language introduces many cool features and concepts and this post only touches the tip of the iceberg.
Your setup is very interesting. Thanks for sharing. Will read the rest at a later time...
ReplyDeleteThanks! This is an awesome exercise for beginners like me :)
ReplyDeleteStill in the middle of trying this out, but so far I've noticed that you forgot a couple of instructions:
1. Each terminal needs to call nix-shell before running ghcid, ghci, and vi.
2. Need to import Safe in order to be able to compile.
Thanks again!
Fixed! Thanks :)
DeleteGreat post. Just to remind you the little command that can do the same `column -ts= -o=`
ReplyDelete"… and Data.Text.lines combines a list of lines back into a block of Text:" should read "... and Data.Text.unlines"?
ReplyDeleteThank you! Fixed
DeleteGreat intro to haskell. I am a haskell beginner and I have been reading a few books. Whats your suggestion for advancing to the intermediate level? Thanks!
ReplyDeleteI wrote up a post with advice for Haskell beginners: http://www.haskellforall.com/2017/10/advice-for-haskell-beginners.html
DeleteThanks for another great blog post!
DeleteYou're welcome! :)
Deleteway too long
ReplyDeleteThanks for this, lots of great info!
ReplyDeleteSeeing a fully worked-out, detailed application is very helpful, thank you!
ReplyDeleteOne minor note: in my experience, aligning equals signs actually makes code *harder* to read, because it emphasizes the columns of names and values (which we don’t care about), rather than the individual assignments. In other words, we mostly care about reading assignments horizontally, but the extra space introduced by the = alignment actually makes it *harder* to do that.
ReplyDeleteAbout my earlier comment: I think this is a great example for a basic Haskell program; I just wish people would stop claiming that there’s any value to aligning equals signs in particular. :) If this had been an example for aligning Markdown tables on | , day, I’d have been right there with you.
ReplyDeleteVery Thanks!
ReplyDeleteOn a line with no "=", doesprefixLength return the length of the entire line?
ReplyDeleteYes
DeleteDoes that mean it will try to make that many spaces?
DeleteYes, I believe so
Delete