Monday, October 8, 2018

Detailed walkthrough for a beginner Haskell program

post

This post walks through the development of a small Haskell program for aligning equals symbols in a block of text. This walkthrough targets a beginning programmer by describing several steps and concepts in extra detail.

Note that this post describes how to author, compile, and run single-file Haskell programs for ease of experimentation and learning. For larger Haskell projects you will want to use cabal or stack to scaffold and run the project and to share your work with others. I mainly introduce things this way because this provides a lightweight way to get started and try the language.

Background

I obsess about visual cleanliness of the code I write because I optimize for ease of reading rather than ease of writing. One way I try to improve appearance is aligning equals signs. For example, the code might begin like this:

address = "192.168.0.44"
port = 22
hostname = "wind"

… and then I manually indent the equals signs to all align with each other:

address  = "192.168.0.44"
port     = 22
hostname = "wind"

My editor is vim, and you can install support for that using the Tabular plugin. However, I thought this would be an instructive example to implement from scratch to illustrate how to program in a functional style.

One neat feature of vim is that you can transform text inside your editor using any command line program. For example, I can select text using visual mode and then type:

:!some-command

… and vim will feed the selected text as standard input to the command-line program named some-command and replace the selected text with whatever the command emits to standard output.

This means that I only need to write a program that takes text to align on standard input and then emits the aligned text on standard output. I’ll call this program: align-equals.

Development environment

The command line is my “IDE”,so I will usually keep up to three terminal windows open:

  • One window edits text using vim
  • One window displays type errors using ghcid
  • One window tests the Haskell code I write in a REPL

I also use Nix, specifically a nix-shell to provision my development tools. I prefer Nix to provision development tools is because I don’t want to accumulate globally installed cruft on my system. Using Nix, I can provision any development tool or library transiently using nix-shell.

I will run all of the following examples inside of the following Nix shell (created for each new terminal):

That creates a transient shell that provides ghc, ghci, ghcid, and hoogle with the text and safe Haskell packages available. If I need to change the set of available Haskell packages I edit the command line and recreate the shell.

I turn on live type-checking by running:

… which will then automatically reload align-equals.hs every time the file changes and display any errors or warnings that the Haskell compiler has found.

In another terminal, I open the code I’m editing inside the ghci REPL so that I can interactively test the functions that I am writing:

Then in the third terminal, I actually edit the file:

The program

First, let’s describe what we plan to do using English prose, before we translate that to code:

We want to find the longest prefix preceding an = sign and then add spaces to the end of all other prefixes to match the length of the longest prefix.

Prefix length

In order to do that, we first need a function that computes the number of characters preceding an = sign for a given line. This function should have the following type:

You can read that type signature as saying: “prefixLength is a function whose input is a value of type Text (i.e. a line of input) and whose output is an Int (the number of characters preceding the first = sign)”. We can even add comments to that effect:

I import Data.Text because I prefer not to use the default String type provided by Haskell’s Prelude, which is inefficient. The text package provides a high-performance alternative type named Text along with many useful utilities.

The implementation of the prefixLength function directly follows the description:

As the name suggests, the prefixLength is the length of the prefix. The only non-trivial part is just knowing how to discover the Data.Text.breakOn function created for this purpose.

I usually browse the online documentation for Haskell packages by doing a Google search for “hackage ${package name}” (i.e. “hackage text”). That’s how I found the breakOn function here:

However, some people also prefer to use hoogle, which indexes and searches Haskell functions by name and by type. For example, if I’m looking for a function that splits a Text value into a prefix and suffix, I can run:

I can also verify that my function works as expected by testing my function in the long-running REPL I have open:

I have to enable theOverloadedStrings extension because I’m not using the default String type from the Haskell Prelude. This extension allows other packages to override how string literals behave to work with alternative implementations like Text.

One thing that’s neat about Haskell is how order-insensitive the language is. You can define things out of order and the compiler doesn’t mind. That’s why we can write our code as if it says: “prefixLength is the length of the prefix … and by the way you can compute the prefix and suffix by splitting the string using breakOn”.

This out-of-order coding style also jives well with lazy evaluation. Haskell is a “lazy” language, meaning that the program can evaluate things out-of-order, too, or even not at all if they are never used. For example, our prefixLength function never actually uses the suffix, so the program will never actually bother to compute or allocate that value.

The more you program in Haskell the less you think about your program as a sequence of statements and the more you think about your program as a graph of dependent computations.

Indenting a line

Now we need a function that pads a prefix with spaces up to a desired number of characters:

… or with comments:

This function is slightly longer, but still pretty straightforward:

Note that we can reorder all the lines after the where with no change to the program’s behavior. However, to keep things readable we order them so that the reader can understand them from top to bottom:

  • Split the line into a prefix and suffix
  • Compute the actual length of the prefix
  • Compute the number of spaces to pad by subtracting the actual length from the desired length
  • Create the padding by repeating a space the specified number of times
  • Create a new line by inserting the padding in between the prefix and suffix

Note that this reads a lot like a function defined in an imperative language. For example, the analogous Python code is not much different:

In general, you can port functional programs to imperative programs in this way if the functional program sticks to simple types (i.e. strings, numbers, records, etc.). For these simple programs, the functional program is essentially an imperative program where you forbid reassigning a value (i.e. “mutation”), which is generally good practice anyway for ease of program comprehension.

We can confirm that our function works by saving the complete program so far:

… and then :reloading that program in our REPL:

Indenting multiple lines

Now we can define a function to indent multiple lines:

This function takes advantage of two convenient utilities called lines and unlines.

Data.Text.lines splits a block of Text into a list of lines:

… and Data.Text.unlines combines a list of lines back into a block of Text:

You can use these two utilities to simplify line-oriented Text transformations in Haskell:

  • Split the block of Text into a list of lines
  • Process the list of lines into a new list of lines
  • Join the new list of lines back into a block of Text

The interesting part of our adjustText function is how we process the list of lines:

You can read the above code as saying:

  • Apply (“map”) the prefixLength function over each element of our list of lines to get a list of prefix lengths
  • Find the maximum length
  • If there is no maximum length, return an empty list of lines
  • If there is a maximum length, pad each line using that length

You might wonder: “Why would there not be a maximum length?” Well, consider the case where we begin with 0 lines of input: what is the maximum value of an empty list? The maximumMay function doesn’t throw an exception or return a bogus sentinel value that might be confused for real data. Instead, the maximumMay function returns an optional result:

The a in the type of maximumMay can be any type that is comparable ( i.e. implements Ord), and in the context of our code that type is Int, so we can pretend that the maximumMay function actually has this more specific type:

In other words, given a list of Ints, the maximumMay function will Maybe return an Int (i.e. it might return an Int or it might not). This means that the result will either be a Nothing (i.e. no result) or an Int value wrapped inside of a Just.

We consume the result of maximumMay by “pattern matching” on the result, like this:

The first branch covers the case where the list is empty. Here, the desiredPrefixLength is not in scope, so if we try to use that value we will get a type error. This provides a nice safeguard so that we can’t attempt to access a result that does not exist. In other languages this might be caught at runtime as a java.lang.NullPointerException or AttributeError: 'NoneType' object has no attribute 'x', but with pattern matching the type-checker can detect these sorts of bugs before we even attempt to run the program.

The second branch covers the case where the list is non-empty and does have a sensible maximum length. There we use that length to adjust each line.

The nice thing about pattern matching is that you can’t forget to handle these cases. If you try to use the result of maximumMay directly as an Int then you will get a type error. The maximumMay function wraps its own result in a Maybe to force downstream users to carefully consider how to handle the case where the list may be empty.

Tying it all together

So far all of our functions are “pure” functions, meaning that they are deterministic conversions from their inputs to their outputs that don’t mutate any variables and don’t have any side effects that we care about.

Carefully note the key phrase: “side effects that we care about”. These functions do technically have side effects, including:

  • Allocating memory/registers
  • Taking a non-zero amount of time to compute

Sometimes we do care about these types of side effects in certain contexts, like cryptography (where secure information can leak through these side channels) or embedded programming (where programs require careful attention to time/memory). However, for our simple utility these functions are effectively “pure”.

We can’t use our utility functions from the command line, though, unless we wrap them in a main function which a program can run, like this:

The interact function converts any pure Text transformation into a runnable program that transforms standard input into standard output using the supplied transformation:

This is an example of a “higher-order” function: a function whose input is another function. The input to the interact function is another function of type Text -> Text. Conveniently, our adjustText function has exactly the correct type to be supplied as an argument to the interact function:

… and then any value of type IO () can be assigned to main, which is what our program will run when invoked as a command-line executable.

If we save the following complete example to align-equals.hs:

… then we can compile that using:

… and verify that the executable works using:

That means that I can now use ./align-equals to transform my text buffer by selecting a text block like this:

address = "192.168.0.44"
port = 22
hostname = "wind"

… and running :!./align-equals to align the block:

address  = "192.168.0.44"
port     = 22
hostname = "wind"

… and now I don’t have to tediously align the code manually.

Conclusion

Hopefully this illustrates one way to learn the Haskell language in the course of building small but practical programs. The language introduces many cool features and concepts and this post only touches the tip of the iceberg.