Sunday, July 16, 2017

Demystifying Haskell assignment

This post clarifies the distinction between <- and = in Haskell, which sometimes mystifies newcomers to the language. For example, consider the following contrived code snippet:

main = do
    input <- getLine
    let output = input ++ "!"
    putStrLn output
    putStrLn (input ++ "!")

The above program reads one line of input, and then prints that line twice with an exclamation mark at the end, like this:

$ ./example
Hello<Enter>
Hello!
Hello!

Why does the first line use the <- symbol to assign a value to input while the second line uses the = symbol to define output? Most languages use only one symbol to assign values (such as = or :=), so why does Haskell use two?

Equality

Haskell bucks the trend because the = symbol does not mean assignment and instead means something stronger than in most programming languages. Whenever you see an equality sign in a Haskell program that means that the two sides are truly equal. You can substitute either side of the equality for the other side and this substitution works in both directions.

For example, we define output to be equal (i.e. synonymous) with the expression input ++ "!" in our original program. This means that anywhere we see output in our program we can replace output with input ++ "!" instead, like this:

main = do
    input <- getLine
    let output = input ++ "!"
    putStrLn (input ++ "!")
    putStrLn (input ++ "!")

Vice versa, anywhere we see input ++ "!" in our program we can reverse the substitution and replace the expression with output instead, like this:

main = do
    input <- getLine
    let output = input ++ "!"
    putStrLn output
    putStrLn output

The language enforces that these sorts of substitutions do not change the behavior of our program (with caveats, but this is mostly true). All three of the above programs have the same behavior because we always replace one expression with another equal expression. In Haskell, the equality symbol denotes true mathematical equality.

Assignment

Once we understand equality we can better understand why Haskell uses a separate symbol for assignment: <-. For example, lets revisit this assignment in our original program:

main = do
    -- Assign result of `getLine` to `input`
    input <- getLine
    ...

input and getLine are not equal in any sense of the word. They don't even have the same type!

The type of input is String:

input :: String

... whereas the type of getLine is IO String:

getLine :: IO String

... which you can think of as "a subroutine whose return value is a String". We can't substitute either one for the other because we would get a type error. For example, if we substitute all occurrences of input with getLine we would get an invalid program which does not type check:

main = do
    let output = getLine ++ "!"  -- Type error!
    putStrLn output
    putStrLn (getLine ++ "!")    -- Type error!

However, suppose we gloss over the type error and accept values of type IO String where the program expected just a String. Even then this substitution would still be wrong because our new program appears to request user input twice:

main = do
    let output = getLine ++ "!"  -- ← 1st request for input
    putStrLn output
    putStrLn (getLine ++ "!")    -- ← 2nd request for input

Contrast this with our original program, which only asks for a single line of input and reuses the line twice:

main = do
    input <- getLine  -- ← 1st and only request for input
    let output = input ++ "!"
    putStrLn output
    putStrLn (input ++ "!")

We cannot substitute the left-hand side of an assignment for the right-hand side of the assignment without changing the meaning of our program. This is why Haskell uses a separate symbol for assignment, because assignment does not denote equality.

Also, getLine and input are not even morally equal. getLine is a subroutine whose result may change every time, and to equate getLine with the result of any particular run doesn't make intuitive sense. That would be like calling the Unix ls command "a list of files".

Conclusion

Haskell has two separate symbols for <- and = because assignment and equality are not the same thing. Haskell just happens to be the first mainstream language that supports mathematical equality, which is why the language requires this symbolic distinction.

Language support for mathematical equality unlocks another useful language feature: equational reasoning. You can use more sophisticated equalities to formally reason about the behavior of larger programs, the same way you would reason about algebraic expressions in math.