Saturday, July 13, 2013

Statements vs Expressions

Many programming languages have separate concepts for statements and expressions. I like to think of the distinction between them as follows:
Statement: What code does
Expression: What code is
I want to clarify that in this post I will deliberately use the term "statement" very broadly to refer to anything that is not an expression or type declaration. If you disagree with this terminology then I welcome suggestions for an alternative name.

The distinction between statements and expressions closely parallels the difference between imperative languages and functional languages:
Imperative: A language that emphasizes statements
Functional: A language that emphasizes expressions
C lies at one end of the spectrum, relying heavily on statements to accomplish everything. The classic example is iterating over an array:
#include <stdio.h>

int main(int argc, char *argv[]) {
    int elems[5] = {1, 2, 3, 4, 5}; // Statement

    int total = 0;
    int i;

    //   +- Statement
    //   |             +- Statement
    //   |             |
    //   v             v
    for (i = 0; i < 5; i++) {
        total += elems[i];  // Statement
    }
    printf("%d\n", total);  // Statement

    return 0;
}
Haskell lies at the exact opposite extreme, using expressions heavily:
main = print (sum [1..5])  -- Expression
In fact, Haskell takes this principle to the extreme: everything in Haskell is an expression, and even statements are expressions.

For example, the following code might appear to be a traditional imperative-style sequence of statements:
main = do
    putStrLn "Enter a number:"         -- Statement?
    str <- getLine                     -- Statement?
    putStrLn ("You entered: " ++ str)  -- Statement?
... but do notation is merely syntactic sugar for nested applications of (>>=), which is itself nothing more than an infix higher-order function:
main =
    putStrLn "Enter a number:" >>= (\_   ->  -- Expression
     getLine                   >>= (\str ->  -- Sub-expression
      putStrLn ("You entered: " ++ str) ))   -- Sub-expression
In Haskell, "statements" are actually nested expressions, and sequencing statements just builds larger and larger expressions.

This statement-as-expression paradigm promotes consistency and prevents arbitrary language limitations, such as Python's restriction of lambdas to single statements. In Haskell, you cannot limit the number of statements a term uses any more than you can limit the number of sub-expressions.


Monads


do notation works for more than just IO. Any type that implements the Monad class can be "sequenced" in statement form, as long as it supports the following two operations:
class Monad m where
    (>>=)  :: m a -> (a -> m b) -> m b

    return :: a -> m a
This provides a uniform interface for translating imperative statement-like syntax into expressions under the hood.

For example, the Maybe type (Haskell's version of nullable) implements the Monad class:
data Maybe a = Nothing | Just a

instance Monad Maybe where
    m >>= f = case m of
        Nothing -> Nothing
        Just a  -> f a

    return = Just
This lets you assemble Maybe-based computations using do notation, like so:
example :: Maybe Int
example = do
    x <- Just 1
    y <- Nothing
    return (x + y)
The above code desugars to nested calls to (>>=):
example =
    Just 1   >>= (\x ->
     Nothing >>= (\y ->
      return (x + y) ))
The compiler then substitutes in our definition of (>>=) and return, which produces the following expression:
example = case (Just 1) of
    Nothing -> Nothing
    Just x  -> case Nothing of
        Nothing -> Nothing
        Just y  -> Just (x + y)
We can then hand-evaluate this expression to prove that it short-circuits when it encounters Nothing:
-- Evaluate the outer `case`
example = case Nothing of
    Nothing -> Nothing
    Just y  -> Just (1 + y)

-- Evaluate the remaining `case`
example = Nothing

Semantics


Notice that we can evaluate these Maybe "statements" without invoking any sort of abstract machine. When everything is an expression, everything is simple to evaluate and does not require understanding or invoking an execution model.

In fact, the distinction between statements and expressions also closely parallels another important divide: the difference between operational semantics and denotational semantics.
Operational semantics: Translates code to abstract machine statements
Denotational semantics: Translates code to mathematical expressions
Haskell teaches you to think denotationally in terms of expressions and their meanings instead of statements and an abstract machine. This is why Haskell makes you a better programmer: you separate your mental model from the underlying execution model, so you can more easily identify common patterns between diverse programming languages and problem domains.