Haskell for all: Statements vs Expressions

Saturday, July 13, 2013

Statements vs Expressions

Many programming languages have separate concepts for statements and expressions. I like to think of the distinction between them as follows:

Statement: What code does

Expression: What code is

I want to clarify that in this post I will deliberately use the term "statement" very broadly to refer to anything that is not an expression or type declaration. If you disagree with this terminology then I welcome suggestions for an alternative name.

The distinction between statements and expressions closely parallels the difference between imperative languages and functional languages:

Imperative: A language that emphasizes statements

Functional: A language that emphasizes expressions

C lies at one end of the spectrum, relying heavily on statements to accomplish everything. The classic example is iterating over an array:

#include <stdio.h>

int main(int argc, char *argv[]) {
    int elems[5] = {1, 2, 3, 4, 5}; // Statement

    int total = 0;
    int i;

    //   +- Statement
    //   |             +- Statement
    //   |             |
    //   v             v
    for (i = 0; i < 5; i++) {
        total += elems[i];  // Statement
    }
    printf("%d\n", total);  // Statement

    return 0;
}

Haskell lies at the exact opposite extreme, using expressions heavily:

main = print (sum [1..5])  -- Expression

In fact, Haskell takes this principle to the extreme: everything in Haskell is an expression, and even statements are expressions.

For example, the following code might appear to be a traditional imperative-style sequence of statements:

main = do
    putStrLn "Enter a number:"         -- Statement?
    str <- getLine                     -- Statement?
    putStrLn ("You entered: " ++ str)  -- Statement?

... but do notation is merely syntactic sugar for nested applications of (>>=), which is itself nothing more than an infix higher-order function:

main =
    putStrLn "Enter a number:" >>= (\_   ->  -- Expression
     getLine                   >>= (\str ->  -- Sub-expression
      putStrLn ("You entered: " ++ str) ))   -- Sub-expression

In Haskell, "statements" are actually nested expressions, and sequencing statements just builds larger and larger expressions.

This statement-as-expression paradigm promotes consistency and prevents arbitrary language limitations, such as Python's restriction of lambdas to single statements. In Haskell, you cannot limit the number of statements a term uses any more than you can limit the number of sub-expressions.

Monads

do notation works for more than just IO. Any type that implements the Monad class can be "sequenced" in statement form, as long as it supports the following two operations:

class Monad m where
    (>>=)  :: m a -> (a -> m b) -> m b

    return :: a -> m a

This provides a uniform interface for translating imperative statement-like syntax into expressions under the hood.

For example, the Maybe type (Haskell's version of nullable) implements the Monad class:

data Maybe a = Nothing | Just a

instance Monad Maybe where
    m >>= f = case m of
        Nothing -> Nothing
        Just a  -> f a

    return = Just

This lets you assemble Maybe-based computations using do notation, like so:

example :: Maybe Int
example = do
    x <- Just 1
    y <- Nothing
    return (x + y)

The above code desugars to nested calls to (>>=):

example =
    Just 1   >>= (\x ->
     Nothing >>= (\y ->
      return (x + y) ))

The compiler then substitutes in our definition of (>>=) and return, which produces the following expression:

example = case (Just 1) of
    Nothing -> Nothing
    Just x  -> case Nothing of
        Nothing -> Nothing
        Just y  -> Just (x + y)

We can then hand-evaluate this expression to prove that it short-circuits when it encounters Nothing:

-- Evaluate the outer `case`
example = case Nothing of
    Nothing -> Nothing
    Just y  -> Just (1 + y)

-- Evaluate the remaining `case`
example = Nothing

Semantics

Notice that we can evaluate these Maybe "statements" without invoking any sort of abstract machine. When everything is an expression, everything is simple to evaluate and does not require understanding or invoking an execution model.

In fact, the distinction between statements and expressions also closely parallels another important divide: the difference between operational semantics and denotational semantics.

Operational semantics: Translates code to abstract machine statements

Denotational semantics: Translates code to mathematical expressions

Haskell teaches you to think denotationally in terms of expressions and their meanings instead of statements and an abstract machine. This is why Haskell makes you a better programmer: you separate your mental model from the underlying execution model, so you can more easily identify common patterns between diverse programming languages and problem domains.

19 comments:

EitanJuly 13, 2013 at 10:24 PM
missing a Just in the def of >>= for Maybe
ReplyDelete
Replies
Joker_vDJuly 15, 2013 at 3:03 AM
But one day you discover that you want jumps. Like, imagine you walk around the tree and calculate the product of all its nodes. Apparently, when you encounter a zero, you can return zero right away.

And that's where statement-oriented languages and operational semantics of an abstract machine perform better than expression-oriented languages and denotational semantics: in the former case, jumps are extremely straightforward, clear, and intuitive. But try to model them in a lambda-calculus! Introduce a denotational semantics for them! Well, okay, that's not impossible, and people figured it out by the middle of the seventies — you need continuations. But the result is pretty gross. CPS-transformed programs look awful, with all that plumbing lying in plain sight, and continuations themselves are kinda confusing.

Sure, with right syntactic sugar you can do this. But reasoning about them is still not pleasant, especially in a lazy language — jumps mess up evaluation order, and there is little you can do about this.
ReplyDelete
Replies
Richard MinerichJuly 15, 2013 at 5:15 PM
Great post. I wrote about statements vs expressions in a blog post a while back too, but I think your's was much more illustrative. If you're interested in an F# vs C# perspective on this you can find my post here:
http://richardminerich.com/2012/07/functional-programming-is-dead-long-live-expression-oriented-programming/
ReplyDelete
Replies
AnonymousJuly 21, 2013 at 1:55 PM
Nice post, but I don't agree with your analogy/distinction between operational semantics vs denotational semantics. For example, I don't think that this rule from operational semantics of simply typed lambda-calculus has anything to do with statements or imperative programming:

e1 => (\a -> e2) e3 => v e2[v/a] => v2
-----------------------------------------------
(e1 e3) => v2

It merely says how can we evaluate an _expression_ from subexpressions. Of course, operational semantics for imperative languages involve some sort of abstract machinery most of the time. But equally you can build denotational semantics for imperative languages: you can interpret statements as functions on the memory stores and then ';' operator as composition of functions.

Thanks.

- Dan
ReplyDelete
Replies
UnknownJanuary 23, 2015 at 8:25 PM
Thank you for share this informative post.
ReplyDelete
Replies
aliyaaNovember 26, 2016 at 6:32 AM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Saturday, July 13, 2013

Statements vs Expressions

Monads

Semantics

19 comments:

Followers