Saturday, July 13, 2013

Statements vs Expressions

Many programming languages have separate concepts for statements and expressions. I like to think of the distinction between them as follows:
Statement: What code does
Expression: What code is
I want to clarify that in this post I will deliberately use the term "statement" very broadly to refer to anything that is not an expression or type declaration. If you disagree with this terminology then I welcome suggestions for an alternative name.

The distinction between statements and expressions closely parallels the difference between imperative languages and functional languages:
Imperative: A language that emphasizes statements
Functional: A language that emphasizes expressions
C lies at one end of the spectrum, relying heavily on statements to accomplish everything. The classic example is iterating over an array:
#include <stdio.h>

int main(int argc, char *argv[]) {
    int elems[5] = {1, 2, 3, 4, 5}; // Statement

    int total = 0;
    int i;

    //   +- Statement
    //   |             +- Statement
    //   |             |
    //   v             v
    for (i = 0; i < 5; i++) {
        total += elems[i];  // Statement
    }
    printf("%d\n", total);  // Statement

    return 0;
}
Haskell lies at the exact opposite extreme, using expressions heavily:
main = print (sum [1..5])  -- Expression
In fact, Haskell takes this principle to the extreme: everything in Haskell is an expression, and even statements are expressions.

For example, the following code might appear to be a traditional imperative-style sequence of statements:
main = do
    putStrLn "Enter a number:"         -- Statement?
    str <- getLine                     -- Statement?
    putStrLn ("You entered: " ++ str)  -- Statement?
... but do notation is merely syntactic sugar for nested applications of (>>=), which is itself nothing more than an infix higher-order function:
main =
    putStrLn "Enter a number:" >>= (\_   ->  -- Expression
     getLine                   >>= (\str ->  -- Sub-expression
      putStrLn ("You entered: " ++ str) ))   -- Sub-expression
In Haskell, "statements" are actually nested expressions, and sequencing statements just builds larger and larger expressions.

This statement-as-expression paradigm promotes consistency and prevents arbitrary language limitations, such as Python's restriction of lambdas to single statements. In Haskell, you cannot limit the number of statements a term uses any more than you can limit the number of sub-expressions.


Monads


do notation works for more than just IO. Any type that implements the Monad class can be "sequenced" in statement form, as long as it supports the following two operations:
class Monad m where
    (>>=)  :: m a -> (a -> m b) -> m b

    return :: a -> m a
This provides a uniform interface for translating imperative statement-like syntax into expressions under the hood.

For example, the Maybe type (Haskell's version of nullable) implements the Monad class:
data Maybe a = Nothing | Just a

instance Monad Maybe where
    m >>= f = case m of
        Nothing -> Nothing
        Just a  -> f a

    return = Just
This lets you assemble Maybe-based computations using do notation, like so:
example :: Maybe Int
example = do
    x <- Just 1
    y <- Nothing
    return (x + y)
The above code desugars to nested calls to (>>=):
example =
    Just 1   >>= (\x ->
     Nothing >>= (\y ->
      return (x + y) ))
The compiler then substitutes in our definition of (>>=) and return, which produces the following expression:
example = case (Just 1) of
    Nothing -> Nothing
    Just x  -> case Nothing of
        Nothing -> Nothing
        Just y  -> Just (x + y)
We can then hand-evaluate this expression to prove that it short-circuits when it encounters Nothing:
-- Evaluate the outer `case`
example = case Nothing of
    Nothing -> Nothing
    Just y  -> Just (1 + y)

-- Evaluate the remaining `case`
example = Nothing

Semantics


Notice that we can evaluate these Maybe "statements" without invoking any sort of abstract machine. When everything is an expression, everything is simple to evaluate and does not require understanding or invoking an execution model.

In fact, the distinction between statements and expressions also closely parallels another important divide: the difference between operational semantics and denotational semantics.
Operational semantics: Translates code to abstract machine statements
Denotational semantics: Translates code to mathematical expressions
Haskell teaches you to think denotationally in terms of expressions and their meanings instead of statements and an abstract machine. This is why Haskell makes you a better programmer: you separate your mental model from the underlying execution model, so you can more easily identify common patterns between diverse programming languages and problem domains.

15 comments:

  1. missing a Just in the def of >>= for Maybe

    ReplyDelete
    Replies
    1. I double-checked it and I think it is correct. Here is the `Monad` instance from `Data.Maybe` for comparison:

      instance Monad Maybe where
      ....(Just x) >>= k = k x
      ....Nothing >>= _ = Nothing
      ....
      ....return = Just

      Where do you think the extra `Just` belongs?

      Delete
    2. This comment has been removed by the author.

      Delete
    3. my bad :-) thought you need Just k x, but k already has codomain Maybe b...

      Delete
    4. No apologies necessary! I appreciate corrections and I don't mind if you make mistakes. :)

      Delete
  2. But one day you discover that you want jumps. Like, imagine you walk around the tree and calculate the product of all its nodes. Apparently, when you encounter a zero, you can return zero right away.

    And that's where statement-oriented languages and operational semantics of an abstract machine perform better than expression-oriented languages and denotational semantics: in the former case, jumps are extremely straightforward, clear, and intuitive. But try to model them in a lambda-calculus! Introduce a denotational semantics for them! Well, okay, that's not impossible, and people figured it out by the middle of the seventies — you need continuations. But the result is pretty gross. CPS-transformed programs look awful, with all that plumbing lying in plain sight, and continuations themselves are kinda confusing.

    Sure, with right syntactic sugar you can do this. But reasoning about them is still not pleasant, especially in a lazy language — jumps mess up evaluation order, and there is little you can do about this.

    ReplyDelete
    Replies
    1. Breaking from a loop does not require continuation passing style. Read this other post of mine which shows a simple alternative using the `Either`/`EitherT` monads:

      http://www.haskellforall.com/2012/07/breaking-from-loop.html

      Delete
  3. Great post. I wrote about statements vs expressions in a blog post a while back too, but I think your's was much more illustrative. If you're interested in an F# vs C# perspective on this you can find my post here:
    http://richardminerich.com/2012/07/functional-programming-is-dead-long-live-expression-oriented-programming/

    ReplyDelete
    Replies
    1. Yeah, I really like your post. It does a really good job illustrating how common imperative features are not first-class in the sense that they do not have a value associated with them.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Vindicating to hear you saw my post. Sometimes it feels like blogging into the void.

      Reading yours I felt like I should have gone deeper. Thanks for doing the thorough treatment.

      Delete
    4. I don't know how WordPress works, but Blogger provides really nice statistics that you can use to track how many people visit your site and where they come from. This helps you know how broad your readership is and I also use these kinds of statistics to hone my writing by studying which of my posts are well received and get reshared frequently.

      Also, when you first start out it helps to pick a specific social network that is appropriate for your blog and submit posts there a few times. Having a thick skin helps, too. :) For example, I began by tailoring my posts to /r/Haskell and regularly submitting there while listening to their suggestions and criticisms (mostly criticisms) in the comments to improve the quality of my posts. That's how I slowly grew a reader base and developed a more personal connection with people who read my blog.

      Once you build an active and personal connection with your readers like that it greatly improves the joy of blogging and then you don't feel like you are talking into an empty void. Instead it feels like chatting in public with friends.

      Delete
  4. Nice post, but I don't agree with your analogy/distinction between operational semantics vs denotational semantics. For example, I don't think that this rule from operational semantics of simply typed lambda-calculus has anything to do with statements or imperative programming:

    e1 => (\a -> e2) e3 => v e2[v/a] => v2
    -----------------------------------------------
    (e1 e3) => v2

    It merely says how can we evaluate an _expression_ from subexpressions. Of course, operational semantics for imperative languages involve some sort of abstract machinery most of the time. But equally you can build denotational semantics for imperative languages: you can interpret statements as functions on the memory stores and then ';' operator as composition of functions.

    Thanks.

    - Dan

    ReplyDelete
    Replies
    1. Oops, looks like Blogger has jammed the formatting. Here's the rule in monofont: http://lpaste.net/8226532637376774144

      Delete
    2. Yes, you're right. Maybe a better way I could have phrased it is that operational semantics and denotational semantics overlap in the same way that statements and expressions overlap in Haskell.

      Delete