Sunday, January 1, 2012

Haskell for C Programmers - For Loops

C programmers make incredibly heavy use of for loops and when they switch over to Haskell they have trouble writing idiomatic code because Haskell doesn't provide an analagous programming construct. Simon Peyton Jones calls Haskell "the world's finest imperative programming language", so I'll take a few simple C code examples and translate them to exactly equivalent Haskell and then compare them to idiomatic Haskell.

Folds


Let's begin with a sum function in C:
double sum(double *xs, size_t num_xs)
{
    size_t i;
    double result = 0.0;
    for (i = 0; i < num_xs; i++)
    { result += xs[i]; }
    return result;
}
For an exact translation, I'll first define a while and for function in Haskell:
while :: (Monad m) => m Bool -> m a -> m ()
while cond action = do
    c <- cond
    when c $ do
        action
        while cond action

for :: (Monad m) => m a -> m Bool -> m b -> m c -> m ()
for init cond post action = do
    init
    while cond $ do
        action
        post
Then we need to define a global state with default values:
data Status = Status { _i :: Int, _result :: Double }

class Default a where def :: a

instance Default Int    where def = 0
instance Default Double where def = 0.0
instance Default Status where def = Status def def
... and we can use the data-lens package, specifically Data.Lens.Lazy, to make state updates much easier if we define a few lenses first:
i      = lens _i      (\x s -> s { _i      = x })
result = lens _result (\x s -> s { _result = x })
Now we are good to go! The final translation is:
sum :: [Double] -> Int -> Double
sum xs n = flip evalState def $ do
    result ~= 0
    for (i ~= 0) (liftM (< n) (access i)) (i += 1) $
        i' <- access i
        result += (xs !! i')
    access result
However, we don't really need to use the length, because Control.Monad provides a way to call an action once for each element of a list, namely forM. This eliminates the n and i variables completely and removes any chance of an out-of-bounds run-time error by providing the wrong list length. forM resembles other languages' foreach constructs, except it is more general:
sum :: [Double] -> Double
sum xs = flip evalState def $ do
    -- i.e. foreach (double x in xs) { ... }
    forM xs $ \x ->
        modify (\a -> a + x)
    get
Much better! But we're just getting started. For monadic functions that fold a list into a single value, Control.Monad provides the foldM function:
sum xs = flip evalState def $
    foldM (\a x -> return (a + x)) 0.0 xs
But now we're not even using any features of the State monad anymore. The above function is equivalent to:
sum xs = foldl (\x a -> x + a) 0.0 xs
... and we can contract it further to:
sum = foldl (+) 0.0
... which is how the Prelude defines it. So any time you need to simply fold a bunch of values into a single value, take a look at foldr/foldl first.

Loops


What if I want to print a list of values, one value per line? In C, it would be:
void printer(double *xs, size_t num_xs)
{
    for (i = 0; i < num_xs; i++)
    { printf("%f\n", xs[i]); }
}
And in Haskell, the exact translation would be:
printer xs n = flip execStateT 0 $
    for (put 0) (liftM (< n) get) (modify (+ 1)) $ do
        i' <- get
        print $ xs !! i'
Using forM_, we can do better:
printer xs = forM_ xs $ \x -> print x
... or just:
printer xs = forM_ xs print
... or we can use mapM_, which is forM_ with its arguments reversed:
printer = mapM_ print
Hopefully you can see that Haskell's forM/mapM replace C's for loops for monadic code, but you shouldn't rely on them like a crutch. For example, if I wanted to print all pair-wise combinations of two numbers from 1 to 10, I could write:
forM_ [1..10] $ \i ->
    forM_ [1..10] $ \j ->
        print (i, j)
... which looks an awful lot like a nested for-loop you'd write in C. But I could also write it like:
mapM_ print $ do
    i <- [1..10]
    j <- [1..10]
    return (i, j)

-- or use list comprehension syntactic sugar
mapM_ print [(i, j) | i <- [1..10], j <- [1..10]]
... or you could get even fancier and use the Applicative class:
mapM_ print $ (,) <$> [1..10] <*> [1..10]
... which almost seems like cheating.

Modularity


Idiomatic Haskell separates data generation from data consumption and the last function demonstrates that. You can partition the function into a generator and consumer of data:
generator = (,) <$> [1..10] <*> [1..10]
consumer = mapM_ print
This design pattern separate concerns and creates more modular code.

You might think this doesn't work when you need to process data before all of it is available, but you'd be wrong. Haskell uses enumerators and "iteratees" to solve precisely that problem. You write generator and consumer code independently, yet they still magically "fuse" into the correct data-processing pipeline when you combine them, handling data as it is generated. I will discuss these in a later post.

This leads to a good rule of thumb: strive to separate data generation from consumption, even when not programming in Haskell.

3 comments:

  1. For a newcomer to Haskell, this blog is not easy. Without introducing type system of haskell everything seems hazy. May be if you could write a blog to mark the correct order of studying your other blogs then it could be immensely helpful.
    Thanks for your efforts!

    ReplyDelete
  2. This is a big problem of Haskell. Unless you completely scrap everything you know, and learn EVERYTHING in haskell from examples. Only THEN can you look at your old code, spend a few dozen of hours figuring out what the program has to do (as it was initially designed in a testable imperative way, and haskell does not support imperative constructs), then you can, using the big dictionary of expressions defined in Haskell, write your program in a way which makes sense in terms of functional programming. After it works, you can rewrite it back as imperative program (this transition is easier) while knowing that it does exactly what you want it to do, without any distractions arising from data-loss during optimization step which sacrifices expressiveness for efficiency.

    In other words, it's a pain.

    ReplyDelete
    Replies
    1. You'd be right if you had made that remark a few years ago, back when Haskell was essentially playing catch-up with imperative languages. Things are different now and Haskell is slowly but surely beating imperative languages at their own game. I can think of at least four killer Haskell features off of the top of my head where Haskell is best-in-class, many of which are virtually inexistent in mainstream languages:

      * Software Transactional Memory
      * Modern Lenses (i.e. the `lens` package)
      * Coroutines (Shameless plug: including my `pipes` library)
      * Parser combinators

      All of those are pretty self-explanatory except for "Lenses", which are very poorly documented, so here's a lightning lens example to give you a taste of their power and expressiveness:

      >>> import Control.Lens
      >>> sumOf both (1, 3)
      4
      >>> sumOf (folded . folded) [[1, 2], [3, 4]]
      10
      >>> forMOf_ (folded . _1) [('A', 1), ('B, 2)] $ \c -> print c
      'A'
      'B'
      >>> set (_1 . mapped) 7 ([1, 2, 3], 'A')
      ([7,7,7],'A')

      I think you would have difficulty writing anything even remotely as powerful as that in an imperative language.

      Now combine that with a type system that makes programs immensely more bug-free and maintainable. Sure, you can translate a Haskell program back to an imperative one, but you'll wish you hadn't the moment your client requests new features or for you to rearchitect your code, both of which are significantly easier in Haskell.

      Haskell's strong and expressive type system acts like a silent guardian protecting you from your mistakes, so you feel much braver about adding new features or re-engineering large swaths of code, even more so than if you had constructed a very large and comprehensive test suite in another language. This alone is the greatest contributor to productivity in Haskell, since you spend less time writing tests and fixing bugs and more time actually implementing features, the way programming was meant to be.

      However, I want to caution you that pure functions are not the solution to everything. Haskell is not an excellent programming language because everything is done purely and functionally. Rather, Haskell is an excellent language because functional programming is the perfect foundation for rigorously building more sophisticated abstractions. Imperative languages can't do this because they can't opt-out of state and side effects, making it incredibly difficult to reason about the correctness of abstractions, whereas Haskell gets to opt-in whenever it wants.

      If you haven't read my category design pattern post then please do, because there I argue that the key to powerful, easy, and reliable programming is not just functions, but rather categories in general. Of the four killer features I mentioned above, only one actually uses the category of Haskell functions (lenses). I consider the overemphasis on pure functional style to be the biggest cultural weakness of the Haskell community.

      Delete