Haskell for all: Haskell for C Programmers

Sunday, January 1, 2012

Haskell for C Programmers - For Loops

C programmers make incredibly heavy use of for loops and when they switch over to Haskell they have trouble writing idiomatic code because Haskell doesn't provide an analagous programming construct. Simon Peyton Jones calls Haskell "the world's finest imperative programming language", so I'll take a few simple C code examples and translate them to exactly equivalent Haskell and then compare them to idiomatic Haskell.

Folds

Let's begin with a sum function in C:

double sum(double *xs, size_t num_xs)
{
    size_t i;
    double result = 0.0;
    for (i = 0; i < num_xs; i++)
    { result += xs[i]; }
    return result;
}

For an exact translation, I'll first define a while and for function in Haskell:

while :: (Monad m) => m Bool -> m a -> m ()
while cond action = do
    c <- cond
    when c $ do
        action
        while cond action

for :: (Monad m) => m a -> m Bool -> m b -> m c -> m ()
for init cond post action = do
    init
    while cond $ do
        action
        post

Then we need to define a global state with default values:

data Status = Status { _i :: Int, _result :: Double }

class Default a where def :: a

instance Default Int    where def = 0
instance Default Double where def = 0.0
instance Default Status where def = Status def def

... and we can use the data-lens package, specifically Data.Lens.Lazy, to make state updates much easier if we define a few lenses first:

i      = lens _i      (\x s -> s { _i      = x })
result = lens _result (\x s -> s { _result = x })

Now we are good to go! The final translation is:

sum :: [Double] -> Int -> Double
sum xs n = flip evalState def $ do
    result ~= 0
    for (i ~= 0) (liftM (< n) (access i)) (i += 1) $
        i' <- access i
        result += (xs !! i')
    access result

However, we don't really need to use the length, because Control.Monad provides a way to call an action once for each element of a list, namely forM. This eliminates the n and i variables completely and removes any chance of an out-of-bounds run-time error by providing the wrong list length. forM resembles other languages' foreach constructs, except it is more general:

sum :: [Double] -> Double
sum xs = flip evalState def $ do
    -- i.e. foreach (double x in xs) { ... }
    forM xs $ \x ->
        modify (\a -> a + x)
    get

Much better! But we're just getting started. For monadic functions that fold a list into a single value, Control.Monad provides the foldM function:

sum xs = flip evalState def $
    foldM (\a x -> return (a + x)) 0.0 xs

But now we're not even using any features of the State monad anymore. The above function is equivalent to:

sum xs = foldl (\x a -> x + a) 0.0 xs

... and we can contract it further to:

sum = foldl (+) 0.0

... which is how the Prelude defines it. So any time you need to simply fold a bunch of values into a single value, take a look at foldr/foldl first.

Loops

What if I want to print a list of values, one value per line? In C, it would be:

void printer(double *xs, size_t num_xs)
{
    for (i = 0; i < num_xs; i++)
    { printf("%f\n", xs[i]); }
}

And in Haskell, the exact translation would be:

printer xs n = flip execStateT 0 $
    for (put 0) (liftM (< n) get) (modify (+ 1)) $ do
        i' <- get
        print $ xs !! i'

Using forM_, we can do better:

printer xs = forM_ xs $ \x -> print x

... or just:

printer xs = forM_ xs print

... or we can use mapM_, which is forM_ with its arguments reversed:

printer = mapM_ print

Hopefully you can see that Haskell's forM/mapM replace C's for loops for monadic code, but you shouldn't rely on them like a crutch. For example, if I wanted to print all pair-wise combinations of two numbers from 1 to 10, I could write:

forM_ [1..10] $ \i ->
    forM_ [1..10] $ \j ->
        print (i, j)

... which looks an awful lot like a nested for-loop you'd write in C. But I could also write it like:

mapM_ print $ do
    i <- [1..10]
    j <- [1..10]
    return (i, j)

-- or use list comprehension syntactic sugar
mapM_ print [(i, j) | i <- [1..10], j <- [1..10]]

... or you could get even fancier and use the Applicative class:

mapM_ print $ (,) <$> [1..10] <*> [1..10]

... which almost seems like cheating.

Modularity

Idiomatic Haskell separates data generation from data consumption and the last function demonstrates that. You can partition the function into a generator and consumer of data:

generator = (,) <$> [1..10] <*> [1..10]
consumer = mapM_ print

This design pattern separate concerns and creates more modular code.

You might think this doesn't work when you need to process data before all of it is available, but you'd be wrong. Haskell uses enumerators and "iteratees" to solve precisely that problem. You write generator and consumer code independently, yet they still magically "fuse" into the correct data-processing pipeline when you combine them, handling data as it is generated. I will discuss these in a later post.

This leads to a good rule of thumb: strive to separate data generation from consumption, even when not programming in Haskell.

6 comments:

Niket KumarMay 17, 2012 at 9:13 PM
For a newcomer to Haskell, this blog is not easy. Without introducing type system of haskell everything seems hazy. May be if you could write a blog to mark the correct order of studying your other blogs then it could be immensely helpful.
Thanks for your efforts!
ReplyDelete
Replies
DmitryJanuary 20, 2013 at 2:27 PM
This is a big problem of Haskell. Unless you completely scrap everything you know, and learn EVERYTHING in haskell from examples. Only THEN can you look at your old code, spend a few dozen of hours figuring out what the program has to do (as it was initially designed in a testable imperative way, and haskell does not support imperative constructs), then you can, using the big dictionary of expressions defined in Haskell, write your program in a way which makes sense in terms of functional programming. After it works, you can rewrite it back as imperative program (this transition is easier) while knowing that it does exactly what you want it to do, without any distractions arising from data-loss during optimization step which sacrifices expressiveness for efficiency.

In other words, it's a pain.
ReplyDelete
Replies
UnknownMay 7, 2018 at 9:24 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownMay 7, 2018 at 10:50 AM
It took some time for me to realize that this examples don't work anymore. Probably because of some old version of lens. Here is my attemp to rewrite them (this is one of my first experiences of using lenses ;) ):

i :: Lens' Status Int
i = lens _i (\s x -> s { _i = x } )

result :: Lens' Status Double
result = lens _result (\s x -> s { _result = x } )

-- worst variant (have to use length and index variables)
sum' :: [Double] -> Int -> Double
sum' xs n = flip evalState def $ do
result .= 0
for (i .= 0) (liftM (< n) (use i)) (i += 1) $ do
i' <- use i
result += (xs !! i')
use result

-- without length and index
sum'' :: [Double] -> Double
sum'' xs = flip evalState def $ do
forM xs (result +=)
use result

-- using foldM
sum''' :: [Double] -> Double
sum''' = flip evalState (Status 0 0.0) . foldM ((return .) . (+)) 0.0
ReplyDelete
Replies

Add comment