Haskell for all: Haskell for Mainstream Programmers

Saturday, December 31, 2011

Haskell for Mainstream Programmers - Code reuse

Haskell code reuse is extraordinarily high because the language builds on a strong foundation of category theory, which is the mathematician's version of "code reuse". Mathematicians developed category theory in order to identify patterns common to every branch of mathematics, but several brilliant computer scientists discovered that those same patterns are also common in programming languages as well.

Haskell programmers work much more rapidly because of this phenomenally high code reuse. Other languages penalize you for using a new library because you must spend a considerable amount of time learning the library's API, reading documentation, and learning library-specific conventions. A proficient Haskeller adopts libraries in minutes because they all share common interfaces derived from category theory.

I'll illustrate this using Haskell's Functor interface, derived from functors in category theory.

Maybe Functor

Java programmers must constantly check for null pointers. This forces them to constantly write code like the following:

String s = null;
if (null != x)
{
    s = x.toString();
}

C programmers have the same problem, where a function might return a special value or null pointer to indicate failure:

FILE *fp = NULL;
fp = fopen("poem.txt", "w");
if (fp)
{
    fprintf(fp, "Roses are red\n");
}

In Haskell, you model values that might be empty using the following type:

data Maybe a = Just a | Nothing

Maybe a denotes something that "maybe" holds a value of type a because there are two ways you can build a value of type Maybe a. Either you provide a value of type a using the Just constructor or you provide no value at all with the Nothing constructor. For example, a map lookup might fail, so the type of Haskell's lookup function is:

lookup :: (Ord k) => k -> Map k a -> Maybe a

This works great except for one problem. All our existing functions that expect arguments of type a will reject arguments of type Maybe a. For example, the following code will not compile:

square :: Int -> Int
square x = x ^ 2

four = square (Just 2) -- wrong

This fails to compile because square expects its argument to be of type Int and not Maybe Int, so let's write a helper function to give square a hand by converting square into a function that works on Maybes. Haskell uses the name fmap for this function:

fmap:: (a -> b) -> Maybe a -> b -- wrong
fmap f (Just x) = f x

Oops! We forgot to cover all possible constructors. fmap's second argument has type Maybe a, which means that fmap must also accept Nothing for its second argument. If we try to lift our function to operate on a Nothing, fmap will fail with a run-time error:

> fmap square (Just 2)
4
> fmap square Nothing
*** Exception: Non-exhaustive patterns in function fmap

Hmm, it seems that our fmap function can sometimes fail, so why not make its return type use Maybe to encapsulate that failure:

fmap :: (a -> b) -> Maybe a -> Maybe b
fmap f (Just x) = Just (f x)
fmap f Nothing  = Nothing

Much better! Now our function handles errors transparently by simply propagating them:

> fmap square (Just 2)
Just 4
> fmap square Nothing
Nothing

We just unwittingly defined our first functor, the Maybe functor. To see why, let's look at the definition of the Functor class:

class Functor f where
    fmap :: (a -> b) -> f a -> f b

If you substitute Maybe for f then you get the type signature for the fmap function we just derived. Using Haskell terminology, Maybe is an instance of the Functor class. Haskell classes are analagous to C#/Java interfaces, so using their terminology you would say that Maybe implements the Functor interface.

Collection Functors

Haskell lists are ordered sets of objects and a Haskell list containing objects of type a is denoted [a], which is just syntactic sugar for [] a. You can treat [] as a Functor:

instance Functor [] where
    fmap :: (a -> b) -> [a] -> [b]
    fmap = map

Python programmers might recognize this is the same as their map function. Just like Python's map, fmap will work on all homogeneous Haskell data structures and generators (a.k.a. iterables in Python) because they all implement the Functor interface:

instance Functor Set where
    fmap :: (a -> b) -> Set a -> Set b
    fmap = ... -- implementation elided
instance Functor Tree where
    fmap :: (a -> b) -> Tree a -> Tree b
    fmap = ... -- implementation elided
instance Functor (Map k) where
    fmap :: (a -> b) -> Map k a -> Map k b
    fmap = ... -- implementation elided

Haskell differs from C++ and Java in that it does not have a unified collections interface, because many operations on collections are actually instances of more general classes. For example, list concatenation uses mappend from the Monoid class.

Since our fmap function surprisingly works on collections, you might wonder what else it works on.

State Functor

Haskell has a dead-simple implementation of state:

-- actually a newtype and I've flipped the output
type State s a = s -> (s, a)

In other words, a stateful function is one that takes a state s and returns a new state s along with a possible output a. In fact, this is how Haskell handles I/O, except that s is the state of the outside world. Seriously:

type IO a = State RealWorld a -- actually a newtype

As you might have guessed, State s and IO are functors:

instance Functor (State s) where
    fmap :: (a -> b) -> State a -> State b
    fmap f x s = second f (x s) -- a pure function
instance Functor IO where
    fmap :: (a -> b) -> IO a -> IO b
    fmap f x s = second f (x s) -- no different

I only included the implementations to emphasize that they are ordinary, pure functions and that there is nothing special or magic about how either functor works.

Haskell's getLine function reads a line from standard input and returns the String as that output of type a that I mentioned above. That means its type is:

getLine :: IO String -- i.e. (RealWorld -> (RealWorld, String))

Because it is a functor, I can apply a function to its output (the String), before I even run it and it will now return the transformed value when it's finally executed. So if I have a function named parseInteger that converts Strings to Integers, then I can just write:

parseInteger :: String -> Int

getInteger :: IO Integer -- i.e. (RealWorld -> (RealWorld, Int))
getInteger = fmap parseInteger getLine

Voila! I just created a new function that now reads a line from standard input and returns the parsed Integer. I don't think there is anything equivalent to the generality of fmap in the standard libraries of mainstream languages.

Parser Functor

In Haskell, a function that parses Strings has the type:

type Parser a = String -> [(a, String)] -- actually a newtype

To understand the type, it takes a string and returns a list of possible parsed values along with their corresponding unconsumed Strings. This is a functor, too, which means that if I have something that parses a list of values and I want to convert the list to a tree, I just write:

convertToList :: [a] -> Tree a

parseTree :: Parser [a]

<$> = fmap -- an infix operator synonymous with fmap

convertToList <$> parseTree :: Parser (Tree a)

Convenient! If you squint, the infix version of fmap, <$>, resembles function application. It's almost as if we were applying convertToList directly to parseTree and the <$> makes the type incompatibility just magically go away.

Functors

So if you see something resembling f a in a type signature and you guess that f is some sort of functor, chances are you will be right. However, you don't have to guess. You can just fire up ghci and type something like:

> :info Maybe
...
instance Functor Maybe -- Defined in Data.Maybe
...

... or if f is a type variable in a function's type signature, it will say something like:

functorsForAll :: (Functor f, ...) => ... -> f a -> ...

Functors are just the tip of iceberg. Haskell programmers reuse classes like crazy and Functor isn't even the most used class (which would arguably be the Monad class). The typical Haskell programmer reuses a few functions like fmap over and over again to great effect, and once you learn those few functions your Haskell productivity will soar. These functions "just work" and always do the "right thing", yet in a completely type-safe manner.

Equally important, Haskell programmers reason more easily about their programs because they reuse the same concepts over and over again. They can rapidly form a mental model of how a function or library works because everything is assembled from the same set of core primitives derived from category theory.

Haskell classes, like C#/Java interfaces, promote generality and code reuse, but they are much more general and reusable than their mainstream counterparts because they have a strong mathematical foundation in category theory. However, Haskell requires no actual knowledge of category theory to use it.

15 comments:

UnknownAugust 2, 2012 at 2:38 AM
Is there an error in the following?

"type State s a = s -> (s, a) -- actually a newtype"

Should (s,a) be (a,s)? I just read http://www.haskellforall.com/2012/01/haskell-for-mainstream-programmers_04.html and there it is

"type State s a = s -> (a, s) -- actually a newtype, but whatever"

Also, on http://www.haskellforall.com/2012/01/haskell-for-mainstream-programmers_04.html, there is an error in one of the add functions, where it should say "put (x+n) s1" instead of "put x s1".

I much appreciate your blog posts, by the way, and I'm working my way through them. :)
ReplyDelete
Replies
orenFebruary 24, 2013 at 12:36 PM
This is a good attempt to explain fmap. I've been reading Real World Haskell and it kept referring to fmap, dropping it here and there with not much of an explanation except a laconic "here, we use fmap", as if that should be obvious.

But your explanation only goes halfway. You say that State s and IO are functors, but fail to explain what does fmap actually do in each case, instead resorting to just showing the definition and a simple example.

Now, throwing an equation at the reader this way is probably ok for people who have an intuitive grasp of math, I'm not one of them. And math people won't be reading this particular blog entry anyway - it's all obvious to them. For me this was annoying.

My suggestion: Why not explicitly say, in plain English:
1) fmap as defined for the State function takes the current state and the current result, changes the result using the given function, and returns the same state but with a different result.

2) fmap as defined for IO takes the result of the IO action, changes it using the given function, and returns another IO action with the new result.

Here are some questions which popped up into my mind while reading this:

- Why is fmap changing the State's result, and not the state itself?
- What if I need a more complex transformation of state or IO, something that is more complex than a simple (a -> b) ?
- How IS exactly changing the type of a tree, a set, a list or a map, "the same" as changing the result of a state? Those are obviously two different things.
- Are there any rules for knowing what fmap will DO, given a certain context?
All the examples given here seem to indicate they change "something" "inside" the "context", but this is very, very vague. Is there a better definition? Also, "inside" is poorly defined.
- Speaking of which, I don't see any example of reuse here. Care to elaborate?
- What IS the result of the state, anyway? Why is it necessary? Isn't it a state like in a state machine? All you need then, in terms of data, is the state variable which is some of an enum, is this the "s" thing here?
- What's so special about the getInteger example? I

I think I know the answers to these questions by now, I'm just mentioning them because that's the kind of questions I asked myself when first reading about the state monad and the IO monad in RWH. I'm not too happy with the answers, BTW, but I'm willing to accept them.

So, I dunno, you seem to have a sort of disconnect from what a "real-world" imperative programmer thinks like. Again, throwing equations at the reader and then waxing about

Also, Real World Haskell calls these fmap uses "lifting operators". This should be mentioned here, I think, unless RWH's name is not the conventional one.
ReplyDelete
Replies
OrenMay 18, 2013 at 3:33 PM
Hi, sorry for not responding for such a lon time, I've just checked and saw you answered me back in March.

Unfortunately recently I've had very little time left for Haskell between my two young children and an insane workload (in fact, I've written half the post above back in February with my baby son slung over my shoulder trying to get him to sleep, and when I was very tired myself, that's why it has some incomplete sentences).

Anyway, thanks for the detailed answers, I'll try to read them again later this week and respond. From reading them just now some things were clearly explained, but others were even more confusing than the original post. I'll try reading it again next week and see if I can point out the confusing parts.

By the way, back in February I also wrote a response to your blog post about the State monad, however due to some bug (in Blogger, I assume) it got lost somehow when I pressed "Publish", and I was too tired to rewrite everything.

To summarize, my response there essentially said that your explanation there was the best one I've read about the State monad yet. It just seemed a shame to me now you never saw it. I think you're doing a great job on this blog.

I also started reading your series of posts on your Lenses package, however I didn't get very far, and I realized I obviously missed some concepts, so I went back to try and finish reading Real World Haskell.
ReplyDelete
Replies
Laurent GiroudApril 19, 2014 at 7:29 PM
I am trying to understand the State example but there is something which puzzles me both with the type of fmap and its implementation.

Shouldn't
instance Functor (State s) where
fmap :: (a -> b) -> State a -> State b
fmap f x s = second f (x s) -- a pure function
actually be:

instance Functor (State s) where
fmap :: (a -> b) -> State s a -> State s b
fmap f x s = (s, second f (x s))

Also something bothers me about defining fmap using three parameters when it only has two. I kind of understand that this has to do with the fact that fmap is a function which returns a function in this case so the last parameter x is meant for the returned function but being a complete rookie I am not too sure that this is the correct interpretation and if it is then I have to admit that this formulation bothers me since I cannot (yet?) grasp how the compiler can generate code for a function using parameters meant for the function it returns.

I suspect that if State s a was expanded into its equivalent type then this would likely make sense but it is quite a steep shortcut to take for me at this stage.

Finally I have two remarks regarding explanations you posted in the comments:
You say: "there is no function that has the type: `(s1 -> s2) -> (State a s1 -> State a s2)", but I guess you mean that it would not make much sense right? It seems to me that it is possible to write such function even if I have a hard time figuring out if it could be useful at all.

Also, I'm confused when you say that the result of State s a is the a and not the s. I do not understand why since the state seems to me the most important part of the transition and a seems secondary since it's not taken into account for future transitions.
ReplyDelete
Replies

Add comment