Saturday, December 31, 2011

Haskell for Mainstream Programmers - Code reuse

Haskell code reuse is extraordinarily high because the language builds on a strong foundation of category theory, which is the mathematician's version of "code reuse". Mathematicians developed category theory in order to identify patterns common to every branch of mathematics, but several brilliant computer scientists discovered that those same patterns are also common in programming languages as well.

Haskell programmers work much more rapidly because of this phenomenally high code reuse. Other languages penalize you for using a new library because you must spend a considerable amount of time learning the library's API, reading documentation, and learning library-specific conventions. A proficient Haskeller adopts libraries in minutes because they all share common interfaces derived from category theory.

I'll illustrate this using Haskell's Functor interface, derived from functors in category theory.

Maybe Functor

Java programmers must constantly check for null pointers. This forces them to constantly write code like the following:
String s = null;
if (null != x)
    s = x.toString();
C programmers have the same problem, where a function might return a special value or null pointer to indicate failure:
FILE *fp = NULL;
fp = fopen("poem.txt", "w");
if (fp)
    fprintf(fp, "Roses are red\n");
In Haskell, you model values that might be empty using the following type:
data Maybe a = Just a | Nothing
Maybe a denotes something that "maybe" holds a value of type a because there are two ways you can build a value of type Maybe a. Either you provide a value of type a using the Just constructor or you provide no value at all with the Nothing constructor. For example, a map lookup might fail, so the type of Haskell's lookup function is:
lookup :: (Ord k) => k -> Map k a -> Maybe a
This works great except for one problem. All our existing functions that expect arguments of type a will reject arguments of type Maybe a. For example, the following code will not compile:
square :: Int -> Int
square x = x ^ 2

four = square (Just 2) -- wrong
This fails to compile because square expects its argument to be of type Int and not Maybe Int, so let's write a helper function to give square a hand by converting square into a function that works on Maybes. Haskell uses the name fmap for this function:
fmap:: (a -> b) -> Maybe a -> b -- wrong
fmap f (Just x) = f x
Oops! We forgot to cover all possible constructors. fmap's second argument has type Maybe a, which means that fmap must also accept Nothing for its second argument. If we try to lift our function to operate on a Nothing, fmap will fail with a run-time error:
> fmap square (Just 2)
> fmap square Nothing
*** Exception: Non-exhaustive patterns in function fmap
Hmm, it seems that our fmap function can sometimes fail, so why not make its return type use Maybe to encapsulate that failure:
fmap :: (a -> b) -> Maybe a -> Maybe b
fmap f (Just x) = Just (f x)
fmap f Nothing  = Nothing
Much better! Now our function handles errors transparently by simply propagating them:
> fmap square (Just 2)
Just 4
> fmap square Nothing
We just unwittingly defined our first functor, the Maybe functor. To see why, let's look at the definition of the Functor class:
class Functor f where
    fmap :: (a -> b) -> f a -> f b
If you substitute Maybe for f then you get the type signature for the fmap function we just derived. Using Haskell terminology, Maybe is an instance of the Functor class. Haskell classes are analagous to C#/Java interfaces, so using their terminology you would say that Maybe implements the Functor interface.

Collection Functors

Haskell lists are ordered sets of objects and a Haskell list containing objects of type a is denoted [a], which is just syntactic sugar for [] a. You can treat [] as a Functor:
instance Functor [] where
    fmap :: (a -> b) -> [a] -> [b]
    fmap = map
Python programmers might recognize this is the same as their map function. Just like Python's map, fmap will work on all homogeneous Haskell data structures and generators (a.k.a. iterables in Python) because they all implement the Functor interface:
instance Functor Set where
    fmap :: (a -> b) -> Set a -> Set b
    fmap = ... -- implementation elided
instance Functor Tree where
    fmap :: (a -> b) -> Tree a -> Tree b
    fmap = ... -- implementation elided
instance Functor (Map k) where
    fmap :: (a -> b) -> Map k a -> Map k b
    fmap = ... -- implementation elided
Haskell differs from C++ and Java in that it does not have a unified collections interface, because many operations on collections are actually instances of more general classes. For example, list concatenation uses mappend from the Monoid class.

Since our fmap function surprisingly works on collections, you might wonder what else it works on.

State Functor

Haskell has a dead-simple implementation of state:
-- actually a newtype and I've flipped the output
type State s a = s -> (s, a)
In other words, a stateful function is one that takes a state s and returns a new state s along with a possible output a. In fact, this is how Haskell handles I/O, except that s is the state of the outside world. Seriously:
type IO a = State RealWorld a -- actually a newtype
As you might have guessed, State s and IO are functors:
instance Functor (State s) where
    fmap :: (a -> b) -> State a -> State b
    fmap f x s = second f (x s) -- a pure function
instance Functor IO where
    fmap :: (a -> b) -> IO a -> IO b
    fmap f x s = second f (x s) -- no different
I only included the implementations to emphasize that they are ordinary, pure functions and that there is nothing special or magic about how either functor works.

Haskell's getLine function reads a line from standard input and returns the String as that output of type a that I mentioned above. That means its type is:
getLine :: IO String -- i.e. (RealWorld -> (RealWorld, String))
Because it is a functor, I can apply a function to its output (the String), before I even run it and it will now return the transformed value when it's finally executed. So if I have a function named parseInteger that converts Strings to Integers, then I can just write:
parseInteger :: String -> Int

getInteger :: IO Integer -- i.e. (RealWorld -> (RealWorld, Int))
getInteger = fmap parseInteger getLine
Voila! I just created a new function that now reads a line from standard input and returns the parsed Integer. I don't think there is anything equivalent to the generality of fmap in the standard libraries of mainstream languages.

Parser Functor

In Haskell, a function that parses Strings has the type:
type Parser a = String -> [(a, String)] -- actually a newtype
To understand the type, it takes a string and returns a list of possible parsed values along with their corresponding unconsumed Strings. This is a functor, too, which means that if I have something that parses a list of values and I want to convert the list to a tree, I just write:
convertToList :: [a] -> Tree a

parseTree :: Parser [a]

<$> = fmap -- an infix operator synonymous with fmap

convertToList <$> parseTree :: Parser (Tree a)
Convenient! If you squint, the infix version of fmap, <$>, resembles function application. It's almost as if we were applying convertToList directly to parseTree and the <$> makes the type incompatibility just magically go away.


So if you see something resembling f a in a type signature and you guess that f is some sort of functor, chances are you will be right. However, you don't have to guess. You can just fire up ghci and type something like:
> :info Maybe
instance Functor Maybe -- Defined in Data.Maybe
... or if f is a type variable in a function's type signature, it will say something like:
functorsForAll :: (Functor f, ...) => ... -> f a -> ...
Functors are just the tip of iceberg. Haskell programmers reuse classes like crazy and Functor isn't even the most used class (which would arguably be the Monad class). The typical Haskell programmer reuses a few functions like fmap over and over again to great effect, and once you learn those few functions your Haskell productivity will soar. These functions "just work" and always do the "right thing", yet in a completely type-safe manner.

Equally important, Haskell programmers reason more easily about their programs because they reuse the same concepts over and over again. They can rapidly form a mental model of how a function or library works because everything is assembled from the same set of core primitives derived from category theory.

Haskell classes, like C#/Java interfaces, promote generality and code reuse, but they are much more general and reusable than their mainstream counterparts because they have a strong mathematical foundation in category theory. However, Haskell requires no actual knowledge of category theory to use it.