Saturday, August 18, 2012

The category design pattern

Functional programming is all the rage these days, but in this post I want to emphasize that functional programming is a subset of a more important overarching programming paradigm: compositional programming.

If you've ever used Unix pipes, you'll understand the importance and flexibility of composing small reusable programs to get powerful and emergent behaviors. Similarly, if you program functionally, you'll know how cool it is to compose a bunch of small reusable functions into a fully featured program.

Category theory codifies this compositional style into a design pattern, the category. Moreover, category theory gives us a precise prescription for how to create our own abstractions that follow this design pattern: the category laws. These laws differentiate category theory from other design patterns by providing rigorous criteria for what does and does not qualify as compositional.

One could easily dismiss this compositional ideal as just that: an ideal, something unsuitable for "real-world" scenarios. However, the theory behind category theory provides the meat that shows that this compositional ideal appears everywhere and can rise to the challenge of messy problems and complex business logic.

This post is just one of many posts that I will write over time where I will demonstrate how to practically use this compositional style in your programs, even for things that may seem like they couldn't possibly lend themselves to compositional programming. This first post starts off by introducing the category as a compositional design pattern.


Categories


I'm going to give a slightly different introduction to category theory than most people give. I'm going to gloss over the definition of what a morphism or an object is and skip over domains and codomains and instead just go straight to composition, because from a programmer's point of view a category is just a compositional design pattern.

Category theory says that for any given category there must be some sort of composition operator, which I will denote (.). The first rule is that this composition operator is associative:
(f . g) . h = f . (g . h) -- Associativity law
This is useful because it means we can completely ignore the order of grouping and write it without any parentheses at all:
f . g . h
Category theory also says that this composition operator must have a left and right identity, which I will denote id. Being an identity means that:
id . f = f  -- Left  identity law

f . id = f  -- Right identity law
The associativity law and the two identity laws are known as the category laws.

Notice that the definition of a category does not define:
  • what (.) is,
  • what id is, or
  • what f, g, and h might be.
Instead, category theory leaves it up to us to discover what they might be.

The brilliance behind the category design pattern is that any composition operator that observes these laws will be:
  • easy to use,
  • intuitive,
  • and free from edge cases.
This is why we try to formulate abstractions in terms of the category design pattern whenever possible.


The function category


Let's define our first category: the category of Haskell functions!
id  :: (a -> a)
id x = x

(.) :: (b -> c) -> (a -> b) -> (a -> c)
(f . g) x = f (g x)
Let's prove to ourselves that these obey the category laws:
-- Left identity: id . f = f
id . f
= \x -> id (f x)
= \x -> f x
= f

-- Right identity: f . id = f
f . id
= \x -> f (id x)
= \x -> f x
= f

-- Associativity: (f . g) . h = f . (g . h)
(f . g) . h
= \x -> (f . g) (h x)
= \x -> f (g (h x))
= \x -> f ((g . h) x)
= \x -> (f . (g . h)) x
= f . (g . h)
Function composition is very easy to use, yet so powerful, precisely because it forms a category! This lets us express complex transformations simply by composing a bunch of reusable parts:
bestLangs :: [Language] -> [Language]
bestLangs = take 3 . sortBy (comparing speed) . filter isCool
Unfortunately, we can't express all of our programs as chains of ordinary functions. I guess we just give up, right? Wrong!


The Kleisli category


The next most common category we encounter on a daily basis is the category of monadic functions, which generalize ordinary functions:
return :: (Monad m) => (a -> m a)

(<=<)  :: (Monad m) => (b -> m c) -> (a -> m b) -> (a -> m c)
Mathematicians call this the "Kleisli" category, and Control.Monad provides both of the above functions.

Notice how the type signatures of return and (<=<) resemble their functional counterparts:
id     ::              (a ->   a)
return :: (Monad m) => (a -> m a)

(.)    ::              (b ->   c) -> (a ->   b) -> (a ->   c)
(<=<)  :: (Monad m) => (b -> m c) -> (a -> m b) -> (a -> m c)
The implementation for (<=<) also resembles the implementation for function composition:
(f  .  g) x = f     (g x)
(f <=< g) x = f =<< (g x)

-- Note (=<<) is the same as (>>=), but with the arguments flipped
Not a coincidence! Monadic functions just generalize ordinary functions and the Kleisli category demonstrates that monadic functions are composable, too. They just use a different composition operator: (<=<), and a different identity: return.

Well, let's assume that category theorists aren't bullshitting us and that (<=<) really is some sort of composition and return really is its identity. If that were true, we'd expect the following laws to hold:
return <=< f = f                   -- Left  identity

f <=< return = f                   -- Right identity

(f <=< g) <=< h = f <=< (g <=< h)  -- Associativity
Well, we already have the definition for (<=<):
(f <=< g) x = f =<< (g x)
... so let's use that definition to expand those laws:
return =<< (f x) = (f x)

f =<< (return x) = f x

(\y -> f =<< (g y)) =<< h x = f =<< (g =<< (h x))
If we simplify those a little and use (>>=) to flip the order of arguments, we get:
m >>= return = m

return x >>= f = f x

m >>= (\y -> g y >>= f) = (m >>= g) >>= f
Look familiar? Those are just the monad laws, which all Monad instances are required to satisfy. If you have ever wondered where those monad laws came from, now you know! They are just the category laws in disguise.

Consequently, every new Monad we define gives us a category for free! Let's try out some of these brave new categories:
-- The Kleisli category for the Maybe monad
lookup  :: k -> [(k, v)] -> Maybe v
maximumByMay :: (a -> a -> Ordering) -> [a] -> Maybe a

bestSuitor :: [(String, [Suitor])] -> Maybe Suitor
bestSuitor = maximumByMay (comparing handsome) <=< lookup "Tall"


-- The Kleisli category for the [] monad
children :: Person -> [Person]

greatGrandChildren :: Person -> [Person]
greatGrandChildren = children <=< children <=< children


-- The Kleisli category for the IO monad
-- * Stolen from /r/haskell today
spawn      ::  IO a  -> IO (IO a)
mapM spawn :: [IO a] -> IO [IO a]
sequence   :: [IO a] -> IO    [a]

concurrentSequence :: [IO a] -> IO [a]
concurrentSequence = sequence <=< mapM spawn
Monads that don't observe these laws are buggy and unintuitive. Don't believe me? Just ask the people who tried to use ListT , which breaks the monad laws.


The Pipe category


Not all categories are functions. I'll use a primitive version of my Pipe type (from the pipes package) with effects removed to simplify the example:
data Pipe a b r
  = Pure r
  | Await (a -> Pipe a b r)
  | Yield b (Pipe a b r)

Pure    r  <-< _          = Pure r
Yield b p1 <-< p2         = Yield b (p1 <-< p2)
Await   f  <-< Yield b p2 = f b <-< p2
p1         <-< Await   f  = Await $ \a -> p1 <-< f a
_          <-< Pure    r  = Pure r

cat = Await $ \a -> Yield a cat
Let's check out what types the compiler infers:
cat   :: Pipe a a r

(<-<) :: Pipe b c r -> Pipe a b r -> Pipe a c r
Those look an awful lot like an identity and composition. I leave it as an exercise for the reader to prove that they actually do form a category:
cat <-< p = p                            -- Right identity

p <-< cat = p                            -- Left  identity

(p1 <-< p2) <-< p3 = p1 <-< (p2 <-< p3)  -- Associativity
Pipes show how more complicated things that don't fit neatly into the functional programming paradigm can still be achieved with a compositional programming style. I won't belabor the compositionality of pipes, though, since my tutorial already does that.

So if you find something that doesn't seem like it could be compositional, don't give up! Chances are that a compositional solution exists just beneath the surface!


Conclusions


All category theory says is that composition is the best design pattern, but then leaves it up to you to define what precisely composition is. It's up to you to discover new and interesting ways to compose things besides just composing functions. As long as the composition operator you define obeys the category laws, you're golden.

Also, I'm really glad to see a resurgence in functional programming (since functions form a category), but in the long run we really need to think about more interesting composition operators than just function composition if we are serious about tackling more complicated problem domains.

Hopefully this post gets you a little bit excited about category theory. In future posts, I will expand upon this post with the following topics:
  • Why the category laws ensure that code is easy, intuitive, and free of edge cases
  • How functors let you mix and match different categories
  • How to use categories to optimize your code
  • How to use categories to simplify equational reasoning

20 comments:

  1. " I leave it as an exercise for the reader to prove that they actually do form a category"

    OK, I'll bite. But let's try a special case of the identity laws, i.e. showing that idP <+< idP = idP:

    idP <+< idP
    = Await (\a -> Yield a idP) <+< Await (\a' -> Yield a' idP)
    = Await (\x -> idP <+< (\a' -> Yield a' idP) x)
    = Await (\x -> idP <+< Yield x idP)
    = Await (\x -> Await (\a -> Yield a idP) <+< Yield x idP)
    = Await (\x -> (\a -> Yield a idP) x <+< idP)
    = Await (\x -> Yield x idP <+< idP)
    = Await (\x -> Yield x (idP <+< idP))

    Now, if I had (idP <+< idP) = idP on the inside, this would reduce to
    = Await (\x -> Yield x idP)
    = idP,

    but that would just be assuming what I want to prove.

    More generally, when I try to prove the category laws for Pipe a b r, I run into cases where if I could prove the law for a sub-expression, I could prove it for the whole expression, which makes me want to do induction; but presumably I also want to be able to prove results for circular, self-recursive pipes like idP in which there is no guarantee that the sub-expressions are 'smaller' than the original expression, so I don't know what to do.

    I figure this must have something to do with domain theory and least fixed points and supremums and all that, but I haven't worked all the way through that material so I'm not sure how the proof would look...

    ReplyDelete
    Replies
    1. I believe the correct term for this style of proof is bisimulation. Rather than use induction to show that the type is finite and use induction, you instead use coinduction to show that the generation of each subsequent constructor is finite and never bottoms. I believe the only example where composition fails to generate a constructor in a finite number of steps is:

      discard = Await $ \_ -> discard
      supply = Yield () supply

      discard <+< supply

      ... and that is precisely the only situation where pipe composition bottoms and fails to produce a result.

      However, I never intended this to be that complicated. It was merely an exercise so that users could practice equationally reasoning about laws. I assumed most of them would accept that the recursion in the proof was legitimate even if perhaps they could not precisely articulate the reason why.

      Delete
    2. You don't even have to assume what you want to prove! You have the following:

      idP <+< idP = Await (\x -> Yield x (idP <+< idP))

      let f = idP <+< idP and do the substitution:

      f = Await (\x -> Yield x f)

      this is merely the definition of idP under a simple substitution. Voila, you have your proof.

      Delete
    3. Hi Saynte,
      Thanks! I think you are right, but I was wondering if I can always conclude from

      f = G f
      and g = G g

      that f = g. It's certainly not always true -- for instance
      if I make the _definition_

      foo :: a
      foo = id foo

      then I have defined foo to be bottom; but I can't conclude
      from the fact that

      7 = id 7

      that 7 = foo.

      Delete
    4. This comment has been removed by the author.

      Delete
    5. Well, remember that here you are *defining* (for example):

      foo = id foo

      that's what foo *means*. You cannot say that 7 = id 7 because 7 *means* something else: it has another definition. The previous was fine because I was picking a fresh name (for example "f") that had no previous definition.

      The substitution was actually just to show you that your proof worked because your equality had the right 'structure', not because you already knew that "idP <+< idP = idP".

      I hope that helps!

      Delete
    6. OK, let me see if I've got this straight.
      From the *definition*

      foo = id foo

      I can conclude that foo is the *least* (well-defined) fixed point of id, i.e. bottom.

      From the mere (easy to prove) equality

      7 = id 7

      I can only conclude that 7 is *some* fixed point of id, not that 7 = foo.

      Similarly, from the definition

      idP = Await (\a -> Yield a idP)

      I can conclude that idP is the least well-defined fixed point of the function G = \f -> Await (\a -> Yield a f),
      whereas from the mere equality (not definition)

      idP <+< idP = Await (\a -> Yield a (idP <+< idP))

      I can only conclude that idP <+< idP is *some* fixed point of G, not (without more) that idP <+< idP = idP.

      The missing piece might be that there is only one fixed point of G. That seems plausible to me, although right now I'm not sure how to prove it. It might be related to Gabriel's comment about bisimulation and coinduction.

      Delete
    7. Sorry, I made a mistake (read your reply too quickly), of course you can say 7 = id 7 because this is just 7 = 7.

      Well, at least from my way of thinking, you have two equalities:

      idP = Await (\a -> Yield a idP)

      and

      idP <+< idP = Await (\a -> Yield a (idP <+< idP))

      These are sort of "clearly" the same under a renaming. It may be true that you need deeper aspects of coinduction to prove it, but I'm not a good enough theorist to know :).

      Delete
  2. I hate to be the snooty mathematician who corrects terminology, but here goes.

    When you say something is the "category of X" you really mean the objects in that category are X and you have to define the morphisms separately. For instance, your first category of Haskell functions is actually (probably) the category of Haskell types, and you defined the morphisms to be Haskell functions. If your category was really the category of Haskell functions, then your morphisms would need to be morphisms of Haskell functions.

    Of course, with the right tinkering you can blur the distinction between an object and the identity morphism on that object, but then the objects are still not *all* morphisms.

    Another thing to note is that the Kleisli category is defined per monad. So it's not the category of all monadic functions across all possible monads.

    I like the explanations though! I need to get back into Haskell myself.

    ReplyDelete
    Replies
    1. The correction is appreciated. If I didn't enjoy being corrected I wouldn't be programming in Haskell in the first place!

      I suppose there needs to be a better word for describing the "the category whose morphisms are X" because I agree that the way I phrased it bastardizes the terminology. From my perspective, what the programmer sees and works with are the morphisms, which is why I try to present categories in terms of the morphisms. Obviously the objects in the category affect which morphisms are compatible for composition, and I see that as being the second step of the presentation, not the first. I have to lead with the motivation (i.e. composition) otherwise it will read like every other category theory introduction which presents a bunch of rules and formalisms with no discussion of why we even need category theory in the first place.

      I tried to make it clear that each monad generates its own Kleisli category, but in retrospect I see that I described it the wrong way. I need to think about how to word it better so that the description still flows.

      Delete
    2. You don't really need to apologize. Saying the "category of X" where X is the objects is a trope. (Consider the usual description of a monoid as a category: how many monoids can be defined on a given set? All of them, and you don't even need any elements in the set as long as you don't ask that the arrows be functions!) OTOH, if you have the morphisms, then (via the source and target maps of the identities) you have the objects.

      As MacLane Saunders points out, it takes courage to name your categories by the common names of their arrows, rather than by the common names of their objects. I honor your bravery!

      Your point about "what the programmer works with" is well-taken, except that one reason why so many programmers don't get functional programming (at least at first) is because they want to work with objects. What you could say here is something like "the morphisms are what you work with if you want to achieve elegant, compact programs expressing complex relationships."

      Delete
    3. oops, speaking of noncommutativity, that's "Saunders MacLane", of course!

      Delete
    4. Thanks for the kind words! I've actually been learning from Awodey's book since I don't have a very strong mathematical background for MacLane's book, yet, but I plan to work through it eventually.

      I was plan on doing a follow-up post discussing objects (mainly in the context of Haskell types). Even in my own code with long composition chains, I often annotate the type of each intermediate step as a comment to help keep track of what is going on.

      Delete
    5. So, what *are* the objects for the Pipe category? Are they Haskell types, just like in Hask? Are the objects on either side of the Pipe morphism represented by 'a' and 'b' in "Pipe a b r"? and if so, what does 'r' represent? If not, how far off the mark am I, exactly? :)

      Delete
    6. You are correct. The a and b are the objects and they are Haskell types. So, interestingly, when you generalize pipes to proxies then you get two extra categories (see the Interact class in Control.Proxy.Class) and they do make use of the r parameter.

      Delete
  3. Hi Gabriel, nice post.

    I was planning to write something on the same topic, but in french, sometime in the near future. Haskell documentation is lacking in this dialect, and I hope that adding more content will create more opportunities for people to get into the purely functional way of programming.

    If you agree, I could also translate this good article, and credit it clearly of course. What do you think about that ?

    ReplyDelete
    Replies
    1. That sounds great! I really appreciate it.

      Delete
    2. done at http://blog.demotera.com/published/2012-08-22-Programmation-Fonctionnelle-et-Theorie-des-Categories.html

      Delete
    3. Thank you very much! I understand enough French to know you did an excellent job of translation.

      Delete
  4. By the way, talking about "order of grouping" is a little awkward (composition is generally not commutative). I think if you rephrase to something like "nesting of groups" the flow will be more natural for the reader.

    ReplyDelete