Wednesday, April 10, 2013

Defaults

Many programs require default values of some sort and normally we would consider this aspect of programming "business logic" and not give it a second thought. However, mathematics provides some some surprising insight into the banal task of choosing appropriate defaults, so let's temporarily put on our theory hats and try to over-think this problem.


Patterns


Let's begin by trying to identify a common pattern unifying all default values. If I were to name a few types, most people would agree on the following default values:
  • Int: 0
  • Bool: False
  • [a]: [], the empty list
  • Text: "", the empty Text string (if you use OverloadedStrings)
Why not choose 5 as the default Int or "ridiculous" as the default Text string? What makes us gravitate towards choosing those particular values?

Well, we can discern a common pattern: all these default values seem to correspond to something "empty". But what does it really mean to be "empty"?

Well, for numbers it is obvious why we consider 0 empty. If you add 0 to any number n, you get back n, signifying that 0 must be empty:
0 + n = n
n + 0 = n
We can extend this same reasoning to the other values to justify why we consider them empty. For example, if you append "" to any Text string str, you get back str, signifying that "" added nothing:
"" `append` str = str
str `append` "" = str
Similarly, the empty list adds nothing when you concatenate it:
[] ++ xs = xs
xs ++ [] = xs
... and False does nothing when you (||) it:
False || b = b
b || False = b
The pattern is obvious: in every case we have some empty default value, which we will call mempty, and some combining function, which we will call mappend, that satisfy the following two equations:
mempty `mappend` x = x
x `mappend` mempty = x
This is a Monoid, and Haskell's Data.Monoid module defines mempty and mappend in the Monoid type class:
class Monoid m where
    mempty  :: m
    mappend :: m -> m -> m
... and also provides a convenient infix operator for mappend:
(<>) :: (Monoid a) => a -> a -> a
m1 <> m2 = m1 `mappend` m2

Emptiness


Not all types have a unique Monoid instance. For example, Bool has two separate Monoid instances and we must use the Any or All newtypes from Data.Monoid to distinguish which one we prefer.

The Any monoid corresponds to the one we chose above, where False is the empty value and (||) is the combining operation:
newtype Any = Any { getAny :: Bool }

instance Monoid Any where
    mempty = Any False
    (Any b1) `mappend` (Any b2) = Any (b1 || b2)
However, there is a dual monoid where True is the "empty" value and (&&) is the combining operation:
newtype All = All { getAll :: Bool }

instance Monoid All where
    mempty = All True
    (All b1) `mappend` (All b2) = All (b1 && b2)
Similarly, numbers have two separate Monoid instances, and we use the Sum or Product monoids to distinguish which one we prefer.

newtype Sum a = Sum { getSum :: a }

instance (Num a) => Monoid (Sum a) where
    mempty = Sum 0
    (Sum n1) `mappend` (Sum n2) = Sum (n1 + n2)


newtype Product a = Product { getProduct :: a }

instance (Num a) => Monoid (Sum a) where
    mempty = Product 1
    (Product n1) `mappend` (Product n2) = Product (n1 * n2)
Monoids teach a valuable lesson: there is no such thing as an intrinsically empty value. Values are only empty with respect to a specific combining operation. We can choose more exotic default values to be mempty, but if we must select an unusual mappend to justify their emptiness then that suggests that we chose the wrong defaults.

So the next time you find yourself needing a Default type class, chances are that you actually should use the Monoid type class instead. The Monoid class forces us to demonstrate why our default is truly empty by also providing the associated combining operator. This basic sanity check discourages us from defining "ridiculous" defaults.

18 comments:

  1. Replies
    1. Yeah. :)

      Surprisingly useful for such a simple concept, especially once you start generalizing it to Alternatives and Categories.

      Delete
  2. FYI readers: there's actually a Default type class [1] which uses mempty for most of the stuff.

    [1] http://hackage.haskell.org/packages/archive/data-default/0.5.3/doc/html/Data-Default.html

    ReplyDelete
  3. The mention of "business logic" at the beginning of this blog entry led me to believe it would address the sort of "default" one sees in typical "business logic", e.g. a default preference such as a blank screen saver. But the clear context (in the post, once I actually read it, as well as in Monoid itself), is of a value that causes a binary operation to simply yield its other argument, a "no effect" element, which of course describes "mempty".

    The Default typeclass itself requires no such context of a binary operator, yet all the -- ahem -- default instances provided dutifully supply either numeric zero or mempty, appearing to demonstrate that really, we're again intending the "noEffect" mathematical kind of default, not default screen savers. (And why would you specify a default screen saver using a typeclass instance anyhow, LOL?) Seems to me a strong case for using Monoid instead of Default (which as I understand it, is Gabriel's point).

    Maybe a nicer name for Monoid's identity element would have been "noEffect". Of course, when we get to Group, it takes on a somewhat expanded role; it's now also the result of combining an element with its inverse. What's the best name for that role -- "ash", "detritus"...? What one name covers both roles? Maybe "nothing"? Oops, taken. Maybe not. How about just "mappend"? Sigh.

    ReplyDelete
    Replies
    1. One of the reason that the `Default` instances tend to provide empty values anyway, even though they are not required to, is because they have to pick something that is universally useful since the type class instance has to be canonical for that type. If you have to pick only one value then the most obvious choice is the a value which is more distinguished than the rest, which is the "empty" value.

      I think "empty" would be a fine name for the empty value (if it weren't already taken by the Alternative class)

      Delete
  4. That last "mappend" should have been "mempty".

    ReplyDelete
  5. I sympathize with the main teaching of this post. But i consider it not totally true. The mistake is the assumption that the default value should be empty somehow. That is false sometimes. An example is type-classes. Defining an instance of a class C for type T is mathematically equivalent to defining a default value for the tuple type (T, dictionary_type_of C). For example : defining a Foldable instance is equivalent to specifying default value of ((a -> b -> b) -> b -> t a -> b) [a 'foldr' function]. I can not think about foldr for List as empty somehow.

    ReplyDelete
    Replies
    1. I'm not saying that all type classes should behave like `Monoid`, but rather just the `Default` class

      However, even if you broaden the scope of the discussion to classes other than `Monoid` I still disagree. Without laws you can't justify one instance over another. For example, I could define the following alternative `Foldable` instance for lists:

      ....instance Foldable [] where
      ........foldr cons nil [] = nil
      ........foldr cons nil (x:_) = cons x nil

      ... and without any laws how could I argue for one instance over another?

      Delete
    2. I made a mistake.
      Instead of
      "for the tuple type (T, dictionary_type_of C)"
      i should have written
      "of type (dictionary_type_of C T)"

      Delete
    3. >>> " Without laws you can't justify one instance over another." <<<
      ---
      Laws do not help me in this justification. A type has multiple values probably because all are needed and different values are right in different situations. A default is not right in some theoretical way, but rather just a value that is assumed most of the times [even just by convention].
      A good example on the law-less side is (Foldable List). It is a problem-less instance, because probably everyone would implement it as it is in package "base".
      A good examples on the law-ed side is (Monoid Bool). I not only can not justify one possible instance over the other but i can not even pick one for default.
      So it is not the presence of laws that make a value good default of a type, or an instance implementation good choice for a (type-class, type) pair.

      Delete
    4. The problem isn't the number of implementations, but rather the lack of a specification. The reason laws matter is that they serve as a specification for a class.

      Without a specification a type class instance is "not even wrong". You can't discern whether or not the instance satisfies the intended purpose because you never precisely stated the intended purpose for the class in the form of laws.

      Yes, there are two `Monoid` instances for `Bool`, but that's fine because the specification says that any implementation is okay as long as `mempty` is empty with respect to `mappend`. However, there is no specification for the `Default` class and therefore no acceptance/rejection criteria for instances.

      Delete
    5. Forgive me if I'm mistaken, but I believe libeako's point was a bit different. The class `Traversable` can be imagined as:

      data TraversableD t = TD
      { traverse :: forall f a b. Applicative f => (a -> f b) -> t a -> f (t b)
      , ...
      , proofOfLaws :: ....}

      An instance of Traversable can then be thought of as an instance of Default:

      instance Default (TraversableD []) where
      def = T {...}

      There's nothing really special about the left to right traversal order.

      Delete
  6. What about types that behave like state machines? To clarify: I am talking about any datastructure whose internals are potentially hidden, and whose external interface provides (a) an 'initial' or 'new' function and (b) functions that take the structure as input and return an updated structure as output.

    In some cases, the 'initial' function will have some parameters but in many cases it will not. It might make sense to standardize the naming of this initial function by using a typeclass.

    I feel like this has not been addressed in your post. What are your thoughts on this matter?

    ReplyDelete
    Replies
    1. I think the point from the post still holds in that scenario. For example, if the state machine's internal state is an Int, then how would you know whether to select 0 or 1 as the initial state?

      Delete
    2. Thank you for your response!
      I'm not entirely convinced yet.
      Say we have a state machine called `Counter`. I'd represent its state as e.g. `newtype Counter = Counter {getCount :: Int}`, with `initial` being defined as `Counter 0`.

      There is no dispatching to the `Default` instance for `Int` (which, as you pointed out, would be ambiguous because we could pick the additive identity, multiplicative identity, etc) but this is not needed for the abstraction of a `Default` typeclass to be useful.

      Semigroup provides `<>` (i.e. `mappend`) for Monoid. But why is there no typeclass for structures that only provide the other half (i.e. only `mempty`)? This would allow functions like `fromMaybeDefault :: Default a => Maybe a -> a` to be written.

      I believe there are more than enough types for which a single well-defined identity value exists even though there is no associated binary operation. I also believe that there types for which multiple arbitrary identity values are possible but it does not matter which one we pick specifically.

      What do you think?

      Delete
    3. This StackOverflow question and answer summarize my view on this: https://stackoverflow.com/a/17100055

      For the example you cite, you could supply the starting state as an ordinary function argument and you don't benefit from the alternative approach using a Default typeclass. Quite the opposite: the global coherence constraint that typeclasses require works against you because if somebody supplies the wrong `Default` instance for a type you are out of luck.

      Delete