Friday, October 2, 2015

Polymorphism for dummies

This tutorial explains how polymorphism is implemented under the hood in Haskell using the least technical terms possible.

The simplest example of a polymorphic function is the identity function:

id :: a -> a
id x = x

The identity function works on any type of value. For example, I can apply id to an Int or to a String:

$ ghci
Prelude> id 4
4
Prelude> id "Test"
"Test"

Under the hood, the id function actually takes two arguments, not one.

-- Under the hood:

id :: forall a . a -> a
id @a x = x

The first argument of id (the @a) is the same as the a in the type signature of id. The type of the second argument (x) can refer to the value of the first argument (the @a).

If you don't believe me, you can prove this yourself by just taking the following module:

module Id where

import Prelude hiding (id)

id :: a -> a
id x = x

... and ask ghc to output the low-level "core" representation of the above id function:

$ ghc -ddump-simpl id.hs
[1 of 1] Compiling Id               ( id.hs, id.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 4, types: 5, coercions: 0}

Id.id :: forall a_apw. a_apw -> a_apw
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType]
Id.id = \ (@ a_aHC) (x_apx :: a_aHC) -> x_apx

The key part is the last line, which if you clean up looks like this:

id = \(@a) (x :: a) -> x

ghc prefixes types with @ when using them as function arguments.

In other words, every time we "generalize" a function (i.e. make it more polymorphic), we add a new hidden argument to that function corresponding to the polymorphic type.

Specialization

We can "specialize" id to a narrower type that is less polymorphic (a.k.a. "monomorphic"):

idString :: String -> String
idString = id

Under the hood, what actually happened was that we applied the id function to the String type, like this:

-- Under the hood:

idString :: String -> String
idString = id @String

We can prove this ourselves by taking this module:

module Id where

import Prelude hiding (id)

id :: a -> a
id x = x

idString :: String -> String
idString = id

... and studying the core that this module generates:

$ ghc -ddump-simpl id.hs
[1 of 1] Compiling Id               ( id.hs, id.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 6, types: 8, coercions: 0}

Id.id :: forall a_apx. a_apx -> a_apx
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType]
Id.id = \ (@ a_aHL) (x_apy :: a_aHL) -> x_apy

Id.idString :: GHC.Base.String -> GHC.Base.String
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType]
Id.idString = Id.id @ GHC.Base.String

If we clean up the last line, we get:

idString = id @String

In other words, every time we "specialize" a function (i.e. make it less polymorphic), we apply it to a hidden type argument. Again, ghc prefixes types with @ when using them as values.

So back in the REPL, when we ran code like this:

>>> id 4
4
>>> id "Test"
"Test"

... ghc was implicitly inserting hidden type arguments for us, like this:

>>> id @Int 4
4
>>> id @String "Test"

Conclusion

That's it! There's really nothing more to it than that.

The general trick for passing around type parameters as ordinary function arguments was first devised as part of System F.

This is the same way that many other languages encode polymorphism (like ML, Idris, Agda, Coq) except that some of them use a more general mechanism. However, the basic principles are the same:

  • When you make something more polymorphic you are adding additional type arguments to your function
  • When you make something less polymorphic you apply your function to type values

In ghc-8.0, you will be allowed to explicitly provide type arguments yourself if you prefer. For example, consider the read function:

read :: Read a => String -> a

If you wanted to specify what type to read without a type annotation, you could provide the type you desired as an argument to read:

read @Int :: String -> Int

Here are some other examples:

[] @Int :: [Int]

show @Int :: Int -> String

Previously, the only way to pass a type as a value was to use the following Proxy type:

data Proxy a = Proxy

... which reified a type as a value. Now you will be able to specialize functions by providing the type argument directly instead of adding a type annotation.

14 comments:

  1. I find it easier to explain Haskell polymorphism to C++/C#/Java developers by showing that it's not very different from generics/templates:

        id :: a -> a
        id x = x
    in Java/C# it would be:
        A id<A>(A a) {
             return a;
        }

    specialization:
        String idString(String a) {
             return id<String>(a);
        }

        id 4
        id "Test"
    become:
        id<Integer>(4)
        id<String>("Test")

    C++, C# and Java sometimes allows to omit type arguments, allowing to write
        id(4)
        id("Test")

    ReplyDelete
    Replies
    1. While you're right that there is a clear correspondence here, the implementations differ significantly. C++, for example, definitely does not add an implicit argument to the function. It actually compiles different versions of the function.

      Delete
    2. This comment has been removed by the author.

      Delete
  2. I think you made a mistake with the type signature for read.

    ReplyDelete
  3. In relation to what Elliot Cameron said – will this be optimized by the compiler – or will you be paying a performance penalty for the polymorphism? With C++ templates you don't.
    As the types are known by the compiler (at compile time) shouldn't it know which function (`Id@String`, `Id@Integer`...) to call?

    By performance penalty I mean: the extra type parameter has passed to the function in a register or on the stack, and afterwards a branch on the type has to be made.

    ReplyDelete
    Replies
    1. Let me clarify that these type values and arguments do not persist to the generated native code. They are only values/arguments in the sense of System F functions, not low-level C-style functions.

      However, unnecessary polymorphism **does** deteriorate performance because GHC has to box all polymorphic arguments by default. GHC can only unbox the arguments if the compiler can statically specialize the function.

      In general, GHC cannot statically specialize all polymorphic functions due to polymorphic recursion. See this Stack Overflow question and answer which explains this in more detail:

      http://stackoverflow.com/questions/10534744/static-types-polymorphism-and-specialization

      However, in a large number of cases GHC *will* do static call site specialization of these polymorphic functions. The reason this works is that when GHC compiles a module into object code the compiler also emits a `*.hi` file which contains source code for some values/functions defined within that file. GHC uses that source code to specialize things on demand at the call site if the polymorphic type can be statically determined.

      By default GHC will use certain heuristics to determine what source code to export from a module. You can also force the compiler to export the source for a given symbol by using the `INLINE` or `INLINABLE` annotations.

      There is also a `SPECIALIZE` annotation which is more like C++ explicit ahead-of-time specialization (i.e. emit code for specialized versions up front), but most people prefer to lazily specialize things on the fly using the `INLINE` or `INLINABLE` annotations.

      Delete
  4. It is only about parametric polymorphism; what about: inclusion,overloading and coercion?

    ReplyDelete
  5. Thanks for all the great Haskell blogging! A quick question about the explicit specialization in ghc 8.0+. If your function is polymorphic in 2 arguments, e.g.
    f::forall a b . a->b->a->(a,b,a)
    f a1 b a2 = (a1,b,a2)

    what will determine the order for a specialization?

    f @Int @Double :: Int->Double->Int->(Int,Double,Int)

    or

    f @Int @Double :: Double->Int->Double->(Double,Int,Double)

    ?

    Or will it work some other way?

    ReplyDelete
    Replies
    1. I think it will be in the same order in which they are universally quantified, but I'm not 100% sure of the details

      Delete
  6. Thanks for all the great Haskell blogging! A quick question about the explicit specialization in ghc 8.0+. If your function is polymorphic in 2 arguments, e.g.
    f::forall a b . a->b->a->(a,b,a)
    f a1 b a2 = (a1,b,a2)

    what will determine the order for a specialization?

    f @Int @Double :: Int->Double->Int->(Int,Double,Int)

    or

    f @Int @Double :: Double->Int->Double->(Double,Int,Double)

    ?

    Or will it work some other way?

    ReplyDelete
  7. Bookmarked to read again later - great post, thanks!

    ReplyDelete