This tutorial explains how polymorphism is implemented under the hood in Haskell using the least technical terms possible.
The simplest example of a polymorphic function is the identity function:
id :: a -> a
id x = x
The identity function works on any type of value. For example, I can apply id
to an Int
or to a String
:
$ ghci
Prelude> id 4
4
Prelude> id "Test"
"Test"
Under the hood, the id
function actually takes two arguments, not one.
-- Under the hood:
id :: forall a . a -> a
id @a x = x
The first argument of id
(the @a
) is the same as the a
in the type signature of id
. The type of the second argument (x
) can refer to the value of the first argument (the @a
).
If you don't believe me, you can prove this yourself by just taking the following module:
module Id where
import Prelude hiding (id)
id :: a -> a
id x = x
... and ask ghc
to output the low-level "core" representation of the above id
function:
$ ghc -ddump-simpl id.hs
[1 of 1] Compiling Id ( id.hs, id.o )
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 4, types: 5, coercions: 0}
Id.id :: forall a_apw. a_apw -> a_apw
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType]
Id.id = \ (@ a_aHC) (x_apx :: a_aHC) -> x_apx
The key part is the last line, which if you clean up looks like this:
id = \(@a) (x :: a) -> x
ghc
prefixes types with @
when using them as function arguments.
In other words, every time we "generalize" a function (i.e. make it more polymorphic), we add a new hidden argument to that function corresponding to the polymorphic type.
Specialization
We can "specialize" id
to a narrower type that is less polymorphic (a.k.a. "monomorphic"):
idString :: String -> String
idString = id
Under the hood, what actually happened was that we applied the id
function to the String
type, like this:
-- Under the hood:
idString :: String -> String
idString = id @String
We can prove this ourselves by taking this module:
module Id where
import Prelude hiding (id)
id :: a -> a
id x = x
idString :: String -> String
idString = id
... and studying the core that this module generates:
$ ghc -ddump-simpl id.hs
[1 of 1] Compiling Id ( id.hs, id.o )
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 6, types: 8, coercions: 0}
Id.id :: forall a_apx. a_apx -> a_apx
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType]
Id.id = \ (@ a_aHL) (x_apy :: a_aHL) -> x_apy
Id.idString :: GHC.Base.String -> GHC.Base.String
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType]
Id.idString = Id.id @ GHC.Base.String
If we clean up the last line, we get:
idString = id @String
In other words, every time we "specialize" a function (i.e. make it less polymorphic), we apply it to a hidden type argument. Again, ghc
prefixes types with @
when using them as values.
So back in the REPL, when we ran code like this:
>>> id 4
4
>>> id "Test"
"Test"
... ghc
was implicitly inserting hidden type arguments for us, like this:
>>> id @Int 4
4
>>> id @String "Test"
Conclusion
That's it! There's really nothing more to it than that.
The general trick for passing around type parameters as ordinary function arguments was first devised as part of System F.
This is the same way that many other languages encode polymorphism (like ML, Idris, Agda, Coq) except that some of them use a more general mechanism. However, the basic principles are the same:
- When you make something more polymorphic you are adding additional type arguments to your function
- When you make something less polymorphic you apply your function to type values
In ghc-8.0
, you will be allowed to explicitly provide type arguments yourself if you prefer. For example, consider the read
function:
read :: Read a => String -> a
If you wanted to specify what type to read
without a type annotation, you could provide the type you desired as an argument to read
:
read @Int :: String -> Int
Here are some other examples:
[] @Int :: [Int]
show @Int :: Int -> String
Previously, the only way to pass a type as a value was to use the following Proxy
type:
data Proxy a = Proxy
... which reified a type as a value. Now you will be able to specialize functions by providing the type argument directly instead of adding a type annotation.
I find it easier to explain Haskell polymorphism to C++/C#/Java developers by showing that it's not very different from generics/templates:
ReplyDeleteid :: a -> a
id x = x
in Java/C# it would be:
A id<A>(A a) {
return a;
}
specialization:
String idString(String a) {
return id<String>(a);
}
id 4
id "Test"
become:
id<Integer>(4)
id<String>("Test")
C++, C# and Java sometimes allows to omit type arguments, allowing to write
id(4)
id("Test")
While you're right that there is a clear correspondence here, the implementations differ significantly. C++, for example, definitely does not add an implicit argument to the function. It actually compiles different versions of the function.
DeleteThis comment has been removed by the author.
DeleteI think you made a mistake with the type signature for read.
ReplyDeleteIn relation to what Elliot Cameron said – will this be optimized by the compiler – or will you be paying a performance penalty for the polymorphism? With C++ templates you don't.
ReplyDeleteAs the types are known by the compiler (at compile time) shouldn't it know which function (`Id@String`, `Id@Integer`...) to call?
By performance penalty I mean: the extra type parameter has passed to the function in a register or on the stack, and afterwards a branch on the type has to be made.
*has to be passed
DeleteLet me clarify that these type values and arguments do not persist to the generated native code. They are only values/arguments in the sense of System F functions, not low-level C-style functions.
DeleteHowever, unnecessary polymorphism **does** deteriorate performance because GHC has to box all polymorphic arguments by default. GHC can only unbox the arguments if the compiler can statically specialize the function.
In general, GHC cannot statically specialize all polymorphic functions due to polymorphic recursion. See this Stack Overflow question and answer which explains this in more detail:
http://stackoverflow.com/questions/10534744/static-types-polymorphism-and-specialization
However, in a large number of cases GHC *will* do static call site specialization of these polymorphic functions. The reason this works is that when GHC compiles a module into object code the compiler also emits a `*.hi` file which contains source code for some values/functions defined within that file. GHC uses that source code to specialize things on demand at the call site if the polymorphic type can be statically determined.
By default GHC will use certain heuristics to determine what source code to export from a module. You can also force the compiler to export the source for a given symbol by using the `INLINE` or `INLINABLE` annotations.
There is also a `SPECIALIZE` annotation which is more like C++ explicit ahead-of-time specialization (i.e. emit code for specialized versions up front), but most people prefer to lazily specialize things on the fly using the `INLINE` or `INLINABLE` annotations.
Cool, thanks!
DeleteYou're welcome!
DeleteIt is only about parametric polymorphism; what about: inclusion,overloading and coercion?
ReplyDeleteThanks for all the great Haskell blogging! A quick question about the explicit specialization in ghc 8.0+. If your function is polymorphic in 2 arguments, e.g.
ReplyDeletef::forall a b . a->b->a->(a,b,a)
f a1 b a2 = (a1,b,a2)
what will determine the order for a specialization?
f @Int @Double :: Int->Double->Int->(Int,Double,Int)
or
f @Int @Double :: Double->Int->Double->(Double,Int,Double)
?
Or will it work some other way?
I think it will be in the same order in which they are universally quantified, but I'm not 100% sure of the details
DeleteThanks for all the great Haskell blogging! A quick question about the explicit specialization in ghc 8.0+. If your function is polymorphic in 2 arguments, e.g.
ReplyDeletef::forall a b . a->b->a->(a,b,a)
f a1 b a2 = (a1,b,a2)
what will determine the order for a specialization?
f @Int @Double :: Int->Double->Int->(Int,Double,Int)
or
f @Int @Double :: Double->Int->Double->(Double,Int,Double)
?
Or will it work some other way?
Bookmarked to read again later - great post, thanks!
ReplyDelete