Thursday, November 14, 2024

The Haskell inlining and specialization FAQ

The Haskell inlining and specialization FAQ

This is a post is an FAQ answering the most common questions people ask me related to inlining and specialization. I’ve also structured it as a blog post that you can read from top to bottom.

What is inlining?

“Inlining” means a compiler substituting a function call or a variable with its definition when compiling code. A really simple example of inlining is if you write code like this:

module Example where

x :: Int
x = 5

y :: Int
y = x + 1

… then at compile time the Haskell compiler can (and will) substitute the last occurrence of x with its definition (i.e. 5):

y :: Int
y = 5 + 1

… which then allows the compiler to further simplify the code to:

y :: Int
y = 6

In fact, we can verify that for ourselves by having the compiler dump its intermediate “core” representation like this:

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… which will produce this output:

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 20, types: 7, coercions: 0, joins: 0/0}

x = I# 5#

$trModule4 = "main"#

$trModule3 = TrNameS $trModule4

$trModule2 = "Example"#

$trModule1 = TrNameS $trModule2

$trModule = Module $trModule3 $trModule1

y = I# 6#

… which we can squint a little bit and read it as:

x = 5

y = 6

… and ignore the other stuff.

A slightly more interesting example of inlining is a function call, like this one:

f :: Int -> Int
f x = x + 1

y :: Int
y = f 5

The compiler will be smart enough to inline f by replacing f 5 with 5 + 1 (here x is 5):

y :: Int
y = 5 + 1

… and just like before the compiler will simplify that further to y = 6, which we can verify from the core output:

y = I# 6#

What is specialization?

“Specialization” means replacing a “polymorphic” function with a “monomorphic” function. A “polymorphic” function is a function whose type has a type variable, like this one:

-- Here `f` is our type variable
example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… and a “monomorphic” version of the same function replaces the type variable with a specific (concrete) type or type constructor:

example2 :: Maybe Int -> Maybe Int
example2 = fmap (+ 1)

Notice that example and example2 are defined in the same way, but they are not exactly the same function:

  • example is more flexible and works on strictly more type constructors

    example works on any type constructor f that implements Functor, whereas example2 only works on the Maybe type constructor (which implements Functor).

  • example and example2 compile to very different core representations

In fact, they don’t even have the same “shape” as far as GHC’s core representation is concerned. Under the hood, the example function takes two extra “hidden” function arguments compared to example2, which we can see if you dump the core output (and I’ve tidied up the output a lot for clarity):

example @f $Functor = fmap $Functor (\v -> v + 1)

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

The two extra function arguments are:

  • @f: This represents the type variable f

    Yes, the type variable that shows up in the type signature also shows up at the term level in the GHC core representation. If you want to learn more about this you might be interested in my Polymorphism for Dummies post.

  • $Functor: This represents the Functor instance for f

    Yes, the Functor instance for a type like f is actually a first-class value passed around within the GHC core representation. If you want to learn more about this you might be interested in my Scrap your Typeclasses post.

Notice how the compiler cannot optimize example as well as it can optimize example2 because the compiler doesn’t (yet) know which type constructor f we’re going to call example on and also doesn’t (yet) know which Functor f instance we’re going to use. However, once the compiler does know which type constructor we’re using it can optimize a lot more.

In fact, we can see this for ourselves by changing our code a little bit to simply define example2 in terms of example:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

example2 :: Maybe Int -> Maybe Int
example2 = example

This compiles to the exact same code as before (you can check for yourself if you don’t believe me).

Here we would say that example2 is “example specialized to the Maybe type constructor”. When write something like this:

example2 :: Maybe Int -> Maybe Int
example2 = example

… what’s actually happening under the hood is that the compiler is actually doing something like this:

example2 = example @Maybe $FunctorMaybe

In other words, the compiler is taking the more general example function (which works on any type constructor f and any Functor f instance) and then “applying” it to a specific type constructor (@Maybe) and the corresponding Functor instance ($FunctorMaybe).

In fact, we can see this for ourselves if we generate core output with optimization disabled (-O0 instead of -O2) and if we remove the -dsuppress-all flag:

$ ghc -O0 -fforce-recomp -ddump-simpl Example.hs

This outputs (among other things):

…

example2 = example @Maybe $FunctorMaybe
…

And when we enable optimizations (with -O2):

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… then GHC inlines the definition of example and simplifies things further, which is how it generates this much more optimized core representation for example2:

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

In fact, specialization is essentially the same thing as inlining under the hood (I’m oversimplifying a bit, but they are morally the same thing). The main distinction between inlining and specialization is:

  • specialization simplifies function calls with “type-level” arguments

    By “type-level” arguments I mean (hidden) function arguments that are types, type constructors, and type class instances

  • inlining simplifies function calls with “term-level” arguments

    By “term-level” arguments I mean the “ordinary” (visible) function arguments you know and love

Does GHC always inline or specialize code?

NO. GHC does not always inline or specialize code, for two main reasons:

  • Inlining is not always an optimization

    Inlining can sometimes make code slower. In particular, it can often be better to not inline a function with a large implementation because then the corresponding CPU instructions can be cached.

  • Inlining a function requires access to the function’s source code

    In particular, if the function is defined in a different module from where the function is used (a.k.a. the “call site”) then the call site does not necessarily have access to the function’s source code.

To expand on the latter point, Haskell modules are compiled separately (in other words, each module is a separate “compilation unit”), and the compiler generates two outputs when compiling a module:

  • a .o file containing object code (e.g. Example.o)

    This object code is what is linked into the final executable to generate a runnable program.

  • a .hi file containing (among other things) source code

    The compiler can optionally store the source code for any compiled functions inside this .hi file so that it can inline those functions when compiling other modules.

However, the compiler does not always save the source code for all functions that it compile because there are downsides to storing source code for functions:

  • this slows down compilation

    This slows down compilation both for the “upstream” module (the module defining the function we might want to inline) and the “downstream” module (the module calling the function we might want to inline). The upstream module takes longer to compile because now the full body of the function needs to be saved in the .hi file and the downstream module takes longer to compile because inlining isn’t free (all optimizations, including inlining, generate more work for the compiler).

  • this makes the .hi file bigger

    The .hi file gets bigger because it’s storing the source code of the function.

  • this can also make the object code larger, too

    Inlining a function multiple times can lead to duplicating the corresponding object code for that function.

This is why by default the compiler uses its own heuristic to decide which functions are worth storing in the .hi file. The compiler does not indiscriminately save the source code for all functions.

You can override the compiler’s heuristic, though, using …

Compiler directives

There are a few compiler directives (a.k.a. “pragmas”) related to inlining and specialization that we’ll cover here:

  • INLINABLE
  • INLINE
  • NOINLINE
  • SPECIALIZE

My general rule of thumb for these compiler directives is:

  • don’t use any compiler directive until you benchmark your code to show that it helps
  • if you do use a compiler directive, INLINABLE is probably the one you should pick

I’ll still explain what what all the compiler directives mean, though.

INLINABLE

INLINABLE is a compiler directive that you use like this:

f :: Int -> Int
f x = x + 1
{-# INLINABLE f #-}

The INLINABLE directive tells the compiler to save the function’s source code in the .hi file in order to make that function available for inlining downstream. HOWEVER, INLINABLE does NOT force the compiler to inline that function. The compiler will still use its own judgment to decide whether or not the function should be inlined (and the compiler’s judgment tends to be fairly good).

INLINE

INLINE is a compiler directive that you use in a similar manner as INLINABLE:

f :: Int -> Int
f x = x + 1
{-# INLINE f #-}

INLINE behaves like INLINABLE except that it also heavily biases the compiler in favor of inlining the function. There are still some cases where the compiler will refuse to fully inline the function (for example, if the function is recursive), but generally speaking the INLINE directive override’s the compiler’s own judgment for whether or not to inline the function.

I would argue that you usually should prefer the INLINABLE pragma over the INLINE pragma because the compiler’s judgment for whether or not to inline things is usually good. If you override the compiler’s judgment there’s a good chance you’re making things worse unless you have benchmarks showing otherwise.

NOINLINE

If you mark a function as NOINLINE:

f :: Int -> Int
f x = x + 1
{-# NOINLINE f #-}

… then the compiler will refuse to inline that function. It’s pretty rare to see people use a NOINLINE annotation for performance reasons (although there are circumstances where NOINLINE can be an optimization). It’s far, far, far more common to see people use NOINLINE in conjunction with unsafePerformIO because that’s what the unsafePerformIO documentation recommends:

Use {-# NOINLINE foo #-} as a pragma on any function foo that calls unsafePerformIO. If the call is inlined, the I/O may be performed more than once.

SPECIALIZE

SPECIALIZE lets you hint to the compiler that it should compile a polymorphic function for a monomorphic type ahead of time. For example, if we define a polymorphic function like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… we can tell the compiler to go ahead and specialize the example function for the special case where f is Maybe, like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)
{-# SPECIALIZE example :: Maybe Int -> Maybe Int #-}

This tells the compiler to go ahead and compile the more specialized version, too, because we expect some other module to use that more specialized version. This is nice if we want to get the benefits of specialization without exporting the function’s source code (so we don’t bloat the .hi file) or if we want more precise control over when specialize does and does not happen.

In practice, though, I find that most Haskell programmers don’t want to go to the trouble of anticipating and declaring all possible specializations, which is why I endorse INLINABLE as the more ergonomic alternative to SPECIALIZE.