Saturday, October 20, 2012

"Hello, core!"

Haskell optimization is very opaque and getting tight inner loops is something of a black art for people who don't take the time to learn GHC's core output. If you like fine-grained optimization, this post will walk you through a very simple example to help you learn to read core.

I'm planning to write some posts in the future discussing optimizing Haskell code and I would like to use this post as a foundation for those latter discussions.


ghc-core


When you practice reading core, stick to really simple programs. In fact, let's study the core generated by the simplest possible Haskell program:
main :: IO ()
main = return ()
The ghc-core tool produces a (slightly) readable core output. To install it, just type:
cabal install ghc-core
... and cabal should install it to ~/.cabal/bin/ghc-core.

We generate core output by running:
~/.cabal/bin/ghc-core --no-cast --no-asm program.hs
The --no-cast flag improves the readability by removing type casts and the --no-asm flag says not to include the final assembly code.

The above command automatically enables all optimizations and outputs to a pager the following two sections:
  • A list of optimizations that fired
  • The core section, which contains the intermediate core representation
The optimization section should look something like this:
0 Lets floated to top level; 0 Lets floated elsewhere; from ...

0 Lets floated to top level; 0 Lets floated elsewhere; from ...

Total ticks:     15

2 PreInlineUnconditionally
  1 x_abD
  1 s_abE
5 UnfoldingDone
  1 returnIO
  1 runMainIO
  1 main
  1 returnIO1
  1 $fMonadIO_$creturn
1 RuleFired 1 Class op return
2 LetFloatFromLet 2
2 EtaExpansion
  1 :main
  1 main
3 BetaReduction
  1 a_abC
  1 x_abD
  1 s_abE
10 SimplifierDone 10
... and the core should look something like this:
Result size = 20

main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
[GblId,
 Arity=1,

 Unf=Unf{Src=, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)}]
main1 =
  \ (eta_B1 :: State# RealWorld) ->
    (# eta_B1, () #)

main :: IO ()
[GblId,
 Arity=1,

 Unf=Unf{Src=, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)}]
main =
  main1
  

main2
  :: State# RealWorld
     -> (# State# RealWorld, () #)
[GblId,
 Arity=1,

 Unf=Unf{Src=, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 30 0}]
main2 =
  \ (eta_X7 :: State# RealWorld) ->
    runMainIO1
      @ ()
      (main1
       )
      eta_X7

:main :: IO ()
[GblId,
 Arity=1,

 Unf=Unf{Src=, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)}]
:main =
  main2
That still looks pretty ugly, so let's filter it a bit.


Core


You'll notice that every function has a type signature, without exception. For example, main1's type is:
main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
However, each function also has a set of attributes that ghc uses to guide various optimization passes. main1's attributes are:
[GblId,
 Arity=1,

 Unf=Unf{Src=, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)}]
For example, Cheap=True says that it is cheap to inline main1. Similarly, ConLike=True says it is okay if a rewrite rule requires duplicating main1's code.

However, for the purposes of this post, we will just completely ignore these annotations and just focus on the type signatures and code. I'll remove the attributes to leave behind something easier on the eyes:
main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
main1 =
  \ (eta_B1 :: State# RealWorld) ->
    (# eta_B1, () #)

main :: IO ()
main =
  main1

main2
  :: State# RealWorld
     -> (# State# RealWorld, () #)
main2 =
  \ (eta_X7 :: State# RealWorld) ->
    runMainIO1
      @ ()
      (main1
       )
      eta_X7

:main :: IO ()
:main =
  main2
Now we see that there are four functions total, but a lot of noise still remain.


Primitive types and values


Let's walk through this core, beginning with the type signature of main1:
main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
Somebody sprinkled #s all over our type signature. These #s denote primitive types and values exposed by the compiler that are not defined within the Haskell language. You can check out the GHC.Prim module which declares these values and type. Notice how it cheats and declares all types empty and the values are all set to let x = x in x just so that everything type-checks.

You can learn more about these primitive types and operations by reading the section on Unboxed types and primitive operations in the GHC user guide.


IO is a state monad


GHC implements IO as a newtype wrapper around a state monad:
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
Notice that this is just the right type to wrap main1, which uses the same stateful pattern:
main1 :: State# RealWorld -> (# State# RealWorld, () #)
IOs state-based implementation ensures proper sequencing of computations and the State# RealWorld token is an empty token that exists solely to ensure that each sequenced computation depends on the previous one so that the compiler does not rearrange, merge or duplicate them.


The chain of command


Now, let's study the definition of main1:
main1 =
  \ (eta_B1 :: State# RealWorld) ->
    (# eta_B1, () #)
If we tidy this up and remove the type annotation, we get:
main1 = \s -> (s, ())
This is just return () in the State (State# RealWorld) monad. The compiler translated our original return () statement into the equivalent stateful function under hood.

However, the next function seems to accomplish nothing:
main :: IO ()
main =
  main1
... but that's only because the --no-casts flag removed type casts. If we add them back in, we get:
main =
  main1
  `cast` (Sym (NTCo:IO <()>)
          :: (State# RealWorld
              -> (# State# RealWorld, () #))
               ~#
             IO ())
That `cast` statement corresponds to the IO constructor from our IO newtype:
IO :: (State# RealWorld -> (# State# RealWorld, a #)) -> IO a
This constructor hides the underlying stateful representation behind the opaque IO newtype. However, unlike data constructors, newtype constructors cost nothing and don't translate into functions with a run-time overhead. Instead, they translate into free compile-time cast statements which the compiler erases once compilation is complete.

This explains why main has a different type from main1:
main :: IO ()
main packages our impure code inside the IO newtype, whereas main1 contains the raw underlying stateful representation.

In fact, main corresponds exactly to the main function we wrote in our original code (which is why it has the exact same name):
-- The original program
main :: IO ()
main = return ()
The next function appears to run our code:
main2 =
  \ (eta_X7 :: State# RealWorld) ->
    runMainIO1
      @ ()
      (main1
       )
      eta_X7
With a little tidying up, this becomes:
main2 = \s -> runMainIO1 main1 s
But wait, it runs main1 and not main! I'm guessing that the main token exists solely to establish the correspondence to the user-written code, but perhaps ghc doesn't actually care about it and instead uses the raw main1 directly.

Now, to be honest, I don't actually know what runMainIO1 does, but according to Edward Yang, it initializes some things like interrupt handlers before running the program.

Finally, we reach the true entry point of our program:
:main :: IO ()
:main =
  main2 -- There is a hidden cast here
This passes our program to the Haskell runtime to be executed. That's it! We're done.

7 comments:

  1. A few comments:
    - haskell-mode for emacs has a minor mode for viewing core files. It requires that core files have hcr extension. There's syntax highlighting and code cleanup feature enabled with M-x ghc-core-clean-buffer
    - CORE pragma is useful for placing annotations in the generated core
    - Edward Yang has a nice post on Core, STG and Cmm: http://blog.ezyang.com/2011/04/tracing-the-compilation-of-hello-factorial/

    Anyway, do you know any good resources for learning core? There is a section of GHC user's manual but besides being a bit outdated it isn't suitable for learning everything from scratch.

    BTW. It would be nice if you enabled different forms of

    ReplyDelete
    Replies
    1. Probably the best starting point for learning core is this Stack Overflow link:

      http://stackoverflow.com/questions/6121146/reading-ghc-core

      This has great links and information in both the question and answer.

      Delete
    2. Thanks! I already read about half of material listed there, but still I only understand the basics.

      Delete
    3. Well, if you have any particular questions I can make sure to address them in my subsequent posts.

      Delete
    4. I guess I'm a bit confused about duplicate functions in my core files. E.g. I get two functions like this:

      csr_$scsr =
      \ eta_B1 ->
      csr =
      \ @ r_a2rS $dSource_a2rT eta_B1 ->

      And both seem to do the same thing, with details slightly different. I suspect that the former one is a specialized version of the latter, because at the very end of core file I see something like this:

      "SPEC csl [U]" [ALWAYS]
      forall $dSource_X2HL. csl $dSource_X2HL = csl_$scsl

      Delete
    5. That sounds right. You can prove it by using -fspec-constr-count=0 (I think) and see if the specialized version goes away.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete