Haskell for all: A GHC plugin for OpenTelemetry build metrics

A GHC plugin for OpenTelemetry build metrics

This post is about a new OpenTelemetry plugin for GHC that I’ve been building for work that we’re open sourcing because I think it might be broadly useful to others. If all you want to do is use the plugin then you can find it on Hackage, which includes more detailed usage instructions. This post will focus more on the motivation and background behind the plugin’s development.

Motivation

The context behind this work was that we use Honeycomb at work for collecting metrics related to production and our team¹ has begun to apply those same metrics to our builds. In particular, we wanted to collect detailed (module-level) build metrics so that we could begin to hunt down and fix expensive modules within our codebase. For context: our codebase currently has almost 7000 modules, so these expensive modules can easily fly under the radar.

When we enable the plugin and export the results to Honeycomb we can begin to see which modules are the most expensive to build:

… and none of the modules are individually very expensive to build (the worst offender is only about 5 seconds), so they’d easily get lost within a sea of thousands of other modules.

However, these sorts of insights have already proven useful. For example:

one expensive modules was completely unused in our codebase

The above list brought it to our attention so that we could delete it.
other expensive modules were representative examples of larger issues to fix

For example, one expensive module consisted of 2000 invocations of an internal function which is expensive to type-check and fixing this function will improve compile speeds across our codebase and not just that module.
other expensive modules are indicative of architectural anti-patterns

Frequently “horizontally-organized” modules top the chart, and I view them as anti-patterns for a few reasons (see: my post on Module organization guidelines). These modules are not expensive per se (the code inside them has to be compiled somewhere), but they tend to be build chokepoints because they have a large number of dependencies and reverse dependencies. Highlighting expensive modules has a tendency to highlight these sorts of build chokepoints as a side bonus.

In principle you can also browse a given build’s trace interactively, like this:

However, for our codebase Honeycomb chokes on our giant build traces and we can only produce visualizations like the above image if we filter down the spans to a randomly sampled subset of modules. Honeycomb doesn’t do a good job of handling traces with a few thousand spans or more.

Workarounds

This plugin was surprisingly difficult for me to implement because GHC’s Plugin interface is so constrained.

For example, the hs-opentelemetry-sdk package asks you to finalize any TracerProvider that you acquire, but there’s no good way (that I know of²) to run finalization logic at the end of a ghc build using the Plugin interface. The purpose of this finalization logic is to flush metrics that haven’t yet been exported.

So what I did was to hack around this by detecting all modules that are root modules of the build graph and flushing metrics after each of those root modules is built (since one of them will be the last module built). I tried a bunch of other alternative approaches (like installing a phase hook), but this was the only approach I was able to get to work.

And the OpenTelemetry plugin is full of workarounds like this. We have vetted internally that the plugin works for normal builds, ghcid and haskell-language-server, but generally I expect there to be some trailing bugs that we’ll have to fix as more people use it due to these various unsafe implementation details.

In fact, one limitation of the plugin is that the top-level span has a duration of 0 (instead of reporting the duration of the build). This is related to the same issue of the Plugin interface apparently not having a good way to run code exactly once after the build completes (even using hacks). If somebody knows of a way to do this that I missed I’d definitely welcome the tip!

Conclusion

What we do know from internal usage is that:

the plugin definitely scales to very large codebases (thousands of modules)

… although honeycomb doesn’t scale to thousands of spans, but that’s not our fault.
the plugin’s overhead is negligible (so it’s safe to always enable)
the plugin works with cabal commands, ghcid, and haskell-language-server

So it should be fine for most use cases, but please report any issues that you run into.

1 comment:

Jeffrey YoungOctober 8, 2024 at 5:32 AM
In the past I've done `cabal build | ts -i '[%.s]'` for this. Which shows the elapsed time for each line of output, here's an old sample of the output:

```
$ cabal build | ts -i '[%.s]'
[1.164258] Resolving dependencies...
[0.555592] Build profile: -w ghc-8.10.4 -O1
[0.000041] In order, the following will be built (use -v for more details):
[0.000008] - Only-0.1 (lib) (requires build)
[0.000008] - assoc-1.0.2 (lib) (requires build)
[0.000010] - base-compat-0.11.2 (lib) (requires build)
[0.000010] - basement-0.0.11 (lib) (requires build)
[0.000019] - data-fix-0.3.0 (lib) (requires build)
[0.000019] - dlist-1.0 (lib) (requires build)
[0.000019] - text-short-0.1.3 (lib) (requires build)
[0.000019] - time-compat-1.9.5 (lib) (requires build)
[0.000017] - uuid-types-1.0.3 (lib) (requires build)
[0.000018] - vsmt-0.0.1 (lib) (configuration changed)
[0.000049] - these-1.1.1.1 (lib) (requires build)
[0.000023] - base-compat-batteries-0.11.2 (lib) (requires build)
[0.000022] - gauge-0.2.5 (lib) (requires build)
[0.000025] - cassava-0.5.2.0 (lib) (requires build)
[0.000028] - strict-0.4.0.1 (lib) (requires build)
[0.000030] - vsmt-0.0.1 (bench:busybox) (first run)
[0.000028] - aeson-1.5.4.1 (lib) (requires build)
...
[12.699889] Installing cassava-0.5.2.0 (lib)
[0.321192] Completed cassava-0.5.2.0 (lib)
[6.319523] Installing basement-0.0.11 (lib)
[0.362916] Completed basement-0.0.11 (lib)
[0.018331] Starting gauge-0.2.5 (lib)
...
```
The most significant digit is seconds. After that one can just tee a log, and sort or construct a csv via awk. Its hacky, but its a one-liner to do this locally without all the complexity that comes with plugins.