Friday, September 8, 2023

GHC plugin for HLint

GHC plugin for HLint

At work I was recently experimenting with running hlint (the widely used Haskell linting program) as a GHC plugin. One reason why I was interested in this is because we have a large (6000+ module) Haskell codebase at work, and I wanted to see if this would make it cheaper to run hlint on our codebase. Ultimately it did not work out but I built something that we could open source so I polished it up and released it in case other people find it useful. You can find the plugin (named hlint-plugin) on Hackage and on GitHub.

This post will explain the background and motivation behind this work to explain why such a plugin might be potentially useful to other Haskell users.

Introduction to hlint

If you’ve never heard of hlint before, it’s a Haskell source code linting tool that is pretty widely used in the Haskell ecosystem. For example, if you run hlint on the following Haskell file:

main :: IO ()
main = (mempty)

… then you’ll get the following hlint error message:

Main.hs:2:8-15: Warning: Redundant bracket
Found:
  (mempty)
Perhaps:
  mempty
  
1 hint

… telling the user to remove the parentheses1 from around the mempty.

Integrating hlint

However, hlint is a tool that is not integrated into the compiler, meaning that you have to run it out of band from compilation for it to catch errors. There are a few ways that one can fix this, though:

  • Create a script that builds your program and then runs hlint

    This is the simplest possible thing that one can do, but it works and some people do this. It’s the “low-tech” solution.

  • Use haskell-language-server or some IDE that plugin that auto-runs hlint

    This is a bit nicer for developers because now they can get rapid feedback (in their editor) as they are authoring the code. For example, haskell-language-server supports an hlint plugin2 for this purpose.

  • A GHC plugin (what this post is about)

    If you turn hlint into a GHC plugin, then ALL GHC-based Haskell tools automatically incorporate hlint suggestions. For example, ghcid would automatically include hlint suggestions in its output, something that doesn’t work with other approaches to integrate hlint. Similarly, all cabal commands (including cabal build and cabal repl) and all stack commands benefit from a GHC plugin.

Alternatives

I’m not the first person who had this idea of turning hlint into a GHC plugin. The first attempt to do this was hlint-source-plugin, but that was a pretty low-tech solution; it basically ran hlint as an executable on the Haskell source file being processed even though the GHC plugin already has access to the parsed syntax tree.

The second attempt was the splint package. This GHC plugin was really well done (it’s basically exactly how I envisioned this was supposed to work) and the corresponding announcement post does a great job of motivating why hlint benefits from being run as a GHC plugin.

However, the problem is that the splint package was recently abandoned and the last version of GHC it supports is GHC 9.2. Since we use GHC 9.6 at work I decided to essentially revive the splint package so I created the hlint-plugin package which is essentially the successor to splint.

Improvements

hlint-plugin is not too different from what splint did, but the main improvements that hlint-plugin brings are:

  • Support for newer versions of GHC

    splint supports GHC versions 8.10, 9.0, and 9.2 whereas hlint-plugin supports GHC versions 9.0, 9.2, 9.4, and 9.6.

  • Known-good cabal/stack/nix builds for the plugin

    … see the next section for more details.

  • A test suite to verify that the plugin works

    hlint-plugin’s CI actually checks that the plugin works for all supported versions of GHC.

  • A simpler work-around to GHC issue #18261

    Basically, I independently stumbled upon the exact same problem that splint encountered, but worked around it in a simpler way. I won’t go into too much detail here other than to point out that you can compare how splint works around this bug with how hlint-plugin works around the bug.

Also, when stress testing hlint-plugin on our internal codebase I discovered an hlint bug which affected some of our modules, and fixed that, so the fix will be in the next release of hlint.

Tricky build stuff

Unfortunately, both splint and hlint-plugin are tricky to correctly install. Why? Because, by default hlint (and ghc-lib-parser-ex) use the ghc-lib and ghc-lib-parser packages by default instead of the ghc API. This is actually a pain in the ass because a GHC plugin needs to be created using the ghc API (i.e. it needs to be a value of type ghc:GHC.Plugins.Plugin). Like, you can use hlint to create a ghc-lib:GHC.Plugins.Plugin and everything will type-check and build, but then when you try to actually run the plugin it will fail.

There is a way to get hlint and ghc-lib-parser-ex to use the ghc API, though! However, you have to build them with non-default cabal configure flags. Specifically, you have to configure hlint with the -f-ghc-lib option and configure ghc-lib-parser-ex with the -fno-ghc-lib option.

To ease things for users I provided a cabal.project file and a flake.nix file4 with working builds for hlint-plugin that set all the correct configuration options.

Performance

I mentioned in the introduction that I was hoping for some performance improvements from switching to a plugin but those improvements didn’t materialize. I’ll talk a bit about what I thought would work and why it didn’t pan out for us (even though it still might help for you).

So there are up to three ways that hlint could potentially be faster as a GHC plugin:

  • Not having to re-lint modules that haven’t changed

    This is nice (especially when your codebase has 6000+ modules like ours). When you turn hlint into a GHC plugin you only run it whenever GHC recompiles a module and you don’t have to run hlint over your entire codebase after every change.

    However, this was actually not a significant benefit to our company because we already have some scripts which take care of only running hlint on the modules that have changed (according to git). However, it’s still a “nice to have” because it’s architecturally simpler (no need to write that clever script if GHC can take care of detecting changes for us).

  • Not having to parse the Haskell code twice

    This is likely a minor performance improvement since parsing is (in my experience) typically not the bottleneck for compiling Haskell code.

  • Running hlint while GHC is compiling modules

    What I mean by this is that if hlint is a GHC plugin then it can begin running while the GHC build is ongoing! In large builds (like ours) there are often a large number of cores that go unused and the hlint plugin could potentially exploit those idle cores to do work before the build is done.

    However, in practice this benefit did not pan out and our build didn't really get faster when we enabled hlint-plugin. The time it took to build our codebase with the plugin was essentially the same amount of time as running hlint in a separate step.

Future directions

The hlint-source-plugin repository notes that if hlint were implemented as a GHC plugin (which it now is) then it would fix some of the hacks that hlint has to use:

Currently this plugin simply hooks into the parse stage and calls HLint with a file path. This means HLint will re-parse all source code. The next logical step is to use the actual parse tree, as given to us by GHC, and HLint that. This means that HLint can lose the special logic to run CPP, along with the hacky handling of fixity resolution (we get that done correctly by GHC’s renaming phase).

… because of this I sort of feel that hlint really should be a GHC plugin. It’s understandable why hlint was not initially implemented in this way (since I believe the GHC plugin system didn’t exist back then), but now it sort of feels like a GHC plugin is a much more natural way of integrating hlint.


  1. I refuse to call parentheses “brackets”.↩︎

  2. Note that this is a plugin for haskell-language-server, which is a different type of plugin than a GHC plugin. A haskell-language-server plugin only works with haskell-language-server whereas a GHC plugin works with anything that uses GHC. The two types of plugins are also installed and set up in different ways.↩︎

  3. Note that this is a plugin for haskell-language-server, which is a different type of plugin than a GHC plugin. A haskell-language-server plugin only works with haskell-language-server whereas a GHC plugin works with anything that uses GHC. The two types of plugins are also installed and set up in different ways.↩︎

  4. I tried to create a working stack.yaml and failed to get it working, but I’d accept a pull request adding a working stack build if someone else has better luck than I did.↩︎

3 comments:

  1. Just a quick correction: the HLS plugin doesn't call the `hlint` executable, it calls into the `hlint` library directly and passes the parsed module. So it doesn't need `hlint` on the path and it doesn't parse the module twice.

    Of course, it's also not useful for non-interactive use!

    ReplyDelete
    Replies
    1. Thanks for pointing that out! I deleted the incorrect statement from the post.

      Delete
  2. Very interesting, thank you! Too bad the expected speed-ups didn't materialize, but if hlint becomes a real GHC plugin in the future, maybe it will.

    ReplyDelete