The context for this post is that at work I recently implemented Nix ecosystem support for “incrementally” building Haskell packages. By “incrementally” I mean that these Nix builds only need to build what changed since the last full build of the package so that the package doesn’t need to be built from scratch every time.
The pull requests implementing this feature have not yet been approved or merged at the time of this writing, but I figured that I would explain the motivation, design, results, and limitations of this work to hopefully persuade people that this work should be merged.
If you’re not interested in the design then you can skip straight to the Demo section below.
Background
I work on Mercury’s Backend Development User Experience team and we support developers contributing to a large Haskell monolith consisting of 3000+ modules. That may seem like a lot but the vast majority of these modules are small and the whole codebase takes ~14 minutes to compile in CI if we disable optimizations (although we still build with optimizations enabled for deployment).
In my experience, that’s pretty good for a Haskell project of this size, thanks not only to the work of our team but also other teams who also contribute to improving the development experience. In fact, the pioneering work for this “incremental builds” feature actually originated from two engineers outside our team.
First, Harry Garrood improved GHC’s change detection algorithm so that GHC would use the hash of the file to detect changes instead of using the timestamp. In this post he explains how you can make use of this to implement incremental builds for traditional CI services (e.g. GitHub actions) where each build reuses the intermediate build products from the prior build instead of building from scratch.
That alone would not be enough for us to use this at work since we use Nix where this sort of build impurity doesn’t fly. However, Harry and Jade Lovelace prototyped using this feature in Nixpkgs so that Nix builds of Haskell packages could also reuse intermediate build products from prior builds to save work. You can find their prototype here.
The basic idea behind the prototype Nixpkgs integration is that you split a Haskell package build into two separate builds:
A “full build” that builds the Haskell package from scratch
This full build exports its intermediate build products (i.e. the
dist
directory) which can then be reused by:An “incremental build” that only builds what changed since the full build
This incremental build imports the intermediate build products from the corresponding full build so that it doesn’t have to build the package from scratch.
So you might wonder: if that was already implemented then what work still remained for me to do?
Problem
The main issue with the initial Nixpkgs integration is that it does not provide any support for selecting which Git revision to use as the basis for the full build. The existing solutions require some out-of-band process to automatically select and lock the appropriate git revision to use for the older (full) build.
Non-solution #0: Rolling rebuilds
The first non-solution is for each revision to always reuse the build products from the previous revision. This doesn’t work well with Nix because it would create an increasingly-long chain of dependent derivations; in order to build the most recent revision you’d have to build all preceding revisions.
The dilemma here is that Nix is forcing us to confront something that other build tools gloss over: if you’re always reusing build products from the last build then you can’t accurately reproduce the most recent build from scratch without reproducing all prior builds. You’ve essentially “contaminated” the current build with all prior builds by doing things in this way.
So what we really want is something more like this:
Periodically do a full build from scratch and then make each incremental build relative to the last full rebuild.
That’s much more compatible with Nix because then we only need to do two builds of our project if we rebuild things from scratch, instead of one build for every revision in our project’s history.
There’s also another issue with rolling rebuilds when you’re not using Nix, which is that most naïve attempts to do this don’t ensure that the starting build products came from the parent commit. You can end up with contamination of build products across branches if you’re not careful, which further complicates reproducibility.
Non-solution #1: Lockfile
Okay, so suppose you periodically do a full build of the project from scratch and then each incremental build is relative to the last full build. You would need to do a full rebuild frequently enough so that the incremental builds stay quick. If you wait too long in between full rebuilds then the project will evolve to the point where the incremental builds can no longer reuse most of the build products from the last full build and in the extreme case the incremental builds degenerate into full builds if they can’t reuse any old build products.
For example, at our work we currently do a full build of our large package once a day, so we need some way to update the full build to point to the last revision from the preceding day.
One
existing approach to solving this involved using Nix flakes to
manage the git revision for the older build. The idea is that you
periodically run nix flake update
to update the revision
used for the full build and you might even automate this process by
having some recurring cron
job generate a pull request or
commit to bump this revision on the main development branch. You don’t
have to use flakes for this purpose, but flakes are probably the most
ergonomic solution along these lines.
However, there are a few issues with this approach:
It only works well for short-lived pull requests
In other words, if you update the revision used for the full build once a day then typically only pull requests that are less than a day old will benefit from incremental builds.
Specifically, what we’d really like is “branch-local” incremental builds. In other words if a longer-lived development branch were to deposit a few commits a day we’d like there to be a full rebuild once a day on that branch so that incremental builds against the tip of that development branch remain snappy.
It pollutes the
git
historyIf you bump the lockfile, say, once per day then that’s one junk commit that you’ve added to your
git
history every day.It’s difficult to open source any useful automation around this
If the solution requires out-of-band machinery (e.g. some recurring
cron
job) to bump the lockfile you can’t provide a great user experience for open source projects. It only really works well for proprietary projects that can tolerate that complexity.
That last point was the most important one for me. Generally, when I design something (even something intended for internal, proprietary use) I try to design it in such a way that it works well in an open source context, too. In my experience, doing things in this way tends to improve the design, quality, and user experience of software that I build.
In particular, I wanted a solution where all the automation could be implemented entirely within the Nix language. However, this is not possible in Nix’s present form!
Non-solution #2: Rollback derivation
So what I really wanted was a Nix function (which I will call
“truncate
”) that would take any git
repository
and roll it back in time to the last commit before some repeating time
boundary (where the time boundary might be, say, an hour, or day, or
week). For simplicity, let’s just say that the desired time interval is
one day so I want to roll back the repository to the last revision from
the day before.
If I had such a truncate
function then it would be easy
to automatically select which revision to use for the full build. I
would:
extract the source
git
repository from the current Haskell package buildtruncate
thatgit
repository to the last revision from the day beforeUse that “truncated” revision as the source for the full build
Use that full build as the input to the current (incremental) build
Then if I built multiple revisions for the same day they would all share the same full build since they would all get “truncated” to the same revision from the previous day.
However, there isn’t a great way to implement this
truncate
function in Nix. To see why, consider the
following (wrong) solution:
extract the source
git
repository from the current Haskell package buildLet’s call the derivation for this
git
repository “src
”create a new Nix derivation (“
src2
”) that rolls backsrc
In other words, this would be a trivial Nix derivation that begins from
src
and runs something like:$ git checkout $(git rev-list -1 --before '1 day ago' HEAD)
… and stores that as the result
Use
src2
as the input to the full build
Do you see the problem with that approach?
The above wrong solution doesn’t allow multiple incremental builds
from the same day to share the same full build from the prior day. This
is because src2
depends on src
and since each
incremental build has a different src
repository then each
also have a different src2
derivation and therefore a
different full build. That in turn defeats the purpose of incremental
builds if we have to do a new full rebuild for each incremental
build.
For this to work we would need a way to roll back a git
repository to an older revision that less sensitive to the current
revision.
Non-solution #3: Plain
fetchGit
The builtins.fetchGit
utility almost does what
we want! This primitive function lets you fetch a git
repository at evaluation time, like this:
nix-repl> builtins.fetchGit { url = ~/proj/turtle; revision = "837f52d2101368bc075d382774460a717904d2ab"; }
{ lastModified = 1655501878; lastModifiedDate = "20220617213758"; narHash = "sha256-Ic4N2gzm0hYsPCynkzETJv7lpAWO1KM+FO+r3ov60y0="; outPath = "/nix/store/ygznanxv6rmbxw5gkgk7axfxazhsa93z-source"; rev = "837f52d2101368bc075d382774460a717904d2ab"; revCount = 566; shortRev = "837f52d"; submodules = false; }
The above result is the same no matter what revision I currently have
checked out at ~/proj/turtle
because Nix’s
fetchGit
function produces a content-addressed derivation.
In other words, if two invocations of fetchGit
generate the
same final repository state then they share the same
outPath
. This is exactly the behavior we want: we need the
source repository for the full build to be content-addressed so that
multiple incremental builds can share the same full build.
However, the problem is that I don’t exactly know which
revision I want. What I really want to be able to say is “get
me the last revision from the day before this other revision”.
fetchGit
does not expose any way to do something like
that.
That brings us to the actual solution:
Solution
The solution I went with was the following two pull requests:
Add optional
date
argument tobuiltins.fetchGit
This amends
builtins.fetchGit
to allow a date specification, which can either be a relative date (e.g.1 day ago
) or an absolute date (e.g.2020-01-01T00:00:00
or a Unix timestamp like1671388622
). Basically, this argument accepts anythinggit
accepts as a date specification (which is a lot sincegit
is pretty flexible in this regard).The cool thing about this change is that it doesn’t compromise the purity of
builtins.fetchGit
. If a givenfetchGit
specification was pure then adding a date specification preserves that purity.Add
haskell.lib.incremental
utilityThis pull request actually does two separate things:
This polishes and upstreams the prototype support for incremental builds
In other words, this upstreams Harry and Jade’s work to split a Haskell build into two builds: a full build and incremental build
This uses the
fetchGit
patch to automate the full build selectionThere’s a new
pkgs.haskell.lib.incremental
utility which usesbuiltins.fetchGit
to automatically update the full build for you and it has all the desired behaviors (including branch-local incrementalism).
I could have split this into two separate pull request (and I still might) but for internal testing purposes it was easier to do everything on one branch. I’m waiting for a decision on the other pull request before deciding whether or not to split up this branch.
Demo
I’ll use my turtle
package as the running example for
the demo. If you clone the gabriella/incremental
branch of
my turtle
repository:
$ git clone --branch gabriella/incremental \
https://github.com/Gabriella439/turtle.git$ cd turtle
… you’ll find the following default.nix
file making use
of the Nixpkgs support for incremental Haskell builds:
{ interval ? 24 * 60 * 60 }:
let
nixpkgs = builtins.fetchTarball {
url = "https://github.com/MercuryTechnologies/nixpkgs/archive/696e0820b03e8ea7ad6a9ba21a00a79c91efc580.tar.gz";
sha256 = "1k3swii3absl154154lmk6zjw11vzzqx8skaiw1250armgfyv9v8";
};
# We need GHC 9.4 or newer for this feature to work
compiler ="ghc94";
overlay = self: super: {
haskell = super.haskell // {
packages = super.haskell.packages // {
"${compiler}" =
super.haskell.packages."${compiler}".override (old: {
overrides =
self.lib.fold
self.lib.composeExtensions
(old.overrides or (_: _: { }))
[ (self.haskell.lib.packageSourceOverrides {
turtle = ./.;
})
(hself: hsuper: {
turtle-incremental =
self.haskell.lib.compose.incremental
{ inherit interval;
makePreviousBuild =
truncate: (import (truncate ./.) { }).turtle;
}
hsuper.turtle;
})
];
});
};
};
};
pkgs = import nixpkgs { config = { }; overlays = [ overlay ]; };
in
{ inherit (pkgs.haskell.packages."${compiler}")
turtle
turtle-incremental
;
}
However, that alone is not enough to make use of incremental builds. If you attempt to build that (at the time of this writing) you’ll get an error message like this:
$ nix build --file ./default.nix turtle-incremental
error: evaluation aborted with the following error message:
'pkgs.haskell.lib.incremental requires Nix version 2.12.0pre20221128_32c182b or
newer'
(use '--show-trace' to show detailed location information)
The Nixpkgs support for incremental builds depends on a matching change to the Nix interpreter, so you actually have to run:
$ nix run github:Gabriella439/nix/gabriella/fetchGit -- \
--file ./default.nix turtle-incremental build
… or if you don’t yet have flakes enabled, then use this pedantically complete command:
$ nix --option extra-experimental-features 'nix-command flakes' \
-- \
run github:Gabriella439/nix/gabriella/fetchGit --file ./default.nix turtle-incremental build
… and that will definitely work.
Once the build is complete you can inspect the logs and you should
see something like the following buildPhase
:
$ nix log ./result
…
@nix { "action": "setPhase", "phase": "buildPhase" }
building
Preprocessing library for turtle-1.6.1..
Building library for turtle-1.6.1..
Preprocessing test suite 'regression-broken-pipe' for turtle-1.6.1..
Building test suite 'regression-broken-pipe' for turtle-1.6.1..
[2 of 2] Linking dist/build/regression-broken-pipe/regression-broken-pipe [Libr>
Preprocessing test suite 'regression-masking-exception' for turtle-1.6.1..
Building test suite 'regression-masking-exception' for turtle-1.6.1..
[2 of 2] Linking dist/build/regression-masking-exception/regression-masking-exc>
Preprocessing test suite 'tests' for turtle-1.6.1..
Building test suite 'tests' for turtle-1.6.1..
[2 of 2] Linking dist/build/tests/tests [Library changed]
Preprocessing test suite 'system-filepath-tests' for turtle-1.6.1..
Building test suite 'system-filepath-tests' for turtle-1.6.1..
[2 of 2] Linking dist/build/system-filepath-tests/system-filepath-tests [Librar>
Preprocessing test suite 'cptree' for turtle-1.6.1..
Building test suite 'cptree' for turtle-1.6.1..
[2 of 2] Linking dist/build/cptree/cptree [Library changed]
…
This is shows that the incremental builds are indeed working. We still have to re-link some executables (for reasons that are still not clear to me), but none of the Haskell modules needed to be rebuilt since nothing has changed (yet) since the last rebuild.
Now let’s test that by making a small whitespace change to one of the
Turtle
modules:
$ echo >> src/Turtle/Prelude.hs
Then if we rebuild the package we’ll see the following build phase:
$ nix --option extra-experimental-features 'nix-command flakes' \
-- \
run github:Gabriella439/nix/gabriella/fetchGit --file ./default.nix --print-build-logs build
…
turtle> building
turtle> Preprocessing library for turtle-1.6.1..
turtle> Building library for turtle-1.6.1..
turtle> [ 7 of 10] Compiling Turtle.Prelude ( src/Turtle/Prelude.hs, dist/build/Turtle/Prelude.o, dist/build/Turtle/Prelude.dyn_o ) [Source file changed]
turtle> src/Turtle/Prelude.hs:319:1: warning: [-Wunused-imports]
turtle> The import of ‘Data.Monoid’ is redundant
turtle> except perhaps to import instances from ‘Data.Monoid’
turtle> To import instances alone, use: import Data.Monoid()
turtle> |
turtle> 319 | import Data.Monoid ((<>))
turtle> | ^^^^^^^^^^^^^^^^^^^^^^^^^
turtle> Preprocessing test suite 'regression-broken-pipe' for turtle-1.6.1..
turtle> Building test suite 'regression-broken-pipe' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/regression-broken-pipe/regression-broken-pipe [Library changed]
turtle> Preprocessing test suite 'regression-masking-exception' for turtle-1.6.1..
turtle> Building test suite 'regression-masking-exception' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/regression-masking-exception/regression-masking-exception [Library changed]
turtle> Preprocessing test suite 'tests' for turtle-1.6.1..
turtle> Building test suite 'tests' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/tests/tests [Library changed]
turtle> Preprocessing test suite 'system-filepath-tests' for turtle-1.6.1..
turtle> Building test suite 'system-filepath-tests' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/system-filepath-tests/system-filepath-tests [Library changed]
turtle> Preprocessing test suite 'cptree' for turtle-1.6.1..
turtle> Building test suite 'cptree' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/cptree/cptree [Library changed]
…
Our package only built the “diff” (the Turtle.Prelude
module we just changed)!
Benchmarks
For the turtle
package the speed-up is not a huge deal
because the package doesn’t take long time to compile, but the benefit
for our main project at work is dramatic!
As I mentioned in the introduction, our work project normally takes
~14 minutes to build and after this change builds can be as fast as ~3.5
minutes. In fact, they could even be faster except for the presence of a
Paths_*
module that is rebuilt each time and triggers a
large number of gratuitous downstream rebuilds (we’re working on fixing
that).
Limitations
There is one major issue with this work, which is that it does not work well with flakes.
Specifically, if you try to turn the above default.nix
into the equivalent flake the build will fail because Nix’s flake
mechanism will copy the project into the /nix/store
but
without the .git
history, so builtins.fetchGit
will fail to to fetch the current repository’s history necessary to
truncate the build to the previous day.
I believe this can be fixed with a change to flakes to support
something like a ?shallow=false
or
?allRefs=true
addendum to git
URLs, but I have
not implemented that, yet.
Nice!
ReplyDeleteHave you considered using an import-from-derivation derivation to calculate the desired commit (via `git rev-list -1 --before '1 day ago' HEAD`), and then use that with fetchGit?
Essentially your Non-Solution #2, but using import-from-derivation to break the unwanted dependency on the dirty `.git` directoy?
Yes. I tried that and left out that solution, but there were a couple of reasons I went the "patch fetchGit route":
Delete- You would have to disable the sandbox for the solution based on import-from-derivation
- fetchGit has a few niceties that I didn't want to reinvent (like the git-aware caching logic and better support for private repositories)
Where is the Paths_* work happening? And what's the approach to fixing it that you have in mind? (It also seems to be an issue with ghc-nix.)
ReplyDeleteIt ended up being a red herring for us. Even after fixing our code to not depend on a `Paths_*` module the problem still persisted. We're still narrowing down the problem to a minimal reproduction, but all we know is that at least one necessary component of the problem is having test modules that use either the `TemplateHaskell` or `TemplateHaskellQuotes` language extensions, because those will get rebuilt even if there are no changes to the library stanza. However, there seem to be other hidden requirements to trigger this issue because I can't reproduce it on smaller packages like `turtle`
DeleteInteresting. I've been experiencing (with ghc-nix) is full recompilation of the test suite on any change. I initially thought it was a preprocessor issue (hspec-discover), though removing that alone didn't fix things. It didn't occur to me that it might be QQ/TH. It's a bit strange that we wouldn't see this when just using plain cabal though.
DeleteI came back to this today, and think I found the reason. Cabal calls GHC twice on the test suite, as described starting here: https://github.com/haskell/cabal/issues/1177#issuecomment-22097657 (all the way to the end). For me at least, it seems this is the problem, since each call is isolated from the other, and so forces an entire rebuild.
Delete