Friday, December 30, 2022

Nixpkgs support for Linux builders running on macOS

macos-builder

I recently upstreamed a derivation for a Linux builder into Nixpkgs that’s easy to deploy on macOS. The significance of this work is that you can now run the following command on macOS:

$ nix run nixpkgs#darwin.builder

… and that will launch a Linux builder that you can delegate builds to. For full details, read the corresponding section of the Nixpkgs manual.

In this post, I wanted to provide some of the background and motivation for this work to help contextualize it.

Background - NixOS qemu VMs on MacOS

I wasn’t originally trying to create a Linux builder for macOS when I began this project. I was actually working on making it as easy as possible to experiment interactively with (non-builder) NixOS qemu VMs on macOS.

While searching for prior art related to this I stumbled across the following Nixpkgs issue requesting exactly this same feature: Allowing NixOS VM’s to be run on macOS.

Even better, by the time I discovered that issue several people had already done most of the work, culminating in the following repository demonstrating how all of the features were supposed to fit together: YorikSar/nixos-vm-on-macos.

In fact, the flake for that repository also came with a binary cache, so if you just ran:

$ nix run github:YorikSar/nixos-vm-on-macos

… then you could run the sample NixOS VM from that repository on macOS without requiring access to an Linux builder because it would download all the Linux build products from the matching cache. Pretty neat!

However, this still didn’t completely satisfy my use case for reasons already noted by someone else: it doesn’t work well if you want to run a NixOS VM that differs even slightly from the included sample VM. Any difference requires Linux build products to be rebuilt which requires access to a Linux builder because those build products will not be cached ahead of time.

Background - linuxkit-nix

The need for a Linux builder wasn’t a showstopper for me because there was already prior art for bootstrapping a Linux builder on macOS, which was the linuxkit-nix project. So what I could have done was:

  • Launch a (non-NixOS) linuxkit VM on macOS for use as a Linux builder
  • Use the linuxkit builder to build the desired NixOS qemu VM
  • Run that NixOS qemu VM on macOS

However, I was curious if I could use a NixOS VM for the first step, too! In other words:

  • Launch a cached NixOS qemu VM on macOS for use as a Linux builder
  • Use the qemu builder to build the desired (non-builder) NixOS qemu VM
  • Run that NixOS qemu VM on macOS

The only difference between the two approaches is the first step: instead of using linuxkit to create the Linux builder we use qemu to create a NixOS builder. This works because the qemu builder’s NixOS configuration doesn’t need to change so hydra.nixos.org can build and cache the NixOS qemu builder ahead of time.

There were a few reasons I took interest in this approach:

  • linuxkit-nix appears to not work on aarch64-darwin (i.e. Apple Silicon)

    This seems like it is potentially fixable, but I wasn’t yet ready to volunteer to do that work.

  • It’s easier to customize a NixOS builder

    linuxkit-nix doesn’t use NixOS for the builder and instead creates a bespoke builder for this purpose. This means that you can’t use the NixOS module system to more easily customize the behavior of the builder.

  • The qemu-based solution is simpler than linuxkit-nix

    I think the easiest way to explain this is for me to link to the macos-builder.nix NixOS module, which has the entirety of the code that I contributed, which is significantly simpler than linuxkit-nix.

    The main reason that the qemu-based solution is simpler than linuxkit-nix is because it is reusing more infrastructure that has already been upstreamed into Nixpkgs (most notably, NixOS and qemu VMs).

  • linuxkit-nix appears to be unmaintained

    There was a nascent attempt to upstream linuxkit-nix into Nixpkgs, but that stalled because it seems like linuxkit-nix appears to have been abandoned a couple of years ago.

    I could have restored that effort, but personally I was fine with using the simpler qemu-based approach. I haven’t given up on the idea of reviving linuxkit-nix, but it’s not on my immediate roadmap.

There is one notable downside to using qemu over linuxkit, which is that qemu is supposed to be slower than linuxkit

Note: I have not actually verified this claim since I can’t run linuxkit-nix on my M1 Mac, but this is purportedly the reason that the authors of linuxkit-nix did not opt to use qemu for their approach according to this PR description.

qemu performance hasn’t been an issue for me (yet), but that could change, especially if I try to make use of this at work, where performance could potentially matter more.

Motivation

As I mentioned above, the long-term goal for all of this is to run NixOS VMs on macOS. There are two main reasons I’m interested in this:

  • I’m working on a NixOS book

    … and I wanted macOS users to be able to test-drive example NixOS configurations on their local machine without requiring them to own and operate a separate Linux machine.

  • I’m interested in running NixOS tests on macOS

    … primarily for work-related reasons. At work developers have to install postgres on their development machines for integration testing, and it would be much nicer if we could restructure our integration tests as NixOS tests (which run inside of qemu VMs instead of running on the host).

    However, at the time of this writing this would still require additional work which is in progress on this draft pull request.

Monday, December 19, 2022

Nixpkgs support for incremental Haskell builds

incremental

The context for this post is that at work I recently implemented Nix ecosystem support for “incrementally” building Haskell packages. By “incrementally” I mean that these Nix builds only need to build what changed since the last full build of the package so that the package doesn’t need to be built from scratch every time.

The pull requests implementing this feature have not yet been approved or merged at the time of this writing, but I figured that I would explain the motivation, design, results, and limitations of this work to hopefully persuade people that this work should be merged.

If you’re not interested in the design then you can skip straight to the Demo section below.

Background

I work on Mercury’s Backend Development User Experience team and we support developers contributing to a large Haskell monolith consisting of 3000+ modules. That may seem like a lot but the vast majority of these modules are small and the whole codebase takes ~14 minutes to compile in CI if we disable optimizations (although we still build with optimizations enabled for deployment).

In my experience, that’s pretty good for a Haskell project of this size, thanks not only to the work of our team but also other teams who also contribute to improving the development experience. In fact, the pioneering work for this “incremental builds” feature actually originated from two engineers outside our team.

First, Harry Garrood improved GHC’s change detection algorithm so that GHC would use the hash of the file to detect changes instead of using the timestamp. In this post he explains how you can make use of this to implement incremental builds for traditional CI services (e.g. GitHub actions) where each build reuses the intermediate build products from the prior build instead of building from scratch.

That alone would not be enough for us to use this at work since we use Nix where this sort of build impurity doesn’t fly. However, Harry and Jade Lovelace prototyped using this feature in Nixpkgs so that Nix builds of Haskell packages could also reuse intermediate build products from prior builds to save work. You can find their prototype here.

The basic idea behind the prototype Nixpkgs integration is that you split a Haskell package build into two separate builds:

  • A “full build” that builds the Haskell package from scratch

    This full build exports its intermediate build products (i.e. the dist directory) which can then be reused by:

  • An “incremental build” that only builds what changed since the full build

    This incremental build imports the intermediate build products from the corresponding full build so that it doesn’t have to build the package from scratch.

So you might wonder: if that was already implemented then what work still remained for me to do?

Problem

The main issue with the initial Nixpkgs integration is that it does not provide any support for selecting which Git revision to use as the basis for the full build. The existing solutions require some out-of-band process to automatically select and lock the appropriate git revision to use for the older (full) build.

Non-solution #0: Rolling rebuilds

The first non-solution is for each revision to always reuse the build products from the previous revision. This doesn’t work well with Nix because it would create an increasingly-long chain of dependent derivations; in order to build the most recent revision you’d have to build all preceding revisions.

The dilemma here is that Nix is forcing us to confront something that other build tools gloss over: if you’re always reusing build products from the last build then you can’t accurately reproduce the most recent build from scratch without reproducing all prior builds. You’ve essentially “contaminated” the current build with all prior builds by doing things in this way.

So what we really want is something more like this:

Periodically do a full build from scratch and then make each incremental build relative to the last full rebuild.

That’s much more compatible with Nix because then we only need to do two builds of our project if we rebuild things from scratch, instead of one build for every revision in our project’s history.

There’s also another issue with rolling rebuilds when you’re not using Nix, which is that most naïve attempts to do this don’t ensure that the starting build products came from the parent commit. You can end up with contamination of build products across branches if you’re not careful, which further complicates reproducibility.

Non-solution #1: Lockfile

Okay, so suppose you periodically do a full build of the project from scratch and then each incremental build is relative to the last full build. You would need to do a full rebuild frequently enough so that the incremental builds stay quick. If you wait too long in between full rebuilds then the project will evolve to the point where the incremental builds can no longer reuse most of the build products from the last full build and in the extreme case the incremental builds degenerate into full builds if they can’t reuse any old build products.

For example, at our work we currently do a full build of our large package once a day, so we need some way to update the full build to point to the last revision from the preceding day.

One existing approach to solving this involved using Nix flakes to manage the git revision for the older build. The idea is that you periodically run nix flake update to update the revision used for the full build and you might even automate this process by having some recurring cron job generate a pull request or commit to bump this revision on the main development branch. You don’t have to use flakes for this purpose, but flakes are probably the most ergonomic solution along these lines.

However, there are a few issues with this approach:

  • It only works well for short-lived pull requests

    In other words, if you update the revision used for the full build once a day then typically only pull requests that are less than a day old will benefit from incremental builds.

    Specifically, what we’d really like is “branch-local” incremental builds. In other words if a longer-lived development branch were to deposit a few commits a day we’d like there to be a full rebuild once a day on that branch so that incremental builds against the tip of that development branch remain snappy.

  • It pollutes the git history

    If you bump the lockfile, say, once per day then that’s one junk commit that you’ve added to your git history every day.

  • It’s difficult to open source any useful automation around this

    If the solution requires out-of-band machinery (e.g. some recurring cron job) to bump the lockfile you can’t provide a great user experience for open source projects. It only really works well for proprietary projects that can tolerate that complexity.

That last point was the most important one for me. Generally, when I design something (even something intended for internal, proprietary use) I try to design it in such a way that it works well in an open source context, too. In my experience, doing things in this way tends to improve the design, quality, and user experience of software that I build.

In particular, I wanted a solution where all the automation could be implemented entirely within the Nix language. However, this is not possible in Nix’s present form!

Non-solution #2: Rollback derivation

So what I really wanted was a Nix function (which I will call “truncate”) that would take any git repository and roll it back in time to the last commit before some repeating time boundary (where the time boundary might be, say, an hour, or day, or week). For simplicity, let’s just say that the desired time interval is one day so I want to roll back the repository to the last revision from the day before.

If I had such a truncate function then it would be easy to automatically select which revision to use for the full build. I would:

  • extract the source git repository from the current Haskell package build

  • truncate that git repository to the last revision from the day before

  • Use that “truncated” revision as the source for the full build

  • Use that full build as the input to the current (incremental) build

Then if I built multiple revisions for the same day they would all share the same full build since they would all get “truncated” to the same revision from the previous day.

However, there isn’t a great way to implement this truncate function in Nix. To see why, consider the following (wrong) solution:

  • extract the source git repository from the current Haskell package build

    Let’s call the derivation for this git repository “src

  • create a new Nix derivation (“src2”) that rolls back src

    In other words, this would be a trivial Nix derivation that begins from src and runs something like:

    $ git checkout $(git rev-list -1 --before '1 day ago' HEAD)

    … and stores that as the result

  • Use src2 as the input to the full build

Do you see the problem with that approach?

The above wrong solution doesn’t allow multiple incremental builds from the same day to share the same full build from the prior day. This is because src2 depends on src and since each incremental build has a different src repository then each also have a different src2 derivation and therefore a different full build. That in turn defeats the purpose of incremental builds if we have to do a new full rebuild for each incremental build.

For this to work we would need a way to roll back a git repository to an older revision that less sensitive to the current revision.

Non-solution #3: Plain fetchGit

The builtins.fetchGit utility almost does what we want! This primitive function lets you fetch a git repository at evaluation time, like this:

nix-repl> builtins.fetchGit { url = ~/proj/turtle; revision = "837f52d2101368bc075d382774460a717904d2ab"; }
{ lastModified = 1655501878; lastModifiedDate = "20220617213758"; narHash = "sha256-Ic4N2gzm0hYsPCynkzETJv7lpAWO1KM+FO+r3ov60y0="; outPath = "/nix/store/ygznanxv6rmbxw5gkgk7axfxazhsa93z-source"; rev = "837f52d2101368bc075d382774460a717904d2ab"; revCount = 566; shortRev = "837f52d"; submodules = false; }

The above result is the same no matter what revision I currently have checked out at ~/proj/turtle because Nix’s fetchGit function produces a content-addressed derivation. In other words, if two invocations of fetchGit generate the same final repository state then they share the same outPath. This is exactly the behavior we want: we need the source repository for the full build to be content-addressed so that multiple incremental builds can share the same full build.

However, the problem is that I don’t exactly know which revision I want. What I really want to be able to say is “get me the last revision from the day before this other revision”. fetchGit does not expose any way to do something like that.

That brings us to the actual solution:

Solution

The solution I went with was the following two pull requests:

  • Add optional date argument to builtins.fetchGit

    This amends builtins.fetchGit to allow a date specification, which can either be a relative date (e.g. 1 day ago) or an absolute date (e.g. 2020-01-01T00:00:00 or a Unix timestamp like 1671388622). Basically, this argument accepts anything git accepts as a date specification (which is a lot since git is pretty flexible in this regard).

    The cool thing about this change is that it doesn’t compromise the purity of builtins.fetchGit. If a given fetchGit specification was pure then adding a date specification preserves that purity.

  • Add haskell.lib.incremental utility

    This pull request actually does two separate things:

    • This polishes and upstreams the prototype support for incremental builds

      In other words, this upstreams Harry and Jade’s work to split a Haskell build into two builds: a full build and incremental build

    • This uses the fetchGit patch to automate the full build selection

      There’s a new pkgs.haskell.lib.incremental utility which uses builtins.fetchGit to automatically update the full build for you and it has all the desired behaviors (including branch-local incrementalism).

    I could have split this into two separate pull request (and I still might) but for internal testing purposes it was easier to do everything on one branch. I’m waiting for a decision on the other pull request before deciding whether or not to split up this branch.

Demo

I’ll use my turtle package as the running example for the demo. If you clone the gabriella/incremental branch of my turtle repository:

$ git clone --branch gabriella/incremental \
    https://github.com/Gabriella439/turtle.git
$ cd turtle

… you’ll find the following default.nix file making use of the Nixpkgs support for incremental Haskell builds:

{ interval ? 24 * 60 * 60 }:

let
  nixpkgs = builtins.fetchTarball {
    url    = "https://github.com/MercuryTechnologies/nixpkgs/archive/696e0820b03e8ea7ad6a9ba21a00a79c91efc580.tar.gz";
    sha256 = "1k3swii3absl154154lmk6zjw11vzzqx8skaiw1250armgfyv9v8";
  };

  # We need GHC 9.4 or newer for this feature to work
  compiler ="ghc94";

  overlay = self: super: {
    haskell = super.haskell // {
      packages = super.haskell.packages // {
        "${compiler}" =
          super.haskell.packages."${compiler}".override (old: {
            overrides =
              self.lib.fold
                self.lib.composeExtensions
                (old.overrides or (_: _: { }))
                [ (self.haskell.lib.packageSourceOverrides {
                    turtle = ./.;
                  })

                  (hself: hsuper: {
                    turtle-incremental =
                      self.haskell.lib.compose.incremental
                        { inherit interval;

                          makePreviousBuild =
                            truncate: (import (truncate ./.) { }).turtle;
                        }
                        hsuper.turtle;
                  })
                ];
          });
      };
    };
  };

  pkgs = import nixpkgs { config = { }; overlays = [ overlay ]; };

in
  { inherit (pkgs.haskell.packages."${compiler}")
      turtle
      turtle-incremental
    ;
  }

However, that alone is not enough to make use of incremental builds. If you attempt to build that (at the time of this writing) you’ll get an error message like this:

$ nix build --file ./default.nix turtle-incremental
error: evaluation aborted with the following error message:
'pkgs.haskell.lib.incremental requires Nix version 2.12.0pre20221128_32c182b or
newer'
(use '--show-trace' to show detailed location information)

The Nixpkgs support for incremental builds depends on a matching change to the Nix interpreter, so you actually have to run:

$ nix run github:Gabriella439/nix/gabriella/fetchGit -- \
    build --file ./default.nix turtle-incremental

… or if you don’t yet have flakes enabled, then use this pedantically complete command:

$ nix --option extra-experimental-features 'nix-command flakes' \
    run github:Gabriella439/nix/gabriella/fetchGit -- \
    build --file ./default.nix turtle-incremental

… and that will definitely work.

Once the build is complete you can inspect the logs and you should see something like the following buildPhase:

$ nix log ./result
…
@nix { "action": "setPhase", "phase": "buildPhase" }
building
Preprocessing library for turtle-1.6.1..
Building library for turtle-1.6.1..
Preprocessing test suite 'regression-broken-pipe' for turtle-1.6.1..
Building test suite 'regression-broken-pipe' for turtle-1.6.1..
[2 of 2] Linking dist/build/regression-broken-pipe/regression-broken-pipe [Libr>
Preprocessing test suite 'regression-masking-exception' for turtle-1.6.1..
Building test suite 'regression-masking-exception' for turtle-1.6.1..
[2 of 2] Linking dist/build/regression-masking-exception/regression-masking-exc>
Preprocessing test suite 'tests' for turtle-1.6.1..
Building test suite 'tests' for turtle-1.6.1..
[2 of 2] Linking dist/build/tests/tests [Library changed]
Preprocessing test suite 'system-filepath-tests' for turtle-1.6.1..
Building test suite 'system-filepath-tests' for turtle-1.6.1..
[2 of 2] Linking dist/build/system-filepath-tests/system-filepath-tests [Librar>
Preprocessing test suite 'cptree' for turtle-1.6.1..
Building test suite 'cptree' for turtle-1.6.1..
[2 of 2] Linking dist/build/cptree/cptree [Library changed]
…

This is shows that the incremental builds are indeed working. We still have to re-link some executables (for reasons that are still not clear to me), but none of the Haskell modules needed to be rebuilt since nothing has changed (yet) since the last rebuild.

Now let’s test that by making a small whitespace change to one of the Turtle modules:

$ echo >> src/Turtle/Prelude.hs 

Then if we rebuild the package we’ll see the following build phase:

$ nix --option extra-experimental-features 'nix-command flakes' \
    run github:Gabriella439/nix/gabriella/fetchGit -- \
    build --file ./default.nix --print-build-logs
…
turtle> building
turtle> Preprocessing library for turtle-1.6.1..
turtle> Building library for turtle-1.6.1..
turtle> [ 7 of 10] Compiling Turtle.Prelude   ( src/Turtle/Prelude.hs, dist/build/Turtle/Prelude.o, dist/build/Turtle/Prelude.dyn_o ) [Source file changed]
turtle> src/Turtle/Prelude.hs:319:1: warning: [-Wunused-imports]
turtle>     The import of ‘Data.Monoid’ is redundant
turtle>       except perhaps to import instances from ‘Data.Monoid’
turtle>     To import instances alone, use: import Data.Monoid()
turtle>     |
turtle> 319 | import Data.Monoid ((<>))
turtle>     | ^^^^^^^^^^^^^^^^^^^^^^^^^
turtle> Preprocessing test suite 'regression-broken-pipe' for turtle-1.6.1..
turtle> Building test suite 'regression-broken-pipe' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/regression-broken-pipe/regression-broken-pipe [Library changed]
turtle> Preprocessing test suite 'regression-masking-exception' for turtle-1.6.1..
turtle> Building test suite 'regression-masking-exception' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/regression-masking-exception/regression-masking-exception [Library changed]
turtle> Preprocessing test suite 'tests' for turtle-1.6.1..
turtle> Building test suite 'tests' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/tests/tests [Library changed]
turtle> Preprocessing test suite 'system-filepath-tests' for turtle-1.6.1..
turtle> Building test suite 'system-filepath-tests' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/system-filepath-tests/system-filepath-tests [Library changed]
turtle> Preprocessing test suite 'cptree' for turtle-1.6.1..
turtle> Building test suite 'cptree' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/cptree/cptree [Library changed]
…

Our package only built the “diff” (the Turtle.Prelude module we just changed)!

Benchmarks

For the turtle package the speed-up is not a huge deal because the package doesn’t take long time to compile, but the benefit for our main project at work is dramatic!

As I mentioned in the introduction, our work project normally takes ~14 minutes to build and after this change builds can be as fast as ~3.5 minutes. In fact, they could even be faster except for the presence of a Paths_* module that is rebuilt each time and triggers a large number of gratuitous downstream rebuilds (we’re working on fixing that).

Limitations

There is one major issue with this work, which is that it does not work well with flakes.

Specifically, if you try to turn the above default.nix into the equivalent flake the build will fail because Nix’s flake mechanism will copy the project into the /nix/store but without the .git history, so builtins.fetchGit will fail to to fetch the current repository’s history necessary to truncate the build to the previous day.

I believe this can be fixed with a change to flakes to support something like a ?shallow=false or ?allRefs=true addendum to git URLs, but I have not implemented that, yet.