Monday, March 6, 2023

The "open source native" principle for software design

The "open source native" principle for software design

This post summarizes a software design principle I call the “open source native” principle which I’ve invoked a few times as a technical lead. I wanted to write this down so that I could easily reference this post in the future.

The “open source native” principle is simple to state:

Design proprietary software as if you intended to open source that software, regardless of whether you will open source that software

I call this the “open source native” principle because you design your software as if it were a “native” member of the open source ecosystem. In other words, your software is spiritually “born” open source, aspirationally written from the beginning to be a good open source citizen, even if you never actually end up open sourcing that software.

You can’t always adhere to this principle, but I still use this as a general design guideline.


It’s hard to give a detailed example of this principle since most of the examples I’d like to use are … well … proprietary and wouldn’t make sense outside of their respective organizations. However, I’ll try to outline a hypothetical example (inspired by a true story) that hopefully enough can people can relate to.

Suppose that your organization provides a product with a domain-specific programming language for customizing their product’s behavior. Furthermore, suppose that you’re asked to design and implement a package manager for this programming language.

There are multiple data stores you could use for storing packages, but to simplify this example suppose there are only two options:

  • Store packages in a product-specific database

    Perhaps your product already uses a database for other reasons, so you figure that you can reuse that existing database for storing packages. That way you don’t need to set up any new infrastructure to get going since the database team will handle that for you. Plus you get the full powerful of a relational database so now you have powerful tools for querying and/or modifying packages.

  • Store packages in git

    You might instead store your packages as flat files inside of a git repository.

These represent two extremes of the spectrum and in reality there might be other options in between (like a standalone sqlite database), but this is a contrived example.

According to the open source principle, you’d prefer to store packages in git because git is a foundational building block of the open source ecosystem that is already battle-tested for this purpose. You’d be sacrificing some features (you’d no longer have access to the full power of a relational database), but your package manager would now be more “open-source native”.

You might wonder: why would one deliberately constrain themselves like that? What’s the benefit of designing things in this way if they might never be open sourced?


There are several reasons I espouse this design principle:

  • better testability

    If you design your component so that it’s easy to use outside of the context of your product then it’s also easier to test in isolation. This means that you don’t need to rely on heavyweight integration tests or end-to-end tests to verify that your component works correctly.

    For example, a package manager based on git is easier to test than a package manager based on a database because a git repository is easier to set up.

  • faster release cadence

    If your component can be tested in isolation then you don’t even need to share continuous integration (CI) with the rest of your organization. Your component can have its own CI and release on whatever frequency is appropriate for that component instead of coupling its release cadence to the rest of your product.

    That in turn typically means that you can release earlier and more often, which is a virtue in its own right.

    Continuing the package manager example, you wouldn’t need to couple releases of your package manager to the release cadence of the rest of your product, so you’d be able to push out improvements or fixes more quickly.

  • simpler documentation

    It’s much easier to write a tutorial for software that delivers value in isolation since there’s less supporting infrastructure necessary to follow along with the tutorial.

  • well-chosen interfaces

    You have to carefully think through the correct logical boundaries for your software when you design for a broader audience of users. It’s also easier to enforce stronger boundaries and narrower scope for the same reasons.

    For example, our hypothetical package manager is less likely to have package metadata polluted with product-specific details if it is designed to operate independently of the product.

  • improved stability

    Open source software doesn’t just target a broader audience, but also targets a broader time horizon. An open source mindset promotes thinking beyond the needs of this financial quarter.

  • you can open source your component! (duh)

    Needless to say, if you design your component to be open-source native, it’s also easier to open source. Hooray! 🎉


You can think of this design principle as being similar to the rule of least power, where you’re making your software less powerful (by adding the additional constraint that it can be open sourced), but in turn improving ease of comprehension, maintainability, and distribution.

Also, if you have any examples along these lines that you care to share, feel free to drop them in the comments.

Monday, January 30, 2023

terraform-nixos-ng: Modern terraform support for NixOS

terraform-nixos-ng: Modern terraform support for NixOS

Recently I’ve been working on writing a “NixOS in Production” book and one of the chapters I’m writing is on deploying NixOS using terraform. However, one of the issues I ran across was the poor NixOS support for terraform. I’ve already gone through the post explaining how to use the terraform-nixos project but I ran into several issues trying to follow those instructions (which I’ll explain below). That plus the fact that terraform-nixos seems to be unmaintained pushed me over the edge to rewrite the project to simplify and improve upon it.

So this post is announcing my terraform-nixos-ng project:

… which is a rewrite of terraform-nixos and I’ll use this post to compare and contrast the two projects. If you’re only interested in trying out the terraform-nixos-ng project then go straight to the README

Using nixos-rebuild

One of the first things I noticed when kicking the tires on terraform-nixos was that it was essentially reinventing what the nixos-rebuild tool already does. In fact, I was so surprised by this that I wrote a standalone post explaining how to use nixos-rebuild as a deployment tool:

Simplifying that code using nixos-rebuild fixed lots of tiny papercuts I had with terraform-nixos, like:

  • The deploy failing if you don’t have a new enough version of bash installed

  • The inability to turn off the use of the --use-substitutes flag

    That flag causes issues if you want to deploy to a machine that disables outbound connections.

  • The dearth of useful options (compared to nixos-rebuild)

    … including the inability to fully customize ssh options

  • The poor interop with flakes

    For example, terraform-nixos doesn’t respect the standard nixosConfigurations flake output hierarchy.

    Also, terraform-nixos doesn’t use flakes natively (it uses flake-compat), which breaks handling of the config.nix.binary{Caches,CachePublicKeys} flakes settings. The Nix UX for flakes is supposed to ask the user to consent to those settings (because they are potentially insecure to auto-enable for a flake), but their workaround breaks that UX by automatically enabling those settings without the user’s consent.

I wanted to upstream this rewrite to use nixos-rebuild into terraform-nixos, but I gave up on that idea when I saw that no pull request since 2021 had been merged, including conservative pull requests like this one to just use the script included within the repository to update the list of available AMIs.

That brings me to the next improvement, which is:

Auto-generating available AMIs

The terraform-nixos repository requires the AMI list to be manually updated. The way you do this is to periodically run a script to fetch the available AMIs from Nixpkgs and then create a PR to vendor those changes. However, this shouldn’t be necessary because we could easily program terraform to generate the list of AMIs on the fly.

This is what the terraform-nixos-ng project does, where the ami module creates a data source that runs an equivalent script to fetch the AMIs at provisioning time.

In the course of rewriting the AMI module, I made another small improvement, which was:

Support for aarch64 AMIs

Another gripe I had with terraform-nixos-ng is that its AMI module doesn’t support aarch64-linux NixOS AMIs even though these AMIs exist and Nixpkgs supports them. That was a small and easy fix, too.

Functionality regressions

terraform-nixos-ng is not a strict improvement over terraform-nixos, though. Specifically, the most notable feature omissions are:

  • Support for non-flake workflows

    terraform-nixos-ng requires the use of flakes and doesn’t provide support for non-flake-based workflows. I’m very much on team “Nix flakes are good and shouldn’t be treated as experimental any longer” so I made an opinionated choice to require users to use flakes rather than support their absence.

    This choice also isn’t completely aesthetic, the use of flakes improves interop with nixos-rebuild, where flakes are the most ergonomic way for nixos-rebuild to select from one of many deployments.

  • Support for secrets management

    I felt that this should be handled by something like sops-nix rather than rolling yet another secrets management system that was idiosyncratic to this deploy tool. In general, I wanted these terraform modules to be as lightweight as possible by making more idiomatic use of the modern NixOS ecosystem.

  • Support for Google Compute Engine images

    terraform-nixos supports GCE images and the only reason I didn’t add the same support is because I’ve never used Google Compute Engine so I didn’t have enough context to do a good rewrite, nor did I have the inclination to set up a GCE account just to test the rewrite. However, I’d accept a pull request adding this support from someone interested in this feature.


There’s one last improvement over the terraform-nixos project, which is that I don’t leave projects in an abandoned state. Anybody who has contributed to my open source projects knows that I’m generous about handing out the commit bit and I’m also good about relinquishing control if I don’t have time to maintain the project myself.

However, I don’t expect this to be a difficult project to maintain anyway because I designed terraform-nixos-ng to outsource the work to existing tools as much as possible instead of reinventing the wheel. This is why the implementation of terraform-nixos-ng is significantly smaller than terraform-nixos.

Monday, January 23, 2023

Announcing nixos-rebuild: a "new" deployment tool for NixOS

Announcing nixos-rebuild: a "new" deployment tool for NixOS

The title of this post is tongue-in-cheek; nixos-rebuild is a tool that has been around for a long time and there’s nothing new about it. However, I believe that not enough people know how capable this tool is for building and deploying remote NixOS systems. In other words, nixos-rebuild is actually a decent alternative to tools like morph or colmena.

Part of the reason why nixos-rebuild flies under the radar is because it’s more commonly used for upgrading the current NixOS system, rather than deploying a remote NixOS system. However, it’s actually fairly capable of managing another NixOS system.

In fact, your local system (that initiates the deploy) doesn’t have to be a NixOS system or even a Linux system. An even lesser known fact is that you can initiate deploys from macOS using nixos-rebuild. In other words, nixos-rebuild is a cross-platform deploy tool!

The trick

I’ll give a concrete example. Suppose that I have the following NixOS configuration (for a blank EC2 machine) saved in configuration.nix:

{ modulesPath, ... }:

{ imports = [ "${modulesPath}/virtualisation/amazon-image.nix" ];

  system.stateVersion = "22.11";

… which I’ve wrapped in the following flake (since I like Nix flakes):

{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/22.11";

  outputs = { nixpkgs, ... }: {
    nixosConfigurations.default = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";

      modules = [ ./configuration.nix ];

Further suppose that I have an x86_64-linux machine on EC2 accessible via ssh at I can deploy that configuration to the remote machine like this:

$ nix shell nixpkgs#nixos-rebuild
$ nixos-rebuild switch --fast --flake .#default \
    --target-host \

… and that will build and deploy the remote machine even if your current machine is a completely different platform (e.g. macOS).

Why this works

The --fast flag is the first adjustment that makes the above command work on systems other NixOS. Without that flag nixos-rebuild will attempt to build itself for the target platform and run that new executable with the same arguments, which will fail if the target platform differs from your current platform.

The --build-host flag is also necessary if the source and target platform don’t match. This instructs nixos-rebuild to build on the target machine so that the deploy is insensitive to your current machine’s platform.

The final thing that makes this work is that Nixpkgs makes the nixos-rebuild script available on all platforms, despite the script living underneath the pkgs/os-specific/linux directory in Nixpkgs.


There’s a reason why I suggest using flakes alongside nixos-rebuild: with flakes you can specify multiple NixOS machines within the same file (just like we can other NixOS deployment tools). That means that we can do something like this:

{ inputs.nixpkgs.url = "github:NixOS/nixpkgs/22.11";

  outputs = { nixpkgs, ... }: {
    nixosConfigurations = {
      machine1 = nixpkgs.lib.nixosSystem { … };

      machine2 = nixpkgs.lib.nixosSystem { … };


… and then we can select which system to build with the desired flake URI (e.g. .#machine1 or .#machine2 in the above example).

Moreover, by virtue of using flakes we can obtain our NixOS configuration from somewhere other than the current working directory. For example, you can specify a flake URI like github:${OWNER}/${REPO}#${ATTRIBUTE} to deploy a NixOS configuration hosted on GitHub without having to locally clone the repository. Pretty neat!


I’m not the first person to suggest this trick. In fact, while researching prior art I stumbled across this comment from Luke Clifton proposing the same idea of using nixos-rebuild as a deploy tool. However, other than that stray comment I couldn’t find any other mentions of this so I figured it was worth formalizing this trick in a blog post that people could more easily share.

This post supersedes a prior post of mine where I explained how to deploy a NixOS system using more low-level idioms (e.g. nix build, nix copy). Now that nixos-rebuild supports both flakes and remote systems there’s no real reason to do it the low-level way.

Edit: An earlier version of this post suggested using _NIXOS_REBUILD_REEXEC=1 to prevent nixos-rebuild for building itself for the target platform but then Naïm Favier pointed out that you can use the --fast flag instead, which has the same effect.

Friday, December 30, 2022

Nixpkgs support for Linux builders running on macOS


I recently upstreamed a derivation for a Linux builder into Nixpkgs that’s easy to deploy on macOS. The significance of this work is that you can now run the following command on macOS:

$ nix run nixpkgs#darwin.builder

… and that will launch a Linux builder that you can delegate builds to. For full details, read the corresponding section of the Nixpkgs manual.

In this post, I wanted to provide some of the background and motivation for this work to help contextualize it.

Background - NixOS qemu VMs on MacOS

I wasn’t originally trying to create a Linux builder for macOS when I began this project. I was actually working on making it as easy as possible to experiment interactively with (non-builder) NixOS qemu VMs on macOS.

While searching for prior art related to this I stumbled across the following Nixpkgs issue requesting exactly this same feature: Allowing NixOS VM’s to be run on macOS.

Even better, by the time I discovered that issue several people had already done most of the work, culminating in the following repository demonstrating how all of the features were supposed to fit together: YorikSar/nixos-vm-on-macos.

In fact, the flake for that repository also came with a binary cache, so if you just ran:

$ nix run github:YorikSar/nixos-vm-on-macos

… then you could run the sample NixOS VM from that repository on macOS without requiring access to an Linux builder because it would download all the Linux build products from the matching cache. Pretty neat!

However, this still didn’t completely satisfy my use case for reasons already noted by someone else: it doesn’t work well if you want to run a NixOS VM that differs even slightly from the included sample VM. Any difference requires Linux build products to be rebuilt which requires access to a Linux builder because those build products will not be cached ahead of time.

Background - linuxkit-nix

The need for a Linux builder wasn’t a showstopper for me because there was already prior art for bootstrapping a Linux builder on macOS, which was the linuxkit-nix project. So what I could have done was:

  • Launch a (non-NixOS) linuxkit VM on macOS for use as a Linux builder
  • Use the linuxkit builder to build the desired NixOS qemu VM
  • Run that NixOS qemu VM on macOS

However, I was curious if I could use a NixOS VM for the first step, too! In other words:

  • Launch a cached NixOS qemu VM on macOS for use as a Linux builder
  • Use the qemu builder to build the desired (non-builder) NixOS qemu VM
  • Run that NixOS qemu VM on macOS

The only difference between the two approaches is the first step: instead of using linuxkit to create the Linux builder we use qemu to create a NixOS builder. This works because the qemu builder’s NixOS configuration doesn’t need to change so can build and cache the NixOS qemu builder ahead of time.

There were a few reasons I took interest in this approach:

  • linuxkit-nix appears to not work on aarch64-darwin (i.e. Apple Silicon)

    This seems like it is potentially fixable, but I wasn’t yet ready to volunteer to do that work.

  • It’s easier to customize a NixOS builder

    linuxkit-nix doesn’t use NixOS for the builder and instead creates a bespoke builder for this purpose. This means that you can’t use the NixOS module system to more easily customize the behavior of the builder.

  • The qemu-based solution is simpler than linuxkit-nix

    I think the easiest way to explain this is for me to link to the macos-builder.nix NixOS module, which has the entirety of the code that I contributed, which is significantly simpler than linuxkit-nix.

    The main reason that the qemu-based solution is simpler than linuxkit-nix is because it is reusing more infrastructure that has already been upstreamed into Nixpkgs (most notably, NixOS and qemu VMs).

  • linuxkit-nix appears to be unmaintained

    There was a nascent attempt to upstream linuxkit-nix into Nixpkgs, but that stalled because it seems like linuxkit-nix appears to have been abandoned a couple of years ago.

    I could have restored that effort, but personally I was fine with using the simpler qemu-based approach. I haven’t given up on the idea of reviving linuxkit-nix, but it’s not on my immediate roadmap.

There is one notable downside to using qemu over linuxkit, which is that qemu is supposed to be slower than linuxkit

Note: I have not actually verified this claim since I can’t run linuxkit-nix on my M1 Mac, but this is purportedly the reason that the authors of linuxkit-nix did not opt to use qemu for their approach according to this PR description.

qemu performance hasn’t been an issue for me (yet), but that could change, especially if I try to make use of this at work, where performance could potentially matter more.


As I mentioned above, the long-term goal for all of this is to run NixOS VMs on macOS. There are two main reasons I’m interested in this:

  • I’m working on a NixOS book

    … and I wanted macOS users to be able to test-drive example NixOS configurations on their local machine without requiring them to own and operate a separate Linux machine.

  • I’m interested in running NixOS tests on macOS

    … primarily for work-related reasons. At work developers have to install postgres on their development machines for integration testing, and it would be much nicer if we could restructure our integration tests as NixOS tests (which run inside of qemu VMs instead of running on the host).

    However, at the time of this writing this would still require additional work which is in progress on this draft pull request.

Monday, December 19, 2022

Nixpkgs support for incremental Haskell builds


The context for this post is that at work I recently implemented Nix ecosystem support for “incrementally” building Haskell packages. By “incrementally” I mean that these Nix builds only need to build what changed since the last full build of the package so that the package doesn’t need to be built from scratch every time.

The pull requests implementing this feature have not yet been approved or merged at the time of this writing, but I figured that I would explain the motivation, design, results, and limitations of this work to hopefully persuade people that this work should be merged.

If you’re not interested in the design then you can skip straight to the Demo section below.


I work on Mercury’s Backend Development User Experience team and we support developers contributing to a large Haskell monolith consisting of 3000+ modules. That may seem like a lot but the vast majority of these modules are small and the whole codebase takes ~14 minutes to compile in CI if we disable optimizations (although we still build with optimizations enabled for deployment).

In my experience, that’s pretty good for a Haskell project of this size, thanks not only to the work of our team but also other teams who also contribute to improving the development experience. In fact, the pioneering work for this “incremental builds” feature actually originated from two engineers outside our team.

First, Harry Garrood improved GHC’s change detection algorithm so that GHC would use the hash of the file to detect changes instead of using the timestamp. In this post he explains how you can make use of this to implement incremental builds for traditional CI services (e.g. GitHub actions) where each build reuses the intermediate build products from the prior build instead of building from scratch.

That alone would not be enough for us to use this at work since we use Nix where this sort of build impurity doesn’t fly. However, Harry and Jade Lovelace prototyped using this feature in Nixpkgs so that Nix builds of Haskell packages could also reuse intermediate build products from prior builds to save work. You can find their prototype here.

The basic idea behind the prototype Nixpkgs integration is that you split a Haskell package build into two separate builds:

  • A “full build” that builds the Haskell package from scratch

    This full build exports its intermediate build products (i.e. the dist directory) which can then be reused by:

  • An “incremental build” that only builds what changed since the full build

    This incremental build imports the intermediate build products from the corresponding full build so that it doesn’t have to build the package from scratch.

So you might wonder: if that was already implemented then what work still remained for me to do?


The main issue with the initial Nixpkgs integration is that it does not provide any support for selecting which Git revision to use as the basis for the full build. The existing solutions require some out-of-band process to automatically select and lock the appropriate git revision to use for the older (full) build.

Non-solution #0: Rolling rebuilds

The first non-solution is for each revision to always reuse the build products from the previous revision. This doesn’t work well with Nix because it would create an increasingly-long chain of dependent derivations; in order to build the most recent revision you’d have to build all preceding revisions.

The dilemma here is that Nix is forcing us to confront something that other build tools gloss over: if you’re always reusing build products from the last build then you can’t accurately reproduce the most recent build from scratch without reproducing all prior builds. You’ve essentially “contaminated” the current build with all prior builds by doing things in this way.

So what we really want is something more like this:

Periodically do a full build from scratch and then make each incremental build relative to the last full rebuild.

That’s much more compatible with Nix because then we only need to do two builds of our project if we rebuild things from scratch, instead of one build for every revision in our project’s history.

There’s also another issue with rolling rebuilds when you’re not using Nix, which is that most naïve attempts to do this don’t ensure that the starting build products came from the parent commit. You can end up with contamination of build products across branches if you’re not careful, which further complicates reproducibility.

Non-solution #1: Lockfile

Okay, so suppose you periodically do a full build of the project from scratch and then each incremental build is relative to the last full build. You would need to do a full rebuild frequently enough so that the incremental builds stay quick. If you wait too long in between full rebuilds then the project will evolve to the point where the incremental builds can no longer reuse most of the build products from the last full build and in the extreme case the incremental builds degenerate into full builds if they can’t reuse any old build products.

For example, at our work we currently do a full build of our large package once a day, so we need some way to update the full build to point to the last revision from the preceding day.

One existing approach to solving this involved using Nix flakes to manage the git revision for the older build. The idea is that you periodically run nix flake update to update the revision used for the full build and you might even automate this process by having some recurring cron job generate a pull request or commit to bump this revision on the main development branch. You don’t have to use flakes for this purpose, but flakes are probably the most ergonomic solution along these lines.

However, there are a few issues with this approach:

  • It only works well for short-lived pull requests

    In other words, if you update the revision used for the full build once a day then typically only pull requests that are less than a day old will benefit from incremental builds.

    Specifically, what we’d really like is “branch-local” incremental builds. In other words if a longer-lived development branch were to deposit a few commits a day we’d like there to be a full rebuild once a day on that branch so that incremental builds against the tip of that development branch remain snappy.

  • It pollutes the git history

    If you bump the lockfile, say, once per day then that’s one junk commit that you’ve added to your git history every day.

  • It’s difficult to open source any useful automation around this

    If the solution requires out-of-band machinery (e.g. some recurring cron job) to bump the lockfile you can’t provide a great user experience for open source projects. It only really works well for proprietary projects that can tolerate that complexity.

That last point was the most important one for me. Generally, when I design something (even something intended for internal, proprietary use) I try to design it in such a way that it works well in an open source context, too. In my experience, doing things in this way tends to improve the design, quality, and user experience of software that I build.

In particular, I wanted a solution where all the automation could be implemented entirely within the Nix language. However, this is not possible in Nix’s present form!

Non-solution #2: Rollback derivation

So what I really wanted was a Nix function (which I will call “truncate”) that would take any git repository and roll it back in time to the last commit before some repeating time boundary (where the time boundary might be, say, an hour, or day, or week). For simplicity, let’s just say that the desired time interval is one day so I want to roll back the repository to the last revision from the day before.

If I had such a truncate function then it would be easy to automatically select which revision to use for the full build. I would:

  • extract the source git repository from the current Haskell package build

  • truncate that git repository to the last revision from the day before

  • Use that “truncated” revision as the source for the full build

  • Use that full build as the input to the current (incremental) build

Then if I built multiple revisions for the same day they would all share the same full build since they would all get “truncated” to the same revision from the previous day.

However, there isn’t a great way to implement this truncate function in Nix. To see why, consider the following (wrong) solution:

  • extract the source git repository from the current Haskell package build

    Let’s call the derivation for this git repository “src

  • create a new Nix derivation (“src2”) that rolls back src

    In other words, this would be a trivial Nix derivation that begins from src and runs something like:

    $ git checkout $(git rev-list -1 --before '1 day ago' HEAD)

    … and stores that as the result

  • Use src2 as the input to the full build

Do you see the problem with that approach?

The above wrong solution doesn’t allow multiple incremental builds from the same day to share the same full build from the prior day. This is because src2 depends on src and since each incremental build has a different src repository then each also have a different src2 derivation and therefore a different full build. That in turn defeats the purpose of incremental builds if we have to do a new full rebuild for each incremental build.

For this to work we would need a way to roll back a git repository to an older revision that less sensitive to the current revision.

Non-solution #3: Plain fetchGit

The builtins.fetchGit utility almost does what we want! This primitive function lets you fetch a git repository at evaluation time, like this:

nix-repl> builtins.fetchGit { url = ~/proj/turtle; revision = "837f52d2101368bc075d382774460a717904d2ab"; }
{ lastModified = 1655501878; lastModifiedDate = "20220617213758"; narHash = "sha256-Ic4N2gzm0hYsPCynkzETJv7lpAWO1KM+FO+r3ov60y0="; outPath = "/nix/store/ygznanxv6rmbxw5gkgk7axfxazhsa93z-source"; rev = "837f52d2101368bc075d382774460a717904d2ab"; revCount = 566; shortRev = "837f52d"; submodules = false; }

The above result is the same no matter what revision I currently have checked out at ~/proj/turtle because Nix’s fetchGit function produces a content-addressed derivation. In other words, if two invocations of fetchGit generate the same final repository state then they share the same outPath. This is exactly the behavior we want: we need the source repository for the full build to be content-addressed so that multiple incremental builds can share the same full build.

However, the problem is that I don’t exactly know which revision I want. What I really want to be able to say is “get me the last revision from the day before this other revision”. fetchGit does not expose any way to do something like that.

That brings us to the actual solution:


The solution I went with was the following two pull requests:

  • Add optional date argument to builtins.fetchGit

    This amends builtins.fetchGit to allow a date specification, which can either be a relative date (e.g. 1 day ago) or an absolute date (e.g. 2020-01-01T00:00:00 or a Unix timestamp like 1671388622). Basically, this argument accepts anything git accepts as a date specification (which is a lot since git is pretty flexible in this regard).

    The cool thing about this change is that it doesn’t compromise the purity of builtins.fetchGit. If a given fetchGit specification was pure then adding a date specification preserves that purity.

  • Add haskell.lib.incremental utility

    This pull request actually does two separate things:

    • This polishes and upstreams the prototype support for incremental builds

      In other words, this upstreams Harry and Jade’s work to split a Haskell build into two builds: a full build and incremental build

    • This uses the fetchGit patch to automate the full build selection

      There’s a new pkgs.haskell.lib.incremental utility which uses builtins.fetchGit to automatically update the full build for you and it has all the desired behaviors (including branch-local incrementalism).

    I could have split this into two separate pull request (and I still might) but for internal testing purposes it was easier to do everything on one branch. I’m waiting for a decision on the other pull request before deciding whether or not to split up this branch.


I’ll use my turtle package as the running example for the demo. If you clone the gabriella/incremental branch of my turtle repository:

$ git clone --branch gabriella/incremental \
$ cd turtle

… you’ll find the following default.nix file making use of the Nixpkgs support for incremental Haskell builds:

{ interval ? 24 * 60 * 60 }:

  nixpkgs = builtins.fetchTarball {
    url    = "";
    sha256 = "1k3swii3absl154154lmk6zjw11vzzqx8skaiw1250armgfyv9v8";

  # We need GHC 9.4 or newer for this feature to work
  compiler ="ghc94";

  overlay = self: super: {
    haskell = super.haskell // {
      packages = super.haskell.packages // {
        "${compiler}" =
          super.haskell.packages."${compiler}".override (old: {
            overrides =
                (old.overrides or (_: _: { }))
                [ (self.haskell.lib.packageSourceOverrides {
                    turtle = ./.;

                  (hself: hsuper: {
                    turtle-incremental =
                        { inherit interval;

                          makePreviousBuild =
                            truncate: (import (truncate ./.) { }).turtle;

  pkgs = import nixpkgs { config = { }; overlays = [ overlay ]; };

  { inherit (pkgs.haskell.packages."${compiler}")

However, that alone is not enough to make use of incremental builds. If you attempt to build that (at the time of this writing) you’ll get an error message like this:

$ nix build --file ./default.nix turtle-incremental
error: evaluation aborted with the following error message:
'pkgs.haskell.lib.incremental requires Nix version 2.12.0pre20221128_32c182b or
(use '--show-trace' to show detailed location information)

The Nixpkgs support for incremental builds depends on a matching change to the Nix interpreter, so you actually have to run:

$ nix run github:Gabriella439/nix/gabriella/fetchGit -- \
    build --file ./default.nix turtle-incremental

… or if you don’t yet have flakes enabled, then use this pedantically complete command:

$ nix --option extra-experimental-features 'nix-command flakes' \
    run github:Gabriella439/nix/gabriella/fetchGit -- \
    build --file ./default.nix turtle-incremental

… and that will definitely work.

Once the build is complete you can inspect the logs and you should see something like the following buildPhase:

$ nix log ./result
@nix { "action": "setPhase", "phase": "buildPhase" }
Preprocessing library for turtle-1.6.1..
Building library for turtle-1.6.1..
Preprocessing test suite 'regression-broken-pipe' for turtle-1.6.1..
Building test suite 'regression-broken-pipe' for turtle-1.6.1..
[2 of 2] Linking dist/build/regression-broken-pipe/regression-broken-pipe [Libr>
Preprocessing test suite 'regression-masking-exception' for turtle-1.6.1..
Building test suite 'regression-masking-exception' for turtle-1.6.1..
[2 of 2] Linking dist/build/regression-masking-exception/regression-masking-exc>
Preprocessing test suite 'tests' for turtle-1.6.1..
Building test suite 'tests' for turtle-1.6.1..
[2 of 2] Linking dist/build/tests/tests [Library changed]
Preprocessing test suite 'system-filepath-tests' for turtle-1.6.1..
Building test suite 'system-filepath-tests' for turtle-1.6.1..
[2 of 2] Linking dist/build/system-filepath-tests/system-filepath-tests [Librar>
Preprocessing test suite 'cptree' for turtle-1.6.1..
Building test suite 'cptree' for turtle-1.6.1..
[2 of 2] Linking dist/build/cptree/cptree [Library changed]

This is shows that the incremental builds are indeed working. We still have to re-link some executables (for reasons that are still not clear to me), but none of the Haskell modules needed to be rebuilt since nothing has changed (yet) since the last rebuild.

Now let’s test that by making a small whitespace change to one of the Turtle modules:

$ echo >> src/Turtle/Prelude.hs 

Then if we rebuild the package we’ll see the following build phase:

$ nix --option extra-experimental-features 'nix-command flakes' \
    run github:Gabriella439/nix/gabriella/fetchGit -- \
    build --file ./default.nix --print-build-logs
turtle> building
turtle> Preprocessing library for turtle-1.6.1..
turtle> Building library for turtle-1.6.1..
turtle> [ 7 of 10] Compiling Turtle.Prelude   ( src/Turtle/Prelude.hs, dist/build/Turtle/Prelude.o, dist/build/Turtle/Prelude.dyn_o ) [Source file changed]
turtle> src/Turtle/Prelude.hs:319:1: warning: [-Wunused-imports]
turtle>     The import of ‘Data.Monoid’ is redundant
turtle>       except perhaps to import instances from ‘Data.Monoid’
turtle>     To import instances alone, use: import Data.Monoid()
turtle>     |
turtle> 319 | import Data.Monoid ((<>))
turtle>     | ^^^^^^^^^^^^^^^^^^^^^^^^^
turtle> Preprocessing test suite 'regression-broken-pipe' for turtle-1.6.1..
turtle> Building test suite 'regression-broken-pipe' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/regression-broken-pipe/regression-broken-pipe [Library changed]
turtle> Preprocessing test suite 'regression-masking-exception' for turtle-1.6.1..
turtle> Building test suite 'regression-masking-exception' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/regression-masking-exception/regression-masking-exception [Library changed]
turtle> Preprocessing test suite 'tests' for turtle-1.6.1..
turtle> Building test suite 'tests' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/tests/tests [Library changed]
turtle> Preprocessing test suite 'system-filepath-tests' for turtle-1.6.1..
turtle> Building test suite 'system-filepath-tests' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/system-filepath-tests/system-filepath-tests [Library changed]
turtle> Preprocessing test suite 'cptree' for turtle-1.6.1..
turtle> Building test suite 'cptree' for turtle-1.6.1..
turtle> [2 of 2] Linking dist/build/cptree/cptree [Library changed]

Our package only built the “diff” (the Turtle.Prelude module we just changed)!


For the turtle package the speed-up is not a huge deal because the package doesn’t take long time to compile, but the benefit for our main project at work is dramatic!

As I mentioned in the introduction, our work project normally takes ~14 minutes to build and after this change builds can be as fast as ~3.5 minutes. In fact, they could even be faster except for the presence of a Paths_* module that is rebuilt each time and triggers a large number of gratuitous downstream rebuilds (we’re working on fixing that).


There is one major issue with this work, which is that it does not work well with flakes.

Specifically, if you try to turn the above default.nix into the equivalent flake the build will fail because Nix’s flake mechanism will copy the project into the /nix/store but without the .git history, so builtins.fetchGit will fail to to fetch the current repository’s history necessary to truncate the build to the previous day.

I believe this can be fixed with a change to flakes to support something like a ?shallow=false or ?allRefs=true addendum to git URLs, but I have not implemented that, yet.

Monday, October 24, 2022

How to correctly cache build-time dependencies using Nix


Professional Nix users often create a shared cache of Nix build products so that they can reuse build products created by continuous integration (CI). For example, CI might build Nix products for each main development branch of their project or even for every pull request and it would be nice if those build products could be shared with all developers via a cache.

However, uploading build products to a cache is a little non-trivial if you don’t already know the “best” solution, which is the subject of this post.

The solution described in this post is:

  • Simple

    It only takes a few lines of Bash code because we use the Nix command-line interface idiomatically

  • Efficient

    It is very cheap to compute which build products to upload and requires no additional builds nor an exorbitant amount of disk space

  • Accurate

    It uploads the build products that most people would intuitively want to upload

Note: Throughout this post I will be using the newer Nix command-line interface and flakes, which requires either adding this line to your nix.conf file:

extra-experimental-features = nix-command flakes

… and restarting your Nix daemon (if you have a multi-user Nix installation), or alternatively adding these flags to the beginning of all nix commands throughout this post:

$ nix --option extra-experimental-features 'nix-command flakes'

Wrong solution #0

As a running example, suppose that our CI builds a top-level build product using a command like this:

$ nix build .#example

The naïve way to upload that to the cache would be:

$ nix store sign --key-file "${KEY_FILE}" --recursive .#example

$ nix copy --to s3:// .#example

Note: You will need to generate a KEY_FILE using the nix-store --generate-binary-cache-key command if you haven’t already. For more details, see the following documentation from the manual:

Click to expand to see the documentation
Operation --generate-binary-cache-key
       nix-store --generate-binary-cache-key key-name secret-key-file

       This command generates an Ed25519 key pair (
       that can be used to create a signed binary cache. It takes three
       mandatory parameters:

       1.     A key name, such as, that is used to look up
              keys on the client when it verifies signatures. It can be
              anything, but it’s suggested to use the host name of your cache
              (e.g. with a suffix denoting the number of the
              key (to be incremented every time you need to revoke a key).

       2.     The file name where the secret key is to be stored.

       3.     The file name where the public key is to be stored.

That seems like a perfectly reasonable thing to do, right? However, the problem with that is that it is incomplete, meaning that the cache would still be missing several useful build products that developers would expect to be there.

Specifically, the above command only copies the “run-time” dependencies of our build product whereas most developers expect the cache to also include “build-time” dependencies, and I’ll explain the distinction between the two.

Run-time vs. Build-time

Many paths in the /nix/store are not “valid” in isolation. They typically depend on other paths within the /nix/store.

For example, suppose that I build the GNU hello package, like this:

$ nix build nixpkgs#hello

I can query all of the other paths within the /nix/store that the hello package transitively depends on at run-time using this command:

$ nix-store --query --requisites ./result

… or I can print the same information in tree form like this:

$ nix-store --query --tree ./result

On my macOS machine, it has two run-time dependencies (other than itself) within the /nix/store: libobjc and apple-framework-CoreFoundation-11.0.

Note: there might be other run-time dependencies, because I believe Nixpkgs support for macOS requires some impure system dependencies, but I’m not an expert on this so I could be wrong.

These are called “run-time” dependencies because we cannot run our hello executable without them.

Nix prevents us from getting into situations where a /nix/store path is missing its run-time dependencies. For example, if I were to nix copy the hello build product to any cache, then Nix would perform the following steps, in order:

  • Copy libobjc to the cache

    … since that has no dependencies

  • Copy apple-framework-CoreFoundation to the cache

    … since its libobjc dependency is now satisfied within the cache

  • Copy hello to the cache

    … since its apple-framework-CoreFoundation dependency is now satisfied within the cache

However, Nix also has a separate notion of “build-time” dependencies, which are dependencies that we need to in order to build the hello package.

Note: The reason we’re interested in build-time dependencies for our project is that we want developers to be able to rebuild the project if they make any changes to the source code. If we were to only cache the run-time dependencies of our project that wouldn’t cache the development environment that developers need.

In order to query these dependencies I need to first get the “derivation” (.drv file) for hello:

$ DERIVATION="$(nix path-info --derivation nixpkgs#hello)"

$ declare -p DERIVATION
typeset DERIVATION=/nix/store/4a78f0s4p5h2sbcrrzayl5xas2i7zq1m-hello-2.12.1.drv

You can think of a derivation file as a build recipe that contains instructions for how to build the corresponding build product (the hello package in this case).

I can query the direct dependencies of that derivation using this command:

$ nix-store --query --references "${DERIVATION}"

Many of these dependencies are themselves derivations (.drv files), meaning that they represent other packages that Nix might have to build or fetch from a cache.

Note: the .drv files are actually not the build-time dependencies, but rather the instructions for building them. You can convert any .drv file to the matching product it is supposed to build using the same nix build command, like this:

$ nix build /nix/store/labgzlb16svs1z7z9a6f49b5zi8hb11s-bash-5.1-p16.drv

Does that mean that these build-time dependencies are on our machine if we built nixpkgs#hello? Not necessarily. In fact, in all likelihood the nixpkgs#hello build was cached, meaning that nix build nixpkgs#hello only downloaded hello and its run-time dependencies and no build-time dependencies were required nor installed by Nix.

However, I could in principle force Nix to build the hello package instead of downloading it from a cache, like this:

$ nix build nixpkgs#hello --rebuild

… and that would download the direct build-time dependencies of the hello package in order to rebuild the package.

Wrong solution #1

By this point you might suppose that you have enough information to come up with a better set of /nix/store paths to cache. Your solution might look like this:

  • Get the derivation for the top-level build product

  • Get the direct build-time dependencies of that derivation

  • Build the top-level build product and its direct build-time dependencies

  • Cache the top-level build product and its direct build-time dependencies

In other words, something like this Nix code:

$ DERIVATION="$(nix path-info --derivation "${BUILD}")"

$ DEPENDENCIES=($(nix-store --query --references "${DERIVATION}"))

$ nix build "${BUILD}" "${DEPENDENCIES[@]}"

$ nix store sign --key-file "${KEY_FILE}" --recursive "${BUILD}" "${DEPENDENCIES[@]}"

$ nix copy --to "${CACHE}" "${BUILD}" "${DEPENDENCIES[@]}"

This is better, but still not good enough!

The problem with this solution is that it only works well if your dependencies never change and you only modify your top-level project. If you upgrade or patch any of your direct build-time dependencies then you need to have their build-time dependencies cached so that you can quickly rebuild them.

In fact, going two layers deep is still not enough; in practice you can’t easily anticipate in advance how deep in the build-time dependency tree you might need to patch or upgrade things. For example, you might need to patch or upgrade your compiler, which is really deep in your build-time dependency tree.

Wrong solution #2

Okay, so maybe we can try to build and cache all of our build-time dependencies?

Wrong again. There are way too many of them. You can query them by replacing --references with --requisites and you’ll a giant list of results, even for “small” packages. For example:

$ DERIVATION=$(nix path-info --derivation nixpkgs#hello)

$ nix-store --query --requisites "${DERIVATION}"
 🌺 500+ derivations later 🌺 …
Click to expand and see the full list of build-time dependencies

The above command not only lists the build-time dependencies for the hello package, but also their transitive build-time dependencies. In other words, these are all the derivations needed to build the hello package “from scratch” in the absence of any cache products. We can see the complete tree of build-time dependencies like this:

$ nix-store --query --tree "${DERIVATION}"
   │   ├───/nix/store/3glray2y14jpk1h6i599py7jdn3j2vns-mkdir.drv
   │   ├───/nix/store/50ql5q0raqkcydmpi6wqvnhs9hpdgg5f-cpio.drv
   │   ├───/nix/store/81xahsrhpn9mbaslgi5sz7gsqra747d4-unpack-bootstrap-tools->
   │   ├───/nix/store/>
   │   ├───/nix/store/gxzl4vmccqj89yh7kz62frkxzgdpkxmp-sh.drv
   │   └───/nix/store/pjbpvdy0gais8nc4sj3kwpniq8mgkb42-bzip2.drv
   │   ├───/nix/store/7kcayxwk8khycxw1agmcyfm9vpsqpw4s-bootstrap-tools.drv [..>
   │   ├───/nix/store/nbxwxwqwcr9rrmxb6gb532f18102815x-bootstrap-stage0-stdenv>
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/7kcayxwk8khycxw1agmcyfm9vpsqpw4s-bootstrap-tools.drv>
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/cickvswrvann041nqxb0rxilc46svw1n-prune-libtool-files>
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/
   │   │   ├───/nix/store/kxw6q8v6isaqjm702d71n2421cxamq68-make-symlinks-relat>
   │   │   ├───/nix/store/m54bmrhj6fqz8nds5zcj97w9s9bckc9v-compress-man-pages.>
   │   │   ├───/nix/store/ngg1cv31c8c7bcm2n8ww4g06nq7s4zhm-set-source-date-epo>
   │   │   └───/nix/store/wlwcf1nw2b21m4gghj70hbg1v7x53ld8-reproducible-builds>
   │   ├───/nix/store/i65va14cylqc74y80ksgnrsaixk39mmh-mirrors-list.drv
   │   │   ├───/nix/store/7kcayxwk8khycxw1agmcyfm9vpsqpw4s-bootstrap-tools.drv>
   │   │   ├───/nix/store/nbxwxwqwcr9rrmxb6gb532f18102815x-bootstrap-stage0-st>
   │   │   └───/nix/store/
   │   └───/nix/store/

If we were to build and cache all of these build-time dependencies then our local /nix/store and cache would explode in size. Also, we do not need to do this because there is a better solution …

Correct solution

The solution that provides the best value is to cache all transitive build-time dependencies that are present within the current /nix/store after building the top-level build product. In other words, don’t bother to predict which build-time dependencies we need; instead, empirically infer which ones to cache based on which ones Nix installed and used along the way.

This is not only more accurate, but it’s also more efficient: we don’t need to build or download anything new because we’re only caching things we already locally installed.

As a matter of fact, the nix-store command already supports this use case quite well. If you consult the documentation for the --requisites flag, you’ll find this gem:

       • --requisites; -R
         Prints out the closure (../ of the store path paths.

         This query has one option:

         • --include-outputs Also include the existing output paths of store
           derivations, and their closures.

         This query can be used to implement various kinds of deployment. A
         source deployment is obtained by distributing the closure of a store
         derivation. A binary deployment is obtained by distributing the closure
         of an output path. A cache deployment (combined source/binary
         deployment, including binaries of build-time-only dependencies) is
         obtained by distributing the closure of a store derivation and
         specifying the option --include-outputs.

We’re specifically interested in a “cache deployment”, so we’re going to do exactly what the documentation says and use the --include-outputs flag in conjunction with the --requisites flag. In other words, the --include-outputs flag was expressly created for this use case!

So here is the simplest, but least robust, version of the script for computing the set of build-time dependencies to cache, as a Bash array:

$ # Continue reading before using this code; there's a more robust version later

$ # Optional: Perform the build if you haven't already
$ nix build "${BUILD}"

$ DERIVATION="$(nix path-info --derivation "${BUILD}")"

$ DEPENDENCIES=($(nix-store --query --requisites --include-outputs "${DERIVATION}"))

$ nix store sign --key-file "${KEY_FILE}" --recursive "${DEPENDENCIES[@]}"

$ nix copy --to "${CACHE}" "${DEPENDENCIES[@]}"

The above code is simple and clear enough to illustrate the idea, but we’re going to make a few adjustments to make this code more robust.

Specifically, we’re going to:

  • Change the code to support an array of build targets

    i.e. BUILDS instead of BUILD

  • Use mapfile instead of ($(…)) to create intermediate arrays

    See: SC2207

  • Use xargs to handle command line length limits

… which gives us:

$ # Optional: Perform the build if you haven't already
$ echo "${BUILDS[@]}" | xargs nix build

$ mapfile -t DERIVATIONS < <(echo "${BUILDS[@]}" | xargs nix path-info --derivation)

$ mapfile -t DEPENDENCIES < <(echo "${DERIVATIONS[@]}" | xargs nix-store --query --requisites --include-outputs)

$ echo "${DEPENDENCIES[@]}" | xargs nix store sign --key-file "${KEY_FILE}" --recursive

$ echo "${DEPENDENCIES[@]}" | xargs nix copy --to "${CACHE}"

… where you:

  • replace BUILDS with a Bash array containing what you want to build

    e.g. .#example or nixpkgs#hello

  • replace CACHE with whatever store you use as your cache

    e.g. s3://

  • replace KEY_FILE with the path to your cache signing key


That last script is the pedantically robust way to do this in Bash if you want to be super paranoid. The above script might not work in other shells, but hopefully this post was sufficiently clear that you can adapt the script to your needs.

If I made any mistakes in the above post, let me know and I can fix them.