Monday, November 27, 2017

Compare Nix derivations using nix-diff

I'm announcing a small nix-diff utility I wrote for comparing Nix derivations. This post will walk through two use cases for how you might use this utility.

Background

This section provides some required background for understanding this post if you're new to Nix.

There are three stages to a Nix build:

  • Nix source code (i.e. *.nix files)
    • This corresponds to a source distribution in a typical package manager
  • Nix derivations (i.e. /nix/store/*.drv files)
    • This is the stage that caching works at
  • Nix build products (i.e. /nix/store/* files that are not derivations)
    • This corresponds to a binary distribution in a typical package manager

You can convert between these stages using the following command-line tools:

  • nix-instantiate converts Nix source code to Nix derivations
    • i.e. *.nix → /nix/store/*.drv
  • nix-store --realise converts Nix derivations to Nix build products
    • i.e. /nix/store/*.drv → /nix/store/*
  • nix-build is a convenience utility which combines the two preceding steps to go straight from source code to build products
    • i.e. *.nix → /nix/store/*

Nix supports caching binary build products so if you try to build the same derivation twice then the second build will reuse the result of the first build (i.e. a "cache hit"). If the derivation changes in any way, you get a "cache miss" and you need to build the derivation.

Carefully note that caching works at the level of Nix derivations and not at the level of Nix source code. For example, the following two Nix files differ at the source code level:

$ cat example0.nix 
let
  pkgs = import <nixpkgs> { };

in
  pkgs.hello

$ cat example1.nix 
(import <nixpkgs> { }).hello

... but they produce the exact same derivation file:

$ nix-instantiate example0.nix 
/nix/store/ajypjz54a8rn1qxsnhyr8m87w6hd7ghp-hello-2.10.drv

$ nix-instantiate example1.nix 
/nix/store/ajypjz54a8rn1qxsnhyr8m87w6hd7ghp-hello-2.10.drv

... which means that if you try to build both example0.nix and example1.nix the build will only occur once since they share the same derivation.

You can think of the derivation file as a language-independent description of how to build something:

$ fold /nix/store/ajypjz54a8rn1qxsnhyr8m87w6hd7ghp-hello-2.10.drv 
Derive([("out","/nix/store/1ijp0xy3s6idns5047lxky6nlj4lrcap-hello-2.10","","")],
[("/nix/store/3ma3q2qf60gvsqs4w0k2krcyikr1pvhf-bash-4.4-p12.drv",["out"]),("/nix
/store/8g65wh5ng9dc68mz07wlznzg4f2zqhlh-stdenv-darwin.drv",["out"]),("/nix/store
/gqwk1j05s2zfykfj4y9k15gs4zl0lynr-hello-2.10.tar.gz.drv",["out"])],["/nix/store/
9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh","/nix/store/z347hsajryw593h
802ggb63lbr3gpv2b-standard-sandbox.sb"],"x86_64-darwin","/nix/store/axikcsz4wh2q
pi5zmlfsmm4jx8wm8s1g-bash-4.4-p12/bin/bash",["-e","/nix/store/9krlzvny65gdc8s7kp
b6lkx8cd02c25b-default-builder.sh"],[("__impureHostDeps","/System/Library/Framew
orks/CoreFoundation.framework/CoreFoundation /dev/zero /dev/random /dev/urandom 
/bin/sh"),("__propagatedImpureHostDeps",""),("__propagatedSandboxProfile",""),("
__sandboxProfile","(allow file-read* (literal \"/usr/lib/libncurses.5.4.dylib\")
)\n(import \"/nix/store/z347hsajryw593h802ggb63lbr3gpv2b-standard-sandbox.sb\")\
n"),("buildInputs",""),("builder","/nix/store/axikcsz4wh2qpi5zmlfsmm4jx8wm8s1g-b
ash-4.4-p12/bin/bash"),("configureFlags",""),("doCheck","1"),("name","hello-2.10
"),("nativeBuildInputs",""),("out","/nix/store/1ijp0xy3s6idns5047lxky6nlj4lrcap-
hello-2.10"),("propagatedBuildInputs",""),("propagatedNativeBuildInputs",""),("s
rc","/nix/store/3x7dwzq014bblazs7kq20p9hyzz0qh8g-hello-2.10.tar.gz"),("stdenv","
/nix/store/dl508ngmyfglplp338np4lnx98prwsbd-stdenv-darwin"),("system","x86_64-da
rwin")])

These *.drv files use the ATerm file format and are Nix-independent. Conceptually, Nix is just a domain-specific language for generating these ATerm files. That means, for example, that you could replace Nix with any front-end language or tool that can generate these ATerm files. In fact, this is how Guix works, by replacing Nix with Guile Scheme as the front-end language.

Understanding how Nix derivations work is fundamental to understanding the Nix ecosystem. nix-diff is one tool that aids this learning process as the following sections will illustrate.

Cache misses

nix-diff is a tool that I wish I had back when Awake Security first adopted Nix. We frequently ran into cache misses when using Nix because of subtle differences in Nix derivations in different development environments.

We can understand why we got cache misses by referring back to the three stages of a Nix build:

  • Nix source code (i.e. *.nix files)
  • Nix derivations (i.e. /nix/store/*.drv files)
  • Nix build products (i.e. /nix/store/* files that are not derivations)

For production we prefer to distribute Nix build products (i.e. binary distributions), but internally for development we distribute Nix source code. We prefer Nix code internally because this gives developers complete control over all of their transitive dependencies. For example, a developer can easily patch the systemd executable used on the virtual machine that runs their integration tests.

However, this flexibility comes at a price: if you don't know what you are doing you can easily accidentally change the derivation. This is because Nix and Nixpkgs are customizable to a fault and they have all sorts of "impure" defaults that change depending on the development environment. If you trip over one of these pitfalls you end up with a cache miss, which is a poor user experience.

The most common pitfalls we ran into early on in our Nix adoption were:

  • Not pinning nixpkgs
    • Note: We publicly shared our recipe for pinning nixpkgs here
  • Not pinning the system field for a derivation
    • This field defaults to the impure builtins.currentSystem in many cases
  • Impure surprises in nixpkgs

Let's motivate this with a real example. Suppose that I have the following derivation to build the Glasgow Haskell compiler (ghc):

$ cat example0.nix
let
  pkgs = import <nixpkgs> { };

in
  pkgs.ghc

This Nix expression is "impure" because the expression depends on the ambient nixpkgs channel that the user has installed. Compare this to the following expression which pins nixpkgs to a specific revision protected by a hash:

$ cat example1.nix
let
  # https://nixos.wiki/wiki/How_to_fetch_Nixpkgs_with_an_empty_NIX_PATH
  fetchNixpkgs = import ./fetchNixpkgs.nix;

  nixpkgs = fetchNixpkgs {
     rev = "76d649b59484607901f0c1b8f737d8376a904019";
     sha256 = "01c2f4mj4ahir0sxk9kxbymg2pki1pc9a3y6r9x6ridry75fzb8h";
  };

  pkgs = import nixpkgs { };

in
  pkgs.ghc

Let's instantiate the two expressions to compute their derivations:

$ nix-instantiate example0.nix 
/nix/store/9shbgc70h32f99nasdd6f8fd7cf9c645-ghc-8.0.2.drv
$ nix-instantiate example1.nix 
/nix/store/fx0xn9djgvvw3h5jdmwybg0ga5qk844d-ghc-8.0.2.drv

Note that you may get a different result for the first derivation depending on what version of the nixpkgs channel you have installed.

Visually comparing the two derivation files is tedious and time-consuming:

$ fold /nix/store/9shbgc70h32f99nasdd6f8fd7cf9c645-ghc-8.0.2.drv | head
Derive([("doc","/nix/store/x3hcyy01kb980yiirjjb3svzrdb0pqdy-ghc-8.0.2-doc","",""
),("man","/nix/store/l1ws9nypjg4xh8jj47dapx71cmgfb97a-ghc-8.0.2-man","",""),("ou
t","/nix/store/76b5ryd9wsc0iimlfz6f4n8kgawf8cli-ghc-8.0.2","","")],[("/nix/store
/1ncnhkd9r4k3wmlwbymccfhlqp3bk2cp-python2.7-Sphinx-1.6.5.drv",["out"]),("/nix/st
ore/2zdlq3dj3mk91ccya7k9z6d5i7lag912-clang-wrapper-4.0.1.drv",["out"]),("/nix/st
ore/3ma3q2qf60gvsqs4w0k2krcyikr1pvhf-bash-4.4-p12.drv",["out"]),("/nix/store/5mp
3qjkbzvmi4yvin1dbfdr1bkzgq9dl-perl-5.24.3.drv",["out"]),("/nix/store/8g65wh5ng9d
c68mz07wlznzg4f2zqhlh-stdenv-darwin.drv",["out"]),("/nix/store/9z3ykw788f50yhi4f
nn3s1ldyyg5s99x-ncurses-5.9.drv",["dev","out"]),("/nix/store/hw59y7rf8w28s123b51
ac57kbd0azjvh-coreutils-8.28.drv",["out"]),("/nix/store/km0zhgg5ykpnwnrczinggxs5

$ fold /nix/store/fx0xn9djgvvw3h5jdmwybg0ga5qk844d-ghc-8.0.2.drv | head
Derive([("doc","/nix/store/qlg3a9923hbcb1vhhaka90c33vrfgbrv-ghc-8.0.2-doc","",""
),("out","/nix/store/69spfrh96hc6y3hcb7w4i0l6s25pslkd-ghc-8.0.2","","")],[("/nix
/store/0ci2jv8sygw63hyl48ac6caw7fn3jrd7-ncurses-5.9.drv",["dev","out"]),("/nix/s
tore/1ksvs625n8lwjhjxld446gn9ql23v5k8-bash-4.4-p5.drv",["out"]),("/nix/store/dqj
rkys7d0c2z4ggny27a0vzpbzvz8y2-ghc-8.0.2-src.tar.xz.drv",["out"]),("/nix/store/dw
srl4iqnc3ij79h2xfn8fl3xnnk2zrg-gmp-6.1.1.drv",["dev","out"]),("/nix/store/gk2ng3
j3ixx6diq5s4xmysj670k62lly-perl-5.22.3.drv",["out"]),("/nix/store/i00ja8b4y0yv9b
aj7qd0caj6az0c8phj-ghc-7.10.3.drv",["out"]),("/nix/store/k82idwsbgby27nkjrwr9bhq
64c95irgf-coreutils-8.26.drv",["out"]),("/nix/store/nmkqpzlahvmpsnn0s5knc6wspy6b
305l-stdenv-darwin.drv",["out"]),("/nix/store/qv0cpl2g4bk5nn5l2hx5fyc2dw6xdjc9-c

If we use nix-diff, then we can pull out the differences immediately:

$ nix-diff /nix/store/fx0xn9djgvvw3h5jdmwybg0ga5qk844d-ghc-8.0.2.drv /nix/store/9shbgc70h32f99nasdd6f8fd7cf9c645-ghc-8.0.2.drv 
- /nix/store/fx0xn9djgvvw3h5jdmwybg0ga5qk844d-ghc-8.0.2.drv:{out}
+ /nix/store/9shbgc70h32f99nasdd6f8fd7cf9c645-ghc-8.0.2.drv:{out}The set of outputs do not match:
    + {man}The builders do not match
    - /nix/store/hsk82g493i7r496ghs0y61m6yvknxcml-bash-4.4-p5/bin/bash
    + /nix/store/axikcsz4wh2qpi5zmlfsmm4jx8wm8s1g-bash-4.4-p12/bin/bash
• The set of input names do not match:
    - bash-4.4-p5
    - clang-wrapper-3.7.1
    - coreutils-8.26
    - gmp-6.1.1
    - perl-5.22.3
    - python2.7-Sphinx-1.5.2
    + bash-4.4-p12
    + clang-wrapper-4.0.1
    + coreutils-8.28
    + gmp-6.1.2
    + perl-5.24.3
    + python2.7-Sphinx-1.6.5

Now we can see at a glance that the versions of several dependencies changed and GHC has split out its man pages into a new man output for better granularity of the build graph.

Note that these are not the only differences between the two derivations. However, all of the other differences are downstream of the above differences. For example, the two derivations have different out paths, but we expect them to differ for any two derivations that are not identical so there's no point including that in the diff. nix-diff makes an effort to highlight the root cause of the difference.

Understanding differences

Nix is more than just a package manager. You can use Nix to build and deploy an entire machine, which is how NixOS (the Nix operating system) works. The machine configuration is a Nix expression that you can instantiate and build like any other Nix expression.

This means that we can also use nix-diff to compare two machine configurations and understand how they differ. For example, when we change our production systems at Awake Security we sometimes run the change through nix-diff during code review to ensure that reviewers understand every change being made to the system.

We can illustrate this with a small example comparing two NixOS system specifications. The first system specification is a mostly blank system:

$ cat example0.nix
let
  nixos = import <nixpkgs/nixos> {
    system = "x86_64-linux";

    configuration = {
      boot.loader.grub.devices = [ "/dev/sda" ];

      fileSystems."/" = {
        device = "/dev/sda";
      };
    };
  };

in
  nixos.system

... and the second specification enables Kafka on the system:

$ cat example1.nix
let
  nixos = import <nixpkgs/nixos> {
    system = "x86_64-linux";

    configuration = {
      boot.loader.grub.devices = [ "/dev/sda" ];

      fileSystems."/" = {
        device = "/dev/sda";
      };

      services.apache-kafka.enable = true;
    };
  };

in
  nixos.system

We can differentiate the two derivations in one step like this:

$ nix-diff $(nix-instantiate example0.nix) $(nix-instantiate example1.nix)
- /nix/store/6z9nr5pzs4j1v9mld517dmlcz61zy78z-nixos-system-nixos-18.03pre119245.
5cfd049a03.drv:{out}
+ /nix/store/k05ibijg0kknvwrgfyb7dxwjrs8qrlbj-nixos-system-nixos-18.03pre119245.
5cfd049a03.drv:{out}The input named `etc` differs
  - /nix/store/05c0v10pla0v8rfl44rs744m6wr729jy-etc.drv:{out}
  + /nix/store/8waqvzjg7bazzfzr49m89q299kz972wv-etc.drv:{out}The input named `dbus-1` differs
    - /nix/store/a16j2snzz25dhh96jriv3p6cgkc0vhxr-dbus-1.drv:{out}
    + /nix/store/mliabzdkqaayya67xiwfhwkg4gs9k0cg-dbus-1.drv:{out}The input named `system-path` differs
      - /nix/store/jcf6q7na01j8k9xcmqxykl62k4x6zwiv-system-path.drv:{out}
      + /nix/store/kh4kgsms24d02bxlrxb062pgsbs3riws-system-path.drv:{out}The set of input names do not match:
          + apache-kafka-2.12-0.10.2.0
  • The input named `system-path` differs
    • These two derivations have already been compared
  • The input named `system-units` differs
    - /nix/store/yqnqdajd4664rvycrnwxwaj0mxp7602c-system-units.drv:{out}
    + /nix/store/2p5c4arwqphdz5wsvz6dbrgv0vhgf5qh-system-units.drv:{out}The set of input names do not match:
        + unit-apache-kafka.service
  • The input named `user-units` differs
    - /nix/store/x34dqw5y34dq6fj5brj2b5qf0nvglql9-user-units.drv:{out}
    + /nix/store/4iplnk260q2dpr8b8ajrjkrn44yk06aq-user-units.drv:{out}The input named `unit-dbus.service` differs
      - /nix/store/fd6j972zn1hfvqslxc8c64xxaf1wg475-unit-dbus.service.drv:{out}
      + /nix/store/s7rpgwbald9qx8rwlw4v276wj2x3ld8r-unit-dbus.service.drv:{out}The input named `dbus-1` differs
        • These two derivations have already been compared
• The input named `system-path` differs
  • These two derivations have already been compared
• The input named `users-groups.json` differs
  - /nix/store/x6c7pqx40wfdzwf96jfi1l0hzxjgypri-users-groups.json.drv:{out}
  + /nix/store/gk5yyjw579hgyxgwbrh1kzb3hbdbzgbq-users-groups.json.drv:{out}The environments do not match:
      text=''
          {"groups":[{"gid":55,"members":[],"name":"adm"},{"gid":17,"members":[]
,"name":"audio"},{"gid":24,"members":[],"name":"cdrom"},{"gid":27,"members":[],"
name":"dialout"},{"gid":6,"members":[],"name":"disk"},{"gid":18,"members":[],"na
me":"floppy"},{"gid":174,"members":[],"name":"input"},{"gid":96,"members":[],"na
me":"keys"},{"gid":2,"members":[],"name":"kmem"},{"gid":20,"members":[],"name":"
lp"},{"gid":4,"members":[],"name":"messagebus"},{"gid":30000,"members":["nixbld1
","nixbld10","nixbld11","nixbld12","nixbld13","nixbld14","nixbld15","nixbld16","
nixbld17","nixbld18","nixbld19","nixbld2","nixbld20","nixbld21","nixbld22","nixb
ld23","nixbld24","nixbld25","nixbld26","nixbld27","nixbld28","nixbld29","nixbld3
","nixbld30","nixbld31","nixbld32","nixbld4","nixbld5","nixbld6","nixbld7","nixb
ld8","nixbld9"],"name":"nixbld"},{"gid":65534,"members":[],"name":"nogroup"},{"g
id":0,"members":[],"name":"root"},{"gid":62,"members":[],"name":"systemd-journal
"},{"gid":110,"members":[],"name":"systemd-journal-gateway"},{"gid":152,"members
":[],"name":"systemd-network"},{"gid":153,"members":[],"name":"systemd-resolve"}
,{"gid":154,"members":[],"name":"systemd-timesync"},{"gid":25,"members":[],"name
":"tape"},{"gid":3,"members":[],"name":"tty"},{"gid":100,"members":[],"name":"us
ers"},{"gid":29,"members":[],"name":"utmp"},{"gid":19,"members":[],"name":"uucp"
},{"gid":26,"members":[],"name":"video"},{"gid":1,"members":[],"name":"wheel"}],
"mutableUsers":true,"users":[{"createHome":false,"description":"→Apache Kafka 
daemon user","group":"nogroup","hashedPassword":null,"home":"/tmp/kafka-logs","i
nitialHashedPassword":null,"initialPassword":null,"isSystemUser":false,"name":"a
pache-kafka","password":null,"passwordFile":null,"shell":"/run/current-system/sw
/bin/nologin","uid":169},{"createHome":false,"description":"→D-Bus system mess
...

However, this doesn't do the diff justice because the output is actually colorized, like this:

From the diff we can see that:

  • This change adds Kafka executables to the system PATH
  • This change adds a new apache-kafka systemd service
  • This change adds a new apache-kafka user to the system

Note how nix-diff does more than diffing the two root derivations. If the two derivations differ on a shared input then nix-diff will descend into that input and diff that and repeat the process until the root cause of the change is found. This works because Nix's dependency graph is complete and reachable from the root derivation.

Conclusion

You can find the nix-diff utility on Hackage or GitHub if you would like to use this in your own development workflow. Hopefully nix-diff will help you better understand how Nix works under the hood and also help you pin Nix derivations more robustly.

3 comments:

  1. Could you give a link about this ATerm format? For some reason Google doesn't seem to show anything meaningful to me.

    ReplyDelete
  2. I would guess he's talking about https://homepages.cwi.nl/~daybuild/daily-books/technology/aterm-guide/aterm-guide.html

    ReplyDelete
  3. Here's the impure Nixpkgs version logic link at the appropriate commit for the time of posting: https://github.com/NixOS/nixpkgs/blob/78af59354cba5fd84efc571086fcaab746b734c1/lib/trivial.nix#L62

    ReplyDelete