Thursday, November 19, 2020

How to use NixOS for lightweight integration tests

How to use NixOS for lightweight integration tests

If you use Nix in some capacity then you should check out the NixOS integration test system, which provides an easy way to test services that run inside one or more QEMU virtual machines.

NixOS tests are (in my opinion) way ahead of other integration test systems, but the only way to properly illustrate their value is to walk through a real-world example to highlight their salient features.

The motivating example

This post will motivate NixOS tests by using them to detect an error in the official postgrest tutorial.

You can skim the above tutorial to get a sense of the steps involved, but I’ll also summarize them here:

  • Run postgres in a docker container

  • Download postgrest

  • Set up the database by running these commands:

    create table api.todos (
      id serial primary key,
      done boolean not null default false,
      task text not null,
      due timestamptz
    );
    
    insert into api.todos (task) values
      ('finish tutorial 0'), ('pat self on back');
    
    create role web_anon nologin;
    
    grant usage on schema api to web_anon;
    grant select on api.todos to web_anon;
    
    create role authenticator noinherit login password 'mysecretpassword';
    grant web_anon to authenticator;
  • Save the following configuration to tutorial.conf:

    db-uri = "postgres://authenticator:mysecretpassword@localhost:5433/postgres"
    db-schema = "api"
    db-anon-role = "web_anon"
  • Run ./postgrest tutorial.conf

  • Check that it’s working using:

    $ curl http://localhost:3000/todos

    … which should return:

    [
      {
        "id": 1,
        "done": false,
        "task": "finish tutorial 0",
        "due": null
      },
      {
        "id": 2,
        "done": false,
        "task": "pat self on back",
        "due": null
      }
    ]

These are quite a few manual steps, and if I were a postgrest maintainer then it would be a pain to check that they still work for every new software release. In practice, most maintainers write and check a tutorial once and then never check again unless users report errors. This is a shame, because one of the most important functions of a tutorial is to inspire confidence:

Make sure that your tutorial works

One of your jobs as a tutor is to inspire the beginner’s confidence: in the software, in the tutorial, in the tutor and, of course, in their own ability to achieve what’s being asked of them.

There are many things that contribute to this. A friendly tone helps, as does consistent use of language, and a logical progression through the material. But the single most important thing is that what you ask the beginner to do must work. The learner needs to see that the actions you ask them to take have the effect you say they will have.

If the learner’s actions produce an error or unexpected results, your tutorial has failed - even if it’s not your fault. When your students are there with you, you can rescue them; if they’re reading your documentation on their own you can’t - so you have to prevent that from happening in advance. This is without doubt easier said than done.

Fortunately, we can codify the manual steps from the tutorial into a NixOS configuration for a virtual machine, which is a declarative specification of our system’s desired state:

# ./postgrest-tutorial.nix

let
  # For extra determinism
  nixpkgs =
    builtins.fetchTarball {
      url = "https://github.com/NixOS/nixpkgs/archive/58f9c4c7d3a42c912362ca68577162e38ea8edfb.tar.gz";

      sha256 = "1517dy07jf4zhzknqbgm617lgjxsn7a6k1vgq61c67f6h55qs5ij";
    };

  # Single source of truth for all tutorial constants
  database = "postgres";
  schema   = "api";
  table    = "todos";
  username = "authenticator";
  password = "mysecretpassword";
  webRole  = "web_anon";

  nixos =
    import "${nixpkgs}/nixos" {
      system = "x86_64-linux";

      configuration = { config, pkgs, ... }: {
        # Open the default port for `postgrest` in the firewall
        networking.firewall.allowedTCPPorts = [ 3000 ];

        services.postgresql = {
          enable = true;

          initialScript = pkgs.writeText "initialScript.sql" ''
            create schema ${schema};

            create table ${schema}.${table} (
              id serial primary key,
              done boolean not null default false,
              task text not null,
              due timestamptz
            );

            insert into ${schema}.${table} (task) values
              ('finish tutorial 0'), ('pat self on back');

            create role ${webRole} nologin;

            grant usage on schema ${schema} to ${webRole};
            grant select on ${schema}.${table} to ${webRole};

            create role ${username} noinherit login password '${password}';
            grant ${webRole} to ${username};
          '';
        };

        users = {
          mutableUsers = false;

          users = {
            # For ease of debugging the VM as the `root` user
            root.password = "";

            # Create a system user that matches the database user so that we
            # can use peer authentication.  The tutorial defines a password,
            # but it's not necessary.
            "${username}".isSystemUser = true;
          };
        };

        systemd.services.postgrest = {
          wantedBy = [ "multi-user.target" ];

          after = [ "postgresql.service" ];

          script =
            let
              configuration = pkgs.writeText "tutorial.conf" ''
                db-uri = "postgres://${username}:${password}@localhost:${toString config.services.postgresql.port}/${database}"
                db-schema = "${schema}"
                db-anon-role = "${username}"
              '';

            in
              ''
                ${pkgs.haskellPackages.postgrest}/bin/postgrest ${configuration}
              '';

          serviceConfig.User = username;
        };

        # Uncomment the next line for running QEMU on a non-graphical system
        # virtualisation.graphics = false;
      };
    };

in
  nixos.vm

We can then build and run this tutorial virtual machine by running the following commands:

$ nix build --file ./postgrest-tutorial.nix

$ QEMU_NET_OPTS='hostfwd=tcp::3000-:3000' result/bin/run-nixos-vm

That spins up a VM and prompts us to log in when the VM is ready:

<<< Welcome to NixOS 20.09pre-git (x86_64) - ttyS0 >>>

Run 'nixos-help' for the NixOS manual.

nixos login: 

However, before we log in, we can test if postgrest is working using the same curl command from the tutorial:

$ curl http://localhost:3000/todos
{"hint":null,"details":null,"code":"42501","message":"permission denied for schema api"}

Wait, what? We were supposed to get:

[
  {
    "id": 1,
    "done": false,
    "task": "finish tutorial 0",
    "due": null
  },
  {
    "id": 2,
    "done": false,
    "task": "pat self on back",
    "due": null
  }
]

… but apparently something is wrong with the database’s permissions.

Fortunately, we can log into the VM as the root user with an empty password to test the database permissions. Once we log into the system we can further log into the database as the authenticator user:

<<< Welcome to NixOS 20.09pre-git (x86_64) - ttyS0 >>>

Run 'nixos-help' for the NixOS manual.

nixos login: root<Enter>
Password: <Enter>

[root@nixos:~]# sudo --user authenticator psql postgres
psql (11.9)
Type "help" for help.

postgres=> 

Now we can test to see if the authenticator user is able to access the api.todos table:

postgres=> SELECT * FROM api.todos;
ERROR:  permission denied for schema api
LINE 1: SELECT * FROM api.todos;

Good: we can reproduce the problem, but what might be the cause?

As it turns out, the tutorial instructions appear to not configure the authenticator role correctly. Specifically, the noinherit in the following commands is the reason we can’t directly access the schema api:

create role authenticator noinherit login password 'mysecretpassword';
grant web_anon to authenticator;

The noinherit setting prevents the authenticator user from automatically assuming all permissions associated with the web_anon user. Instead, the authenticator user has to explicitly use the SET ROLE command to assume such permissions, and we can verify that at the database prompt:

postgres=> SET ROLE web_anon;
SET
postgres=> SELECT * FROM api.todos;
 id | done |       task        | due 
----+------+-------------------+-----
  1 | f    | finish tutorial 0 | 
  2 | f    | pat self on back  | 
(2 rows)

Mystery solved! We can test our hypothesis by changing that noinherit to inherit:

create role authenticator inherit login password 'mysecretpassword';
grant web_anon to authenticator;

… then we can restart the VM to check that things now work by:

  • typing CTRL-a c and entering quit

  • running the following commands:

    $ rm nixos.qcow2  # Remove the old VM's disk image so we start fresh
    $ nix build --file ./postgrest-tutorial.nix  # The rest is the same as before
    $ QEMU_NET_OPTS='hostfwd=tcp::3000-:3000' result/bin/run-nixos-vm

… and now the curl example from the tutorial works:

$ curl http://localhost:3000/todos
[{"id":1,"done":false,"task":"finish tutorial 0","due":null}, 
 {"id":2,"done":false,"task":"pat self on back","due":null}]

But wait, there’s more!

Automated testing

We don’t have to manually setup/teardown VMs and run curl commands. We can automate the entire process from end-to-end by using NixOS’s support for automated integration tests.

If we follow the instructions from the NixOS manual, then the automated integration test looks like this:

# ./postgrest-tutorial.nix

let
  # For extra determinism
  nixpkgs =
    builtins.fetchTarball {
      url = "https://github.com/NixOS/nixpkgs/archive/58f9c4c7d3a42c912362ca68577162e38ea8edfb.tar.gz";

      sha256 = "1517dy07jf4zhzknqbgm617lgjxsn7a6k1vgq61c67f6h55qs5ij";
    };

  # Single source of truth for all tutorial constants
  database      = "postgres";
  schema        = "api";
  table         = "todos";
  username      = "authenticator";
  password      = "mysecretpassword";
  webRole       = "web_anon";
  postgrestPort = 3000;

in
  import "${nixpkgs}/nixos/tests/make-test-python.nix" ({ pkgs, ...}: {
    system = "x86_64-linux";

    nodes = {
      server = { config, pkgs, ... }: {
        # Open the default port for `postgrest` in the firewall
        networking.firewall.allowedTCPPorts = [ postgrestPort ];

        services.postgresql = {
          enable = true;

          initialScript = pkgs.writeText "initialScript.sql" ''
            create schema ${schema};

            create table ${schema}.${table} (
              id serial primary key,
              done boolean not null default false,
              task text not null,
              due timestamptz
            );

            insert into ${schema}.${table} (task) values
              ('finish tutorial 0'), ('pat self on back');

            create role ${webRole} nologin;

            grant usage on schema ${schema} to ${webRole};
            grant select on ${schema}.${table} to ${webRole};

            create role ${username} inherit login password '${password}';
            grant ${webRole} to ${username};
          '';
        };

        users = {
          mutableUsers = false;

          users = {
            # For ease of debugging the VM as the `root` user
            root.password = "";

            # Create a system user that matches the database user so that we
            # can use peer authentication.  The tutorial defines a password,
            # but it's not necessary.
            "${username}".isSystemUser = true;
          };
        };

        systemd.services.postgrest = {
          wantedBy = [ "multi-user.target" ];

          after = [ "postgresql.service" ];

          script =
            let
              configuration = pkgs.writeText "tutorial.conf" ''
                db-uri = "postgres://${username}:${password}@localhost:${toString config.services.postgresql.port}/${database}"
                db-schema = "${schema}"
                db-anon-role = "${username}"
              '';

            in
              ''
                ${pkgs.haskellPackages.postgrest}/bin/postgrest ${configuration}
              '';

          serviceConfig.User = username;
        };

        # Uncomment the next line for running QEMU on a non-graphical system
        # virtualisation.graphics = false;
      };

      client = { };
    };

    testScript =
      ''
      import json
      import sys

      start_all()

      server.wait_for_open_port(${toString postgrestPort})

      expected = [
          {"id": 1, "done": False, "task": "finish tutorial 0", "due": None},
          {"id": 2, "done": False, "task": "pat self on back", "due": None},
      ]

      actual = json.loads(
          client.succeed(
              "${pkgs.curl}/bin/curl http://server:${toString postgrestPort}/${table}"
          )
      )

      if expected != actual:
          sys.exit(1)
      '';
  })

… and you can run the test with the following command:

$ nix build --file ./postgrest-tutorial.nix

… which will silently succeed with a 0 exit code if the test passes, or fail with an error message otherwise.

The above example highlights a few neat aspects of the NixOS test framework:

  • You can test more than one VM at a time

    The above test creates two VMs:

    • One VM named server which hosts postgres + postgrest

    • One VM named client where we initiate our curl commands

    … so that we can verify that everything works even when curl is run from a separate machine. For example, this comes in handy for testing firewall rules.

  • You can write the test and orchestration logic in Python

    This means that we can use Python not only to run the curl subprocess, but to also compare the result against a golden JSON output.

Conclusion

This NixOS test framework is streets ahead of other integration test frameworks that I’ve worked with:

  • The test is deterministic

    The above example will continue to work a decade from now because all transitive dependencies are fully pinned by the NixOS specification.

  • The test is reproducible

    We don’t need to specify out-of-band instructions for how to obtain or install test dependencies. The only thing users globally install is Nix.

  • The test is compact

    The whole thing fits in a single 120-line file with generous whitespace and formatting (although you have the option of splitting into more files if you prefer)

  • The test is fully isolated

    The test does not mutate any shared resources or files and the test runs within an isolated network, so we can run multiple integration tests in parallel on the same machine for building a test matrix.

  • The test is fast

    You might think that a VM-based test is slow compare to a container-based one, but the entire test run, including VM setup and teardown, only takes about 10 seconds.

  • The test is written in a fully-featured language

    We can use Nix’s support for programming language features to reduce repetition. For example, this is why we can consolidate all test constants to be defined in one place so that there is a single source of truth for everything.

So if you’re already trying out Nix, I highly encourage you to give the NixOS integration test framework a try for the above reasons.

3 comments:

  1. Such a great post, thank you for sharing! Your presentation on Twitch.tv showing how to run a QEMU Kafka server was fantastic too.

    ReplyDelete
  2. Hey Gabriel, thanks a ton for this tutorial, very well written!

    Here are a few tips for anyone trying to follow the tutorial on MacOS:

    1. QEMU on MacOS does not support virtfs, so it is not possible to run NixOS tests natively: https://github.com/NixOS/nixpkgs/issues/64578#issuecomment-668102681
    2. The workaround is running them in a Docker container like so:

    $ docker run --name docker-qemu --detach --restart unless-stopped -it nixos/nix sh
    $ docker exec -it docker-qemu sh
    0c1e2bb993ce:/# echo "system-features = nixos-test benchmark big-parallel kvm" >> /etc/nix/nix.conf
    0c1e2bb993ce:/# nix-env -i vim coreutils curl
    0c1e2bb993ce:~# mkdir -p /root/tutorial
    0c1e2bb993ce:~# cd /root/tutorial/
    0c1e2bb993ce:~/tutorial# vim postgrest-tutorial.nix
    # uncomment `virtualisation.graphics = false;`
    0c1e2bb993ce:~/tutorial# nix build --file ./postgrest-tutorial.nix
    0c1e2bb993ce:~/tutorial# QEMU_NET_OPTS='hostfwd=tcp::3000-:3000' result/bin/run-nixos-vm

    Additionally, to make the tests pass, I needed to add "sleep(30)" after "server.wait_for_open_port".

    The major drawback for running Docker -> QEMU is that it uses software virtualization and the build is rather slow, around 100 instead of 10 seconds.

    ReplyDelete
  3. For completeness sake, I also tried VirtualBox -> QEMU on MacOS, expecting it to be faster than Docker, but it really isn't. It's even a bit slower.

    Here are the steps I took:

    1. Install VirtualBox 6.1.16 from https://www.virtualbox.org/wiki/Downloads.
    2. Import NixOS VirtualBox OVA image from https://nixos.org/download.html#nixos-virtualbox.
    4. Start the Image, open Konsole terminal and enable OpenSSH in /etc/nixos/configuration.nix.
    You need sudo, password is `demo`.
    5. `sudo nixos-rebuild switch`.
    6. Reboot the image and note down the image IP by running `ifconfig` in Konsole.
    7. Network -> Adapter 1 -> Advanced -> Port Forwarding -> Click on "+":
    Host Port: 1111, Guest Port: 22, leave the host IP and guest IP blank
    8. `ssh -p 2222 demo@localhost` (password is `demo`):

    ```
    [demo@nixos:~]$ nix-env -i vim
    [demo@nixos:~]$ vim ./postgrest-tutorial.nix # uncomment `virtualisation.graphics = false;`
    [demo@nixos:~]$ nix build --file ./postgrest-tutorial.nix

    ReplyDelete