Our team at Arista Networks
is happy to announce nix-serve-ng, a backwards-compatible Haskell
rewrite of nix-serve
(a service for hosting a /nix/store
as a binary cache). It provides better reliability and performance
than nix-serve (ranging from ≈ 1.5× to 32× faster). We wrote
nix-serve-ng to fix scaling bottlenecks in our cache and we expect other
large-scale deployments might be interested in this project, too.
This post will focus more on the background behind the development
process and comparisons to other Nix cache implementations. If you don’t
care about any of that then you can get started by following the
instructions in the
repository’s README
.
Background
Before we began this project there were at least two other open
source rewrites of nix-serve-ng
that we could have adopted
instead of nix-serve
:
eris
- A Perl rewrite ofnix-serve
Note: the original
nix-serve
is implemented in Perl, and eris is also implemented in Perl using a different framework.harmonia
- A Rust rewrite ofnix-serve
The main reason we did not go with these two alternatives is because
they are not drop-in replacements for the original
nix-serve
. We could have fixed that, but given how
simple nix-serve
is I figured that it would be simpler
to just create our own. nix-serve-ng
only took a couple of
days for the initial version and maybe a week of follow-up fixes and
performance tuning.
We did not evaluate the performance or reliability of
eris
or harmonia
before embarking on our own
nix-serve
replacement. However, after
nix-serve-ng
was done we learned that it was significantly
faster than the alternatives (see the Performance section below). Some of those
performance differences are probably fixable, especially for
harmonia
. That said, we are very happy with the quality of
our solution.
Backwards compatibility
One important design goal for this project is to be significantly
backwards compatible with nix-serve
. We went to great
lengths to preserve compatibility, including:
Naming the built executable
nix-serve
Yes, even though the project name is
nix-serve-ng
, the executable built by the project is namednix-serve
.Preserving most of the original command-line options, including legacy options
… even though some are unused.
In most cases you can literally replace pkgs.nix-serve
with pkgs.nix-serve-ng
and it will “just work”. You can
even continue to use the existing services.nix-serve
NixOS
options.
The biggest compatibility regression is that
nix-serve-ng
cannot be built on MacOS. It is extremely
close to supporting MacOS save for this one bug in Haskell’s
hsc2hs
tool: haskell/hsc2hs
- #26. We left in all of the MacOS shims so that if that bug is ever
fixed then we can get MacOS support easily.
For more details on the exact differences compared to
nix-serve
, see the Result /
Backwards-compatibility section of the README
.
Performance
nix-serve-ng
is faster than all of the alternatives
according to both our formal benchmarks and also informal testing. The
“Benchmarks”
section of our README
has the complete breakdown but
the relevant part is this table:
Speedups (compared to nix-serve
):
Benchmark | nix-serve |
eris |
harmonia |
nix-serve-ng |
---|---|---|---|---|
Fetch present NAR info ×10 | 1.0 | 0.05 | 1.33 | 1.58 |
Fetch absent NAR info ×1 | 1.0 | 0.06 | 1.53 | 1.84 |
Fetch empty NAR ×10 | 1.0 | 0.67 | 0.59 | 31.80 |
Fetch 10 MB NAR ×10 | 1.0 | 0.64 | 0.60 | 3.35 |
… which I can summarize like this:
nix-serve-ng
is faster than all of the alternatives across all use caseseris
is slower than the originalnix-serve
across all use casesharmonia
is faster than the originalnix-serve
for NAR info lookups, but slower for fetching NARs
These performance results were surprising for a few reasons:
I was not expecting
eris
to be slower than the originalnix-serve
implementation… especially not NAR info lookups to be ≈ 20× slower. This is significant because NAR info lookups typically dominate a Nix cache’s performance. In my (informal) experience, the majority of a Nix cache’s time is spent addressing failed cache lookups.
I was not expecting
harmonia
(the Rust rewrite) to be slower than the originalnix-serve
for fetching NARsThis seems like something that should be fixable.
harmonia
will probably eventually match our performance because Rust has a high performance ceiling.I was not expecting a ≈ 30x speedup for
nix-serve-ng
fetching small NARsI had to triple-check that neither
nix-serve-ng
nor the benchmark were broken when I saw this speedup.
So I investigated these performance differences to help inform other implementations what to be mindful of.
Performance insights
We didn’t get these kinds of speed-ups by being completely oblivious to performance. Here are the things that we paid special attention to to keep things efficient, in order of lowest-hanging to highest-hanging fruit:
Don’t read the secret key file on every NAR fetch
This is a silly thing that the original
nix-serve
does that is the easiest thing to fix.eris
andharmonia
also fix this, so this optimization is not unique to our rewrite.We bind directly to the Nix C++ API for fetching NARs
nix-serve
,eris
, andharmonia
all shell out to a subprocess to fetch NARs, by invoking eithernix dump-path
ornix-store --dump
to do the heavy lifting. In contrast,nix-serve-ng
binds to the Nix C++ API for this purpose.This would definitely explain some of the performance difference when fetching NARs. Creating a subprocess has a fixed overhead regardless of the size of the NAR, which explains why we see the largest performance difference when fetching tiny NARs since the overhead of creating a subprocess would dominate the response time.
This may also affect throughput for serving large NAR files, too, by adding unnecessary memory copies/buffering as part of streaming the subprocess output.
We minimize memory copies when fetching NARs
We go to great lengths to minimize the number of intermediate buffers and copies when streaming the contents of a NAR to a client. To do this, we exploit the fact that Haskell’s foreign function interface works in both directions: Haskell code can call C++ code but also C++ code can call Haskell code. This means that we can create a Nix C++ streaming sink from a Haskell callback function and this eliminates the need for intermediate buffers.
This likely also improves the throughput for serving NAR files. Only
nix-serve-ng
performs this optimization (sincenix-serve-ng
is the only one that uses the C++ API for streaming NAR contents).Hand-write the API routing logic
We hand-write all of the API routing logic to prioritize and optimize the hot path (fetching NAR info).
For example, a really simple thing that the original
nix-serve
does inefficiently is to check if the path matches/nix-cache-info
first, even though that is an extremely infrequently used path. In our API routing logic we move that check straight to the very end.These optimizations likely improve the performance of NAR info requests. As far as I can tell, only
nix-serve-ng
performs these optimizations.
I have not benchmarked the performance impact of each of these changes in isolation, though. These observations are purely based on my intuition.
Features
nix-serve-ng
is not all upsides. In particular,
nix-serve-ng
is missing features that some of the other
rewrites provide, such as:
- Greater configurability
- Improved authentication support
- Monitoring/diagnostics/status APIs
Our focus was entirely on scalability, so the primary reason to use
nix-serve-ng
is if you prioritize performance and
uptime.
Conclusion
We’ve been using nix-serve-ng
long enough internally
that we feel confident endorsing its use outside our company. We run a
particularly large Nix deployment internally (which is why we needed
this in the first place), so we have stress tested
nix-serve-ng
considerably under heavy and realistic usage
patterns.
You can get started by following these these instructions and let us know if you run into any issues or difficulties.
Also, I want to thank Arista Networks for graciously sponsoring our team to work on and open source this project