Our team at Arista Networks
is happy to announce nix-serve-ng, a backwards-compatible Haskell
  rewrite of nix-serve
  (a service for hosting a /nix/store as a binary cache). It provides better reliability and performance
than nix-serve (ranging from ≈ 1.5× to 32× faster). We wrote
nix-serve-ng to fix scaling bottlenecks in our cache and we expect other
large-scale deployments might be interested in this project, too.
This post will focus more on the background behind the development
process and comparisons to other Nix cache implementations. If you don’t
care about any of that then you can get started by following the
instructions in the
repository’s README.
Background
Before we began this project there were at least two other open
source rewrites of nix-serve-ng that we could have adopted
instead of nix-serve:
- eris- A Perl rewrite of- nix-serve- Note: the original - nix-serveis implemented in Perl, and eris is also implemented in Perl using a different framework.
- harmonia- A Rust rewrite of- nix-serve
The main reason we did not go with these two alternatives is because
they are not drop-in replacements for the original
nix-serve. We could have fixed that, but given how
simple nix-serve is I figured that it would be simpler
to just create our own. nix-serve-ng only took a couple of
days for the initial version and maybe a week of follow-up fixes and
performance tuning.
We did not evaluate the performance or reliability of
eris or harmonia before embarking on our own
nix-serve replacement. However, after
nix-serve-ng was done we learned that it was significantly
faster than the alternatives (see the Performance section below). Some of those
performance differences are probably fixable, especially for
harmonia. That said, we are very happy with the quality of
our solution.
Backwards compatibility
One important design goal for this project is to be significantly
backwards compatible with nix-serve. We went to great
lengths to preserve compatibility, including:
- Naming the built executable - nix-serve- Yes, even though the project name is - nix-serve-ng, the executable built by the project is named- nix-serve.
- Preserving most of the original command-line options, including legacy options - … even though some are unused. 
In most cases you can literally replace pkgs.nix-serve
with pkgs.nix-serve-ng and it will “just work”. You can
even continue to use the existing services.nix-serve NixOS
options.
The biggest compatibility regression is that
nix-serve-ng cannot be built on MacOS. It is extremely
close to supporting MacOS save for this one bug in Haskell’s
hsc2hs tool: haskell/hsc2hs
- #26. We left in all of the MacOS shims so that if that bug is ever
fixed then we can get MacOS support easily.
For more details on the exact differences compared to
nix-serve, see the Result /
Backwards-compatibility section of the README.
Performance
nix-serve-ng is faster than all of the alternatives
according to both our formal benchmarks and also informal testing. The
“Benchmarks”
section of our README has the complete breakdown but
the relevant part is this table:
Speedups (compared to nix-serve):
| Benchmark | nix-serve | eris | harmonia | nix-serve-ng | 
|---|---|---|---|---|
| Fetch present NAR info ×10 | 1.0 | 0.05 | 1.33 | 1.58 | 
| Fetch absent NAR info ×1 | 1.0 | 0.06 | 1.53 | 1.84 | 
| Fetch empty NAR ×10 | 1.0 | 0.67 | 0.59 | 31.80 | 
| Fetch 10 MB NAR ×10 | 1.0 | 0.64 | 0.60 | 3.35 | 
… which I can summarize like this:
- nix-serve-ngis faster than all of the alternatives across all use cases
- erisis slower than the original- nix-serveacross all use cases
- harmoniais faster than the original- nix-servefor NAR info lookups, but slower for fetching NARs
These performance results were surprising for a few reasons:
- I was not expecting - eristo be slower than the original- nix-serveimplementation- … especially not NAR info lookups to be ≈ 20× slower. This is significant because NAR info lookups typically dominate a Nix cache’s performance. In my (informal) experience, the majority of a Nix cache’s time is spent addressing failed cache lookups. 
- I was not expecting - harmonia(the Rust rewrite) to be slower than the original- nix-servefor fetching NARs- This seems like something that should be fixable. - harmoniawill probably eventually match our performance because Rust has a high performance ceiling.
- I was not expecting a ≈ 30x speedup for - nix-serve-ngfetching small NARs- I had to triple-check that neither - nix-serve-ngnor the benchmark were broken when I saw this speedup.
So I investigated these performance differences to help inform other implementations what to be mindful of.
Performance insights
We didn’t get these kinds of speed-ups by being completely oblivious to performance. Here are the things that we paid special attention to to keep things efficient, in order of lowest-hanging to highest-hanging fruit:
- Don’t read the secret key file on every NAR fetch - This is a silly thing that the original - nix-servedoes that is the easiest thing to fix.- erisand- harmoniaalso fix this, so this optimization is not unique to our rewrite.
- We bind directly to the Nix C++ API for fetching NARs - nix-serve,- eris, and- harmoniaall shell out to a subprocess to fetch NARs, by invoking either- nix dump-pathor- nix-store --dumpto do the heavy lifting. In contrast,- nix-serve-ngbinds to the Nix C++ API for this purpose.- This would definitely explain some of the performance difference when fetching NARs. Creating a subprocess has a fixed overhead regardless of the size of the NAR, which explains why we see the largest performance difference when fetching tiny NARs since the overhead of creating a subprocess would dominate the response time. - This may also affect throughput for serving large NAR files, too, by adding unnecessary memory copies/buffering as part of streaming the subprocess output. 
- We minimize memory copies when fetching NARs - We go to great lengths to minimize the number of intermediate buffers and copies when streaming the contents of a NAR to a client. To do this, we exploit the fact that Haskell’s foreign function interface works in both directions: Haskell code can call C++ code but also C++ code can call Haskell code. This means that we can create a Nix C++ streaming sink from a Haskell callback function and this eliminates the need for intermediate buffers. - This likely also improves the throughput for serving NAR files. Only - nix-serve-ngperforms this optimization (since- nix-serve-ngis the only one that uses the C++ API for streaming NAR contents).
- Hand-write the API routing logic - We hand-write all of the API routing logic to prioritize and optimize the hot path (fetching NAR info). - For example, a really simple thing that the original - nix-servedoes inefficiently is to check if the path matches- /nix-cache-infofirst, even though that is an extremely infrequently used path. In our API routing logic we move that check straight to the very end.- These optimizations likely improve the performance of NAR info requests. As far as I can tell, only - nix-serve-ngperforms these optimizations.
I have not benchmarked the performance impact of each of these changes in isolation, though. These observations are purely based on my intuition.
Features
nix-serve-ng is not all upsides. In particular,
nix-serve-ng is missing features that some of the other
rewrites provide, such as:
- Greater configurability
- Improved authentication support
- Monitoring/diagnostics/status APIs
Our focus was entirely on scalability, so the primary reason to use
nix-serve-ng is if you prioritize performance and
uptime.
Conclusion
We’ve been using nix-serve-ng long enough internally
that we feel confident endorsing its use outside our company. We run a
particularly large Nix deployment internally (which is why we needed
this in the first place), so we have stress tested
nix-serve-ng considerably under heavy and realistic usage
patterns.
You can get started by following these these instructions and let us know if you run into any issues or difficulties.
Also, I want to thank Arista Networks for graciously sponsoring our team to work on and open source this project