I’ve been banging my head for a while on the challenge of building a type checker for Nix. The purpose of this post is to summarize my thoughts on this subject so far since they might prove useful to other people grappling with the same problem. In this post I’ll assume familiarity with Nix idioms and the Nix ecosystem.
Stack traces are not enough
Nix has one key advantage: purity. This means that you can safely detect errors in code by just running the code (with some caveats I won’t go into) and if anything goes wrong you get a stack trace.
This approach to error detection still has some limitations, which I wrote about in a prior blog post:
… but Nix’s stack traces still set the bar for any type-checker, meaning that any proposed type checker needs to produce error messages which are clearer and more informative than the stack traces that Nix currently produces.
“What’s the problem with stack traces?”, you might ask.
The issue is that stack traces do not work well for understanding errors “in the large”. Interpreting a typical stack trace requires a fairly sophisticated mental model of your dependencies, so the more dependencies you have (and the more complicated they are) the more difficulty interpreting the stack trace.
I’ll illustrate this using the following example which is inspired by a real error I ran into at work:
# ./module0.nix
{ nixpkgs.overlays = [
(self: super: {
# This is a common idiom in Nixpkgs to wrap Haskell packages in
# `justStaticExecutables` at the top level
fast-tags =
self.haskell.lib.justStaticExecutables self.haskellPackages.fast-tags;
})
];
}
# ./module1.nix
{ nixpkgs.overlays = [
(self: super: {
# In a separate overlay, create a variation on the `fast-tags` build with
# tests disabled
fast-tags-no-tests = self.haskell.lib.dontCheck super.fast-tags;
})
];
}
# ./example.nix
let
nixos = import <nixpkgs/nixos> {
configuration.imports = [
./module1.nix
./module0.nix
];
};
in
nixos.pkgs.fast-tags-no-tests
The above ./example.nix
builds just fine:
$ nix build --file ./example.nix
$ ./result/bin/fast-tags --version
fast-tags, version 2.0.1
… but now suppose that I modify ./example.nix
to sort the import list:
# ./example.nix
let
nixos = import <nixpkgs/nixos> {
configuration.imports = [
./module0.nix
./module1.nix
];
};
in
nixos.pkgs.fast-tags-no-tests
Now it fails to build, with the following error:
error: attribute 'fast-tags' missing
at /private/tmp/test/module1.nix:5:55:
4| # tests disabled
5| fast-tags-no-tests = self.haskell.lib.dontCheck super.fast-tags;
| ^
6| })
(use '--show-trace' to show detailed location information)
Okay, so why is the fast-tags
attribute missing? Let me add --show-trace
like the error message suggests to see what is going wrong:
error: attribute 'fast-tags' missing
at /private/tmp/test/module1.nix:5:55:
4| # tests disabled
5| fast-tags-no-tests = self.haskell.lib.dontCheck super.fast-tags;
| ^
6| })
… while evaluating 'overrideCabal'
at /nix/store/dchfgixlmxwq0w495w7xc39d65dyqg42-nixpkgs-22.05pre352357.98bb5b77c8c/nixpkgs/pkgs/development/haskell-modules/lib/compose.nix:38:22:
37| */
38| overrideCabal = f: drv: (drv.override (args: args // {
| ^
39| mkDerivation = drv: (args.mkDerivation drv).override f;
… from call site
at /private/tmp/test/module1.nix:5:28:
4| # tests disabled
5| fast-tags-no-tests = self.haskell.lib.dontCheck super.fast-tags;
| ^
6| })
… while evaluating the attribute 'pkgs.fast-tags-no-tests'
at /private/tmp/test/module1.nix:5:7:
4| # tests disabled
5| fast-tags-no-tests = self.haskell.lib.dontCheck super.fast-tags;
| ^
6| })
… while evaluating the file './example.nix':
Hmmm 🤔. That still does really not explain what went wrong.
The actual reason for the type error is that the order of overlays matters, but there’s no way that you would know that from looking at the stack trace. You would have to first understand how NixOS modules work and how the Nixpkgs overlay system works in order to correctly pinpoint the problem and even then it would still be tricky. I know because I am an experienced Nix user myself yet I still stumbled on an error message like this, albeit in the context of a larger codebase.
Moreover, the above example is not a contrived example that I hand-picked to make Nix look bad. This is idiomatic Nix code that uses modern conventions from Nixpkgs and NixOS exactly as the maintainers intended them to be used. The only thing that’s not idiomatic is using the NixOS module system to build just one package instead of building an entire NixOS system, but that’s only because I minimized this example from real code that built a complete NixOS system.
Types to the rescue?
There’s a deeper issue, though, which is that even if Nix had a type system the equivalent type error would have been almost as uninformative!
To see why, let’s imagine what a hypothetical informative type error might have looked like.
A great type error (for a command-line type checker) would have been something like this:
The fast-tags attribute defined within this overlay:
./module0.nix:7:6:
↓
fast-tags =
self.haskell.lib.justStaticExecutables self.haskellPackages.fast-tags;
… is not in scope within this other overlay:
./module1.nix:7:6:
↓
fast-tags-no-tests = self.haskell.lib.dontCheck super.fast-tags;
… because the latter overlay is ordered before the former overlay.
Suggestion: Perhaps wrap the latter overlay in `lib.mkAfter`, like this:
{ lib, ... }:
{ nixpkgs.overlays = lib.mkAfter [
(self: super: {
# In a separate overlay, create a variation on the `fast-tags` build with
# tests disabled
fast-tags-no-tests = self.haskell.lib.dontCheck super.fast-tags;
})
];
}
Nobody expects a type-checker for Nix to be that smart, but it’s instructive to consider why not.
Obviously, there’s no way for the type-checker to know that the user’s intent was to refer to the fast-tags
attribute in some other overlay defined within some unrelated file. Or is there?
Well, what if we had written the following simpler example that doesn’t use overlays and instead uses ordinary let
definitions to build up the final desired package:
let
pkgs = import <nixpkgs> { };
fast-tags =
pkgs.haskell.lib.justStaticExecutables pkgs.haskellPackages.fast-tags;
fast-tags-no-tests =
pkgs.haskell.lib.dontCheck fast-tags;
in
fast-tags-no-tests
As an aside, let’s admire how much more clear that example is. However, there are good reasons why Nixpkgs discourages this approach in the large which I won’t get into.
Now let’s take that example and reorder the two let
bindings:
let
pkgs = import <nixpkgs> { };
fast-tags-no-tests =
pkgs.haskell.lib.dontCheck fast-tags;
fast-tags =
pkgs.haskell.lib.justStaticExecutables pkgs.haskellPackages.fast-tags;
in
fast-tags-no-tests
Well, actually that still works 😅 because Nix permits out-of-order let
bindings so long as they are in the same let
block, but we can force the ordering by nesting the latter let
binding:
let
pkgs = import <nixpkgs> { };
fast-tags-no-tests =
pkgs.haskell.lib.dontCheck fast-tags;
in
let
fast-tags =
pkgs.haskell.lib.justStaticExecutables pkgs.haskellPackages.fast-tags;
in
fast-tags-no-tests
… which gives the following error we expected:
error: undefined variable 'fast-tags'
at /private/tmp/test/example.nix:5:32:
4| fast-tags-no-tests =
5| pkgs.haskell.lib.dontCheck fast-tags;
| ^
6|
This type error is already easier for the user to interpret, not because of any type system, but rather because using ordinary let
bindings forced us to structure our code so that a human can more easily discern what’s wrong. Now the let
bindings are plainly out of order, lexically.
Moreover, now a type-checker can easily look ahead and detect that there was a fast-tags
identifier defined shortly downstream that the user might have intended to use, so it’s completely realistic to expect an error message like this:
error: undefined variable 'fast-tags'
at /private/tmp/test/example.nix:5:32:
4| fast-tags-no-tests =
5| pkgs.haskell.lib.dontCheck fast-tags;
| ^
6|
Hint: Did you mean to refer to this `fast-tags` identifier defined afterwards
within the same file?
9| fast-tags =
| ^
10| pkgs.haskell.lib.justStaticExecutables pkgs.haskellPackages.fast-tags;
By restructuring definitions as ordinary let
bindings we’ve greatly reduced the difficulty of producing a great error message. No longer does the type-checker require a deep understanding of Nixpkgs overlays or NixOS modules; the type-checker only needs to understand how Nix the language works (specifically let
bindings).
Equally important, the end user also no longer requires an understanding of Nixpkgs overlays or NixOS modules to understand the type error. Even our hypothetical “great” error message still required our user to have some literacy with Nixpkgs and NixOS in order to interpret the error (albeit, not as much as the original stack trace).
The actual problem
The real issue with Nix isn’t the lack of a type checker. The absence of a type-checker is problematic, but in my view this is a symptom of a larger issue.
The fundamental problem that plagues all type-checking attempts for Nix is that nobody actually uses Nix the language at any significant scale. Instead, the community has adopted two sub-languages embedded within Nix for programming “in the large”:
Nixpkgs overlays
This is an embedded language that simulates object-oriented programming with inheritance / late binding / dynamic scope (depending on how you think about it)
NixOS modules
This is an embedded language that roughly emulates Terraform
Carefully note that these are not language features built into Nix; rather they are embedded domain-specific languages implemented within Nix. Consequently, a type-checker for “Nix the language” is not necessarily equipped to type-check these two sub-languages.
What about row polymorphism?
Let’s focus on one of those two sub-languages, the Nixpkgs overlay system, to see if we can still salvage things somewhat. The reason I suggest this is because the Nixpkgs overlay system is the far simpler sub-language of the two and the implementation of overlays is simple and tiny.
Technically, all you need to type-check the Nixpkgs overlays system is type system support for anonymous records and Nix’s //
operator. Modern type systems can support both of those features through the use of row polymorphism. This section assumes familiarity with row polymorphism, and you can skip this section if you aren’t familiar (it’s not essential), but if you would like to learn more, you can read the following resources:
Most people who are familiar with row polymorphism will wonder: can we use a type system with support for row polymorphism to type-check Nixpkgs overlays?
The answer is: sort of. We run into the same problems as stack traces: we catch errors, but the error messages and inferred types are less informative than we would hope.
To illustrate this, consider the default entrypoint to Nixpkgs, which is a function that takes an overlays
argument containing a list of overlays, like this:
let
= self: super: {
overlay0 = super.cowsay.overrideAttrs (old: {
cowsay = false;
doCheck
});
};
= self: super: {
overlay1 = super.hello.overrideAttrs (old: {
hello = "${self.cowsay}/bin/cowsay 'Installation complete'";
postInstall
});
};
= import <nixpkgs> { overlays = [ overlay0 overlay1 ]; };
pkgs
in
.hello pkgs
Now, ask yourself the following questions:
What would be the type of an overlay?
What would be the type of a list of overlays?
… such as
[ overlay0 overlay1 ]
in the above exampleWhat type should the
<nixpkgs>
function expect for itsoverlays
argument?How do the answers to the above questions affect inferred types and error messages?
I found these questions surprisingly difficult to answer! Again, the above example is not a contrived example I chose as a type-checking puzzle. This is the recommended way to use Nixpkgs.
I will go ahead and share what I think are the “right” answers that will get us closest to a type-checker for Nixpkgs with the fewest intrusive changes:
What would be the type of an overlay?
There would be no such thing as an “overlay type”. Each overlay would have a distinct type, but in every case an overlay would be a function from strongly-typed and row-polymorphic record inputs to a strongly-typed record output.
For example, the types of the above
overlay0
andoverlay1
functions would be something like this (using a pseudo-notation I made up inspired by Fall-from-Grace because no existing language has great syntax for this):overlay0 : forall (a b c : Fields) (o : Type) . { a } -> { cowsay : { overrideAttrs : (o -> { doCheck : Bool }) -> { c, doCheck : Bool } , c } , b } -> { cowsay : { c, doCheck : Bool } } overlay1 : forall (a b c : Fields) (o : Type) . { a } -> { hello : { overrideAttrs : (o -> { postInstall : Text }) -> { c, postInstall : Text } , c } , b } -> { hello : { c, postInstall : Text } }
Don’t sweat it if you don’t follow that notation I just made up. The purpose is just to informally illustrate that we can assign row-polymorphic types to
overlay0
andoverlay1
.What would be the type of a list of overlays?
There would be no such thing as a list of overlays, for the same reason that there would be no “overlay type”.
To be totally pedantic, you could in theory model a list of heterogeneously-typed overlays using a type-indexed list, but I don’t believe that’s the right path forward for Nixpkgs.
What type should the
<nixpkgs>
function expect for itsoverlays
argument?The
<nixpkgs>
entrypoint function would no longer take a list of overlays (because there would be no such thing as a list of overlays). It would take a single overlay as input and you would have to precompose overlays usinglib.composeExtensions
if you wanted to supply more than one overlay as input to<nixpkgs>
.For example, if you were to compose
overlay0
andoverlay1
, you would end up with a new composite overlay of the following type:lib.composeExtensions overlay0 overlay1 : forall (a b c d : Fields) (o : Type) . { a } -> { cowsay : { overrideAttrs : (o -> { doCheck : Bool }) -> { c, doCheck : Bool } , c } hello : { overrideAttrs : (o -> { postInstall : Text }) -> { d, postInstall : Text } , d } , b } -> { cowsay : { c, doCheck : Bool }, hello : { d, postInstall : Text } }
And the
<nixpkgs>
entrypoint function type would be something like:import <nixpkgs> : forall (a : Fields) { overlay : { cowsay : …, hello : …, …, a } # Enormous type goes here -> { cowsay : …, hello : …, … } # Similarly enormous type -> { a } , … # Other <nixpkgs> arguments we'll ignore for now } -> { cowsay : …, hello : …, …, a }
How do the answers to the above questions affect inferred types and error messages?
The inferred types would be enormous (especially for the
<nixpkgs>
entrypoint function). They might even be infinitely large without some form of information hiding, because these derivations are infinitely deep records that refer to themselves:$ nix repl nix-repl> pkgs = import <nixpkgs> { } nix-repl> pkgs.hello.out.out.out.out.out.… # … ad infinitum
The error messages would be slightly better than current stack traces but would still have great difficulty explaining to the user what actually went wrong because the type-checker is still not operating on the same abstraction level as the overlay system. The main improvement over stack traces is that the user would now be able to annotate expressions with types to help narrow down the cause of type errors.
I would like to dwell a little bit on the incredibly large inferred type for the <nixpkgs>
entrypoint, because this is essentially the same problem as the one highlighted in the previous section.
Implementing the Nixpkgs overlay system as an embedded language within Nix means that implementation details (like scope) are exposed to the user in a way that they wouldn’t have been exposed to if they were built-in language features.
For example, this type:
import <nixpkgs>
: forall (a : Fields)
{ overlay :
{ cowsay : …, hello : …, …, a } # Enormous type goes here
-> { cowsay : …, hello : …, … } # Similarly enormous type
-> { a }
, … # Other <nixpkgs> arguments we'll ignore for now
}
-> { cowsay : …, hello : …, …, a }
… is essentially saying “our input overlay is a value that has all of these packages in scope (such as cowsay
, hello
, and every other package from Nixpkgs), and then adds some new packages to the scope.
That’s cool and all that we can reify this information in the language and the type, but I want to point out how awkward this is compared to good old-fashioned lexical scope natively supported by the Nix language. Let’s revisit the same example using plain-old let
bindings:
let
pkgs = import <nixpkgs> { };
cowsay = pkgs.cowsay.overrideAttrs (old: {
doCheck = false;
});
hello = pkgs.hello.overrideAttrs (old: {
postInstall = "${cowsay}/bin/cowsay 'Installation complete'";
});
in
hello
If I were to ask the type-checker what the type of cowsay
is, I would hope that the type-checker would respond with something like:
cowsay
is aDerivation
… and not:
cowsay
is an expression that begins from a context containing a value namedpkgs
of this enormous type and then ends with a new context that is also enormous, but now extended with an additional value namedcowsay
of typeDerivation
.
You see the problem here? By implementing this overlays sub-language within Nix we’re needlessly polluting the user experience with internal implementation details of that sub-language (such as scope and context). This is why a type-checker for Nix is not adequately equipped to explain to the user what is happening within the Nixpkgs overlay system because it’s type-checking things at the wrong abstraction level.
Conclusion
So what’s the path forward for Nix? I’m not entirely sure, but here is what I think are the possible outcomes, in roughly ascending order of difficulty:
Solution A: Don’t implement a type system for Nix
Instead, try to improve the quality of stack traces and dynamic type errors.
For example, this is essentially the approach taken by the recent work to improve the user experience for stack traces:
Solution B: Only type-check Nix “in the small”
This seems to be the approach that Nickel is taking and this also seems to be the same way that people are using dhall-to-nix. Users settle for type-checking smaller fragments of Nix (like individual packages or NixOS modules), but any issues that arise across package/module boundaries are still not effectively addressed.
Solution C: Type-check Nixpkgs overlays using a type system supporting row polymorphism
Some users might be willing to settle for really complicated row-polymorphic types as long as they can still statically detect packaging issues in the large.
I haven’t yet thought through how to type-check the NixOS module system as an embedded language, so I’m leaving that out for now, but it might be possible to also type-check that using row polymorphism 🤷🏻♀️.
Solution D: Implement the two sub-languages in an external language
In other words, implement the Nixpkgs overlay system and NixOS module system in a separate language that is not Nix so that overlays and modules are supported by the language along with a type system that natively understands these features. Then you could compile this external language to ordinary Nix code that is compatible with the existing Nixpkgs overlay system or NixOS module system.
Solution E: Like Solution D, but upstream these features into the Nix language
You could add Nix language support for overlays/modules, instead of implementing them as embedded domain-specific languages.
Solution F: Like Solution D, but without Nix as an intermediate language
Basically, design a new front-end programming language for the Nix store that isn’t Nix. This language would natively understand overlays and NixOS modules and would generate derivations directly without going through Nix as an intermediate language.
Solution G: Don’t use overlays or NixOS modules at all
In other words, rethink the ecosystem from the ground up to use plain old Nix the language. Obviously, this is a massive amount of work to reinvent all of Nixpkgs and NixOS and it’s not clear that people would even want this.
My best guess is that “Solution C” or “Solution D” are the two most promising approaches that strike the right balance between how difficult they are to implement and actually addressing what users want in a type-checker.
No comments:
Post a Comment