Monday, May 9, 2022

The golden rule of software distributions

golden-software-distro

This is a short post documenting a pattern I learned as a user and maintainer of software distributions. I wanted to share this pattern because the lesson was non-obvious to me in my early days as a software engineer.

I call this pattern the “golden rule of software distributions”, which I’ll define the verbose way followed by the concise way.

The verbose version of the golden rule of software distributions is:

If a package manager only permits installing or depending on one version of each of package then a software distribution for that package manager should bless one version of each package. The blessed version for each package must be compatible with the blessed version of every other package.

The concise version of the golden rule of software distributions is:

A locally coherent package manager requires a globally coherent software distribution.

… where:

  • “locally coherent” means that you can only install or depend on one version of each package for a given system or project

  • “globally coherent” means each package has a unique blessed version compatible with every other package’s blessed version

Note that any sufficiently large software distribution will not perfectly adhere to this golden rule. You should view this rule as an ideal that a software distribution aspires to approximate as closely as possible, but there will necessarily be cases where they cannot.

Motivation

I’ll introduce the term “build plan” to explain the motivation behind the golden rule:

A build plan for a package A specifies a version for each dependency of A such that A successfully builds against those dependencies.

To motivate the golden rule, let’s examine what happens when you have a locally coherent package manager but a globally incoherent software distribution:

  • Package users need to do a combinatorial search of their dependencies

    … in order to find a successful build plan. Specifically, they may need to test multiple major versions of their direct and indirect independencies to find a permutation that successfully builds.

  • Compatible sets of packages become increasingly unlikely at scale

    The likelihood of finding a build plan rapidly diminishes as your dependency tree grows. Beyond a certain number of dependencies a build plan might not even exist, even if every dependency is maintained.

  • Package authors need to support multiple major versions of every dependency

    … in order to maximize the likelihood that downstream packages can find a successful build plan. Maintaining this backwards compatibility greatly increases their maintenance burden.

  • Package authors must test against multiple major versions of each dependency

    … in order to shield their users from build failures due to unexpected build plans. This means a large number of CI runs for every proposed change to the package, which slows down their development velocity.

  • Responsibility for fixing incompatibilities becomes diffuse

    Sometimes you need to depend on two packages (A and B) which transitively depend on incompatible versions of another package (C). Neither package A nor package B can be held responsible for fixing the problem unless there is a blessed version of package C.

These issues lead to a lot of wasted work, which scales exponentially with the number of dependencies. Consequently, software ecosystems that ignore the golden rule run into difficulties scaling dependency trees which people will work around in the following ways:

  • Culturally discouraging dependencies

  • Vendoring dependencies within their projects

  • Gravitating towards large and monolithic dependencies / frameworks

The fundamental problem

The golden rule is necessary because build plans do not compose for a locally coherent package manager. In other words, if you have a working build plan for package A and another working build plan for package B, you cannot necessarily combine those two build plans to generate a working build plan for a package that depends on both A and B. In particular, you definitely cannot combine the two build plans if A and B depend on incompatible versions of another package C.

However, build plans can also fail to compose for more subtle reasons. For example, you can depend on multiple packages whose build plans are all pairwise-compatible, but there still might not exist a build plan for the complete set of packages.

The good news is that you can trivially “weaken” a build plan, meaning that if you find a build plan that includes both packages A and B then you can downgrade that to a working build plan for just package A or just package B.

Consequently, the globally optimal thing to do is to find a working build plan that combines as many packages as possible, because then any subset of that build plan is still a working build plan. That ensures that any work spent fixing this larger build plan is not wasted and benefits everybody. Contrast that with work spent fixing the build for a single package (e.g. creating a lockfile), which does not benefit any other package (not even downstream packages, a.k.a. reverse dependencies).

Common sense?

Some people might view the golden rule of software distributions as common sense that doesn’t warrant a blog post, but my experiences with the Haskell ecosystem indicate otherwise. That’s because I began using Haskell seriously around 2011, four years before Stackage was first released.

Before Stackage, I ran into all of the problems described in the previous section because there was no blessed set of Haskell packages. In particular, the worst problem (for me) was the inability to find a working build plan for my projects.

This issue went on for years; basically everyone in the Haskell ecosystem (myself included) unthinkingly cargo-culted this as the way things were supposed to work. When things went wrong we blamed Cabal (e.g. “Cabal hell”) for our problems when the root of the problem had little to do with Cabal.

Stackage fixed all of that when Michael Snoyman essentially introduced the Haskell ecosystem to the golden rule of software distributions. Stackage works by publishing a set of blessed package versions for all of the packages vetted by Stackage and these packages are guaranteed to all build together. Periodically, Stackage publishes an updated set of blessed package versions.

After getting used to this, I quickly converted to this way of doing things, which seemed blindingly obvious in retrospect. Also, my professional career arc shifted towards DevOps, including managing upgrades and software distributions and I discovered that this was a fairly common practice for most large software distributions.

Why this rule is not intuitive

Actually, this insight is not as obvious as people might think. In fact, a person with a superficial understanding of how software ecosystems work might suspect that the larger a software ecosystem grows the more incoherent things get. However, you actually encounter the opposite phenomenon in practice: the larger a software ecosystem gets the more coherent things get (by necessity).

In fact, I still see people argue against the global coherence of software ecosystems, which indicates to me that this isn’t universally received wisdom. Sometimes they argue against global coherence directly (they believe coherence imposes an undue burden on maintainers or users) or they argue against global coherence indirectly (by positing incoherence as a foundation of a larger architectural pattern). Either way, I strongly oppose global incoherence, both for the theoretical reasons outlined in this post and also based on my practical experience managing dependencies in the pre-Stackage days of the Haskell ecosystem.

Indeed, many of people arguing against globally coherent software ecosystems are actually unwitting beneficiaries of global coherence. There is a massive amount of invisible work that goes on behind the scenes for every software distribution to create a globally coherent package set that benefits everybody (not just the users of those software distributions). For example, all software users benefit from the work that goes into maintaining the Debian, Arch, Nixpkgs, and Brew software distributions even if they don’t specifically use those software distributions or their associated package managers.

Conclusion

This whole post has one giant caveat, which is that all of the arguments assume that the packager manager is locally coherent, which is not always the case! In fact, there’s a post that proves that local coherence can be undesirable because it (essentially) makes dependency resolution NP complete. For more details, see:

I personally have mixed views on whether local coherence is good or bad. Right now I’m slightly on team “local coherence is good”, but my opinions on that are not fully formed, yet.

That said, most package managers tend to require or at least benefit from local coherence so in practice most software distributions also require global coherence. For example, Haskell’s build tooling basically requires global coherence (with some caveats I won’t go into), so global coherence is a good thing for the Haskell ecosystem.

Tuesday, May 3, 2022

Why does Haskell's take function accept insufficient elements?

Why does Haskell's take function accept insufficient elements?

This post is a long-form response to a question on Reddit, which asked:

I just started learning haskell, and the take function confuses me.

e.g take 10 [1,2,3,4,5] will yield [1,2,3,4,5]

How does it not generate an error ?

… and I have enough to say on this subject that I thought it would warrant a blog post rather than just a comment reply.

The easiest way to answer this question is to walk through all the possible alternative implementations that can fail when not given enough elements.

Solution 0: Output a Maybe

The first thing we could try would be to wrap the result in a Maybe, like this:

safeTake :: Int -> [a] -> Maybe [a]
safeTake 0      _   = Just []
safeTake n      []  = Nothing
safeTake n (x : xs) = fmap (x :) (safeTake (n - 1) xs)
>>> safeTake 3 [0..]
Just [0,1,2]

>>> safeTake 3 []
Nothing

The main deficiency with this approach is that it is insufficiently lazy. The result will not produce a single element of the output list until safeTake finishes consuming the required number of elements from the input list.

We can see the difference with the following examples:

>>> oops = 1 : 2 : error "Too short"

>>> take 1 (take 3 oops)
[1]

>>> safeTake 1 =<< safeTake 3 oops
*** Exception: Too short

Solution 1: Fail with error

Another approach would be to create a partial function that fails with an error if we run out of elements, like this:

partialTake :: Int -> [a] -> [a]
partialTake 0      _   = []
partialTake n (x : xs) = x : partialTake (n - 1) xs
>>> partialTake 3 [0..]
[0,1,2]

>>> partialTake 3 []
*Main> partialTake 3 []
*** Exception: Test.hs:(7,1)-(8,51): Non-exhaustive patterns in function partialTake

>>> partialTake 1 (partialTake 3 oops)
[1]

Partial functions like these are undesirable, though, so we won’t go with that solution.

Solution 2: Use a custom list-like type

Okay, but what if we could store a value at the end of the list indicating whether or not the take succeeded. One way we could do that would be to define an auxiliary type similar to a list, like this:

{-# LANGUAGE DeriveFoldable #-}

data ListAnd r a = Cons a (ListAnd r a) | Nil r deriving (Foldable, Show)

… where now the empty (Nil) constructor can store an auxiliary value. We can then use this auxiliary value to indicate to the user whether or not the function succeeded or not:

data Result = Sufficient | Insufficient deriving (Show)

takeAnd :: Int -> [a] -> ListAnd Result a
takeAnd 0      _   = Nil Sufficient
takeAnd n      []  = Nil Insufficient
takeAnd n (x : xs) = Cons x (takeAnd (n - 1) xs)
>>> takeAnd 3 [0..]
Cons 0 (Cons 1 (Cons 2 (Nil Sufficient)))

>>> takeAnd 3 []
Nil Insufficient

Also, the ListAnd type derives Foldable, so we can recover the old behavior by converting the ListAnd Result a type into [a] using toList:

>>> import Data.Foldable (toList)

>>> toList (takeAnd 3 [0..])
[0,1,2]

>>> toList (takeAnd 3 [])
[]

>>> toList (takeAnd 1 (toList (takeAnd 3 oops)))
[1]

This is the first total function that has the desired laziness characteristics, but the downside is that the take function now has a much weirder type. Can we solve this only using existing types from base?

Solution 3: Return a pair

Well, what if we were to change the type of take to return an pair containing an ordinary list alongside a Result, like this:

takeWithResult :: Int -> [a] -> ([a], Result)
takeWithResult 0      _   = (    [], Sufficient  )
takeWithResult n      []  = (    [], Insufficient)
takeWithResult n (x : xs) = (x : ys, result      )
  where
    (ys, result) = takeWithResult (n - 1) xs
>>> takeWithResult 3 [0..]
([0,1,2],Sufficient)
>>> takeWithResult 3 []
([],Insufficient)

Now we don’t need to add this weird ListAnd type to base, and we can recover the old behavior by post-processing the output using fst:

>>> fst (takeWithResult 3 [0..])
[0,1,2]
fst (takeWithResult 3 [])
[]

… and this also has the right laziness characteristics:

>>> fst (takeWithResult 1 (fst (takeWithResult 3 oops)))
[1]

… and we can replace Result with a Bool if want a solution that depends solely on types from base:

takeWithResult :: Int -> [a] -> ([a], Bool)
takeWithResult 0      _   = ([], True)
takeWithResult n      []  = ([], False)
takeWithResult n (x : xs) = (x : ys, result)
  where
    (ys, result) = takeWithResult (n - 1) xs

However, even this solution is not completely satisfactory. There’s nothing that forces the user to check the Bool value before accessing the list, so this is not as safe as, say, the safeTake function which returns a Maybe. The Bool included in the result is more of an informational value rather than a safeguard.

Conclusion

So the long-winded answer to the original question is that there are several alternative ways we could implement take that can fail if the input list is too small, but in my view each of them has their own limitations.

This is why I think Haskell’s current take function is probably the least worst of the alternatives, even if it’s not the safest possible implementation.

Sunday, May 1, 2022

Introductory resources to type theory for language implementers

Introductory resources to type theory for language implementers

This post briefly tours resources that helped introduce me to type theory, because I’m frequently asked by others for resources on this subject (even though I never had a formal education in type theory). Specifically, these resources will focus more on how to implement a type checker or type inference algorithm.

Also, my post will be biased against books, because I don’t tend to learn well from reading books. That said, I will mention a few books that I’ve heard others recommend, even if I can’t personally vouch for them.

What worked for me

The first and most important resource that I found useful was this one:

The reason why is because that paper shows logical notation side-by-side with Haskell code. That juxtaposition served as a sort of “Rosetta stone” for me to understand the correspondence between logical judgments and code. The paper also introduces some type theory basics (and dependent types!).

Along similar lines, another helpful resource was:

… which, as the name suggests, walks through a Haskell program to type-check Haskell code. This paper along with the preceding one helped bring type checkers “down to earth” to me by showing how there wasn’t any magic or secret sauce to implementing a type checker.

After that, the next thing that helped me improve my understanding was learning about pure type systems. Specifically, this paper was a very clear introduction to pure type systems:

You can think of pure type systems as sort of a “framework” for specifying type systems or talking about them. For example, the simply typed lambda calculus, System F, System FΩ, and the calculus of constructions are some example type systems that you’ll hear the literature refer to, and they’re all special cases of this general framework. You can think of pure type systems as generalizing the lambda cube.

However, none of the above resources introduce how to implement a type system with “good” type inference. To elaborate on that, many simple type systems can infer the types of program outputs from program inputs, but cannot work “in reverse” and infer the types of inputs from outputs. Hindley Milner type inference is one example of a “good” type inference algorithm that can work in reverse.

However, I never learned Hindley Milner type inference all that well, because I skipped straight to bidirectional type checking, which is described in this paper:

I prefer bidirectional type checking because (in my limited experience) it’s easier to extend the bidirectional type checking algorithm with new language features (or, at least, easier than extending Hindley Milner with the same language features).

The other reason I’m a fan of bidirectional type checking is that many cutting edge advances in research slot in well to a bidirectional type checker or even explicitly present their research using the framework of bidirectional type checking.

Books

Like I mentioned, I didn’t really learn that much from books, but here are some books that I see others commonly recommend, even if I can’t personally vouch for them:

Example code

I also created a tutorial implementation of a functional programming language that summarizes everything I know about programming language theory so far, which is my Fall-from-Grace project:

This project is a clean reference implementation of how to implement an interpreted langauge using (what I believe are) best practices in the Haskell ecosystem.

I also have a longer post explaining the motivation behind the above project:

Conclusion

Note that these are not the only resources that I learned from. This post only summarizes the seminal resources that greatly enhanced my understanding of all other resources.

Feel free to leave a comment if you have any other resources that you’d like to suggest that you feel were helpful in this regard.