Monday, May 9, 2022

The golden rule of software distributions

golden-software-distro

This is a short post documenting a pattern I learned as a user and maintainer of software distributions. I wanted to share this pattern because the lesson was non-obvious to me in my early days as a software engineer.

I call this pattern the “golden rule of software distributions”, which I’ll define the verbose way followed by the concise way.

The verbose version of the golden rule of software distributions is:

If a package manager only permits installing or depending on one version of each of package then a software distribution for that package manager should bless one version of each package. The blessed version for each package must be compatible with the blessed version of every other package.

The concise version of the golden rule of software distributions is:

A locally coherent package manager requires a globally coherent software distribution.

… where:

  • “locally coherent” means that you can only install or depend on one version of each package for a given system or project

  • “globally coherent” means each package has a unique blessed version compatible with every other package’s blessed version

Note that any sufficiently large software distribution will not perfectly adhere to this golden rule. You should view this rule as an ideal that a software distribution aspires to approximate as closely as possible, but there will necessarily be cases where they cannot.

Motivation

I’ll introduce the term “build plan” to explain the motivation behind the golden rule:

A build plan for a package A specifies a version for each dependency of A such that A successfully builds against those dependencies.

To motivate the golden rule, let’s examine what happens when you have a locally coherent package manager but a globally incoherent software distribution:

  • Package users need to do a combinatorial search of their dependencies

    … in order to find a successful build plan. Specifically, they may need to test multiple major versions of their direct and indirect independencies to find a permutation that successfully builds.

  • Compatible sets of packages become increasingly unlikely at scale

    The likelihood of finding a build plan rapidly diminishes as your dependency tree grows. Beyond a certain number of dependencies a build plan might not even exist, even if every dependency is maintained.

  • Package authors need to support multiple major versions of every dependency

    … in order to maximize the likelihood that downstream packages can find a successful build plan. Maintaining this backwards compatibility greatly increases their maintenance burden.

  • Package authors must test against multiple major versions of each dependency

    … in order to shield their users from build failures due to unexpected build plans. This means a large number of CI runs for every proposed change to the package, which slows down their development velocity.

  • Responsibility for fixing incompatibilities becomes diffuse

    Sometimes you need to depend on two packages (A and B) which transitively depend on incompatible versions of another package (C). Neither package A nor package B can be held responsible for fixing the problem unless there is a blessed version of package C.

These issues lead to a lot of wasted work, which scales exponentially with the number of dependencies. Consequently, software ecosystems that ignore the golden rule run into difficulties scaling dependency trees which people will work around in the following ways:

  • Culturally discouraging dependencies

  • Vendoring dependencies within their projects

  • Gravitating towards large and monolithic dependencies / frameworks

The fundamental problem

The golden rule is necessary because build plans do not compose for a locally coherent package manager. In other words, if you have a working build plan for package A and another working build plan for package B, you cannot necessarily combine those two build plans to generate a working build plan for a package that depends on both A and B. In particular, you definitely cannot combine the two build plans if A and B depend on incompatible versions of another package C.

However, build plans can also fail to compose for more subtle reasons. For example, you can depend on multiple packages whose build plans are all pairwise-compatible, but there still might not exist a build plan for the complete set of packages.

The good news is that you can trivially “weaken” a build plan, meaning that if you find a build plan that includes both packages A and B then you can downgrade that to a working build plan for just package A or just package B.

Consequently, the globally optimal thing to do is to find a working build plan that combines as many packages as possible, because then any subset of that build plan is still a working build plan. That ensures that any work spent fixing this larger build plan is not wasted and benefits everybody. Contrast that with work spent fixing the build for a single package (e.g. creating a lockfile), which does not benefit any other package (not even downstream packages, a.k.a. reverse dependencies).

Common sense?

Some people might view the golden rule of software distributions as common sense that doesn’t warrant a blog post, but my experiences with the Haskell ecosystem indicate otherwise. That’s because I began using Haskell seriously around 2011, four years before Stackage was first released.

Before Stackage, I ran into all of the problems described in the previous section because there was no blessed set of Haskell packages. In particular, the worst problem (for me) was the inability to find a working build plan for my projects.

This issue went on for years; basically everyone in the Haskell ecosystem (myself included) unthinkingly cargo-culted this as the way things were supposed to work. When things went wrong we blamed Cabal (e.g. “Cabal hell”) for our problems when the root of the problem had little to do with Cabal.

Stackage fixed all of that when Michael Snoyman essentially introduced the Haskell ecosystem to the golden rule of software distributions. Stackage works by publishing a set of blessed package versions for all of the packages vetted by Stackage and these packages are guaranteed to all build together. Periodically, Stackage publishes an updated set of blessed package versions.

After getting used to this, I quickly converted to this way of doing things, which seemed blindingly obvious in retrospect. Also, my professional career arc shifted towards DevOps, including managing upgrades and software distributions and I discovered that this was a fairly common practice for most large software distributions.

Why this rule is not intuitive

Actually, this insight is not as obvious as people might think. In fact, a person with a superficial understanding of how software ecosystems work might suspect that the larger a software ecosystem grows the more incoherent things get. However, you actually encounter the opposite phenomenon in practice: the larger a software ecosystem gets the more coherent things get (by necessity).

In fact, I still see people argue against the global coherence of software ecosystems, which indicates to me that this isn’t universally received wisdom. Sometimes they argue against global coherence directly (they believe coherence imposes an undue burden on maintainers or users) or they argue against global coherence indirectly (by positing incoherence as a foundation of a larger architectural pattern). Either way, I strongly oppose global incoherence, both for the theoretical reasons outlined in this post and also based on my practical experience managing dependencies in the pre-Stackage days of the Haskell ecosystem.

Indeed, many of people arguing against globally coherent software ecosystems are actually unwitting beneficiaries of global coherence. There is a massive amount of invisible work that goes on behind the scenes for every software distribution to create a globally coherent package set that benefits everybody (not just the users of those software distributions). For example, all software users benefit from the work that goes into maintaining the Debian, Arch, Nixpkgs, and Brew software distributions even if they don’t specifically use those software distributions or their associated package managers.

Conclusion

This whole post has one giant caveat, which is that all of the arguments assume that the packager manager is locally coherent, which is not always the case! In fact, there’s a post that proves that local coherence can be undesirable because it (essentially) makes dependency resolution NP complete. For more details, see:

I personally have mixed views on whether local coherence is good or bad. Right now I’m slightly on team “local coherence is good”, but my opinions on that are not fully formed, yet.

That said, most package managers tend to require or at least benefit from local coherence so in practice most software distributions also require global coherence. For example, Haskell’s build tooling basically requires global coherence (with some caveats I won’t go into), so global coherence is a good thing for the Haskell ecosystem.

No comments:

Post a Comment