Monday, February 10, 2020

Dhall Survey Results (2019-2020)

dhall-2020

The results are in for this year’s Dhall survey, which you can find here:

You might also want to compare to the summary of last year’s survey:

Note that I will not include all write-in responses, but you can consult the above links if you want to read them all. I will only highlight responses that are representative of clear themes in survey feedback. I am also omitting praise from this summary (which I greatly appreciate, though!), because I would like to draw attention to what needs improvement.

Adoption

This year a greater number of survey respondents reported using Dhall at work:

Which option best describes your usage of Dhall:

  • 9 (14.3%) - Never used Dhall
  • 13 (20.6%) - Briefly tried Dhall
  • 11 (17.5%) - Use Dhall (but only for personal projects)
  • 11 (17.5%) - Use Dhall at work (but only me)
  • 19 (30.2%) - Use Dhall at work along with coworkers

… compared to last year:

  • 7 (11.7%) - Never used it
  • 22 (36.7%) - Briefly tried it out
  • 11 (18.3%) - Use it for my personal projects
  • 19 (31.7%) - Use it at work
  • 1 ( 1.7%) - Write in: Trying to convince work people to use it

… even though fewer people completed the survey this year (down from 73 responses to 64).

The most likely reason for the smaller number of responses was the greater length of the survey. This will also likely influence the distribution of responses since people more interested in Dhall will be more motivated to complete this year’s longer survey. Next year I will probably trim the survey down again in order to gather a larger sample size.

Even taking that into account, the number of respondents using Dhall at work grew in absolute terms.

This year’s survey added a new category to distinguish whether people using Dhall at work were doing so alone or alongside their coworkers. I was pleased to see that Dhall users do not appear to have difficulty persuading their coworkers to use Dhall.

Reasons to adopt

What do you use Dhall for?

CI / CD / Ops (especially Kubernetes) continue to be the most prominent use cases for using Dhall:

Writing environment variables for configurations of containerized application

deployment configs and secrets management

Higher-level common configuration from which tool configuration is derived.

project configuration (CI, K8s, etc) & templating

Generating ECS task definitions

SRE/DevOps related configuration

Kubernetes mostly + some glue config with AWS

Kubernetes config

Mostly for kubernetes cluster configuration.

Configuration of my custom build system

dhall-kubernetes

Concourse Pipelines

publish interfaces for Ansible roles to make their usage easier through Dhall based config

Simple configuration for Haskell app, prototype kubernetes cluster config

Generate the yaml passed to –values for helm. Some gitlab ci configuration.

Application server configuration, database replacement

Setting up Docker Compose files for parts of our product to be used for automatic testing.

Configuration of application in kubernetes

Kubernetes management and shared config (application and infrastructure)

Generating yaml and build files

CI setup (generate ansible yml files)

Built bitbucket pipeline typing. Also internal configuration for a pdf-filler application.

Ansible config

For configuration and build utility (combinded with Shake)

Package definitions for a Linux ports tree builder I was building

Customer-specific configuration files

My favorite response was:

Dhall is the configuration format for my company’s product

However, there were still a few responses that didn’t fall into one of the DevOps use cases:

Defining a schema for metadata

Test data generation

Configuration of a personal chat bot, dhall-cabal,

Game data, templating LaTeX, canonical source of truth (upstream of JSON) for Elm apps.

dzen-dhall

Configuration for personal projects and generating DBC files at work

One surprising result was that only person wrote in Spago (PureScript’s package manager) as how they used Dhall:

PureScript through spago

… even though 7 survey respondents reported using Spago in a subsequent section! In fact, 5 of them wrote in completely different use cases for Dhall. This suggests a potential issue with the survey design: people might have chosen not to write in answers that were already covered by subsequent questions.

Other responses simply noted that Dhall is a programmable configuration language:

Adding a small amount of automation to config generation

Move non-turing-complete data out of config/programs, declarative programs that get interpreted as data

Configuration

Configuration

…configuration

configuration

Configuration generation

Generate yaml configuration files

configuration of programs written in haskell

The pitch

One of the things we do periodically is refine our “pitch”, based on feedback from users (including, but not limited to, these surveys). This section covers reasons to use Dhall in users’ own words when they answered:

Why do you use Dhall?

One of the most interesting answers to me was from the same respondent who said that Dhall was the configuration format for their company’s product. In their own words:

Because YAML is horrible. Because having multiple configuration files for a single product is horrible and dhall is the best “big config file” format.

The above pitch must be pretty compelling for a company to embrace Dhall to that extent. In fact, the latter half of the pitch is fairly similar to the one currently used by dhall-lang.org, focusing on Dhall’s suitability for configuration “in the large”, although we’ve recently de-emphasized replacing YAML specifically. Several commercial users provided input into that branding (and one of them could have been the same person who left the above feedback).

Being a “big config file” format can imply many different virtues, but in my experience users usually mean the the following programming language features:

  • Types (to reduce errors at scale)
  • Functions (to avoid repeating one’s self)
  • Imports (to have a single source of truth that one can import)
  • Integrations (to unify disparate configuration formats)

… and several responses were related to these “scale”-related reasons to adopt:

Consistent typing across deployment configs and environments

strong types, functions and remote (decentralised) importing

Type safety, avoid writing yaml by hand

It allows us to provide a function via environment variable

Higher-level common configuration from which tool configuration is derived.

It replaced a lot of redundant JSON configs; adding new services is a lot quicker now, and cross-cutting changes less painful

To avoid copy&paste and the maintanance problems caused by it

Type safety, safe imports, no YAML indentation madness.

To help reduce code duplication and avoid errors in the values and keys of configuration filez. Mainly used it so far to help setup a more maintainable set of Kinect clusters and users for myself, which also made it easier to add.

It has a strong type system

Type-safety, the ability to move a lot of logic/formatting (e.g. in templating) into Dhall proper, the ability to represent ADTs/union types.

Abstraction!

Because i can abstract boilerplate

It provides a sane language for configuring systems (with types and abstraction)

Type checking, configuration typo safety, factorising yaml

Abstract useful config + defaults

Doesn’t have the weird gotchas of yaml. Possible to make DRY config files.

The main reason has been to decrease duplication.

type safety

I love the idea of strongly typed config, and also the programming aspect means there is less chance for error since you can reuse a component instead of copy/paste. I have been pushing for dhall at work as there have been very severe incidents at work from bad config files

Strong type safety

I like the ability to evaluate expressions, value substitution

Strong types and non-repetitive abstractions. Ease of publishing modules

our CI setup is now big and has lots of commonalities between projects. dhall helps us avoid preventable mistakes thanks to type checks and allows us to share common config with functions

it’s typed and avoids repetition

Not repeating common parts of configuration (like Kubernetes YAML hell)

Normally a general-purpose programming language can also do these things, and several respondents noted why they preferred Dhall over a general-purpose language:

It’s non-Turing-complete, but allows imports and static typing

It’s convenient and provides a good power-to-weight ratio

Only non-turing-complete config language that is based on a lambda calculus (and is modern/not a hobby project)

Because it won’t crash at runtime (totality)

It is by far the most well-wrought typed configuration language

I want to use something with proper type theory behind it and dhall is actually almost useful for some problems we have at work

It’s a nice config language

It’s quick, compact and type-safe

simple but effektive system f, i like that

Others explained the motivation in terms of the current solution they were struggling with and trying to replace (commonly YAML):

To make sense of this pile of configuration mess

Verifying and reading through our combo of custom scripts, chef, and other solutions was a nightmare

so I don’t need to create my own configuration language and other existing ones suck more (json, yaml, that weird python thing…)

Because YAML and database were not the choices

For all the benefits over YAML

better helm-templates

Documentation

This year we asked people what needed better documentation to see if there were any blind spots in our current coverage:

What needs better documentation?

Like last year, people most commonly requested documentation on best practices and design patterns:

Recursive things. (This has been discussed on Slack somewhat.)

Packaging

I’d love a manual on best practices and a contributing guide to the Haskell implementation

Cookbooks

The prelude/standard library, or rather, surfacing documentation centrally in browsable form

How to create defaults, for records with optional values. How to design types with backward compatibility in mind.

Nested record updating, though the blog post shows some improvement here with the default types :: syntax.

How to make use of dhall at the peripheries of a large project.

How to deal with generated files (!), e.g. CI yaml config; how to include dhall projects into nix configs (dhallToNix); best practices for pre-caching imports

Imports section could be a bit more extensive like import prelude, import private repos (GitHub, BitBucket), multi imports. currently the information is scattered around various pages.

Perhaps patterns. For example, we have { a :: ty1, b:: Maybe ty2} and want users to be able to write { a = val } without b = None or default \ { a = val }

(this is probably because I didn’t google it properly), how to properly deal with freezing and updating hashes

Best practices and real-world examples. I’d love to use something like Dhall for managing configuration at work, but it’s very hard to tell if it will handle my usecases well, and it’s hard to dive in and try it out because I don’t know if I’m doing it right

We have made progress on that front in two forms:

  • The Dhall Configuration Language Manual

    This manual was created as a series of how-to guides for common tasks and idioms related to the language. Feedback from these surveys helps inform what topics I choose for each chapter.

  • Some design patterns became language features

    The most notable example this year is standardizing support for the record completion operator for better handling of defaults

In fact, several improvements year (and some currently in progress) are directly inspired by my work on the book. Any time I describe a workflow that seems too painful I make changes to the language, tooling, or ecosystem to smooth things over.

Besides the book, the thing most likely to improve over the coming year is packaging, documenting, and discovering new Dhall packages. For example, I just created a Google Summer of Code project proposal for a student work on a documentation generator for Dhall packages:

The second most common request was to improve documentation for the core language features:

Still not sure how the merge keyword works.

i somehow have trouble finding the doc page about record operations and iirc the record projection thing where you take the a subset of a record’s field is not included in the page listing records operations

Importing of other files

The introduction to FP. Lots of devs work in Go, Ruby, etc and need help thinking about polymorphism with sum types. Also more clarification on the let...in syntax in early guides and an explanation on why you need :let in the repl.

I tend to have a hard time finding comprehensive info on syntactic features and end up hearing about them on guthub issues

Common errors like “-/+” syntax, “not a function”

the type system

This is understandable because the closest thing Dhall has to a complete resource on this is the Haskell implementation’s tutorial:

One of my short-term goals is to translate this into a language-independent tutorial.

There is an existing language-independent tutorial for translating Dhall to JSON:

… but that doesn’t cover all of the language features like the Haskell tutorial does.

Language bindings

Which language bindings do you currently use?

  • 37 (84.1%) - Haskell
  • 6 (13.6%) - Bash
  • 5 (11.4%) - Nix
  • 3 ( 6.8%) - Ruby
  • 2 ( 4.5%) - Rust
  • 1 ( 2.3%) - Golang
  • 1 ( 2.3%) - Swift
  • 0 ( 0.0%) - Clojure
  • 0 ( 0.0%) - Eta
  • 0 ( 0.0%) - Java (via Eta)

The number of Haskell users is not surprising given that the Haskell implementation also powers the shared command-line tools, like dhall/dhall-to-{json,yaml}/dhall-lsp-server. Many Dhall users do not use Haskell the language and instead use the Haskell implementation to generate JSON or YAML from Dhall while waiting for a language binding for their preferred language.

I think the one response for Swift might have been a mistaken answer intended for the next section. As far as I know there currently are not Dhall bindings to Swift (not even ones in progress).

One of the interesting take-aways from the above question is that the JVM is one of the areas where Dhall is not being used as a native language binding despite existing bindings. I get the impression that most JVM users are waiting for Java/Scala bindings.

Desired language bindings

Which language bindings would you like to see get more attention?

  • 17 (39.5%) - Python
  • 12 (27.9%) - Scala
  • 11 (25.6%) - PureScript
  • 9 (20.9%) - JavaScript
  • 7 (16.4%) - Go
  • 7 (16.3%) - Java
  • 3 ( 7.0%) - C++
  • 3 ( 7.0%) - Elm
  • 2 ( 4.7%) - C#
  • 2 ( 4.7%) - Kotlin
  • 2 ( 4.7%) - Rust
  • 1 ( 2.3%) - Swift
  • 1 ( 2.3%) - TypeScript
  • 1 ( 2.3%) - PHP
  • 1 ( 2.3%) - Perl
  • 1 ( 2.3%) - C
  • 1 ( 2.3%) - A C/Rust library so all the other langs can bind to

Python is an interesting response because at one point there was progress on a Python binding to Dhall, but that stalled out.

The demand for Python makes sense because Python is used heavily in Dhall’s primary use case (CI / CD / Ops), alongside Go. In fact, Go was listed as well, although possibly not mentioned as often due to the Go binding to Dhall being far closer to completion.

In fact, the Go binding just announced the first release candidate for version 1.0.0:

Note that survey respondents preferred bindings in functional languages over their more widely used imperative counterparts. For example, there was greater demand for Scala compared to Java and greater demand for PureScript compared to JavaScript. This might owe to Dhall’s functional programming heritage.

A few survey respondents appear to not be aware that there is a complete Rust binding to Dhall now available. This is understandable, though, given that the Rust binding only officially announced recently.

Integrations

Which of the following integrations do you use?

  • 26 (63.4%) - JSON (via dhall-to-json)
  • 25 (61.0%) - YAML (via dhall-to-yaml)
  • 10 (24.4%) - Kubernetes (via dhall-to-kubernetes)
  • 7 (17.1%) - JSON (via Prelude.JSON.render)
  • 7 (17.1%) - purescript-packages (via spago)
  • 5 (12.2%) - YAML (via Prelude.JSON.renderYAML)
  • 3 ( 7.3%) - Cabal (via dhall-to-cabal)
  • 1 ( 2.4%) - Write-in: Nix (via dhall-to-nix)
  • 0 ( 0.0%) - XML (via dhall-to-xml)
  • 0 ( 0.0%) - XML (via Prelude.XML.render)
  • 0 ( 0.0%) - TOML (via JSON)

The thing I take away from the above numbers is that a large number of people would still benefit from language bindings and they currently work around the absence of a language binding by generating JSON/YAML.

Desired integrations

Which integrations would you like to see get more attention?

  • 22 (68.8%) - Terraform
  • 11 (34.4%) - Docker Compose
  • 9 (28.1%) - HCL
  • 8 (25.0%) - Prometheus
  • 4 (12.5%) - Packer
  • 2 ( 6.3%) - INI
  • 2 ( 6.3%) - Concourse
  • 2 ( 3.1%) - Write-in: Ansible
  • 1 ( 3.1%) - GoCD
  • 1 ( 3.1%) - Write-in: Grafana
  • 1 ( 3.1%) - Write-in: Nix, Nixops
  • 1 ( 3.1%) - Write-in: Dockerfile
  • 1 ( 3.1%) - Write-in: Google Cloud Builder
  • 1 ( 3.1%) - Write-in: Travis
  • 1 ( 3.1%) - Write-in: Vault
  • 1 ( 3.1%) - Write-in: GitHub Actions
  • 1 ( 3.1%) - Write-in: Drone CI
  • 1 ( 3.1%) - Write-in: CloudFormation
  • 1 ( 3.1%) - Write-in: Jenkins
  • 1 ( 3.1%) - Write-in: TOML
  • 1 ( 3.1%) - Write-in: Bitbucket pipelines

Terraform was far and away the most requested integration. One of the interesting challenges about this potential integration is figuring out what is the right way to integrate Dhall with Terraform because Terraform has its own programming features (like a DSL for defining function-like modules).

Dhall packages

Which of the following Dhall packages do you use?

  • 32 (100.0%) - Prelude
  • 4 ( 12.5%) - dhall-packages (Dhall monorepo)
  • 1 ( 3.1%) - hpack-dhall (hpack bindings)
  • 1 ( 3.1%) - github-actions-dhall (GitHub Actions bindings)
  • 1 ( 3.1%) - dhall-terraform (Terraform bindings)
  • 1 ( 3.1%) - dhall-semver (Semantic versions)
  • 1 ( 3.1%) - dhall-concourse (Concourse bindings)
  • 1 ( 3.1%) - dhall-bhat (Haskell type classes in Dhall)
  • 1 ( 3.1%) - dada (Recursion schemes)
  • 0 ( 0.0%) - dho (CircleCI bindings)
  • 0 ( 0.0%) - dhallql (Query language)
  • 0 ( 0.0%) - dhallia (Dhall as an IDL)
  • 0 ( 0.0%) - cpkg (C package manager)
  • 0 ( 0.0%) - caterwaul (Category theory)

Unsurprisingly, most people use the Prelude. The thing that did catch my eye was how many respondents used dhall-packages (mainly because I hadn’t realized how fast it had grown since the last time I checked it out).

dhall-packages appears to have the potential to grow into the Dhall analog of Helm, meaning a repository containing useful types and predefined recipes for deploying Kubernetes services. I can see this repository easily giving Dhall a competitive edge in the Kubernetes space since I know quite a few people are looking for a Helm alternative without the headaches associated with templating YAML.

ASCII vs Unicode

dhall format tries to be as opinionated as possible, but currently permits one way to customize behavior: the tool can either emit Unicode symbols (e.g. λ, , ) or ASCII symbols (e.g. \, ->, forall).

I asked several questions about ASCII versus Unicode to see if there was a possibility of standardizing on one or the other. Unfortunately, people were split in this regard. The only thing they agreed upon was that they preferred not to input Unicode symbols directly:

Do you prefer to input ASCII or Unicode symbols in your editor (before formatting the code)?

  • 58 (93.5%) - ASCII
  • 4 ( 6.5%) - Unicode

… but on the other two questions people split pretty evenly, with a slight preference for Unicode:

How do you format your code?

  • 24 (42.9%) - Unicode
  • 22 (39.3%) - ASCII
  • 6 (10.7%) - I don’t format my code
  • 4 ( 7.3%) - Other

Do you prefer to read Dhall code that uses ASCII or Unicode symbols?

  • 28 (50.9%) - Unicode
  • 25 (45.5%) - ASCII
  • 2 ( 3.6%) - Other

What’s interesting is that you get clearer preferences when you slice the data by how much people use Dhall.

For example, people who have never used Dhall or briefly tried Dhall prefer ASCII by roughly a 3-to-1 margin:

How do you format your code?

  • 10 (66.7%) - ASCII
  • 3 (20.0%) - I don’t format my code
  • 2 (13.3%) - Unicode

Do you prefer to read Dhall code that uses ASCII or Unicode symbols?

  • 12 (75.0%) - ASCII
  • 4 (25.0%) - Unicode

… whereas other categories (e.g. personal projects or work) prefer Unicode by roughly a 2-to-1 margin:

How do you format your code?

  • 22 (55.0%) - Unicode
  • 11 (27.5%) - ASCII
  • 4 (10.0%) - Other
  • 3 ( 7.5%) - I don’t format my code

Do you prefer to read Dhall code that uses ASCII or Unicode symbols?

  • 23 (60.5%) - Unicode
  • 13 (34.2%) - ASCII
  • 2 ( 5.3%) - Other

There are several possible ways to interpret that evidence:

  • Perhaps Dhall could expand its potential audience by formatting ASCII

  • Perhaps people prefer the Unicode syntax the more they use Dhall

  • Perhaps there is a “founder effect” since originally dhall format only supported Unicode

Either way, I don’t plan on making any changes to dhall format immediately, but I will use this data to inform future formatting discussions on Discourse.

One person also added Unicode-related feedback in the “Other feedback” section:

I feel strongly that unicode symbols are not worth supporting indefinitely, as they typically can’t be typed and add mental overhead when reading. The symbols themselves also have a mathematical bent, which can be intimidating for those not well versed in math / logic programming.

Growth

Would anything encourage you to use Dhall more often?

Language bindings:

I wish there was a good way to do nix through dhall, but I don’t have any good suggestions.

something like the dhall Haskell library for purescript

Js/purescript bindings in release state. …

Scala/JVM bindings (Eta is suboptimal & unmaintained) …

python bindings

Successfully getting my work on board with it (which means having a perfect golang integration)

Getting more bindings for things I use

better docs; bindings for the JVM languages

Bindings on other languages (hard to get people to contribute on a project using the Haskell impl)

Better Python support, so I can sell it to colleagues who use Python.

Better (documented) language bindings.

Packages/Integrations:

I’d love to use it in more places, eg to replace all of our terraform code or CI configuration; currently those integrations just aren’t there and I don’t have time to bridge the gap

More libraries / packages. I think dhall needs a richer ecosystem. In particular i’d love complete terraform bindings and more kubernetes packages (a la helm)

First class Kubernetes, Terraform, Vault, Ansible integration.

When I looked at Dhall most recently, it wasn’t obvious to me that an ecosystem of packages was springing up around it. Might want to add a link (or a more prominent one if there is already one and I missed it).

More packages (like dhall-kubernetes)

… Also, better Terraform/HCL bindings, Ansible integration, or integrations for common CI systems (Jenkins/jenkins-job-builder, CircleCI, GoCD, etc)

… more integrations (I would like to be able to configure EVERYTHING using Dhall ^^)

Tooling:

Better performance

better (especially more concise) Error messages

structural editor with auto-completion based on symbols in scope and with automatic let-floating

Speed and ergonomics of the Emacs integration. Currently it’s terribly slow to type check.

Language features:

Ability to import YAML without yaml-to-dhall

nested record updates

Usage unicode chars in imported filenames without quotes

… Also Text/Double operations and possibly comparisons. I understand and agree with the reasons this hasn’t been done, but finding a way to do this without compromising the goal of the language would add so much potential

Formatting improvements:

Vonderhaar-style dhall format. Long multi-line literals with interpolations become really illegible currently. More built-ins. Text becoming non-opaque. The ability to expose multiple things from a record (this is currently being discussed).

Ultimately it was the autoformatter/idomatic formatting of Dhall that turned me off. I wanted my package format to be clear and readable to people new to my project but the idomatic was very messy in my eyes. Here’s a comparison of Dhall and the TCL inspired pkg definition format I came up with: https://gist.github.com/wezm/dfdce829964c410e2c521aa3ca132ddd

Social proof:

Popularity

Popularity is the big one so I could get away with it more at work. …

Documentation:

… Better docs to educate team members with.

Better starter documentation. maybe also how to put in place Dhall build in CI?

Other unassorted responses:

Hard to describe in one line, but an easier way to get values in and out of larger dhall projects (github issue 676 being a symptom of this)

For work maybe a convincing reason to use it in place of yaml. For personal, making it easier to support backward compatiblity in types.

The different type hack for multiple resources in a single Kubernetes file made me drop it, I could’ve never recommended it to my coworkers over worse (as in, worse is better) templating solutions.

Issue #1521

Generation of Dhall types from Haskell types; Easier extension with own functions

A nice bidirectional typechecker for less type annotation burden

Contributions

One thing I checked is if there were any barriers to adoption that we were not aware of:

Would anything encourage you to contribute to the Dhall ecosystem more often?

There were not many common themes that stood out, so I’ll include the responses verbatim:

Maybe some links to resources on how to get started with programming theory (eg. info on type notation for the standard, parser, “compiler”, etc). Basically I feel I probably just need to learn more, but I’m not entirely sure what.

Nope, as a new contributor this year, the Dhall community has been an absolute delight to start contributing to.

better reporting for missing env vars (not one by one)

better personal usecases

No - I already want to do a lot more!

A contributing guide to the haskell implementation

Linked Haskell tutorials and perhaps partial (or early) implementations of dhall features

Perhaps simplified core language? I sometimes think that the language is large and difficult, especially around “safety guarantees” features, and it might make it harder to develop a new language binding.

Using it at work

I found it difficult to generate dhall-kubernetes bindings

The ecosystem seems to be very contributor-friendly, but I don’t have enough time at the moment.

Getting standards for repo layout, documentation, discovery

… and my favorite response was:

There’s a ‘good first issue’ label that’s attached to no issues.

Conclusion

The main thing I concluded from the survey feedback is that people want us to focus on ease of integration, especially language bindings, and especially Python language bindings.

A major change from last year’s feedback was the dramatic drop in requests for better documentation / examples / use cases. People appear to understand the motivation for the language and how to use Dhall to solve their problems and now they care more about streamlining the integration process as much as possible.

If this is your first time hearing about the survey you can still complete the survey:

I receive e-mail notifications when people complete the survey so I will be aware of any feedback you provide this way. Also, completing the survey will let you browse the survey results more easily.

Before concluding this post, I would like to highlight that this year’s survey used approval voting to let people select their preferred language bindings. If you’re a person frustrated with a political system based on first-past-the-post voting, I encourage you to research approval voting as an alternative voting method with a higher power-to-weight ratio (simpler than ranked choice voting and produces better outcomes).

Friday, January 17, 2020

Why Dhall advertises the absence of Turing-completeness

total2

Several people have asked why I make a big deal out of the Dhall configuration language being “total” (i.e. not Turing-complete) and this post will summarize the two main reasons:

  1. If Dhall is total, that implies that the language got several other things correct

  2. “Not Turing-complete” is a signaling mechanism that appeals to Dhall’s target audience

“Because of the Implication”

The absence of Turing completeness per se does not provide many safety guarantees. Many people have correctly noted that you can craft compact Dhall functions that can take longer than the age of the universe to evaluate. I even provide a convenient implementation of the Ackermann function in Dhall to make it as easy as possible for people to foil the interpreter:

However, a total language like Dhall needs to get several other things correct in order to be able to guarantee that the language is not Turing complete. There are multiple ways you can eliminate Turing-completeness from a language, but nearly all of them improve the language in some way.

For example, the way Dhall eliminates Turing-completeness is:

  • Eliminating general recursion

    … which protects against common mistakes that introduce infinite loops

  • Having a strong type system

    … especially one with no escape hatches for reintroducing general recursion

  • Forbidding arbitrary side effects

    … which can also be another way to backdoor general recursion into a language

These three features are widely viewed as good things in their own right by people who care about language security, regardless of whether they are employed in service of eliminating Turing-completeness.

In other words, Turing-completeness functions as a convenient “umbrella” or “shorthand” for other safety features that LangSec advocates promote.

Shibboleth

According to Wikipedia a shibboleth is:

… a custom or tradition, usually a choice of phrasing or even a single word that distinguishes one group of people from another. Shibboleths have been used throughout history in many societies as passwords, simple ways of self-identification, signaling loyalty and affinity, maintaining traditional segregation, or protecting from real or perceived threats.

The phrase “not Turing-complete” is one such shibboleth. People who oppose the use of general-purpose programming languages for configuration files use this phrase as a signaling mechanism. This choice of words communicates to like-minded people that they share the same values as the Dhall community and agree on the right balance between configuration files being data vs. being programs.

If you follow online arguments about programmable configuration files, the discussion almost invariably follows this pattern:

  • Person A: “Configuration files should be inert so that they are easier to understand and manipulate”
  • Person B: “Software enginering practices like types and DRY can prevent config-induced production outages. Configs should be written in a general-purpose programming language.”
  • Person A: “But configuration files should not be Turing-complete!”

Usually, what “Person A” actually meant to say was something like:

  • configuration files should not permit arbitrary side effects
  • configuration files should not enable excessive indirection or obfuscation
  • configuration files should not crash or throw exceptions

… and none of those desires necessarily imply the absence of Turing-completeness!

However, for historical reasons all of the “Person A”s of the world rallied behind the absence of Turing-completeness as their banner. When I advertise that Dhall is not Turing-complete I’m signaling to them that they “belong here” with the rest of the Dhall community.

Conclusion

In my view, those two points make the strongest case for not being Turing complete. However, if you think I missed an important point just let me know.

Sunday, January 5, 2020

Dhall - Year in review (2019-2020)

dhall-2020

The Dhall configuration language is now three years old and this post reviews progress in 2019 and the future direction of the language in 2020.

If you’re not familiar with Dhall, you might want to visit the official website for the language. This post assumes familiarity and interest in the language.

I would like to use this post to advertise a short survey you can take if you would like to provide feedback that informs the direction of the language:

Language bindings

Last year’s survey indicated that many respondents were keen on additional language bindings, which this section covers.

This year there is a new officially supported language binding! 🎉

  • dhall-rust - Rust bindings to Dhall by Nadrieril

    I’m excited about this binding, both because Rust is an awesome language and also because I believe this paves the way for C/C++ bindings (and transitively any language that can interop with C)

There is also one new language binding close to completion:

  • dhall-golang - Go bindings to Dhall by Philip Potter

    This binding is not yet official, but I’m mentioning here in case interested parties might want to contribute. If you are interested in contributing then this thread is a good starting point.

    This is a binding that I believe would improve the user experience for one of Dhall’s largest audiences (Ops / CI / CD), since many tools in this domain (such as Kubernetes) are written in Go.

If there is a language binding that you would most like to see the survey includes a question to let you advertise your wish list for language bindings.

As I mentioned last year, I have no plans to implement any new language bindings myself. However, there are always things I can do to improve the likelihood of new language bindings popping up:

  • Error messages

    I’ve noticed that a major barrier for a new implementation is adding quality error messages.

    One solution to this problem may be taking the error messages for the Haskell implementation and upstreaming them into shared templates that all implementations can reuse (and improve upon)

  • Reference implementation

    One idea I’ve floated a few times recently is having a simplified reference implementation in some programming language instead of using natural deduction as the notation for specifying language semantics. This might help ease the life of people who are not as familiar with programming language theory and its notation.

    For example, the current Haskell implementation is not suitable as a reference implementation because it operates under a lot of constraints that are not relevant to the standard (such as customization, formatting, and performance).

  • Simplify the standard

    This year we removed several stale features (such as old-style Optional literals and old-style union literals) in the interest of decreasing the cost for language binding maintainers.

    There is also one feature that in my eyes is “on the chopping block”, which is the language’s using keyword for custom headers. This feature is one of the more complex ones to implement correctly and doesn’t appear to be carrying its own weight. There also may be preferable alternatives to this feature that don’t require language support (such as .netrc files).

Integrations

There are other integrations that are not language bindings, which this section covers.

PureScript package sets

Dhall is now the officially supported way of specifying PureScript package sets:

… and there is a new PureScript build tool named spago that provides the command-line interface to using these package sets:

There are several contributors to both of these repositories, so I can’t acknowledge them all, but I would like to give special mention to Justin Woo and Fabrizio Ferrai for bootstrapping these projects.

This integration is the largest case I’m aware of where Dhall is not being used for its own sake but rather as a required configuration format for another tool.

YAML/JSON

Last year we added support for converting Dhall to JSON/YAML and this year antislava and Robbie McMichael also added support for converting JSON/YAML to Dhall. Specifically, there are two new json-to-dhall and yaml-to-dhall executables that you can use.

This addressed a common point of feedback from users that migrating existing YAML configuration files to Dhall was tedious and error-prone. Now the process can be automated.

This year we also added Prelude support for JSON and YAML. Specifically:

  • There is a new Prelude.JSON.Type that can model arbitrary schema-free JSON or YAML

  • There is a new Prelude.JSON.render utility that can render expressions of the above type as JSON or YAML Text that is guaranteed to be well-formed

Here is an example of how it works:

In other words, there is a now a “pure Dhall” implementation of JSON/YAML support, although it is not as featureful as the dhall-to-{json,yaml} executables.

Special thanks to Philipp Krüger for contributing Prelude.JSON.renderYAML!

On top of that, {yaml,json}-to-dhall and dhall-to-{yaml,json} both natively support the schema-free JSON type from the Prelude, which means that you can now incrementally migrate YAML/JSON configuration files. You can learn more about this from the following chapter in the Dhall Configuration Language Manual:

XML

Thanks to Stephen Weber Dhall now supports XML:

The above package provides dhall-to-xml and xml-to-dhall utilities for converting between Dhall and XML. This package also provides a Ruby API to this functionality as well.

This fills one of the big omissions in supported configuration formats that we had last year, so I’m very thankful for this contribution.

Rails

The same Stephen Weber also contributed Rails support for Dhall:

… so that you can use Dhall as the configuration file format for a Rails app instead of YAML.

C package management

Vanessa McHale built a C package manager named with an emphasis on cross compilation:

The current package set already supports a surprisingly large number of C packages!

I also find this project fascinating because I’ve seen a few people discuss what Nixpkgs (the Nix package repository) might look like if it were redone from the ground up in terms of Dhall. cpkg most closely resembles how I imagined it would be organized.

Language improvements

Last year some survey respondents were interested more in improvements to the language ergonomics rather than specific integrations, so this section covers new language enhancements.

Consuming packages

One thing we improved this year was the experience for people consuming Dhall packages.

Probably the biggest improvement was changing to “stable hashes”, where we stopped using the standard version as an input to semantic hashes. Users complained about each new version of the standard breaking their integrity checks, and now that is a thing of the past. This means that expressions authored for older versions of the language are now far more likely to work for newer language versions when protected by an integrity check.

Oliver Charles also contributed another large improvement by standardizing support for mixed records of types and terms. This means that package authors can now serve both types and terms from the same top-level package.dhall file instead of having to author separate types.dhall and terms.dhall files.

For example, the Prelude now serves both terms and types from a single package:

The above example also illustrates how field names no longer need to be escaped if they conflict with reserved names (like Type). This improves the ergonomics of using the Prelude which had several field names that conflicted with built-in language types and previously had to be escaped with backticks.

Authoring packages

We also improved the experience for users authoring new packages. Dhall now has language support for tests so that package authors do not need to implement testing infrastructure out of band.

You can find several examples of this in the Prelude, such as the tests for the Prelude.Natural.greaterThan utility:

The above example shows how you can not only write unit tests but in limited cases you can also write property tests. For example, the above property0 test verifies that greaterThan n n is False for all possible values of n.

Dependent types

Language support for tests is a subset of a larger change: dependent types. Dhall is now a technically dependently-typed language, meaning that you can take advantage of some basic features of dependent types, such as:

  • Type-level assertions (i.e. the tests we just covered)
  • Type-level literals (such as Natural and Text)

… but you cannot do more sophisticated things like length-indexed Lists.

toMap keyword

This year Mario Blažević added a new toMap keyword for converting Dhall records to homogeneous lists of key-value pairs (a.k.a. Maps):

Dhall users frequently requested this feature for supporting JSON/YAML-based formats. These formats commonly use dictionaries with a variable set of fields, but this led to an impedance mismatch when interoperating with a typed language like Dhall because Dhall records are not homogeneous maps and the type of a Dhall record changes when you add or remove fields.

Normally the idiomatic way to model a homogeneous Map in Dhall would be a List of key-value pairs (since you can add or remove key-value pairs without changing the type of a List), but that’s less ergonomic than using a record. The toMap keyword gives users the best of both worlds: they can use Dhall’s record notation to ergonomically author values that they can convert to homogeneous Maps using toMap.

The :: operator for record completion

Several users complained about the language’s support for records with defaultable fields, so we added a new operator to make this more ergonomic.

This example illustrates how the operator works:

In other words, given a “schema” record (such as Person) containing a record type and a record of default values, you can use that schema to instantiate a record, defaulting all fields that are not specified.

The operator is “syntactic sugar”. When you write:

… that “desugars” to:

Also, dhall format will recognize this operator and format the operator compactly for large nested records authored using this operator.

The easiest way to motivate this change is to compare the dhall-kubernetes simple deployment example before and after using this operator. Before, using the // operator and old formatting rules, the example looked like this:

Now the most recent iteration of the example looks like this:

New built-ins

One change with a high power-to-weight ratio was adding new built-ins. Without listing all of them, the key changes were:

  • Integers are no longer opaque. You can convert back and forth between Integer and Natural and therefore implement arbitrary arithmetic on Integers.

  • Some new built-ins enabled new efficient Prelude utilities that would have been prohibitively slow otherwise

  • Some things that used to require external command-line tools can now be implemented entirely within the language (such as modeling and rendering JSON/YAML in “pure Dhall” as mentioned above)

Note that Text is still opaque, although I predict that is the most likely thing that will change over the next year if we continue to add new built-ins.

Enums

Enums are now much more ergonomic in Dhall, as the following example illustrates:

More generally, union alternatives can now be empty, like this:

… and enums are the special case where all alternatives are empty.

Before this change users would have to use an alternative type of {}, like this:

… which made things more verbose both for authors and consumers of Dhall packages.

Tooling improvements

This section covers improvements to the the tooling in order to provide a more complete development experience.

Language server

Last year I stated that one of our goals was to create a Dhall language server for broader better editor support and I’m happy to announce that we accomplished that goal!

Credit goes to both PanAeon (who authored the initial implementation) and Folkmar Ramcke (who greatly expanded upon the initial implementation as part of a Google Summer of Code project).

You can read the final report at the end of Folkmar’s work here:

… and this GIF gives a sample of what the language server can do:

The language server was tested to work with VSCode but in principle should work with any editor that supports the language server protocol with a small amount of work. I’ve personally tested that the language server works fine with Vim/Neovim.

If you have any issues getting the language server working with your editor of choice just let us know as we plan to polish and document the setup process for a wide variety of editors.

Also, we continue to add new features to language server based on user feedback. If you have cool ideas for how to make the editor experience more amazing please share them with us!

Pre-built executables for all platforms

Several users contributed continuous delivery support so that we could automatically generate pre-built executables for the shared command-line tools, including the dhall command and dhall-lsp-server (the language server).

That means that for each new release you can download pre-built executables for:

  • Windows
  • OS X
  • Linux

… from this page:

Docker support

You can also obtain docker containers for many of the command-line tools for ease of integration with your company’s container-based infrastructure:

Performance

The Haskell implementation (which powers the dhall tool and the language server) has undergone some dramatic performance improvements over the last year.

Most of these performance improvements were in response to the following two pressures on the language:

  • The language server requires a snappy feedback loop for productive editing

  • People are commonly using Dhall on very large program configurations (like dhall-kubernetes)

There is still room for improvement, but it is markedly better for all Dhall configuration files and orders of magnitude faster in many cases compared to a year ago.

Formatting improvements

The standard formatter is probably one of the things I get the most feedback about (mostly criticism 🙂), so I’ve spent some time this year on improving the formatter in the following ways:

  • let-related comments are now preserved

    … and I plan to expand support for preserving more comments

  • Expressions are now much more compact than before

    … such as in the dhall-kubernetes sample code above

Additionally, I recently started a discussion about potentially switching to ASCII as the default for formatting code, which you can follow here:

The outcome of that discussion was to add several new survey questions to assess whether people prefer to read and write Dhall code using ASCII or Unicode. So if you have strong opinions about this then please take the survey!

Dhall packages

Another significant component of the Dhall ecosystem is packages written within the language.

Dhall differentiates itself from other programmable file formats (e.g. Jsonnet) by having hundreds of open source packages built around the language that support for a variety of tools and formats (especially in the Ops / CI / CD domain).

In fact, Dhall’s open source footprint large enough this year that GitHub now recognizes Dhall as a supported file format. This means that files with a .dhall extension now enjoy syntax highlighting on GitHub and show up in language statistics for projects.

Here are some example Dhall bindings to various formats added last year:

I’m highly grateful for every person who improves the ecosystem. In fact, I randomly stalk Dhall packages on GitHub to inform language design by seeing how people use Dhall “in the wild”.

Shared infrastructure

We made two main improvements to shared infrastructure for the Dhall community this year:

Documentation

The Dhall wiki has been moved to docs.dhall-lang.org thanks to work by Tristan de Cacqueray. This means that:

  • The documentation is now generated using Sphinx

  • The documentation is now much easier to contribute to as it is under version control here:

    dhall-lang/docs

So if you would like to improve the documentation you can now open a pull request to do so!

Discourse

We also have a new Discourse forum hosted at discourse.dhall-lang.org that you can use to discuss anything Dhall-related.

We’ve been using the forum so far for announcing projects / releases and also as a sounding board for ideas.

Funding

Last year I solicited ideas for funding improvements to the Dhall ecosystem and this year we followed through on three different funding mechanisms:

Google Summer of Code

The most successful funding source by far was Google’s Summer of Code grant that funded Folkmar Ramcke to develop the language server. I plan to try this again for the upcoming summer and I will also recommend that other Dhall projects and language bindings try this out, too. Besides providing a generous source of funding (thank you, Google 🙇‍♂️) this program is an excellent opportunity to bring in new contributors to the ecosystem.

Open Collective

Another thing we set up this year is an Open Collective for Dhall so that we can accept donations from companies and individuals. We started this only a few months ago and thanks to people’s generosity we’ve accumulated over $500 in donations.

I would like to give special thanks to our first backer:

… and our largest backer:

We plan to use these donations to fund projects that (A) benefit the entire Dhall community and (B) bring in new contributors, so your donations help promote a vibrant and growing developer community around the language.

Our first such project was to implement “pure Dhall” support for rendering YAML:

… and we have another proposal in progress to fund documenting the setup process for the language server for various editors:

If you would like to donate, you can do so here:

Book

I’m also working on the following book:

My plan is to make the book freely available using LeanPub but to give users the option to pay for the book, with all proceeds going to the above Open Collective for Dhall.

One of the strongest pieces of survey feedback we got was that users were willing to pay for Dhall-related merchandise (especially books) and they were highly eager for documentation regarding best practices for the language. This book intends to address both of those points of feedback.

I expect that at the current rate of progress the book will likely be done by the end of this year, but you can already begin reading the chapters that I’ve completed so far:

Future directions

Marketing

One of the things that’s slowly changing about the language is how we market ourselves. People following the language know that we’ve recently revamped the website:

… and changed our “slogan” to “Maintainable configuration files”.

One difference from last year is that we’re no longer trying to replace all uses for YAML. User feedback indicated that some uses of YAML were better served by TOML rather than Dhall. Specifically, small (~10 line) configuration files for simple command-line tools were cases where TOML was a better default choice than Dhall.

On the other hand, we find that users that deal with very large and fragmented program configurations tend to prefer Dhall due the language’s support for features that promote maintainability and reduce total cost of ownership.

I continue to prioritize Ops / CI / CD use cases, but I no longer try to displace YAML for use cases where TOML would be a more appropriate choice.

Completing the Dhall manual

One of my personal goals is to complete the Dhall manual to help people confidently recommend Dhall to others by providing high-quality material to help onboard their coworkers. I expect this will help accelerate language adoption quite a bit.

Polish the language server

The language server is another area of development that I see as highly promising. Although we currently provide common features like type-on-hover and intelligent auto-completion I still think there is a lot of untapped potential here to really “wow” new users by showcasing Dhall’s strengths.

People currently have really low expectations for programmable file formats, so I view the quality of the language server implementation as being a way that Dhall can rapidly differentiate itself from competing programmable file formats. In particular, Dhall is one of the few typed configuration formats and quality editor support is one of the easiest ways to convey the importance of using a typed language.

Packaging for various distributions

One thing that the Dhall ecosystem would benefit from is packaging, not just for Linux distributions but other platforms as well. We made progress this year by adding support for Brew and Docker, but there are still important omissions for other platforms, such as:

  • Windows (e.g. Nuget)
  • Linux (e.g. Debian/Fedora/Arch)

This one of the areas where I have the greatest difficulty making progress because each package repository tends to have a pretty high barrier to entry in my experience. If you are somebody who has experience with any of the above package repositories and could help me get started I would appreciate it!

Package discovery

I mentioned earlier that Dhall is growing quite a large open source package footprint, but these packages are not easy to discover.

One effort to address this is:

… which is working to create a “mono-repo” of Dhall packages to promote discoverability.

In addition to that, the language probably also needs a standard documentation generator. There have been a few nascent efforts along these lines this year, but at some point we need to take this idea “all the way”.

Library of Kubernetes utilities

Last year I mentioned that I would spend some time on a new Ops-related Dhall integration and I quickly gravitated towards improving the existing dhall-kubernetes integration. After familarizing myself with Kubernetes I realized that this is a use case that is served well by Dhall since Kubernetes configurations are highly unmaintainable.

Programmable Kubernetes configuration files are a bit of a crowded field (a cottage industry, really), with a steady stream of new entrants (like Pulumi, Cue, and Tanka). That said, I’m fairly confident that with some attention Dhall can become the best-in-class solution in this space.

Conclusion

I would like thank everybody who contributed last year, and I apologize if I forgot to acknowledge your contribution.

This post is not an exhaustive list of what happened over the last year. If you would like to learn more, the best places to start are:

Please don’t forget to take our yearly survey to provide feedback on the language or to inform the future direction:

In a month I will follow up with another post reviewing the survey feedback.

Thursday, December 12, 2019

Prefer to use fail for IO exceptions

fail

This post briefly explains why I commonly suggest that people replace error with fail when raising IOExceptions.

The main difference between error and fail can be summarized by the following equations:

In other words, any attempt to evaluate an expression that is an error will raise the error. Evaluating an expression that is a fail does not raise the error or trigger any side effects.

Why does this matter? One of the nice properties of Haskell is that Haskell separates effect order from evaluation order. For example, evaluating a print statement is not the same thing as running it:

This insensitivity to evaluation order makes Haskell code easier to maintain. Specifically, this insensitivity frees us from concerning ourselves with evaluation order in the same way garbage collection frees us from concerning ourselves with memory management.

Once we begin using evaluation-sensitive primitives such as error we necessarily need to program with greater caution than before. Now any time we manipulate a subroutine of type IO a we need to take care not to prematurely force the thunk storing that subroutine.

How likely are we to prematurely evaluate a subroutine? Truthfully, not very likely, but fortunately taking the extra precaution to use fail is not only theoretically safer, it is also one character shorter than using error.

Limitations

Note that this advice applies solely to the case of raising IOExceptions within an IO subroutine. fail is not necessarily safer than error in other cases, because fail is a method of the MonadFail typeclass and the typeclass does not guarantee in general that fail is safe.

fail happens to do the correct thing for IO:

… but for other MonadFail instances fail could be a synonym for error and offer no additional protective value.

If you want to future-proof your code and ensure that you never use the wrong MonadFail instance, you can do one of two things:

  • Enable the TypeApplications language extension and write fail @IO string
  • Use Control.Exception.throwIO (userError string) instead of fail

However, even if you choose not to future-proof your code fail is still no worse than error in this regard.

Sunday, June 16, 2019

The CAP theorem for software engineering

The CAP theorem for software engineering

The CAP theorem says that distributed computing systems cannot simultaneously guarantee all three of:

  • Consistency - Every read receives the most recent write or an error

  • Availability - Every request receives a (non-error) response - without the guarantee that it contains the most recent write

  • Partition tolerance - The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Source: CAP theorem - Wikipedia

Since we cannot guarantee all three, we must typically sacrifice at least one of those guarantees (i.e. sacrifice availability or sacrifice partition tolerance).

However, what if we were to squint and apply the CAP theorem to another distributed system: a team of software engineers working towards a common goal.

In particular:

  • What if our data store were a distributed version control system?

  • What if our “nodes” were software developers instead of machines?

If we view engineering through this lens, we can recognize many common software engineering tradeoffs as special cases of CAP theorem tradeoffs. In other words, many architectural patterns also require us sacrifice at least one of consistency, availability, or partition tolerance among developers.

Before we get into examples, I’d like to review two points that come up in most discussions of the CAP theorem:

Partition tolerance

What does it mean to sacrifice partition tolerance? In the context of machines, this would require us to rule out the possibility of any sort of network failure.

Now replace machines with developers. We’d have to assume that people never miscommunicate, lose internet access, or fetch the wrong branch.

The possibility of partitions are what make a system a distributed system, so we’re usually not interested in the option of sacrificing partition tolerance. That would be like a computing system with only one machine or a team with only one developer.

Instead, we’ll typically focus on sacrificing either availability or consistency.

Spectrums of tradeoffs

In most systems you’re always sacrificing all three of consistency, availability, and partition tolerance to some degree if you look closely enough. For example, even a healthy machine is not 100% available if you consider that even the fastest network request still has an irreducible delay of around a few hundred microseconds on today’s machines.

In practice, we ignore these vanishingly small inconsistencies or inavailabilities, but they still illustrate a general pattern: we can think of system health/availability/consistency as spectrums rather than boolean options.

For example, if we say we choose availability over consistency, we really mean that we choose to make our system’s unavailability vanishingly small and that our system could be consistent, but not all the time. Indeed, if our hardware or network were both fast and extremely reliable we could enjoy both high consistency and high availability, but when things fail then we need to prioritize which of consistency or availability that we sacrifice to accommodate that failure.

We can also choose to sacrifice a non-trivial amount of both availability and consistency. Sometimes exclusively prioritizing one or the other is not the right engineering choice!

With those caveats out of the way, let’s view some common software engineering tradeoffs through the lens of the CAP theorem.

Monorepo vs. Polyrepo

In revision control systems, a monorepo (syllabic abbreviation of monolithic repository) is a software development strategy where code for many projects are stored in the same repository

A “polyrepo” is the opposite software development strategy where each project gets a different source repository. In a monorepo, projects depend on each other by their relative paths. In a polyrepo, a project can depend on another project by referencing a specific release/revision/build of the dependency.

The tradeoff between a monorepo and a polyrepo is a tradeoff between consistency and availability. A monorepo prioritizes consistency over availability. Conversely, a polyrepo prioritizes availability over consistency.

To see why, let’s pretend that project A depends on project B and we wish to make a breaking change to project B that requires matching fixes to project A. Let’s also assume that we have some sort of continuous integration that ensures that the master branch of any repository must build and pass tests.

In a polyrepo, we can make the breaking change to the master branch of project B before we are prepared to make the matching fix to project A. The continuous integration that we run for project B’s repository does not check that other “downstream” projects that depend on B will continue to build if they incorporate the change. In this scenario, we’ve deferred the work of integrating the two projects together and left the system in a state where the master branch of project B is not compatible with the master branch of project A. (Note: the master branch of project A may still build and pass tests, but only because it depends on an older version of project B).

In a monorepo, we must bundle the breaking change to project B and the fix to project A in a single logical commit to the master branch of our monorepo. The continuous integration for the monorepo prevents us from leaving the master branch in a state where some of the projects don’t build. Fixing these package incompatibilities up-front will delay merging work into the master branch (i.e. sacrificing availability of our work product) but imposing this restriction ensures that the entire software engineering organization has a unified view of the codebase (i.e. preserving consistency).

To make the analogy precise, let’s revisit the original definitions of consistency, availability, and partition tolerance:

  • Consistency - Every read receives the most recent write or an error

  • Availability - Every request receives a (non-error) response - without the guarantee that it contains the most recent write

  • Partition tolerance - The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

… and change them to reflect the metaphor of a distributed team of developers collaborating via GitHub/GitLab:

  • Consistency - Every git pull receives the latest versions of all dependencies

  • Availability - Every pull request succeeds - without the guarantee that it contains the latest versions of all dependencies

  • Partition tolerance - git operations continue to work even if GitHub/GitLab is unavailable

Trunk based development vs. Long-lived branches

Our previous scenario assumed that each repository was using “trunk-based development”, defined as:

A source-control branching model, where developers collaborate on code in a single branch called “trunk” [and] resist any pressure to create other long-lived development branches by employing documented techniques.

Source: trunkbaseddevelopment.com

The opposite of trunk-based development is “long-lived branches” that are not the master branch (i.e. the “trunk” branch).

Here are some examples of long-lived branches you’ll commonly find in the wild:

  • A develop branch that is used as a the base branch of pull requests. This develop branch is periodically merged into master (typically at release boundaries)

  • Release branches that are suported for months or years (i.e. long-term support releases)

  • Feature branches that people work on for an extended period of time before merging their work into master

The choice between trunk-based development and long-lived branches is a choice between consistency and availability. Trunk-based development prioritizes consistency over availability. Long-lived branches prioritize availability over consistency.

To see why, imagine merging a feature branch over a year old back into the master branch. You’ll likely run into a large number of merge conflicts because up until now you sacrificed consistency by basing your work on an old version of the master branch. However, perhaps you would have slowed down your iteration speed (i.e. sacrificing availability of your local work product) if you had to ensure that each of your commits built against the latest master.

You might notice that trunk-based development vs. long-lived branches closely parallels monorepo vs. polyrepo. Indeed, organizations that prefer monorepos also tend to prefer trunk-based development because they both reflect the same preference for developers sharing a unified view of the codebase. Vice versa, organizations that prefer polyrepo also tend to prefer long-lived branches because both choices emerge from the same preference to prioritize availability of developers’ work product. These are not perfect correlations, though.

Continuous integration vs. Test team

Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.

Source: thoughtworks.com - Continous Integration

So far we’ve been assuming the use of continuous integration to ensure that master stays “green”, but not all organization operate that way. Some don’t use continuous integration and rely on a test team to identify integration issues.

You can probably guess where this is going:

  • Continuous integration prioritizes consistency over availability
  • Use of a test team prioritizes availability over consistency

The more you rely on continuous integration, the more you need to catch (and fix) errors up front since the error-detection process is automated. The more you rely on a test team the more developers tend to defer detection of errors and bugs, leaving the system in a potentially buggy state for features not covered by automated states.

Organizations that use a test team prioritize availability of developers’ work product, but at the expense of possibly deferring consistency between components system-wide. Vice-versa, organizations that rely on continuous integration prioritize consistency of the fully integrated system, albeit sometimes at the expense of the progress of certain components.

Spectrums

Remember that each one of these choices is really a spectrum. For example:

  • Many monorepos are partially polyrepos, too, if you count their third-party dependencies. The only true monorepo is one with no external dependencies

  • The distinction between trunk-based development and long-lived branches is a matter of degree. There isn’t a bright line that separates a short-lived branch from a long-lived one.

  • Many organizations use a mix of continuous integration (to catch low-level issues) and a test team (to catch high-level issues). Also, every organization has an implicit test team: their customers, who will report bugs that even the best automation will miss.

Conclusion

This post is not an exhaustive list of software engineering tradeoffs that mirror the CAP theorem. I’ll wager that as you read this several other examples came to mind. Once you recognize the pattern you will begin to see this tension between consistency and availability everywhere (even outside of software engineering).

Hopefully this post can help provide a consistent language for talking about these choices so that people can frame these discussions in terms of their organization’s core preference for consistency vs availability. For example, maybe in the course of reading this you noticed that your organization prefers availability in some cases but consistency in others. Maybe that’s a mistake you need to correct or maybe it’s an inevitability since we can never truly have 100% availability or 100% consistency.

You might be interested in what happens if you take availability or consistency to their logical conclusion. For example, Kent Beck experiments with an extreme preference for consistency over availability in test && commit || revert. Or to put it more humorously:

On the other hand, if you prioritize availability over consistency at all costs you get … the open source ecosystem.

This is not the first post exploring the relationship between the CAP theorem and software development. For example, Jessica Kerr already explored this idea of treating teams as distributed systems in Tradeoffs in Coordination Among Teams.

Tuesday, May 14, 2019

Release early and often

Release early and often

This post summarizes the virtues of cutting frequent releases for software projects. You might find this post useful if you are trying to convince others to release more frequently (such as your company or an open source project you contribute to).

Easing migration

Frequent releases provide a smoother migration path for end-users of your software.

For example, suppose that your software is currently version “1.0” and you have two breaking changes (“A” and “B”) that you plan to make. Now consider the following two release strategies

  • Release strategy #0 - More frequent releases

    * Version 1.0
        * Initial release
    
    * Version 2.0
        * BREAKING CHANGE: A
    
    * Version 3.0
        * BREAKING CHANGE: B
  • Release strategy #1 - Less frequent releases

    * Version 1.0
        * Initial release
    
    * Version 2.0
        * BREAKING CHANGE: A
        * BREAKING CHANGE: B

The first release strategy is better from the end-user’s point of view because they have the option to upgrade in two smaller steps. In other words, they can upgrade from version 1.0 to version 2.0 and then upgrade from version 2.0 to version 3.0 at a later date.

Under both release strategies users can elect to skip straight to the latest release if they are willing to pay down the full upgrade cost up front, but releasing more frequently provides users the option to pay down the upgrade cost in smaller installments. To make an analogy: walking up a staircase is easier than scaling a sheer cliff of the same height.

In particular, you want to avoid the catastrophic scenario where a large number of users refuse to upgrade if one release bundles too many breaking changes. The textbook example of this is the Python 2 to 3 upgrade where a large fraction of the community refused to upgrade because too many breaking changes were bundled into a single release instead of spread out over several releases.

Keeping trains running on time

You don’t need to delay a release to wait for a particular feature if you release frequently. Just postpone the change for the next release if it’s not ready. After all, if you release frequently then the next release is right around the corner.

Conversely, if you release infrequently, you will frequently run into the following vicious cycle:

  • Important feature X is close to completion but perhaps not quite ready to merge

    Perhaps the feature has insufficient tests or there are unresolved concerns during code review

  • A new release is about to be cut

    Should you wait to merge feature X? It might be a long time (3 months?) before the next release, even though X could be ready with just 1 more week of work.

  • You choose to delay the release to wait for important feature X

  • Now another important feature Y requests to also slip in before the release

    … further delaying the release

  • 3 months have passed and you still haven’t cut the release

    New features keep (justifiably) slipping in out of fear that they will have to otherwise wait for the next release

Eventually you do cut a release, but each iteration of this process decreases the release frequency and compounds the problem. The less frequently you release software the more incentive to slip in last-minute changes before the release cutoff, further delaying the release. Even worse, the longer you wait to cut each release the greater the pressure to compromise on quality to get the release out the door.

Sticking to a strict and frequent release schedule staves off this vicious cycle because then you can always safely postpone incomplete features to the next release.

Avoiding “crunch time”

Infrequent “big bang” releases create pressure for developers to work excessive hours in the lead up to a release cutoff. This can happen even when developers are unpaid, such as on open source projects: the peer pressure of holding up the release for others can induce people to work unhealthy schedules they wouldn’t work otherwise.

I won’t claim that frequent release schedules will prevent paid developers from working late nights and weekends, but at least management can’t hide behind a looming release deadline to justify the overtime.

Accelerating the feedback loop

Releases are opportunities to correct course because you don’t know how users will react to a feature until you put the feature into their hands. If you implement a feature and the next release is 3 month away, that’s 3 months where you don’t know if the feature is what the user actually needs.

Even worse: suppose that the first implementation of the feature does not do what the user wants: now you have to wait another 3 months to get the next iteration of the feature into their hands. That slow feedback loop is a recipe for a poorly-designed product.

Incentivizing automation

Fast release cycles force you to automate and accelerate release-related processes that you would otherwise do manually (i.e. continuous integration), including:

  • Testing
  • Publication of software artifacts
  • Collecting quality and health metrics

That automation in turn means that you spend more time in the long run developing features and less time delivering them to end users.

Conclusion

Releasing more frequently isn’t free: as the previous section suggests, you need to invest in automation to be able to make frequent releases a reality.

However, I do hope that people reading this post will recognize when symptoms of infrequent releases creep up on them so that they can get ahead of them and make the case to others to invest in improving release frequency.