Wednesday, January 16, 2019

Dhall - Year in review (2018-2019)

dhall-2018

The Dhall configuration language is now two years old and this post will review progress in 2018 and the future direction of the language in 2019.

If you’re not familiar with Dhall, you might want to visit the official website for the language, which is the recommended starting point. This post assumes familiarity with the language.

Also, I want to use this post to advertise a short survey that you can take if you are interested in the language and would like to provide feedback:

Progress in 2018

This section will review the highlights of what we accomplished over the last year. These highlights are not exhaustive and I focus on improvements that might encourage people to revisit the language if they were on the fence a year ago.

If you’re already familiar with recent progress in the language and you are more interested in where the language is going then you can jump to the Future direction section.

New language bindings

Several contributors stepped up to the plate to begin three new actively-maintained language bindings to Dhall.

Of these three the Clojure bindings are the ones closest to completion:

The Clojure bindings are sufficiently close to completion that they currently get an official vote on proposed changes to the language standard, giving them an equal voice in the language evolution process.

This is a complete reimplementation of the language entirely in Clojure that allows you to marshal Dhall expressions, including Dhall functions, directly into Clojure:

The Clojure bindings pave the way for making the Dhall configuration language a first class citizen on the JVM.

Additionally, two other language bindings have gotten pretty far along:

These latter two language bindings haven’t announced yet as they are works in progress, but I still wanted to recognize their work so far.

Also, I want to mention that adding a conformance test suite (thanks to Fabrizio Ferrai) helped drive parallel implementations by providing implementors with a tangible measure of progress towards the goal of 100% standard coverage.

Haskell - Cabal

Thanks to the work of Oliver Charles you can generate .cabal files with Dhall by using dhall-to-cabal.

This part of the project’s README sold me on the motivation for doing so:

We can go beyond Cabal files. If Cabal is a domain specific language for building Haskell projects, what does a domain specific language for building Haskell web applications look like? Does the separate of library, executable, and test-suite make sense here? Maybe we’d rather:

… and have this take care of some other details.

When you think about this it makes perfect sense: Haskell programmers use Cabal/Hackage to package and distribute Haskell code, but then what do they use to package and distribute “Cabal code”? The answer is a language like Dhall that builds in its own code distribution mechanism instead of relying on a separate build tool. This closes the loop so that you don’t need to maintain a growing tower of build tools as your project expands.

I’m pretty sure Cabal was the first “heavy duty” configuration format tested with Dhall because this project prompted the first swell of feature requests related to interpreter performance and usability improvements for working with giant schemas.

Also, the dhall-to-cabal project includes the entire Cabal schema encoded as a Dhall type, which you can find here:

This comes in handy when you want a systematic listing of all Cabal configuration features. If you have the dhall interpreter installed you can also view the normal form of the schema in all its glory by running:

You can also migrate an existing project using cabal-to-dhall, a tool which converts a .cabal file to the equivalent .dhall file.

Eta

Javier Neira with the support of TypeLead added Dhall as a supported file format for configuring Eta packages by building on top of the dhall-to-cabal project.

That project has also produced work-in-progress Eta and Java bindings to Dhall bindings along the way by using Eta to compile the Haskell implementation of Dhall to the JVM. When those are complete you will have yet another option for using the Dhall configuration language on the JVM

If you are interested, you can follow the progress on those bindings via this GitHub issue:

Kubernetes

Dhall is commonly used for ops, and the first project to systematically integrate Dhall into a widely used ops tool is the dhall-kubernetes project, thanks to the work of Arian van Putten, Fabrizio Ferrai, and Thomas Scholtes.

I’ve never used Kubernetes, but everybody tells me that Kubernetes configurations are large, repetitive, and error-prone YAML files, which are the perfect use case for Dhall.

PureScript - Spago

The Spago project builds on top of psc-packages to assemble a PureScript package set to build using Dhall as the configuration format.

This tool takes advantage of Dhall’s import system so that the package set can be split over multiple files, which you can see in the top-level package set here:

… and users can easily import that and easily override packages locally without dealing with the headache of rebasing their local changes whenever the upstream package set changes.

Complete language standard

Last year I promised to upstream all features from the Haskell implementation into the language standard, since at the time a few import-related features were implementation-defined. This was a top priority because I didn’t want to treat other language bindings as second-class citizens.

This year we successfully standardized everything, meaning that there should no longer be any implementation-defined features. Additionally, all new functionality now begins with a change to the standard followed by a change to each implementation of the language, meaning that the Haskell implementation is no longer treated as a distinguished implementation.

Unsigned Natural literals

Earlier this year Greg Pfeil proposed to fix the obligatory + sign preceding Natural number literals, which bothered a lot of newcomers to the language. He proposed require a leading + for non-negative Integers instead of Natural numbers.

We knew this would be a highly breaking change, but we were all tired of the + signs which littered our code. The Natural type is much more natural to use (pun intended) than the Integer type, so why not optimize the syntax for Natural numbers?

So we made the change and now instead of writing an expression like:

… you instead write:

This change also improved code comprehension, because before this change an expression like this:

… could be misconstrued as adding f to various numbers, but after this change:

… the reader can more easily discern that f is being applied as a function to numeric arguments.

Type synonyms

Previously, users couldn’t create new types using let expressions and had to work around this limitation by using the import system to reuse types, like this:

Now users can define new types inline within the same file using an ordinary let expression:

Simpler Optional literals

Optional literals used to resemble lists, like this:

Now you can use Some and None instead, like this:

In particular, a present Optional literal no longer requires a type since Some can infer the type from the provided argument. This simplifies the common idiom of overriding an absent Optional value within a record of defaults:

The old list-like syntax is still supported but is deprecated and will be dropped. dhall lint will also automatically replace the old list-like Optional literals with their new Some/None equivalents.

Union constructors

Unions used to be one of the major pain points when using the language, due to having to specify all alternatives for a union literal, like this:

My first attempt to improve this introduced the constructors keyword which took a union type as an argument and returned a record of functions to create each constructor. This changed the above code example to:

However, this initial solution introduced two new problem:

  • A lot of constructors-related boilerplate at the beginning of Dhall files
  • Performance issues due to these large intermediate records of constructors

A follow-up change resolved both issues by overloading the . operator to also access constructor functions directly from a union type (as if it were already a record), like this:

This solved both the performance issue (by eliminating the need for an intermediate record of constructors) and eliminated the constructors keyword boilerplate. Also, this simplification plays nicely with the auto-complete support provided by the dhall repl since you can now auto-complete constructors using the . operator.

Import caching

In 2017 Dhall added support for semantic integrity checks, where you tag an import with a SHA256 hash of a standard binary encoding of an expression’s normal form. This integrity check protects against tampering by rejecting any expression with a different normal form, guaranteeing that the import would never change.

Several astute users pointed out that you could locally cache any import protected by such a check indefinitely. Even better, the SHA256 hash makes for a natural lookup key within that cache.

We standardized and implemented exactly that idea and now any import protected by an integrity check is permanently cached locally using the standard directory prescribed by the XDG Base Directory Specification.

For example, you can now import the entire Prelude as a package protected by an integrity check:

The first time you resolve the Prelude the import may take a bit (~7 seconds on my machine) to fetch the entire package, but the normal form is then stored locally in a 5 KB file:

… and then subsequent attempts to import the same Prelude resolve much more quickly (~80 ms on my machine).

This means that you can now cheaply import the entire Prelude in every file instead of separately importing each function that you use.

Alternative imports

The language now provides support for fallback imports using the ? operator if import resolution fails.

For example, you can use this feature to import an environment variable if present but gracefully fallback to another value if absent:

Or you can use the ? operator to provide alternative locations for obtaining an imported expression:

Multi-let expressions

You can now define multiple values within a single let expression instead of nesting let expressions. In other words, instead of this:

… you can now write this:

dhall lint will also automatically simplify any code using the old nested let style to use the new “multi-let” style.

Statically linked executables

The Haskell implementation of Dhall strives to be like the “Bash of typed functional programming”, but in order to do so the implementation needs to small, statically linked, and portable so that sysadmins don’t object to widely installing Dhall. In fact, if the executable satisfies those criteria then you don’t even need your sysadmin’s permission to try Dhall out within your own workspace.

Niklas Hambüchen made this possible through this through his general-purpose work on fully static Haskell executables built using Nix. Now Dhall’s continuous integration system produces small (< 3 MB) Linux executables that have no dependency footprint whatsoever.

Major performance improvements

The Haskell implementation of Dhall has made dramatic strides in performance improvements over the last year, motivated by projects with very large schemas, such as:

… as well as Formation’s internal use of Dhall which has led to them upstreaming many performance improvements to handle large Dhall programs.

Thanks to the work of Fintan Halpenny, Greg Pfeil, @quasicomputational, and others the Haskell implementation is between 1 to 3 orders of magnitude faster than it was a year ago, depending on the configuration file that you benchmark.

We’re also not done improving performance! We continue to improve as new projects continue to stretch the boundaries of what the language can do.

Type diffs

Large projects like these also led to usability improvements when working with gigantic types. The Haskell implementation now displays concise “type diffs” whenever you get a type mismatch so that you can quickly narrow down the problem no matter how much your configuration schema grows. This works no matter how deeply nested the error is.

For example, the following contrived example introduces four deeply nested errors in a gigantic schema (where the type is over 6000 lines long) and the error message still zeroes in on every error:

dhall repl

The Haskell implementation also added a REPL contributed by Oliver Charles that you can use to interactively interpret Dhall code, including sophisticated auto-completion support contributed by Basile Henry:

The REPL comes in handy when exploring large values or types, as illustrated by the dhall-nethack tutorial which uses the REPL:

dhall lint

The Haskell implementation also provides a useful dhall lint subcommand that you can use to not only format code but to also automatically improve the code in non-controversial ways.

For example, dhall lint will automatically remove unused let bindings and will simplify nested let expressions to instead take advantage of the newest multi-let feature.

dhall resolve --dot

Basile Henry also contributed support for visualizing the dependency tree of a Dhall expression like this:

The following tweet illustrates how to use this feature along with example output:

dhall freeze

Thanks to Tobias Pflug you can also automatically take advantage of Dhall’s semantic integrity checks using the dhall freeze subcommand. This command fetches all imports within a Dhall expression and then automatically tags all of them with semantic integrity checks.

For example:

dhall-lang.org

A while ago, Neuman Vong advised me that if you want your open source project to take off, you need a logo, a website, and a live demo in the browser.

So I took that advice to heart and now Dhall has all three! You can try out the language live in your browser by visiting:

This allows people to “try before they buy” and the site links to several other useful resources, such as the …

Dhall wiki

The Dhall wiki contains several useful educational resources for learning the language. The organization of the wiki closely follows the guidelines from this handy post on writing documentation:

The main thing that is missing is to migrate the Haskell tutorial into a language-agnostic tutorial.

Twitter account

You can also now follow the official Twitter account for the language:

This account regularly posts news and tips about the language and ecosystem that you can use to stay abreast of recent progress.

Switch from IPFS to GitHub

Early on in the language history we used IPFS to distribute the Dhall Prelude, but due to reliability issues we’ve switched to using GitHub for hosting Dhall code.

There’s even a convenient link you can use to browse the Prelude:

Future direction

Phew! That was a lot to recap and I’m grateful to all the contributors who made that possible. Now we can review where the language is going.

First, I’m no longer benevolent dictator-for-life of the language. Each new reimplementation of the language gets a vote on the language standard and now that the Clojure implementation of Dhall is essentially complete they get an equal say on the evolution of the language. Similarly, once the PureScript bindings and Python bindings are close to complete they will also get a vote on the language standard, too.

However, I can still use this post to outline my opinion of where the language should go.

Crossing the Chasm

A colleague introduced me to the book Crossing the Chasm, which heavily influenced my approach to designing and marketing the language. The book was originally written for startups trying to gain mainstream adoption, but the book also strongly resonated with my experience doing open source evangelism (first for Haskell, and now Dhall).

The book explains that you need to first build a best-in-class solution for a narrowly-defined market. This in turn requires that you think carefully about what market you are trying to address and strategically allocate your limited resources to address that market.

So what “market” should Dhall try to address?

YAML

One of the clearest signals I’ve gotten from users is that Dhall is “the YAML killer”, for the following reasons:

  • Dhall solves many of the problems that pervade enterprise YAML configuration, including excessive repetition and templating errors

  • Dhall still provides many of the good parts of YAML, such as multi-line strings and comments, except with a sane standard

  • Dhall can be converted to YAML using a tiny statically linked executable, which provides a smooth migration path for “brownfield” deployments

Does that mean that Dhall is clearly the best-in-class solution for people currently using YAML?

Not quite. The key thing Dhall is missing for feature parity with YAML is a wide array of native language bindings for interpreting Dhall configuration files. Many people would prefer to use Dhall without having to invoke an external executable to convert their Dhall configuration file to YAML.

This is one of the reasons I’ve slowed down the rate of evolution of the standard so that many of the new language bindings have an opportunity to implement the full standard. Also, I expect that once more language bindings have votes on the standard evolution that will further stabilize the language since new features proposals will have a higher bar to clear.

That’s not to say that we will freeze the language, but instead we will focus on strategically spending our “complexity budget” on features that help displace YAML. If we spend our complexity budget on unrelated features then we will increase the difficulty of porting Dhall to new languages without addressing the initial use case that will help Dhall gain mainstream traction.

JSON integration

One of YAML’s features is that all JSON is also valid YAML, by definition. In fact, some people use YAML just for the fact that it supports both JSON and comments.

This suggests that Dhall, like YAML, should also natively support JSON in some way. Dhall’s issue tracker contains a few issues along these lines and the one I would most like to see completed this year is adding support for importing JSON files as Dhall expressions:

Editor support

Another thing Dhall is missing compared to YAML is widespread editor support. This is why another one of my goals for this year is to create a Dhall language server so that any editor that supports the language server protocol (basically all of them) would get Dhall support for free.

Ops

We can actually narrow down Dhall’s “market” further if we really want to be selective about what we work on. Dhall has also grown in popularity for simplifying ops-related configurations, providing several features that ops engineers care about:

  • Strong normalization

    Ops commonly suffers from the dilemma that too much repetition is error prone, but too much abstraction is also error prone if readers of the code can’t effectively audit what is going on. One of Dhall’s unique features is that all code is strongly normalizing, meaning that every expression can be reduced to an abstraction-free normal form. This is made possible by the fact that Dhall is not Turing-complete (another feature favored by Ops).

  • Absolute type safety

    Ops engineers care about reliability since they maintain the software that the rest of their company relies on and any outages can have devastating effects on both the product and the productivity of other engineers.

    This is one of the reason for the Ops flight from Turing-complete languages to inert configuration files like JSON/YAML because Turing-complete languages give you the tools to shoot yourself in the foot. However, Dhall strikes a balance between being programmable while still not being Turing-complete and having a type system with no escape hatches, so you’re incapable of shooting yourself in the foot.

  • Built-in support for importing code

    Another reason that Ops people hate programmable configuration files is that the programming language they pick typically comes with an external build tool for the language that adds one more layer to the tower of build tools that they have to maintain. Now they’ve just replaced one problem (a repetitive configuration file for their infrastructure) which a new problem (a repetitive configuration file for the build tool for the programming language they used to reduce the original repetition).

    Dhall solves this problem well by providing built-in language support for importing other code (similar to Bash and Nix, both also heavily used for Ops use cases). This means that Dhall provides a solid foundaton for their tower of automation because they don’t need to introduce another tool to support a growing Dhall codebase.

  • Dhall displaces YAML well

    YAML configuration files are incredibly common in Ops and “infrastructure as code”. Example tools that use a YAML configuration are:

    • Kubernetes
    • Docker Compose
    • Concourse
    • Ansible
    • Travis

    YAML is so common that Ops engineers sometimes half-jokingly refer to themselves as “YAML engineers”.

    As already mentioned above, Dhall provides a sane alternative to YAML.

We’ve already seen one Dhall integration for an Ops tool emerge last year with the dhall-kubernetes project and this year I hope we continue along those lines and add at least one more Ops-related integration.

I think the next promising integration is the dhall-terraform project which is still a work in progress that would benefit from contributions.

Funding

Finally, I would like to experiment with various ways to fund open source work on Dhall now that the language has a growing userbase. In particular, I’d like to fund:

  • additional language bindings
  • better editor support
  • adding CI support for statically linked Windows and OS X binaries
  • packaging Dhall for various software distributions (i.e. .rpm/.deb)

… and I’d like to provide some way to reward the work of people who contribute beyond just acknowledging their work in posts like this one.

That’s why one of the survey questions for this year asks for suggestions on what would be the most appropriate (non-proprietary) funding model for that sort of work.

Conclusion

Hopefully that gives people a sense of where I think the language is going. If you have any thoughts on the direction of the language this would be a good time to take the survey:

Like last year, I will follow-up a month from now with another post reviewing the feedback from the survey.