Monday, December 5, 2016

Dhall - A non-Turing-complete configuration language

I'm releasing a new configuration language named Dhall with Haskell bindings. Even if you don't use Haskell you might still find this language interesting.

This language started out as an experiment to answer common objections to programmable configuration files. Almost all of these objections are, at their root, criticisms of Turing-completeness.

For example, people commonly object that configuration files should be easy to read, but they descend into unreadable spaghetti if you make them programmable. However, Dhall doesn't have this problem because Dhall is a strongly normalizing language, which means that we can reduce every expression to a standard normal form in a finite amount of time by just evaluating everything.

For example, consider this deliberately obfuscated configuration file:

$ cat config
    let zombieNames =
            [ "Rachel", "Gary", "Liz" ] : List Text

in  let isAZombie =
            \(name : Text) -> { name = name, occupation = "Zombie" }

in  let map =
            https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/map

in  let tag =
            map Text { name : Text, occupation : Text }

in  let zombies =
            tag isAZombie zombieNames

in  let policeNames =
            [ "Leon", "Claire" ] : List Text

in  let worksForPolice =
            \(name : Text) -> { name = name, occupation = "Police officer" }

in  let policeOfficers =
            tag worksForPolice policeNames

in  let concat =
            https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/concat

in  let characters =
            concat
            { name : Text, occupation : Text }
            (   [   zombies
                ,   policeOfficers
                ]   : List (List { name : Text, occupation : Text })
            )

in  {   protagonist =
            List/head { name : Text, occupation : Text } policeOfficers
    ,   numberOfCharacters =
            List/length { name : Text, occupation : Text } characters
    }

We can use the dhall compiler to cut through the indirection and reduce the above configuration file to the following fully evaluated normal form:

$ stack install dhall
$ dhall < config
{ numberOfCharacters : Natural, protagonist : Optional { name : Text, occupation : Text } }

{ numberOfCharacters = +5, protagonist = [{ name = "Leon", occupation = "Police officer" }] : Optional { name : Text, occupation : Text } }

The first line is the inferred type of the file, which we can format as:

{ numberOfCharacters : Natural
, protagonist        : Optional { name : Text, occupation : Text }
}

This says that our configuration file is a record with two fields:

  • a field named numberOfCharacters that stores a Natural number (i.e. a non-negative number)
  • a field named protagonist that stores an Optional record with a name and occupation

From this type alone, we know that no matter how complex our configuration file gets the program will always evaluate to a simple record. This type places an upper bound on the complexity of the program's normal form.

The second line is the actual normal form of our configuration file:

{ numberOfCharacters =
    +5
, protagonist =
    [ { name = "Leon", occupation = "Police officer" }
    ] : Optional { name : Text, occupation : Text }
}

In other words, our compiler cut through all the noise and gave us an abstraction-free representation of our configuration.

Total programming

You can also evaluate configuration files written in other languages, too, but Dhall differentiates itself from other languages by offering several stronger guarantees about evaluation:

  • Dhall is not Turing complete because evaluation always halts

    You can never write a configuration file that accidentally hangs or loops indefinitely when evaluated

    Note that you can still write a configuration file that takes longer than the age of the universe to compute, but you are much less likely to do so by accident

  • Dhall is safe, meaning that functions must be defined for all inputs and can never crash, panic, or throw exceptions

  • Dhall is sandboxed, meaning that the only permitted side effect is retrieving other Dhall expressions by their filesystem path or URL

    There are examples of this in the above program where Dhall retrieves two functions from the Prelude by their URL

  • Dhall's type system has no escape hatches

    This means that we can make hard guarantees about an expression purely from the expression's type

  • Dhall can normalize functions

    For example, Dhall's Prelude provides a replicate function which builds a list by creating N copies of an element. Check out how we can normalize this replicate function before the function is even saturated:

    $ dhall
    let replicate = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/replicate
    in  replicate +10
    <Ctrl-D>
    ∀(a : Type) → ∀(x : a) → List a
    
    λ(a : Type) → λ(x : a) → [x, x, x, x, x, x, x, x, x, x] : List a

    The compiler knows that no matter what element we provide for the final argument to replicate the result must be 10 copies of that element in a list

Types

Dhall is also a typed language, so every configuration file can be checked ahead of time against an expected schema. The schema can even live in a separate file, like this:

$ cat schema
{ numberOfCharacters : Natural
, protagonist        : Optional { name : Text, occupation : Text }
}

... and then checking our configuration against a schema is as simple as giving the configuration file a type annotation:

$ dhall
./config : ./schema
<Ctrl-D>
{ numberOfCharacters : Natural, protagonist : Optional { name : Text, occupation : Text } }

{ numberOfCharacters = +5, protagonist = [{ name = "Leon", occupation = "Police officer" }] : Optional { name : Text, occupation : Text } }

If the compiler doesn't complain then that means that the configuration file type checks against our schema.

Haskell bindings

Dhall configuration files can be marshalled into Haskell data types. For example, the following Haskell program:

{-# LANGUAGE DeriveAnyClass    #-}
{-# LANGUAGE DeriveGeneric     #-}
{-# LANGUAGE OverloadedStrings #-}

import Dhall

data Summary = Summary
    { numberOfCharacters :: Natural
    , protagonist :: Maybe Person
    } deriving (Generic, Interpret, Show)

data Person = Person
    { name :: Text
    , occupation :: Text
    } deriving (Generic, Interpret, Show)

main :: IO ()
main = do
    x <- input auto "./config"
    print (x :: Summary)

... will marshal our config file into Haskell and print the corresponding Haskell representation of our configuration file:

$ stack runghc example.hs
Summary {numberOfCharacters = 5, protagonist = Just (Person {name = "Leon", occupation = "Police officer"})}

The Haskell program automatically checks that the configuration file's schema automatically matches the data structures that we marshal into. The entire pipeline is type safe from end to end.

Imports

Dhall expressions can reference other expression, either by their filesystem paths or URLs. Anything can be imported, such as fields of records:

$ cat record
{ foo = 1.0
, bar = ./bar
}

$ cat bar
[1, 2, 3] : List Integer

$ dhall < record
{ bar = [1, 2, 3] : List Integer, foo = 1.0 }

... or types:

$ cat function
\(f : ./type ) -> f False

$ cat type
Bool -> Integer

$ dhall < function
∀(f : Bool → Integer) → Integer

λ(f : Bool → Integer) → f False

... or functions:

$ cat switch
\(b : Bool) -> if b then 2 else 3

$ dhall <<< "./function ./switch"
Integer

3

You can also import URLs, too. The Dhall Prelude is hosted using IPFS (a distributed and immutable filesystem), and you can browse the Prelude here:

https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/

Anything from the Prelude can be used by just pasting the URL into your program:

$ dhall
https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/Natural/sum
([+2, +3, +5] : List Natural)
<Ctrl-D>
Natural

+10

... although usually you want to assign the URL to a shorter name for readability:

let sum = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/Natural/sum 
in  sum ([+2, +3, +5] : List Natural)

You're not limited to IPFS for hosting Dhall expressions. Any pastebin, web server, or Github repository that can serve raw UTF8 text can host a Dhall expression for others to use.

Error messages

Dhall outputs helpful error messages when things go wrong. For example, suppose that we change our type file to something that's not a type:

$ echo "1" > type
$ dhall <<< "./function ./switch"
Use "dhall --explain" for detailed errors

↳ ./function 

f : 1

Error: Not a function

f False

function:1:19

By default Dhall gives a concise summary of what broke. The error message begins with a trail of breadcrumbs pointing to which file in your import graph is broken:

↳ ./function 

In this case, the error is located in the ./function file that we imported.

Then the next part of the error message is a context that prints the types of all values that are in scope:

f : 1

... which says that only value named f is in scope and f has type 1 (Uh oh!)

The next part is a brief summary of what went wrong:

Error: Not a function

... which says that we are using something that's not a function

The compiler then prints the code fragment so we can see at a glance what is wrong with our code before we even open the file:

f False

The above fragment is wrong because f is not a function, but we tried to apply f to an argument.

Finally, the compiler prints out the file, column, and line number so that we can jump to the broken code fragment and fix the problem:

function:1:19

This says that the problem is located in the file named function at row 1 and column 19.

Detailed error messages

But wait, there's more! You might have noticed this line at the beginning of the error message:

Use "dhall --explain" for detailed errors

Let's add the --explain flag to see what happens:

$ dhall --explain <<< "./function ./switch"

↳ ./function 

f : 1

Error: Not a function

Explanation: Expressions separated by whitespace denote function application,
like this:


    ┌─────┐
    │ f x │  This denotes the function ❰f❱ applied to an argument named ❰x❱ 
    └─────┘


A function is a term that has type ❰a → b❱ for some ❰a❱ or ❰b❱.  For example,
the following expressions are all functions because they have a function type:


                        The function's input type is ❰Bool❱
                        ⇩
    ┌───────────────────────────────┐
    │ λ(x : Bool) → x : Bool → Bool │  User-defined anonymous function
    └───────────────────────────────┘
                               ⇧
                               The function's output type is ❰Bool❱


                     The function's input type is ❰Natural❱
                     ⇩
    ┌───────────────────────────────┐
    │ Natural/even : Natural → Bool │  Built-in function
    └───────────────────────────────┘
                               ⇧
                               The function's output type is ❰Bool❱


                        The function's input kind is ❰Type❱
                        ⇩
    ┌───────────────────────────────┐
    │ λ(a : Type) → a : Type → Type │  Type-level functions are still functions
    └───────────────────────────────┘
                               ⇧
                               The function's output kind is ❰Type❱


             The function's input kind is ❰Type❱
             ⇩
    ┌────────────────────┐
    │ List : Type → Type │  Built-in type-level function
    └────────────────────┘
                    ⇧
                    The function's output kind is ❰Type❱


                        Function's input has kind ❰Type❱
                        ⇩
    ┌─────────────────────────────────────────────────┐
    │ List/head : ∀(a : Type) → (List a → Optional a) │  A function can return
    └─────────────────────────────────────────────────┘  another function
                                ⇧
                                Function's output has type ❰List a → Optional a❱


                       The function's input type is ❰List Text❱
                       ⇩
    ┌────────────────────────────────────────────┐
    │ List/head Text : List Text → Optional Text │  A function applied to an
    └────────────────────────────────────────────┘  argument can be a function
                                   ⇧
                                   The function's output type is ❰Optional Text❱


An expression is not a function if the expression's type is not of the form
❰a → b❱.  For example, these are not functions:


    ┌─────────────┐
    │ 1 : Integer │  ❰1❱ is not a function because ❰Integer❱ is not the type of
    └─────────────┘  a function


    ┌────────────────────────┐
    │ Natural/even +2 : Bool │  ❰Natural/even +2❱ is not a function because
    └────────────────────────┘  ❰Bool❱ is not the type of a function


    ┌──────────────────┐
    │ List Text : Type │  ❰List Text❱ is not a function because ❰Type❱ is not
    └──────────────────┘  the type of a function


You tried to use the following expression as a function:

↳ f

... but this expression's type is:

↳ 1

... which is not a function type

────────────────────────────────────────────────────────────────────────────────

f False

function:1:19

We get a brief language tutorial explaining the error message in excruciating detail. These mini-tutorials target beginners who are still learning the language and want to better understand what error messages mean.

Every type error has a detailed explanation like this and these error messages add up to ~2000 lines of text, which is ~25% of the compiler's code base.

Tutorial

The compiler also comes with an extended tutorial, which you can find here:

This tutorial is also ~2000 lines long or ~25% of the code base. That means that half the project is just the tutorial and error messages and that's not even including comments.

Design goals

Programming languages are all about design tradeoffs and the Dhall language uses the following guiding principles (in order of descending priority) that help navigate those tradeoffs:

  • Polish

    The language should delight users. Error messages should be fantastic, execution should be snappy, documentation should be excellent, and everything should "just work".

  • Simplicity

    When in doubt, cut it out. Every configuration language needs bindings to multiple programming languages, and the more complex the configuration language the more difficult to create new bindings. Let the host language that you bind to compensate for any missing features from Dhall.

  • Beginner-friendliness

    Dhall needs to be a language that anybody can learn in a day and debug with little to no assistance from others. Otherwise people can't recommend Dhall to their team with confidence.

  • Robustness

    A configuration language needs to be rock solid. The last thing a person wants to debug is their configuration file. The language should never hang or crash. Ever.

  • Consistency

    There should only be one way to do something. Users should be able to instantly discern whether or not something is possible within the Dhall language or not.

The dhall configuration language is also designed to negate many of the common objections to programmable configuration files, such as:

"Config files shouldn't be Turing complete"

Dhall is not Turing-complete. Evaluation always terminates, no exceptions

"Configuration languages become unreadable due to abstraction and indirection"

Every Dhall configuration file can be reduced to a normal form which eliminates all abstraction and indirection

"Users will go crazy with syntax and user-defined constructs"

Dhall is a very minimal programming language. For example: you cannot even compare strings for equality (yes, really). The language also forbids many other common operations in order to force users to keep things simple.

Conclusion

You should read the tutorial if you would like to learn more about the language or use Dhall to configure your own projects:

You can also contribute, file issues, or ask questions by visiting the project repository on Github:

And the Haskell library is hosted on Hackage here:

If you would like to contribute, you can try porting Dhall to bind to languages other than Haskell, so that Dhall configuration files can be used across multiple languages. I keep the compiler simple (less than ~4000 lines of code if you don't count error messages) so that people can port the language more easily.

Also, for people who are wondering, the language is named after a Dustman from the game Planescape: Torment who belongs to a faction obsessed with death (termination).

4 comments:

  1. So how do I use it in Scala :-) Are you going to release a Scala binding for us Scala folks

    ReplyDelete
    Replies
    1. Definitely! I'm just waiting a bit for the language to stabilize after the initial wave of feature requests before porting to other languages.

      Delete
  2. A tiny correction:

    Dhall is not Turing complete, so evaluation always halts

    is, of course, not literally true, and is probably the converse of what you meant; totality is much stronger than non-Turing completeness (as evidenced by the OISC whose one instruction is `loop`).

    ReplyDelete
    Replies
    1. Fixed by changing it to "Dhall is not Turing complete *because* evaluation always halts", so that the implication is in the correct direction

      Delete