Monday, December 5, 2016

Dhall - A non-Turing-complete configuration language

I'm releasing a new configuration language named Dhall with Haskell bindings. Even if you don't use Haskell you might still find this language interesting.

This language started out as an experiment to answer common objections to programmable configuration files. Almost all of these objections are, at their root, criticisms of Turing-completeness.

For example, people commonly object that configuration files should be easy to read, but they descend into unreadable spaghetti if you make them programmable. However, Dhall doesn't have this problem because Dhall is a strongly normalizing language, which means that we can reduce every expression to a standard normal form in a finite amount of time by just evaluating everything.

For example, consider this deliberately obfuscated configuration file:

$ cat config
    let zombieNames =
            [ "Rachel", "Gary", "Liz" ] : List Text

in  let isAZombie =
            \(name : Text) -> { name = name, occupation = "Zombie" }

in  let map =
            https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/map

in  let tag =
            map Text { name : Text, occupation : Text }

in  let zombies =
            tag isAZombie zombieNames

in  let policeNames =
            [ "Leon", "Claire" ] : List Text

in  let worksForPolice =
            \(name : Text) -> { name = name, occupation = "Police officer" }

in  let policeOfficers =
            tag worksForPolice policeNames

in  let concat =
            https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/concat

in  let characters =
            concat
            { name : Text, occupation : Text }
            (   [   zombies
                ,   policeOfficers
                ]   : List (List { name : Text, occupation : Text })
            )

in  {   protagonist =
            List/head { name : Text, occupation : Text } policeOfficers
    ,   numberOfCharacters =
            List/length { name : Text, occupation : Text } characters
    }

We can use the dhall compiler to cut through the indirection and reduce the above configuration file to the following fully evaluated normal form:

$ stack install dhall
$ dhall < config
{ numberOfCharacters : Natural, protagonist : Optional { name : Text, occupation : Text } }

{ numberOfCharacters = +5, protagonist = [{ name = "Leon", occupation = "Police officer" }] : Optional { name : Text, occupation : Text } }

The first line is the inferred type of the file, which we can format as:

{ numberOfCharacters : Natural
, protagonist        : Optional { name : Text, occupation : Text }
}

This says that our configuration file is a record with two fields:

  • a field named numberOfCharacters that stores a Natural number (i.e. a non-negative number)
  • a field named protagonist that stores an Optional record with a name and occupation

From this type alone, we know that no matter how complex our configuration file gets the program will always evaluate to a simple record. This type places an upper bound on the complexity of the program's normal form.

The second line is the actual normal form of our configuration file:

{ numberOfCharacters =
    +5
, protagonist =
    [ { name = "Leon", occupation = "Police officer" }
    ] : Optional { name : Text, occupation : Text }
}

In other words, our compiler cut through all the noise and gave us an abstraction-free representation of our configuration.

Total programming

You can also evaluate configuration files written in other languages, too, but Dhall differentiates itself from other languages by offering several stronger guarantees about evaluation:

  • Dhall is not Turing complete because evaluation always halts

    You can never write a configuration file that accidentally hangs or loops indefinitely when evaluated

    Note that you can still write a configuration file that takes longer than the age of the universe to compute, but you are much less likely to do so by accident

  • Dhall is safe, meaning that functions must be defined for all inputs and can never crash, panic, or throw exceptions

  • Dhall is sandboxed, meaning that the only permitted side effect is retrieving other Dhall expressions by their filesystem path or URL

    There are examples of this in the above program where Dhall retrieves two functions from the Prelude by their URL

  • Dhall's type system has no escape hatches

    This means that we can make hard guarantees about an expression purely from the expression's type

  • Dhall can normalize functions

    For example, Dhall's Prelude provides a replicate function which builds a list by creating N copies of an element. Check out how we can normalize this replicate function before the function is even saturated:

    $ dhall
    let replicate = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/replicate
    in  replicate +10
    <Ctrl-D>
    ∀(a : Type) → ∀(x : a) → List a
    
    λ(a : Type) → λ(x : a) → [x, x, x, x, x, x, x, x, x, x] : List a

    The compiler knows that no matter what element we provide for the final argument to replicate the result must be 10 copies of that element in a list

Types

Dhall is also a typed language, so every configuration file can be checked ahead of time against an expected schema. The schema can even live in a separate file, like this:

$ cat schema
{ numberOfCharacters : Natural
, protagonist        : Optional { name : Text, occupation : Text }
}

... and then checking our configuration against a schema is as simple as giving the configuration file a type annotation:

$ dhall
./config : ./schema
<Ctrl-D>
{ numberOfCharacters : Natural, protagonist : Optional { name : Text, occupation : Text } }

{ numberOfCharacters = +5, protagonist = [{ name = "Leon", occupation = "Police officer" }] : Optional { name : Text, occupation : Text } }

If the compiler doesn't complain then that means that the configuration file type checks against our schema.

Haskell bindings

Dhall configuration files can be marshalled into Haskell data types. For example, the following Haskell program:

{-# LANGUAGE DeriveAnyClass    #-}
{-# LANGUAGE DeriveGeneric     #-}
{-# LANGUAGE OverloadedStrings #-}

import Dhall

data Summary = Summary
    { numberOfCharacters :: Natural
    , protagonist :: Maybe Person
    } deriving (Generic, Interpret, Show)

data Person = Person
    { name :: Text
    , occupation :: Text
    } deriving (Generic, Interpret, Show)

main :: IO ()
main = do
    x <- input auto "./config"
    print (x :: Summary)

... will marshal our config file into Haskell and print the corresponding Haskell representation of our configuration file:

$ stack runghc example.hs
Summary {numberOfCharacters = 5, protagonist = Just (Person {name = "Leon", occupation = "Police officer"})}

The Haskell program automatically checks that the configuration file's schema automatically matches the data structures that we marshal into. The entire pipeline is type safe from end to end.

Imports

Dhall expressions can reference other expression, either by their filesystem paths or URLs. Anything can be imported, such as fields of records:

$ cat record
{ foo = 1.0
, bar = ./bar
}

$ cat bar
[1, 2, 3] : List Integer

$ dhall < record
{ bar = [1, 2, 3] : List Integer, foo = 1.0 }

... or types:

$ cat function
\(f : ./type ) -> f False

$ cat type
Bool -> Integer

$ dhall < function
∀(f : Bool → Integer) → Integer

λ(f : Bool → Integer) → f False

... or functions:

$ cat switch
\(b : Bool) -> if b then 2 else 3

$ dhall <<< "./function ./switch"
Integer

3

You can also import URLs, too. The Dhall Prelude is hosted using IPFS (a distributed and immutable filesystem), and you can browse the Prelude here:

https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/

Anything from the Prelude can be used by just pasting the URL into your program:

$ dhall
https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/Natural/sum
([+2, +3, +5] : List Natural)
<Ctrl-D>
Natural

+10

... although usually you want to assign the URL to a shorter name for readability:

let sum = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/Natural/sum 
in  sum ([+2, +3, +5] : List Natural)

You're not limited to IPFS for hosting Dhall expressions. Any pastebin, web server, or Github repository that can serve raw UTF8 text can host a Dhall expression for others to use.

Error messages

Dhall outputs helpful error messages when things go wrong. For example, suppose that we change our type file to something that's not a type:

$ echo "1" > type
$ dhall <<< "./function ./switch"
Use "dhall --explain" for detailed errors

↳ ./function 

f : 1

Error: Not a function

f False

function:1:19

By default Dhall gives a concise summary of what broke. The error message begins with a trail of breadcrumbs pointing to which file in your import graph is broken:

↳ ./function 

In this case, the error is located in the ./function file that we imported.

Then the next part of the error message is a context that prints the types of all values that are in scope:

f : 1

... which says that only value named f is in scope and f has type 1 (Uh oh!)

The next part is a brief summary of what went wrong:

Error: Not a function

... which says that we are using something that's not a function

The compiler then prints the code fragment so we can see at a glance what is wrong with our code before we even open the file:

f False

The above fragment is wrong because f is not a function, but we tried to apply f to an argument.

Finally, the compiler prints out the file, column, and line number so that we can jump to the broken code fragment and fix the problem:

function:1:19

This says that the problem is located in the file named function at row 1 and column 19.

Detailed error messages

But wait, there's more! You might have noticed this line at the beginning of the error message:

Use "dhall --explain" for detailed errors

Let's add the --explain flag to see what happens:

$ dhall --explain <<< "./function ./switch"

↳ ./function 

f : 1

Error: Not a function

Explanation: Expressions separated by whitespace denote function application,
like this:


    ┌─────┐
    │ f x │  This denotes the function ❰f❱ applied to an argument named ❰x❱ 
    └─────┘


A function is a term that has type ❰a → b❱ for some ❰a❱ or ❰b❱.  For example,
the following expressions are all functions because they have a function type:


                        The function's input type is ❰Bool❱
                        ⇩
    ┌───────────────────────────────┐
    │ λ(x : Bool) → x : Bool → Bool │  User-defined anonymous function
    └───────────────────────────────┘
                               ⇧
                               The function's output type is ❰Bool❱


                     The function's input type is ❰Natural❱
                     ⇩
    ┌───────────────────────────────┐
    │ Natural/even : Natural → Bool │  Built-in function
    └───────────────────────────────┘
                               ⇧
                               The function's output type is ❰Bool❱


                        The function's input kind is ❰Type❱
                        ⇩
    ┌───────────────────────────────┐
    │ λ(a : Type) → a : Type → Type │  Type-level functions are still functions
    └───────────────────────────────┘
                               ⇧
                               The function's output kind is ❰Type❱


             The function's input kind is ❰Type❱
             ⇩
    ┌────────────────────┐
    │ List : Type → Type │  Built-in type-level function
    └────────────────────┘
                    ⇧
                    The function's output kind is ❰Type❱


                        Function's input has kind ❰Type❱
                        ⇩
    ┌─────────────────────────────────────────────────┐
    │ List/head : ∀(a : Type) → (List a → Optional a) │  A function can return
    └─────────────────────────────────────────────────┘  another function
                                ⇧
                                Function's output has type ❰List a → Optional a❱


                       The function's input type is ❰List Text❱
                       ⇩
    ┌────────────────────────────────────────────┐
    │ List/head Text : List Text → Optional Text │  A function applied to an
    └────────────────────────────────────────────┘  argument can be a function
                                   ⇧
                                   The function's output type is ❰Optional Text❱


An expression is not a function if the expression's type is not of the form
❰a → b❱.  For example, these are not functions:


    ┌─────────────┐
    │ 1 : Integer │  ❰1❱ is not a function because ❰Integer❱ is not the type of
    └─────────────┘  a function


    ┌────────────────────────┐
    │ Natural/even +2 : Bool │  ❰Natural/even +2❱ is not a function because
    └────────────────────────┘  ❰Bool❱ is not the type of a function


    ┌──────────────────┐
    │ List Text : Type │  ❰List Text❱ is not a function because ❰Type❱ is not
    └──────────────────┘  the type of a function


You tried to use the following expression as a function:

↳ f

... but this expression's type is:

↳ 1

... which is not a function type

────────────────────────────────────────────────────────────────────────────────

f False

function:1:19

We get a brief language tutorial explaining the error message in excruciating detail. These mini-tutorials target beginners who are still learning the language and want to better understand what error messages mean.

Every type error has a detailed explanation like this and these error messages add up to ~2000 lines of text, which is ~25% of the compiler's code base.

Tutorial

The compiler also comes with an extended tutorial, which you can find here:

This tutorial is also ~2000 lines long or ~25% of the code base. That means that half the project is just the tutorial and error messages and that's not even including comments.

Design goals

Programming languages are all about design tradeoffs and the Dhall language uses the following guiding principles (in order of descending priority) that help navigate those tradeoffs:

  • Polish

    The language should delight users. Error messages should be fantastic, execution should be snappy, documentation should be excellent, and everything should "just work".

  • Simplicity

    When in doubt, cut it out. Every configuration language needs bindings to multiple programming languages, and the more complex the configuration language the more difficult to create new bindings. Let the host language that you bind to compensate for any missing features from Dhall.

  • Beginner-friendliness

    Dhall needs to be a language that anybody can learn in a day and debug with little to no assistance from others. Otherwise people can't recommend Dhall to their team with confidence.

  • Robustness

    A configuration language needs to be rock solid. The last thing a person wants to debug is their configuration file. The language should never hang or crash. Ever.

  • Consistency

    There should only be one way to do something. Users should be able to instantly discern whether or not something is possible within the Dhall language or not.

The dhall configuration language is also designed to negate many of the common objections to programmable configuration files, such as:

"Config files shouldn't be Turing complete"

Dhall is not Turing-complete. Evaluation always terminates, no exceptions

"Configuration languages become unreadable due to abstraction and indirection"

Every Dhall configuration file can be reduced to a normal form which eliminates all abstraction and indirection

"Users will go crazy with syntax and user-defined constructs"

Dhall is a very minimal programming language. For example: you cannot even compare strings for equality (yes, really). The language also forbids many other common operations in order to force users to keep things simple.

Conclusion

You should read the tutorial if you would like to learn more about the language or use Dhall to configure your own projects:

You can also contribute, file issues, or ask questions by visiting the project repository on Github:

And the Haskell library is hosted on Hackage here:

If you would like to contribute, you can try porting Dhall to bind to languages other than Haskell, so that Dhall configuration files can be used across multiple languages. I keep the compiler simple (less than ~4000 lines of code if you don't count error messages) so that people can port the language more easily.

Also, for people who are wondering, the language is named after a Dustman from the game Planescape: Torment who belongs to a faction obsessed with death (termination).

Thursday, October 27, 2016

Electoral vote distributions are Monoids

I'm a political junkie and I spend way too much time following the polling results on FiveThirtyEight's election forecast.

A couple of days ago I was surprised that FiveThirtyEight gave Trump a 13.7% chance of winning, which seemed too high to be consistent with the state-by-state breakdowns. After reading their methodology I learned that this was due to them not assuming that state outcomes were independent. In other words, if one swing state breaks for Trump this might increase the likelihood that other swing states also break for Trump.

However, I still wanted to do the exercise to ask: what would be Hillary's chance of winning if each state's probability of winning was truly independent of one another? Let's write a program to find out!

Raw data

A couple of days ago (2016-10-24) I collected the state-by-state data from FiveThirtyEight's website (by hand) and recorded:

  • the name of the state
  • the chance that Hillary Clinton would win the state
  • the number of electoral college votes for that state

I recorded this data as a list of 3-tuples:

probabilities :: [(String, Double, Int)]
probabilities =
    [ ("Alabama"             , 0.003,  9)
    , ("Alaska"              , 0.300,  3)
    , ("Arizona"             , 0.529, 11)
    , ("Arkansas"            , 0.012,  6)
    , ("California"          , 0.999, 55)
    , ("Colorado"            , 0.889,  9)
    , ("Connecticut"         , 0.977,  7)
    , ("Delaware"            , 0.948,  3)
    , ("District of Columbia", 0.999,  3)
    , ("Florida"             , 0.731, 29)
    , ("Georgia"             , 0.259, 16)
    , ("Hawaii"              , 0.996,  4)
    , ("Idaho"               , 0.026,  4)
    , ("Illinois"            , 0.994, 20)
    , ("Indiana"             , 0.121, 11)
    , ("Iowa"                , 0.491,  6)
    , ("Kansas"              , 0.089,  6)
    , ("Kentucky"            , 0.042,  8)
    , ("Louisiana"           , 0.009,  8)
    , ("Maine"               , 0.852,  2)
    , ("Maine - 1"           , 0.944,  1)
    , ("Maine - 2"           , 0.517,  1)
    , ("Maryland"            , 0.999, 10)
    , ("Massachussetts"      , 0.998, 11)
    , ("Michigan"            , 0.929, 16)
    , ("Minnesota"           , 0.886, 10)
    , ("Mississippi"         , 0.034,  6)
    , ("Missouri"            , 0.168, 10)
    , ("Montana"             , 0.119,  3)
    , ("Nebraska"            , 0.040,  2)
    , ("Nebraska - 1"        , 0.154,  1)
    , ("Nebraska - 2"        , 0.513,  1)
    , ("Nebraska - 3"        , 0.014,  1)
    , ("Nevada"              , 0.703,  6)
    , ("New Hampshire"       , 0.868,  4)
    , ("New Jersey"          , 0.981, 14)
    , ("New Mexico"          , 0.941,  5)
    , ("New York"            , 0.998, 29)
    , ("North Carolina"      , 0.689, 15)
    , ("North Dakota"        , 0.070,  3)
    , ("Oklahoma"            , 0.002,  7)
    , ("Ohio"                , 0.563, 18)
    , ("Oregon"              , 0.957,  7)
    , ("Pennsylvania"        , 0.880, 20)
    , ("Rhode Island"        , 0.974,  4)
    , ("South Carolina"      , 0.086,  9)
    , ("South Dakota"        , 0.117,  3)
    , ("Tennessee"           , 0.025, 11)
    , ("Texas"               , 0.166, 38)
    , ("Utah"                , 0.067,  6)
    , ("Vermont"             , 0.984,  3)
    , ("Virginia"            , 0.943, 13)
    , ("Washington"          , 0.975, 12)
    , ("West Virginia"       , 0.006,  5)
    , ("Wisconsin"           , 0.896, 10)
    , ("Wyoming"             , 0.021,  3)
    ]

Note that some states (like Maine) apportion electoral votes in a weird way:

probabilities :: [(String, Double, Int)]
probabilities =
    ...
    , ("Maine"               , 0.852,  2)
    , ("Maine - 1"           , 0.944,  1)
    , ("Maine - 2"           , 0.517,  1)
    ...
    ]

Maine apportions two of its electoral votes based on a state-wide vote (i.e. "Maine" in the above list) and then two further electoral votes are apportioned based on two districts (i.e. "Maine - 1" and "Maine - 2". FiveThirtyEight computes the probabilities for each subset of electoral votes, so we just record them separately.

Combinatorial explosion

So how might we compute Hillary's chances of winnings assuming the independence of each state's outcome?

One naïve approach would be to loop through all possible electoral outcomes and compute the probability and electoral vote for each outcome. Unfortunately, that's not very efficient since the number of possible outcomes doubles with each additional entry in the list:

>>> 2 ^ length probabilities
72057594037927936

... or approximately 7.2 * 10^16 outcomes. Even if I only spent a single CPU cycle to compute each outcome (which is unrealistic) on a 2.5 GHz processor that would take almost a year to compute them all. The election is only a couple of weeks away so I don't have that kind of time or computing power!

Distributions

Fortunately, we can do much better than that! We can efficiently solve this using a simple "divide-and-conquer" approach where we subdivide the large problem into smaller problems until the solution is trivial.

The central data structure we'll use is a probability distribution which we'll represent as a Vector of Doubles:

import Data.Vector.Unboxed (Vector)

newtype Distribution = Distribution (Vector Double)
    deriving (Show)

This Vector will always have 539 elements, one element per possible final electoral vote count that Hillary might get. Each element is a Double representing the probability of that corresponding electoral vote count. We will maintain an invariant that all the probabilities (i.e. elements of the Vector) must sum to 1.

For example, if the Distribution were:

[1, 0, 0, 0, 0 ... ]

... that would represent a 100% chance of Hillary getting 0 electoral votes and a 0% chance of any other outcome. Similarly, if the Distribution were:

[0, 0.5, 0, 0.5, 0, 0, 0 ... ]

... then that would represent a 50% chance of Hillary getting 1 electoral vote and a 50% chance of Hillary getting 3 electoral votes.

In order to simplify the problem we need to subdivide the problem into smaller problems. For example, if I want to compute the final electoral vote probability distribution for all 50 states perhaps we can break that down into two smaller problems:

  • Split the 50 states into two sub-groups of 25 states each
  • Compute an electoral vote probability distribution for each sub-group of 25 states
  • Combine probability distributions for each sub-group into the final distribution

In order to do that, I need to define a function that combines two smaller distributions into a larger distribution:

import qualified Data.Vector.Unboxed

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * Data.Vector.Unboxed.unsafeIndex ys (i - j)

The combine function takes two input distributions named xs and ys and generates a new distribution named zs. To compute the probability of getting i electoral votes in our composite distribution, we just add up all the different ways we can get i electoral votes from the two sub-distributions.

For example, to compute the probability of getting 4 electoral votes for the entire group, we add up the probabilities for the following 5 outcomes:

  • We get 0 votes from our 1st group and 4 votes from our 2nd group
  • We get 1 votes from our 1st group and 3 votes from our 2nd group
  • We get 2 votes from our 1st group and 2 votes from our 2nd group
  • We get 3 votes from our 1st group and 1 votes from our 2nd group
  • We get 4 votes from our 1st group and 0 votes from our 2nd group

The probabilityOfEachOutcome function computes the probability of each one of these outcomes and then the totalProbability function sums them all up to compute the total probability of getting i electoral votes.

We can also define an empty distribution representing the probability distribution of electoral votes given zero states:

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0 = 1
    makeElement _ = 0

This distribution says that given zero states you have a 100% chance of getting zero electoral college votes and 0% chance of any other outcome. This empty distribution will come in handy later on.

Divide and conquer

There's no limit to how many times we can subdivide the problem. In the extreme case we can sub-divide the problem down to individual states (or districts for weird states like Maine and Nebraska):

  • subdivide our problem into 56 sub-groups (one group per state or district)
  • compute the probability distribution for each sub-group, which is trivial
  • combine all the probability distributions to retrieve the final result

In fact, this extreme solution is surprisingly efficient!

All we're missing is a function that converts each entry in our original probabilities list into a Distribution:

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
    Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0              = 1 - probability
    makeElement i | i == votes = probability
    makeElement _              = 0

This says that if our probability distribution for a single state should have two possible outcomes:

  • Hillary clinton has probability x of winning n votes for this state
  • Hillary clinton has probability 1 - x of winning 0 votes for this state
  • Hillary clinton has 0% probability of any other outcome for this state

Let's test this out on a couple of states:

>>> toDistribution ("Alaska"      , 0.300, 3)
Distribution [0.7,0.0,0.0,0.3,0.0,0.0,...
>>> toDistribution ("North Dakota", 0.070, 3)
Distribution [0.9299999999999999,0.0,0.0,7.0e-2,0.0...

This says that:

  • Alaska has a 30% chance of giving Clinton 3 votes and 70% chance of 0 votes
  • North Dakota has a 7% chance of giving Clinton 3 votes and a 93% chance of 0 votes

We can also verify that combine works correctly by combining the electoral vote distributions of both states. We expect the new distribution to be:

  • 2.1% chance of 6 votes (the probability of winning both states)
  • 65.1% chance of 0 votes (the probability of losing both states)
  • 32.8% chance of 3 votes (the probability of winning just one of the two states)

... and this is in fact what we get:

>>> let alaska      = toDistribution ("Alaska"      , 0.300, 3)
>>> let northDakota = toDistribution ("North Dakota", 0.070, 3)
>>> combine alaska northDakota
Distribution [0.6509999999999999,0.0,0.0,0.32799999999999996,0.0,0.0,2.1e-2,0.0,...

Final result

To compute the total probability of winning, we just transform each element of the list to the corresponding distribution:

distributions :: [Distribution]
distributions = map toDistribution probabilities

... then we reduce the list to a single value repeatedly applying the combine function, falling back on the empty distribution if the entire list is empty:

import qualified Data.List

distribution :: Distribution
distribution = Data.List.foldl' combine empty distributions

... and if we want to get Clinton's chances of winning, we just add up the probabilities for all outcomes greater than or equal to 270 electoral college votes:

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
    Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
  where
    Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory

If we compile and run this program we get the final result:

$ stack --resolver=lts-7.4 build vector
$ stack --resolver=lts-7.4 ghc -- -O2 result.hs
$ ./result
0.9929417642334847

In other words, Clinton has a 99.3% chance of winning if each state's outcome is independent of every other outcome. This is significantly higher than the probability estimated by FiveThirtyEight at that time: 86.3%.

These results differ for the same reason I noted above: FiveThirtyEight assumes that state outcomes are not necessarily independent and that a Trump in one state could correlate with Trump wins in other states. This possibility of correlated victories favors the person who is behind in the race.

As a sanity check, we can also verify that the final probability distribution has probabilities that add up to approximately 1:

>>> let Distribution xs = distribution
>>> Data.Vector.Unboxed.sum xs
0.9999999999999994

Exercise: Expand on this program to plot the probability distribution

Efficiency

Our program is also efficient, running in 30 milliseconds:

$ bench ./result
benchmarking ./result
time                 30.33 ms   (29.42 ms .. 31.16 ms)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 29.43 ms   (29.13 ms .. 29.81 ms)
std dev              710.6 μs   (506.7 μs .. 992.6 μs)

This is a significant improvement over a year's worth of running time.

We could even speed this up further using parallelism. Thanks to our divide and conquer approach we can subdivide this problem among up to 53 CPUs to accelerate the solution. However, after a certain point the overhead of splitting up the work might outweigh the benefits of parallelism.

Monoids

People more familiar with Haskell will recognize that this solution fits cleanly into a standard Haskell interface known as the Monoid type class. In fact, many divide-and-conquer solutions tend to be Monoids of some sort.

The Monoid typeclass is defined as:

class Monoid m where
    mappend :: m -> m -> m

    mempty :: m

-- An infix operator that is a synonym for `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y

... and the Monoid class has three rules that every implementation must obey, which are known as the "Monoid laws".

The first rule is that mappend (or the equivalent (<>) operator) must be associative:

x <> (y <> z) = (x <> y) <> z

The second and third rules are that mempty must be the "identity" of mappend, meaning that mempty does nothing when combined with other values:

mempty <> x = x

x <> mempty = x

A simple example of a Monoid is integers under addition, which we can implement like this:

instance Monoid Integer where
    mappend = (+)
    mempty = 0

... and this implementation satisfies the Monoid laws thanks to the laws of addition:

(x + y) + z = x + (y + z)

0 + x = x

x + 0 = x

However, Distributions are Monoids, too! Our combine and empty definitions both have the correct types to implement the mappend and mempty methods of the Monoid typeclass, respectively:

instance Monoid Distribution where
    mappend = combine

    mempty = empty

Both mappend and mempty for Distributions satisfy the Monoid laws:

  • mappend is associative (Proof omitted)
  • mempty is the identity of mappend

We can prove the identity law using the following rules for how Vectors behave:

-- These rules assume that all vectors involved have 539 elements

-- If you generate a vector by just indexing into another vector, you just get back
-- the other vector
Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs

-- If you index into a vector generated by a function, that's equivalent to calling
-- that function
Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i

Equipped with those rules, we can then prove that mappend xs mempty = xs

mapppend (Distribution xs) mempty

-- mappend = combine
= combine (Distribution xs) mempty

-- Definition of `mempty`
= combine (Distribution xs) (Distribution ys)
  where
    ys = Data.Vector.Unboxed.generate 539 makeElement
      where
        makeElement 0 = 1
        makeElement _ = 0

-- Definition of `combine`
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * Data.Vector.Unboxed.unsafeIndex ys (i - j)

    ys = Data.Vector.Unboxed.generate 539 makeElement
      where
        makeElement 0 = 1
        makeElement _ = 0

-- Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * makeElement (i - j)

        makeElement 0 = 1
        makeElement _ = 0

-- Case analysis on `j`
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j
            | j == i =
                    Data.Vector.Unboxed.unsafeIndex xs j
                  * 1  -- makeElement (i - j) = makeElement 0 = 1
            | otherwise =
                    Data.Vector.Unboxed.unsafeIndex xs j
                  * 0  -- makeElement (i - j) = 0

-- x * 1 = x
-- y * 0 = 0
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j
            | j == i    = Data.Vector.Unboxed.unsafeIndex xs j
            | otherwise = 0

-- Informally: "Sum of a vector with one non-zero element is just that element"
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i = Data.Vector.Unboxed.unsafeIndex xs i

-- Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs
= Distribution xs

Exercise: Prove the associativity law for combine

Conclusion

I hope people find this an interesting example of how you can apply mathematical design principles (in this case: Monoids) in service of simplifying and speeding up programming problems.

If you would like to test this program out yourself the complete program is provided below:

import Data.Vector.Unboxed (Vector)

import qualified Data.List
import qualified Data.Vector.Unboxed

probabilities :: [(String, Double, Int)]
probabilities =
    [ ("Alabama"             , 0.003,  9)
    , ("Alaska"              , 0.300,  3)
    , ("Arizona"             , 0.529, 11)
    , ("Arkansas"            , 0.012,  6)
    , ("California"          , 0.999, 55)
    , ("Colorado"            , 0.889,  9)
    , ("Connecticut"         , 0.977,  7)
    , ("Delaware"            , 0.948,  3)
    , ("District of Columbia", 0.999,  3)
    , ("Florida"             , 0.731, 29)
    , ("Georgia"             , 0.259, 16)
    , ("Hawaii"              , 0.996,  4)
    , ("Idaho"               , 0.026,  4)
    , ("Illinois"            , 0.994, 20)
    , ("Indiana"             , 0.121, 11)
    , ("Iowa"                , 0.491,  6)
    , ("Kansas"              , 0.089,  6)
    , ("Kentucky"            , 0.042,  8)
    , ("Louisiana"           , 0.009,  8)
    , ("Maine"               , 0.852,  2)
    , ("Maine - 1"           , 0.944,  1)
    , ("Maine - 2"           , 0.517,  1)
    , ("Maryland"            , 0.999, 10)
    , ("Massachussetts"      , 0.998, 11)
    , ("Michigan"            , 0.929, 16)
    , ("Minnesota"           , 0.886, 10)
    , ("Mississippi"         , 0.034,  6)
    , ("Missouri"            , 0.168, 10)
    , ("Montana"             , 0.119,  3)
    , ("Nebraska"            , 0.040,  2)
    , ("Nebraska - 1"        , 0.154,  1)
    , ("Nebraska - 2"        , 0.513,  1)
    , ("Nebraska - 3"        , 0.014,  1)
    , ("Nevada"              , 0.703,  6)
    , ("New Hampshire"       , 0.868,  4)
    , ("New Jersey"          , 0.981, 14)
    , ("New Mexico"          , 0.941,  5)
    , ("New York"            , 0.998, 29)
    , ("North Carolina"      , 0.689, 15)
    , ("North Dakota"        , 0.070,  3)
    , ("Oklahoma"            , 0.002,  7)
    , ("Ohio"                , 0.563, 18)
    , ("Oregon"              , 0.957,  7)
    , ("Pennsylvania"        , 0.880, 20)
    , ("Rhode Island"        , 0.974,  4)
    , ("South Carolina"      , 0.086,  9)
    , ("South Dakota"        , 0.117,  3)
    , ("Tennessee"           , 0.025, 11)
    , ("Texas"               , 0.166, 38)
    , ("Utah"                , 0.067,  6)
    , ("Vermont"             , 0.984,  3)
    , ("Virginia"            , 0.943, 13)
    , ("Washington"          , 0.975, 12)
    , ("West Virginia"       , 0.006,  5)
    , ("Wisconsin"           , 0.896, 10)
    , ("Wyoming"             , 0.021,  3)
    ]

newtype Distribution = Distribution { getDistribution :: Vector Double }
    deriving (Show)

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * Data.Vector.Unboxed.unsafeIndex ys (i - j)

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0 = 1
    makeElement _ = 0

instance Monoid Distribution where
    mappend = combine

    mempty = empty

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
    Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0              = 1 - probability
    makeElement i | i == votes = probability
    makeElement _              = 0

distributions :: [Distribution]
distributions = map toDistribution probabilities

distribution :: Distribution
distribution = mconcat distributions

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
    Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
  where
    Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory

Monday, July 4, 2016

Auto-generate service API endpoints from records

Haskell has pretty cool support for code generation from data type definitions using GHC generics. So I thought: "why not generate a service from a data type?".

The basic idea is pretty simple. Given a data type definition like this:

data Command
    = Create { filepath :: FilePath, contents :: String }
    | Delete { filepath :: FilePath }

... we'll auto-generate two API endpoints:

  • /create?filepath=:string&contents=:string
  • /delete?filepath=:string

Each endpoint accepts query parameters matching the fields for their respective constructors:

$ curl 'localhost:8080/create?filepath=test.txt&contents=ABC'
"File created"
$ cat test.txt
ABC
$ curl 'localhost:8080/delete?filepath=test.txt'
"File deleted"
$ cat test.txt
cat: test.txt: No such file or directory

The complete code to build the server looks like this:

{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric  #-}

import Server.Generic
import System.Directory (removeFile)

data Command
    = Create { filepath :: FilePath, contents :: String }
    | Delete { filepath :: FilePath }
    deriving (Generic, ParseRecord)

handler :: Command -> IO String
handler (Create file text) = do
    writeFile file text
    return "File created"
handler (Delete file) = do
    removeFile file
    return "File deleted"

main :: IO ()
main = serveJSON 8080 handler

You can test it yourself by running:

$ stack build server-generic
$ stack runghc AboveExample.hs

... and then in a separate terminal you can hit each endpoint with curl as illustrated above.

GHC Generics

The Haskell magic is in this one line of code:

    deriving (Generic, ParseRecord)

This auto-generates code that tells the server how to marshal the route and query parameters into our Command data type. All we have to do is supply a handler that pattern matches on the incoming Command to decide what to do:

handler :: Command -> IO String
handler (Create file text) = do
    writeFile file text
    return "File created"
handler (Delete file) = do
    removeFile file
    return "File deleted"

You can read that as saying:

  • "If a client hits the /create endpoint, create the specified file"
  • "If a client hits the /delete endpoint, delete the specified file"

As an exercise, you can try modifying the handler to respond with the name of the file that was created or deleted.

However, you're not limited to query parameters. You can also parse data from path tokens by just omitting field labels from the data type:

{-# LANGUAGE DeriveGeneric #-}

import Server.Generic

data Command
    = Add      Double Double
    | Multiply Double Double
    deriving (Generic)

instance ParseRecord Command

handler :: Command -> IO Double
handler (Add      x y) = return (x + y)
handler (Multiply x y) = return (x * y)

main :: IO ()
main = serveJSON 8080 handler

If you run the above code, you get a server that has two endpoints:

  • /add/:double/:double
  • /multiply/:double/:double

You can run the server and test that they work:

$ curl 'localhost:8080/add/2/3'
5
$ curl 'localhost:8080/multiply/2/3'
6

This library also intelligently handles optional and repeated fields in data type definitions. For example, suppose we serve this data type::

{-# LANGUAGE DeriveGeneric  #-}
{-# LANGUAGE DeriveAnyClass #-}

import Server.Generic

data Example = Example
    { list     :: [Int]
    , optional :: Maybe   Int
    , first    :: First   Int
    , last     :: Last    Int
    } deriving (Generic, ParseRecord, ToJSON)

handler :: Example -> IO Example
handler = return  -- Serve decoded value back to client as JSON

main :: IO ()
main = serveJSON 8080 handler

... then the server will echo back the decoded type as JSON, correctly handling absent or repeated fields:

$ curl 'localhost:8080/example'
{"list":[],"first":null,"last":null,"optional":null}
$ curl 'localhost:8080/example?optional=1&list=1&list=2&first=1&first=2&last=1&last=2'
{"list":[1,2],"first":1,"last":2,"optional":1}

Also, these repeated and optional annotations work for path components, too, in case you were wondering:

{-# LANGUAGE DeriveGeneric  #-}
{-# LANGUAGE DeriveAnyClass #-}

import Server.Generic

data Example = Example [Int] (Maybe Text)
    deriving (Generic, ParseRecord, ToJSON)

handler :: Example -> IO Example
handler = return

main :: IO ()
main = serveJSON 8080 handler

The above server does "the right thing" and doesn't need to be told where the Ints end and the Text begins:

$ curl 'localhost:8080/example'
[[],null]
$ curl 'localhost:8080/example/1/2/foo'
[[1,2],"foo"]
$ curl 'localhost:8080/example/1/2/3'
[[1,2,3],null]
$ curl 'localhost:8080/example/foo'
[[],"foo"]

The server uses backtracking when parsing the route so the server knows when the Ints end and the Text begins.

Types

The whole thing is strongly typed, which means several things in the context of service programming.

For example, if you define a data type that expects an Int field, then by golly your handler will get an Int field or the server will automatically reject the request for you. You don't have to worry about checking that the field is present nor do you need to validate that the parameter decodes to an Int correctly. If you want the parameter to be optional then you need to make that explicit by marking the field as type Maybe Int.

You also don't have to handle fields that belong to other endpoints. Each endpoint only gets exactly the fields it requested; no more, no less. If a given endpoint gets the wrong set of path tokens or query parameters then the server rejects the request for you.

This is also strongly typed in the sense that more logic is pushed into the type and less logic goes in the handler. If you want just the first or last occurrence of a query parameter, you just annotate the type with First or Last, respectively. The more logic you push into declarative type-level programming the more you distill your handler to focus on business logic.

Caveats

I wrote this library to provide a quick an easy way to spin up Haskell web services but the library could still use some improvement. I'm not really a web developer so I only kind of know what I'm doing and could use help from people more knowledgeable than me.

The most notable deficiency is that the library does not take care to serve proper HTTP status codes for different types of errors. Every failed request returns a 404 status code.

Also, if the route is malformed the error message is a completely unhelpful "404 Not Found" error message that doesn't indicate how to fix the error.

Another blatant deficiency is that the server completely ignores the request method. I wasn't sure how to design this to work within the framework of data type generic programming.

If you have ideas about how to improve things I would greatly welcome any contributions.

Conclusions

People familiar with Haskell will recognize that this library resembles the servant library in some respects. The high-level difference is that this is a subset of servant that is much simpler but also significantly less featureful. For example, servant can also generate client-side bindings and Swagger resource declarations and servant also permits a much greater degree of customization.

This library focuses primarily on simple quick-and-dirty services; for anything more polished and production-ready you will probably want to try other Haskell service libraries. I just wanted to make it as easy as possible for people to get started with back-end development in Haskell and also show off how cool and powerful GHC generics can be.

If you would like to learn more about this library you can read the tutorial or if you would like to use the library you can obtain the code from Hackage or Github.

Sunday, July 3, 2016

list-transformer - A beginner-friendly ListT

Currently, Hackage has four implementations of "ListT-done-right" that I'm aware of:

  • LogicT
  • pipes (which provides a ListT type)
  • list-t
  • List

However, I felt that all of these libraries were more complex than they needed to be so I tried to distill them down to the simplest library possible. I want to encourage more people to use ListT so I'm releasing the beginner-friendly list-transformer library .

There are a few design choices I made to improve the new user experience for this library:

First, the ListT data type is not abstract and users are encouraged to use constructors to both assemble and pattern match on ListT values. This helps them build an intuition for how ListT works under the hood since the type is small and not difficult to use directly:

newtype ListT m a = ListT { next :: m (Step m a) }

data Step m a = Cons a (ListT m a) | Nil

Second, the API is tiny in order to steer users towards leaning on Haskell's "collections API" as much as possible. Specifically, I try to direct people towards these type classes:

  • Functor/Applicative/Monad
  • Alternative/MonadPlus
  • MonadTrans/MonadIO

Right now there are only three functions in the API that are not class methods:

  • runListT
  • fold
  • foldM

Everything else is a method for one of the standard type classes and I avoid introducing new type classes.

Third, the API does not provide convenient helper functions for fully materializing the list. In other words, there is no utility function of this type:

toList :: ListT m a -> m [a]

A determined user can still force the list either indirectly via the fold function or by manually recursing over the ListT type. The goal is not to forbid this behavior, but rather to gently encourage people to preserve streaming. The API promotes the intuition that you're supposed to transform and consume results one-by-one instead of in large batches.

Fourth, the library comes with a long inline tutorial which is longer than the actual code. I think the tutorial could still use some improvement so if you would like to contribute to improve the tutorial please do!

Conclusion

You can find the list-transformer library either on Hackage or on Github.

Hopefully this encourages more people to give ListT a try and provides a stepping stone for understanding more complex streaming abstractoins.

Saturday, May 21, 2016

A command-line benchmark tool

I wrote a small program named bench that lets you benchmark other programs from the command line. Think of this as a much nicer alternative to the time command.

The best way to illustrate how this works is to show a few example uses of the program:

$ bench 'ls /usr/bin | wc -l'  # You can benchmark shell pipelines
benchmarking ls /usr/bin | wc -l
time                 6.756 ms   (6.409 ms .. 7.059 ms)
                     0.988 R²   (0.980 R² .. 0.995 R²)
mean                 7.590 ms   (7.173 ms .. 8.526 ms)
std dev              1.685 ms   (859.0 μs .. 2.582 ms)
variance introduced by outliers: 88% (severely inflated)

$ bench 'sleep 1'  # You have to quote multiple tokens
benchmarking sleep 1
time                 1.003 s    (1.003 s .. 1.003 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.003 s    (1.003 s .. 1.003 s)
std dev              65.86 μs   (0.0 s .. 68.91 μs)
variance introduced by outliers: 19% (moderately inflated)

$ bench true  # The benchmark overhead is below 1 ms
benchmarking true
time                 383.9 μs   (368.6 μs .. 403.4 μs)
                     0.982 R²   (0.971 R² .. 0.991 R²)
mean                 401.1 μs   (386.9 μs .. 418.9 μs)
std dev              54.39 μs   (41.70 μs .. 67.62 μs)
variance introduced by outliers: 87% (severely inflated)

This utility just provides a command-line API for Haskell's criterion benchmarking library. The bench tool wraps any shell command you provide in a a subprocess and benchmarks that subprocess through repeated runs using criterion. The number of runs varies between 10 to 10000 times depending on how expensive the command is.

This tool also threads through the same command-line options that criterion accepts for benchmark suites. You can see the full set of options using the --help flag:

$ bench --help
Command-line tool to benchmark other programs

Usage: bench COMMAND ([-I|--ci CI] [-G|--no-gc] [-L|--time-limit SECS]
             [--resamples COUNT] [--regress RESP:PRED..] [--raw FILE]
             [-o|--output FILE] [--csv FILE] [--junit FILE]
             [-v|--verbosity LEVEL] [-t|--template FILE] [-m|--match MATCH]
             [NAME...] | [-n|--iters ITERS] [-m|--match MATCH] [NAME...] |
             [-l|--list] | [--version])

Available options:
  -h,--help                Show this help text
  COMMAND                  The command line to benchmark
  -I,--ci CI               Confidence interval
  -G,--no-gc               Do not collect garbage between iterations
  -L,--time-limit SECS     Time limit to run a benchmark
  --resamples COUNT        Number of bootstrap resamples to perform
  --regress RESP:PRED..    Regressions to perform
  --raw FILE               File to write raw data to
  -o,--output FILE         File to write report to
  --csv FILE               File to write CSV summary to
  --junit FILE             File to write JUnit summary to
  -v,--verbosity LEVEL     Verbosity level
  -t,--template FILE       Template to use for report
  -m,--match MATCH         How to match benchmark names ("prefix" or "glob")
  -n,--iters ITERS         Run benchmarks, don't analyse
  -m,--match MATCH         How to match benchmark names ("prefix" or "glob")
  -l,--list                List benchmarks
  --version                Show version info

The --output option is really useful: it outputs an HTML page with a chart showing the distribution of run times. For example, the following command:

$ bench 'ls /usr/bin | wc -l' --output example.html
benchmarking ls /usr/bin | wc -l
time                 6.716 ms   (6.645 ms .. 6.807 ms)
                     0.999 R²   (0.999 R² .. 0.999 R²)
mean                 7.005 ms   (6.897 ms .. 7.251 ms)
std dev              462.0 μs   (199.3 μs .. 809.2 μs)
variance introduced by outliers: 37% (moderately inflated)

Also produces something like the following chart which you can view in example.html:

You can also increase the time limit using the --time-limit option, which will in turn increase the number of runs for better statistics. For example, criterion warned me that I had too many outliers for my benchmarks, so I can increase the time limit for the above benchmark to 30 seconds:

$ bench 'ls /usr/bin | wc -l' --time-limit 30 --output example.html
benchmarking ls /usr/bin | wc -l
time                 6.937 ms   (6.898 ms .. 7.002 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 6.905 ms   (6.878 ms .. 6.935 ms)
std dev              114.9 μs   (86.59 μs .. 156.1 μs)

... which dials up the number of runs to the ~4000 range, reduces the number of outliers, and brings down the standard deviation by a factor of four:

Keep in mind that there are a few limitations to this tool:

  • this tool cannot accurately benchmark code that requires a warm up phase (such as JVM programs that depend on JIT compilation for performance)
  • this tool cannot measure performance below about half a millisecond due to the overhead of launching a subprocess and bash interpreter

Despite those limitations, I find that this tool comes in handy in a few scenarios:

  • Preliminary benchmarking in the prototyping phase of program development
  • Benchmarking program pipelines written in multiple languages

You can install this tool by following the instructions on the Github repo:

Or if you have the Haskell stack tool installed you can just run:

$ stack update
$ stack install bench

Sunday, April 24, 2016

Data is Code

The title of this post is a play on the Lisp aphorism: "Code is Data". In the Lisp world everything is data; code is just another data structure that you can manipulate and transform.

However, you can also go to the exact opposite extreme: "Data is Code"! You can make everything into code and implement data structures in terms of code.

You might wonder what that even means: how can you write any code if you don't have any primitive data structures to operate on? Fascinatingly, Alonzo Church discovered a long time ago that if you have the ability to define functions you have a complete programming language. "Church encoding" is the technique named after his insight that you could transform data structures into functions.

This post is partly a Church encoding tutorial and partly an announcement for my newly released annah compiler which implements the Church encoding of data types. Many of the examples in this post are valid annah code that you can play with. Also, to be totally pedantic annah implements Boehm-Berarducci encoding which you can think of as the typed version of Church encoding.

This post assumes that you have basic familiarity with lambda expressions. If you do not, you can read the first chapter (freely available) of the Haskell Programming from First Principles which does an excellent job of teaching lambda calculus.

If you would like to follow along with these examples, you can download and install annah by following these steps:

  • Install the stack tool
  • Create the following stack.yaml file

    $ cat > stack.yaml
    resolver: lts-5.13
    packages: []
    extra-deps:
    - annah-1.0.0
    - morte-1.6.0
    <Ctrl-D>
  • Run stack setup
  • Run stack install annah
  • Add the installed executable to your $PATH

Lambda calculus

In the untyped lambda calculus, you only have lambda expressions at your disposal and nothing else. For example, here is how you encode the identity function:

λx  x

That's a function that takes one argument and returns the same argument as its result.

We call this "abstraction" when we introduce a variable using the Greek lambda symbol and we call the variable that we introduce a "bound variable". We can then use that "bound variable" anywhere within the "body" of the lambda expression.

+-- Abstraction
|
|+-- Bound variable
||
vv
λx → x
     ^
     |
     +-- Body of lambda expression

Any expression that begins with a lambda is an anonymous function which we can apply to another expression. For example, we can apply the the identity function to itself like this:

(λx  x) (λy  y)

-- β-reduction
= λy  y

We call this "application" when we supply an argument to an anonymous function.

We can define a function of multiple arguments by nested "abstractions":

λx  λy  x

The above code is an anonymous function that returns an anonymous function. For example, if you apply the outermost anonymous function to a value, you get a new function:

(λx  λy  x) 1

-- β-reduce
λy  1

... and if you apply the lambda expression to two values, you return the first value:

(λx  λy  x) 1 2

-- β-reduce
(λy  1) 2

-- β-reduce
1

So our lambda expression behaves like a function of two arguments, even though it's really a function of one argument that returns a new function of one argument. We call this "currying" when we simulate functions of multiple arguments using functions one argument. We will use this trick because we will be programming in a lambda calculus that only supports functions of one argument.

Typed lambda calculus

In the typed lambda calculus you have to specify the types of all function arguments, so you have to write something like this:

λ(x : a)  x

... where a is the type of the bound variable named x.

However, the above function is still not valid because we haven't specified what the type a is. In theory, we could specify a type like Int:

λ(x : Int)  x

... but the premise of this post was that we could program without relying on any built-in data types so Int is out of the question for this experiment.

Fortunately, some typed variations of lambda calculus (most notably: "System F") let you introduce the type named a as yet another function argument:

λ(a : *)  λ(x : a)  x

This is called "type abstraction". Here the * is the "type of types" and is a universal constant that is always in scope, so we can always introduce new types as function arguments this way.

The above function is the "polymorphic identity function", meaning that this is the typed version of the identity function that still preserves the ability to operate on any type.

If we had built-in types like Int we could apply our polymorphic function to the type just like any other argument, giving back an identity function for a specific type:

(λ(a : *)  λ(x : a)  x) Int

-- β-reduction
λ(x : Int)  x

This is called "type application" or (more commonly) "specialization". A "polymorphic" function is a function that takes a type as a function argument and we "specialize" a polymorphic function by applying the function to a specific type argument.

However, we are forgoing built-in types like Int, so what other types do we have at our disposal?

Well, every lambda expression has a corresponding type. For example, the type of our polymorphic identity function is:

(a : *)  (x : a)  a

You can read the type as saying:

  • this is a function of two arguments, one argument per "forall" (∀) symbol
  • the first argument is named a and a is a type
  • the second argument is named x and the type of x is a
  • the result of the function must be a value of type a

This type uniquely determines the function's implementation. To be totally pedantic, there is exactly one implementation up to extensional equality of functions. Since this function has to work for any possible type a there is only one way to implement the function. We must return x as the result, since x is the only value available of type a.

Passing around types as values and function arguments might seem a bit strange to most programmers since most languages either:

  • do not use types at all

    Example: Javascript

    // The polymorphic identity function in Javascript
    function id(x) {
        return x
    }
    
    // Example use of the function
    id(true)
  • do use types, but they hide type abstraction and type application from the programmer through the use of "type inference"

    Example: Haskell

    -- The polymorphic identity function in Haskell
    id x = x
    
    -- Example use of the function
    id True
  • they use a different syntax for type abstraction/application versus ordinary abstraction and application

    Example: Scala

    -- The polymorphic identity function in Scala
    def id[A](x : a)
    
    -- Example use of the function
    -- Note: Scala lets you omit the `[Boolean]` here thanks
    --       to type inference but I'm making the type
    --       application explicit just to illustrate that
    --       the syntax is different from normal function
    --       application
    id[Boolean](true)

For the purpose of this post we will program with explicit type abstraction and type application so that there is no magic or hidden machinery.

So, for example, suppose that we wanted to apply the typed, polymorphic identity function to itself. The untyped version was this:

(λx  x) (λy  y)

... and the typed version is this:

(λ(a : *)  λ(x : a)  x)
    ((b : *)  (y : b)  b)
    (λ(b : *)  λ(y : b)  y)

-- β-reduction
= (λ(x : (b : *)  (y : b)  b)  x)
      (λ(b : *)  λ(y : b)  y)

-- β-reduction
= (λ(b : *)  λ(y : b)  y)

So we can still apply the identity function to itself, but it's much more verbose. Languages with type inference automate this sort of tedious work for you while still giving you the safety guarantees of types. For example, in Haskell you would just write:

(\x -> x) (\y -> y)

... and the compiler would figure out all the type abstractions and type applications for you.

Exercise: Haskell provides a const function defined like this:

const :: a -> b -> a
const x y = x

Translate const function to a typed and polymorphic lambda expression in System F (i.e. using explicit type abstractions)

Boolean values

Lambda expressions are the "code", so now we need to create "data" from "code".

One of the simplest pieces of data is a boolean value, which we can encode using typed lambda expressions. For example, here is how you implement the value True:

λ(Bool : *)  λ(True : Bool)  λ(False : Bool)  True

Note that the names have no significance at all. I could have equally well written the expression as:

λ(a : *)  λ(x : a)  λ(y : a)  x

... which is "α-equivalent" to the previous version (i.e. equivalent up to renaming of variables).

We will save the above expression to a file named ./True in our current directory. We'll see why shortly.

We can either save the expression using Unicode characters:

$ cat > ./True
λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True
<Ctrl-D>

... or using ASCII, replacing each lambda (i.e. λ) with a backslash (i.e. \) and replacing each arrow (i.e. ) with an ASCII arrow (i.e. ->)

$ cat > ./True
\(Bool : *) -> \(True : Bool) -> \(False : Bool) -> True
<Ctrl-D>

... whichever you prefer. For the rest of this tutorial I will use Unicode since it's easier to read.

Similarly, we can encode False by just changing our lambda expression to return the third argument named False instead of the second argument named True. We'll name this file ./False:

$ cat > ./False
λ(Bool : *) → λ(True : Bool) → λ(False : Bool)False
<Ctrl-D>

What's the type of a boolean value? Well, both the ./True and ./False files have the same type, which we shall call ./Bool:

$ cat > ./Bool
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

... and if you are following along with ASCII you can replace each forall symbol (i.e. ) with the word forall:

$ cat > ./Bool
forall (Bool : *) -> forall (True : Bool) -> forall (False : Bool) -> Bool

We are saving these terms and types to files because we can use the annah compiler to work with any lambda expression or type saved as a file. For example, I can use the annah compiler to verify that the file ./True has type ./Bool:

$ annah
-- Read this as: "./True has type ./Bool"
./True : ./Bool
<Ctrl-D>
./True
$ echo $?
0

If the expression type-checks then annah will just compile the expression to lambda calculus (by removing the unnecessary type annotation in this case) and return a zero exit code. However, if the expression does not type-check:

$ annah
./True : ./True
annah: 
Expression: ∀(x : λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True) → λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True

Error: Invalid input type

Type: λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True

$ echo $?
1

... then annah will throw an exception and return a non-zero exit code. In this case annah complains that the ./True on the right-hand side of the type annotation is not a valid type.

The last thing we need is a function that can consume values of type ./Bool, like an ./if function:

$ cat > ./if
λ(x : ./Bool )  x
--          ^
--          |
--          +-- Note the space.  Filenames must end with a space

The definition of ./if is blindingly simple: ./if is just the identity function on ./Bools!

To see why this works, let's see what the type of ./if is. We can ask for the type of any expression by feeding the expression to the morte compiler via standard input:

$ morte < ./if
(x : (Bool : *)  (True : Bool)  (False : Bool)  Bool) 
(Bool : *)  (True : Bool)  (False : Bool)  Bool

λ(x : (Bool : *)  (True : Bool)  (False : Bool)  Bool)  x

morte is a lambda calculus compiler installed alongside annah and annah is a higher-level interface to the morte language. By default, the morte compiler will:

  • resolve all file references (transitively, if necessary)
  • type-check the expression
  • optimize the expression
  • write the expression's type to standard error as the first line of output
  • write the optimized expression to standard output as the last line of output

In this case we only cared about the type, so we could have equally well just asked the morte compiler to resolve and infer the type of the expression:

$ morte resolve < ./Bool/if | morte typecheck
(x : (Bool : *)  (True : Bool)  (False : Bool)  Bool) 
(Bool : *)  (True : Bool)  (False : Bool)  Bool

The above type is the same thing as:

(x : ./Bool )  ./Bool

If you don't believe me you can prove this to yourself by asking morte to resolve the type:

$ echo "∀(x : ./Bool ) → ./Bool" | morte resolve
∀(x : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool) →
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

However, the type will make the most sense if you only expand out the second ./Bool in the type but leave the first ./Bool alone:

./Bool  (Bool : *)  (True : Bool)  (False : Bool)  Bool

You can read this type as saying that the ./if function takes four arguments:

  • the first argument is the ./Bool that we want to branch on (i.e. ./True or ./False)
  • the second argument is the result type of our ./if expression
  • the third argument is the result we return if the ./Bool evaluates to ./True (i.e. the "then" branch)
  • the fourth argument is the result we return if the ./Bool evaluates to ./False (i.e. the "else" branch)

For example, this Haskell code:

if True
then False
else True

... would translate to this Annah code:

$ annah
./if ./True
    ./Bool   -- The type of the result
    ./False  -- The `then` branch
    ./True   -- The `else` branch
<Ctrl-D>
./if  ./True  ./Bool  ./False  ./True

annah does not evaluate the expression. annah only translates the expression into Morte code (and the expression is already valid Morte code) and type-checks the expression. If you want to evaluate the expression you need to run the expression through the morte compiler, too:

$ morte
./if ./True
    ./Bool   -- The type of the result
    ./False  -- The `then` branch
    ./True   -- The `else` branch
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)False

morte deduces that the expression has type ./Bool and the expression evaluates to ./False.

morte evaluates the expression by resolving all references and repeatedly applying β-reduction. This is what happens under the hood:

./if
    ./True
    ./Bool
    ./False
    ./True

-- Resolve the `./if` reference
= (λ(x : ./Bool )  x)
    ./True
    ./Bool
    ./False
    ./True

-- β-reduce
= ./True
    ./Bool
    ./False
    ./True

-- Resolve the `./True` reference
= (λ(Bool : *)  λ(True : Bool)  λ(False : Bool)  True)
    ./Bool
    ./False
    ./True

-- β-reduce
= (λ(True : ./Bool )  λ(False : ./Bool )  True)
    ./False
    ./True

-- β-reduce
= (λ(False : ./Bool )  ./False )
    ./True

-- β-reduce
= ./False

-- Resolve the `./False` reference
λ(Bool : *)  λ(True : Bool)  λ(False : Bool)  False

The above sequence of steps is a white lie: the true order of steps is actually different, but equivalent.

The ./if function was not even necessary because every value of type ./Bool is already a "pre-formed if expression". That's why ./if is just the identity function on ./Bools. You can delete the ./if from the above example and the code will still work.

Now let's define the not function and save the function to a file:

$ annah > ./not
λ(b : ./Bool )./if b
        ./Bool
        ./False  -- If `b` is `./True`  then return `./False`
        ./True   -- If `b` is `./False` then return `./True`
<Ctrl-D>

We can now use this file like an ordinary function:

$ morte
./not ./False
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True
$ morte
./not ./True
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)False

Notice how ./not ./False returns ./True and ./not ./True returns ./False.

Similarly, we can define an and function and an or function:

$ annah > and
λ(x : ./Bool )  λ(y : ./Bool ) 
    ./if x
        ./Bool
        y        -- If `x` is `./True`  then return `y`
        ./False  -- If `x` is `./False` then return `./False`
<Ctrl-D>
$ annah > or
λ(x : ./Bool )  λ(y : ./Bool ) 
    ./if x
        ./Bool
        ./True   -- If `x` is `./True`  then return `./True`
        y        -- If `x` is `./False` then return `y`
<Ctrl-D>

... and use them:

$ morte
./and ./True ./False
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)False
$ morte
./or ./True ./False
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True

We started with nothing but lambda expressions, but still managed to implement:

  • a ./Bool type
  • a ./True value of type ./Bool
  • a ./False value of type ./Bool
  • ./if, ./not, ./and, and ./or functions

... and we can do real computation with them! In other words, we've modeled boolean data types entirely as code.

Exercise: Implement an xor function

Natural numbers

You might wonder what other data types you can implement in terms of lambda calculus. Fortunately, you don't have to wonder because the annah compiler will actually compile data type definitions to lambda expressions for you.

For example, suppose we want to define a natural number type encoded using Peano numerals. We can write:

$ annah types
type Nat
data Succ (pred : Nat)
data Zero
fold foldNat
<Ctrl-D>

You can read the above datatype specification as saying:

  • Define a type named Nat ...
  • ... with a constructor named Succ with one field named pred of type Nat ...
  • ... with another constructor named Zero with no fields
  • ... and a fold named foldNat

annah then translates the datatype specification into the following files and directories:

+-- ./Nat.annah -- `annah` implementation of `Nat`
|
`-- ./Nat
    |
    +-- @  -- `morte` implementation of `Nat`
    |      --
    |      -- If you import the `./Nat` directory this file is
    |      -- imported instead
    |
    +-- Zero.annah     -- `annah` implementation of `Zero`
    |
    +-- Zero           -- `morte` implementation of `Zero`
    |
    +-- Succ.annah     -- `annah` implementation of `Succ`
    |
    +-- Succ           -- `morte` implementation of `Succ`
    |
    +-- foldNat.annah  -- `annah` implementation of `foldNat`
    |
    `-- foldNat        -- `morte` implementation of `foldNat`

Let's see how the Nat type is implemented:

(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

All Boehm-Berarducci-encoded datatypes are encoded as substitution functions, including ./Nat. Any value of ./Nat is a function that takes three arguments that we will substitute into our natural number expression:

  • The first argument replace every occurrence of the Nat type
  • The second argument replaces every occurrence of the Succ constructor
  • The third argument replaces every occurrence of the Zero constructor

This will make more sense if we walk through a specific example. First, we will build the number 3 using the ./Nat/Succ and ./Nat/Zero constructors:

$ morte
./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))
(Nat : *)  (Succ : (pred : Nat)  Nat)  (Zero : Nat)  Nat

λ(Nat : *)  λ(Succ : (pred : Nat)  Nat)  λ(Zero : Nat) 
Succ (Succ (Succ Zero))

Now suppose that we want to compute whether or not our natural number is even. The only catch is that we must limit ourselves to substitution when computing even. We have to figure out something that we can substitute in place of the Succ constructors and something that we can substitute in place of the Zero constructors that will then evaluate to ./True if the natural number is even and ./False otherwise.

One substitution that works is the following:

  • Replace every Zero with ./True (because Zero is even)
  • Replace every Succ with ./not (because Succ alternates between even and odd)

So in other words, if we began with this:

./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))

... and we substitute with ./Nat/Succ with ./not and substitute ./Nat/Zero with ./True:

./not (./not (./not ./True ))

... then the expression will reduce to ./False.

Let's prove this by saving the above number to a file named ./three:

$ morte > ./three
./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))
$ cat three
λ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →
Succ (Succ (Succ Zero))

The first thing we need to do is to replace the Nat with ./Bool:

./three ./Bool


-- Resolve `./three`
= (λ(Nat : *)  λ(Succ : (pred : Nat)  Nat)  λ(Zero : Nat) 
      Succ (Succ (Succ Zero))
  ) ./Bool

-- β-reduce
= λ(Succ : (pred : ./Bool )  ./Bool )  λ(Zero : ./Bool ) 
      Succ (Succ (Succ Zero))

Now the next two arguments have exactly the right type for us to substitute in ./not and ./True. The argument named ./Succ is now a function of type ∀(pred : ./Bool ) → ./Bool, which is the same type as ./not. The argument named Zero is now a value of type ./Bool, which is the same type as ./True. This means that we can proceed with the next two arguments:

./three ./Bool ./not ./True

-- Resolve `./three`
= (λ(Nat : *)  λ(Succ : (pred : Nat)  Nat)  λ(Zero : Nat) 
      Succ (Succ (Succ Zero))
  ) ./Bool ./not ./True

-- β-reduce
= (λ(Succ : (pred : ./Bool )  ./Bool )  λ(Zero : ./Bool ) 
      Succ (Succ (Succ Zero))
  ) ./not ./True

-- β-reduce
= (λ(Zero : ./Bool )  ./not (./not (./not Zero))) ./True

-- β-reduce
= ./not (./not (./not ./True )))

The result is exactly what we would have gotten if we took our original expression:

./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))

... and replaced ./Nat/Succ with ./not and replaced ./Nat/Zero with ./True.

Let's verify that this works by running the code through the morte compiler:

$ morte
./three ./Bool ./not ./True
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)False

morte computes that the number ./three is not even, returning ./False.

We can even go a step further and save an ./even function to a file:

$ annah > even
\(n : ./Nat ) →
    n ./Bool
        ./not   -- Replace each `./Nat/Succ` with `./not`
        ./True  -- Replace each `./Nat/Zero` with `./True`

... and use our newly-formed ./even function:

$ morte
./even ./three
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)False
$ morte
./even ./Nat/Zero
<Ctrl-D>(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True

The annah compiler actually provides direct support for natural number literals, so you can also just write:

$ annah | morte
./even 100
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True

What about addition? How do we add two numbers using only substitution?

Well, one way we can add two numbers, m and n, is that we substitute each ./Nat/Succ in m with ./Nat/Succ (i.e. keep them the same) and substitute the Zero with n. In other words:

$ annah > plus
λ(m : ./Nat ) → λ(n : ./Nat )m ./Nat         -- The result will still be a `./Nat`
        ./Nat/Succ  -- Replace each `./Nat/Succ` with `./Nat/Succ`
        n           -- Replace each `./Nat/Zero` with `n`

Let's verify that this works:

$ annah | morte
./plus 2 2
∀(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

λ(Nat : *) → λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ (Succ (Succ Zero)))

We get back a Church-encoded 4!

What happened under the hood was the following substitutions:

./plus 2 2

-- Resolve `./plus`
= (λ(m : ./Nat )  λ(n : ./Nat )  m ./Nat ./Nat/Succ n) 2 2

-- β-reduce
= (λ(n : ./Nat )  2 ./Nat ./Nat/Succ n) 2

-- β-reduce
= 2 ./Nat ./Nat/Succ 2

-- Definition of 2
= (./Nat/Succ (./Nat/Succ ./Nat/Zero )) ./Nat ./Nat/Succ 2

-- Resolve and β-reduce the definition of 2 (multiple steps)
= (λ(Nat : *)  λ(Succ : (pred : Nat)  Nat)  λ(Zero : Nat) 
      Succ (Succ Zero)
  ) ./Nat ./Nat/Succ 2

-- β-reduce
= (λ(Succ : (pred : ./Nat )  ./Nat )  λ(Zero : ./Nat ) 
      Succ (Succ Zero)
  ) ./Nat/Succ 2

-- β-reduce
= (λ(Zero : ./Nat )  ./Nat/Succ (./Nat/Succ Zero)) 2

-- β-reduce
= ./Nat/Succ (./Nat/Succ 2)

-- Definition of 2
= ./Nat/Succ (./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero )))

-- Resolve and β-reduce (multiple steps)
= λ(Nat : *)  λ(Succ : (pred : Nat)  Nat)  λ(Zero : Nat) 
      Succ (Succ (Succ (Succ Zero)))

So we can encode natural numbers in lambda calculus, albeit very inefficiently! There are some tricks that we can use to greatly speed up both the time complexity and constant factors, but it will never be competitive with machine arithmetic. This is more of a proof of concept that you can model arithmetic purely in code.

Exercise: Implement a function which multiplies two natural numbers

Data types

annah also lets you define "temporary" data types that scope over a given expression. In fact, that's how Nat was implemented. You can look at the corresponding *.annah files to see how each type and term is defined in annah before conversion to morte code.

For example, here is how the Nat type is defined in annah:

$ cat Nat.annah
type Nat
data Succ (pred : Nat)
data Zero
fold foldNat
in   Nat

The first four lines are identical to what we wrote when we invoked the annah types command from the command line. We can use the exact same data type specification to create a scoped expression that can reference the type and data constructors we specified.

When we run this expression through annah we get back the Nat type:

$ annah < Nat.annah
∀(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

You can use these scoped datatype declarations to quickly check how various datatypes are encoded without polluting your current working directory. For example, I can ask annah how the type Maybe is encoded in lambda calculus:

$ annah
λ(a : *)type Maybe
    data Just (x : a)
    data Nothing
    -- You can also leave out this `fold` if you don't use it
    fold foldMaybe
    in   Maybe
<Ctrl-D>
λ(a : *) → ∀(Maybe : *) → ∀(Just : ∀(x : a) → Maybe) →
∀(Nothing : Maybe) → Maybe

A Maybe value is just another substitution function. You provide one branch that you substitute for Just and another branch that you substitute for Nothing. For example, the Just constructor always substitutes in the first branch and ignores the Nothing branch that you supply:

$ annah
λ(a : *)type Maybe
    data Just (x : a)
    data Nothing
    in   Just
<Ctrl-D>
λ(a : *) → λ(x : a) → λ(Maybe : *) → λ(Just : ∀(x : a)Maybe)
→ λ(Nothing : Maybe)Just x

Vice versa, the Nothing constructor substitutes in the Nothing branch that you supply and ignores the Just branch:

$ annah
λ(a : *)type Maybe
    data Just (x : a)
    data Nothing
    in   Nothing
<Ctrl-D>
λ(a : *) → λ(Maybe : *) → λ(Just : ∀(x : a)Maybe) → λ(Nothing : Maybe)Nothing

Notice how we've implemented Maybe and Just purely using functions. This implies that any language with functions can encode Maybe, like Python!

Let's translate the above definition of Just and Nothing to the equivalent Python code. The only difference is that we delete the type abstractions because they are not necessary in Python:

def just(x):
  def f(just, nothing):
    return just(x)
  return f

def nothing():
  def f(just, nothing):
    return nothing
  return f

We can similarly translate Haskell-style pattern matching like this:

example :: Maybe Int -> IO ()
example m = case m of
    Just n  -> print n
    Nothing -> return ()

... into this Python code:

def example(m):
  def just(n):    # This is what we substitute in place of `Just`
    print(n)
  def nothing():  # This is what we substitute in place of `Nothing`
    return
  m(just, nothing)

... and verify that our pattern matching function works:

>>> example(nothing())
>>> example(just(1))
1

Neat! This means that any algebraic data type can be embedded into any language with functions, which is basically every language!

Warning: your colleagues may get angry at you if you do this! Consider using a language with built-in support for algebraic data types instead of trying to twist your language into something it's not.

Let expressions

You can also translate let expressions to lambda calculus, too. For example, instead of saving something to a file we can just use a let expression to temporarily define something within a program.

For example, we could write:

$ annah | morte
let x : ./Nat = ./plus 1 2
in  ./plus x x
∀(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

λ(Nat : *) → λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ (Succ (Succ (Succ (Succ Zero)))))

... but that doesn't really tell us anything about how annah desugars let because we only see the final evaluated result. We can ask annah to desugar without performing any other transformations using the annah desugar command:

$ annah desugar
let x : ./Nat = ./plus 1 2
in  ./plus x x
<Ctrl-D>
(λ(x : ./Nat )./plus  x x) (./plus  (λ(Nat : *) → λ(Succ :
∀(pred : Nat)Nat) → λ(Zero : Nat)Succ Zero) (λ(Nat : *) →
λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ
Zero)))

... which makes more sense if we clean up the result through the use of numeric literals:

(λ(x : ./Nat )  ./plus x x) (./plus 1 2)

Every time we write an expression of the form:

let x : t = y
in  e

... we decode that to lambda calculus as:

(λ(x : t)  e) y

We can also decode function definitions, too. For example, you can write:

$ annah | morte
let increment (x : ./Nat ) : ./Nat = ./plus x 1
in  increment 3
<Ctrl-D>(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

λ(Nat : *) → λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ (Succ (Succ Zero)))

... and the intermediate desugared form also encodes the function definition as a lambda expression:

$ annah desugar
let increment (x : ./Nat ) : ./Nat = ./plus x 1
in  increment 3
<Ctrl-D>
(λ(increment : ∀(x : ./Nat )./Nat )increment (λ(Nat : *)
→ λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ
(Succ Zero)))) (λ(x : ./Nat )./plus  x (λ(Nat : *) → λ(Succ
: ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ Zero)

... which you can clean up as this expression:

(λ(increment : (x : ./Nat )  ./Nat )  increment 3)
    (λ(x : ./Nat )  ./plus x 1)

We can combine let expressions with data type expressions, too. For example, here's our original not program, except without saving anything to files:

$ annah
type Bool
data True
data False
fold if
in

let not (b : Bool) : Bool = if b Bool False True
in

not False
<Ctrl-D>
λ(Bool : *) → λ(True : Bool) → λ(False : Bool)True

Lists

annah also provides syntactic support for lists as well. For example:

$ annah
[nil ./Bool , ./True , ./False , ./True ]
<Ctrl-D>
λ(List : *) → λ(Cons : ∀(head : ./Bool ) → ∀(tail : List)List) → λ(Nil : List)Cons ./True  (Cons ./False  (Cons
./True  Nil))

Just like all the other data types, a list is defined in terms of what you use to substitute each Cons and Nil constructor. I can replace each Cons with ./and and the Nil with ./True like this:

$ annah | morte
<Ctrl-D>
[nil ./Bool , ./True , ./False , ./True ] ./Bool ./and ./True

∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool)Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool)False

This conceptually followed the following reduction sequence:

(   λ(List : *)
→   λ(Cons : ∀(head : ./Bool ) → ∀(tail : List) → List)
→   λ(Nil : List)
→   Cons ./True  (Cons ./False  (Cons ./True  Nil))
)   ./Bool
    ./and
    ./True

-- β-reduction
= (   λ(Cons : ∀(head : ./Bool ) → ∀(tail : ./Bool ) → ./Bool )
  →   λ(Nil : ./Bool )
  →   Cons ./True  (Cons ./False  (Cons ./True  Nil))
  )   ./and
      ./True

-- β-reduction
= (   λ(Nil : ./Bool )
  →   ./and ./True  (./and ./False  (./and ./True  Nil))
  )   ./True

-- β-reduction
= ./and ./True (./and ./False (./and ./True ./True))

Similarly, we can sum a list by replacing each Cons with ./plus and replacing each Nil with 0:

$ annah | morte
[nil ./Nat , 1, 2, 3, 4] ./Nat ./plus 0
(Nat : *)  (Succ : (pred : Nat)  Nat)  (Zero : Nat)  Nat

λ(Nat : *)  λ(Succ : (pred : Nat)  Nat)  λ(Zero : Nat) 
Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
Zero)))))))))

This behaves as if we had written:

./plus 1 (./plus 2 (./plus 3 (./plus 4 0)))

Prelude

annah also comes with a Prelude to show some more sophisticated examples of what you can encode in pure lambda calculus. You can find version 1.0 of the Prelude here:

http://sigil.place/prelude/annah/1.0/

You can use these expressions directly within your code just by referencing their URL. For example, the remote Bool expression is located here:

http://sigil.place/prelude/annah/1.0/Bool/@

... and the remote True expression is located here:

http://sigil.place/prelude/annah/1.0/Bool/True

... so we can check if the remote True's type matches the remote Bool by writing this:

$ annah
http://sigil.place/prelude/annah/1.0/Bool/True : http://sigil.place/prelude/annah/1.0/Bool
<Ctrl-D>
http://sigil.place/prelude/annah/1.0/Bool/True 
$ echo $?
0

Similarly, we can build a natural number (very verbosely) using remote Succ and Zero:

$ annah | morte
http://sigil.place/prelude/annah/1.0/Nat/Succ
(   http://sigil.place/prelude/annah/1.0/Nat/Succ
    (   http://sigil.place/prelude/annah/1.0/Nat/Succ
        http://sigil.place/prelude/annah/1.0/Nat/Zero
    )
)(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

λ(Nat : *) → λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ (Succ Zero))

However, we can also locally clone the Prelude using wget if we wish to refer to local file paths instead of remote paths:

$ wget -np -r --cut-dirs=3 http://sigil.place/prelude/annah/1.0/
$ cd sigil.place
$ ls
(->)            Defer.annah    List.annah    Path         Sum0.annah
(->).annah      Eq             Maybe         Path.annah   Sum1
Bool            Eq.annah       Maybe.annah   Prod0        Sum1.annah
Bool.annah      Functor        Monad         Prod0.annah  Sum2
Category        Functor.annah  Monad.annah   Prod1        Sum2.annah
Category.annah  index.html     Monoid        Prod1.annah
Cmd             IO             Monoid.annah  Prod2
Cmd.annah       IO.annah       Nat           Prod2.annah
Defer           List           Nat.annah     Sum0

Now we can use these expressions using their more succinct local paths:

./Nat/sum (./List/(++) ./Nat [nil ./Nat , 1, 2] [nil ./Nat , 3, 4])
<Ctrl-D>(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

λ(Nat : *) → λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
Zero)))))))))

Also, every expression has a corresponding *.annah file that documents the expression's type using a let expression. For example, we can see the type of the ./List/(++) function by studying the ./List/(++).annah file:

cat './List/(++).annah' 
let (++) (a : *) (as1 : ../List a) (as2 : ../List a) : ../List a =
        \(List : *)
    ->  \(Cons : a -> List -> List)
    ->  \(Nil : List)
    ->  as1 List Cons (as2 List Cons Nil)
in (++)

The top line tells us that (++) is a function that takes three arguments:

  • An argument named a for the type list elements you want to combine
  • An argument named as1 for the left list you want to combine
  • An argument named as2 for the right list you want to combine

... and the function returns a list of the same type as the two input lists.

Beyond

Exactly how far can you take lambda calculus? Well, here's a program written in annah that reads two natural numbers, adds them, and writes them out:

$ annah | morte
./IO/Monad ./Prod0 (do ./IO {
    x : ./Nat   <- ./IO/get ;
    y : ./Nat   <- ./IO/get ;
    _ : ./Prod0 <- ./IO/put (./Nat/(+) x y);
})
(IO : *)  (Get_ : (((Nat : *)  (Succ : (pred : Nat) 
Nat)  (Zero : Nat)  Nat)  IO)  IO)  (Put_ : ((Nat : *)
 (Succ : (pred : Nat)  Nat)  (Zero : Nat)  Nat)  IO 
IO)  (Pure_ : ((Prod0 : *)  (Make : Prod0)  Prod0)  IO)
 IO

λ(IO : *)  λ(Get_ : (((Nat : *)  (Succ : (pred : Nat) 
Nat)  (Zero : Nat)  Nat)  IO)  IO)  λ(Put_ : ((Nat : *)
 (Succ : (pred : Nat)  Nat)  (Zero : Nat)  Nat)  IO 
IO)  λ(Pure_ : ((Prod0 : *)  (Make : Prod0)  Prod0)  IO)
 Get_ (λ(r : (Nat : *)  (Succ : (pred : Nat)  Nat) 
(Zero : Nat)  Nat)  Get_ (λ(r : (Nat : *)  (Succ :
(pred : Nat)  Nat)  (Zero : Nat)  Nat)  Put_ (λ(Nat : *)
 λ(Succ : (pred : Nat)  Nat)  λ(Zero : Nat)  r@1 Nat Succ
(r Nat Succ Zero)) (Pure_ (λ(Prod0 : *)  λ(Make : Prod0) 
Make))))

This does not run the program, but it creates a syntax tree representing all program instructions and the flow of information.

annah supports do notation so you can do things like write list comprehensions in annah:

$ annah | morte
./List/sum (./List/Monad ./Nat (do ./List {
    x : ./Nat <- [nil ./Nat , 1, 2, 3];
    y : ./Nat <- [nil ./Nat , 4, 5, 6];
    _ : ./Nat <- ./List/pure ./Nat (./Nat/(+) x y);
}))
<Ctrl-D>(Nat : *) → ∀(Succ : ∀(pred : Nat)Nat) → ∀(Zero : Nat)Nat

λ(Nat : *) → λ(Succ : ∀(pred : Nat)Nat) → λ(Zero : Nat)Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ
(Succ (Succ (Succ Zero)))))))))))))))))))))))))))))))))))))))))
)))))))))))))))))))))

The above Annah program is equivalent to the following Haskell program:

sum (do
    x <- [1, 2, 3]
    y <- [4, 5, 6]
    return (x + y) )

... yet it is implemented 100% in a minimal and total lambda calculus without any built-in support for data types.

This tutorial doesn't cover how do notation works, but you can learn this and more by reading the Annah tutorial which is bundled with the Hackage package:

Conclusions

A lot of people underestimate how much you can do in a total lambda calculus. I don't recommend pure lambda calculus as a general programming language, but I could see a lambda calculus enriched with high-efficiency primitives to be a realistic starting point for simple functional languages that are easy to port and distribute.

One of the projects I'm working towards in the long run is a "JSON for code" and annah is one step along the way towards that goal. annah will likely not be that language, but I still factored out annah as a separate and reusable project along the way so that others could fork and experiment with annah when experimenting with their own language design.

Also, as far as I can tell annah is the only project in the wild that actually implements the Boehm-Berarducci encoding outlined in this paper:

... so if you prefer to learn the encoding algorithm by studying actual code you can use the annah source code as a reference real-world implementation.

You can find Annah project on Hackage or Github:

... and you can find the Annah prelude hosted online:

The Annah tutorial goes into the Annah language and compiler in much more detail than this tutorial, so if you would like to learn more I highly recommend reading the tutorial which walks through the compiler, desugaring, and the Prelude in much more detail:

Also, for those who are curious, both the annah and morte compilers are named after characters from the game Planescape: Torment.