In this post I will introduce a proof-of-concept implementation for distributing typed code over the internet where the unit of compilation is individual expressions.
The core language
To motivate this post, consider this Haskell code:
data Bool = True | False
and :: Bool -> Bool -> Bool
and b1 b2 = if b1 then b2 else False
or :: Bool -> Bool -> Bool
or b1 b2 = if b1 then True else b2
data Even = Zero | SuccE Odd
data Odd = SuccO Even
four :: Even
four = SuccE (SuccO (SuccE (SuccO Zero)))
doubleEven :: Even -> Even
doubleEven (SuccE o) = SuccE (SuccO (doubleOdd o)
doubleEven Zero = Zero
doubleOdd :: Odd -> Even
doubleOdd (SuccO e) = SuccE (SuccO (doubleEven e)
I will encode each one of the above types, terms, and constructors as separate, closed, non-recursive expressions in the calculus of constructions. You can think of the calculus of constructions as a typed assembly language for functional programs which we will use to distribute program fragments over the internet. You can learn more about the calculus of constructions and other pure type systems by reading this clear paper by Simon Peyton Jones: "Henk: a typed intermediate language".
For example, here is how you encode the True
constructor in the calculus of constructions:
λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True
Note that the entire expression is the True
constructor, not just the right-hand side:
This is the True constructor
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True
^^^^
Not this
I just chose the variable names so that you can tell at a glance what constructor you are looking at from the right-hand side of the expression.
Similarly, here is how you encode the type Bool
in the calculus of constructions:
This is the `Bool` type
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
^^^^
Not this
Again, the entire expression is the Bool
type, but I chose the variable names so that you can tell which type you are looking at from the right-hand side.
You can learn more about the full set of rules for translating data types to System F (a subset of the calculus of constructions) by reading the paper: "Automatic synthesis of typed Λ-programs on term algebras". Also, I will soon release a compiler named annah
that automates this translation algorithm, and I used this compiler to translate the above Haskell code to the equivalent expressions in the calculus of constructions.
Distribution
We can distribute these expressions by hosting each expression as text source code on the internet. For example, I encoded all of the above types, terms and constructors in the calculus of constructions and hosted them using a static web server. You can browse these expressions by visiting sigil.place/post/0/.
Click on one of the expressions in the directory listing to see how they are encoded in the calculus of constructions. For example, if you click the link to four
you will find an ordinary text file whose contents look like this (formatted for clarity):
λ(Even : *) -- This entire
→ λ(Odd : *) -- expression is
→ λ(Zero : Even) -- the number `four`
→ λ(SuccE : ∀(pred : Odd) → Even) --
→ λ(SuccO : ∀(pred : Even) → Odd) --
→ SuccE (SuccO (SuccE (SuccO Zero))) -- Not just this last line
Each one of these expressions gets a unique URL, and we can embed any expression in our code by referencing the appropriate URL.
Remote imports
We can use the morte
compiler to download, parse, and super-optimize programs written in the calculus of constructions. The morte compiler reads in a program from standard input, outputs the program's type to standard error, then super-optimizes the program and outputs the optimized program to standard output.
For example, we can compute and True False
at compile time by just replacing and
, True
, and False
by their appropriate URLs:
$ cabal install 'morte >= 1.2'
$ morte
#http://sigil.place/post/0/and
#http://sigil.place/post/0/True
#http://sigil.place/post/0/False
When we hit <Ctrl-D>
to signal end of standard input then morte
will compile the program:
$ morte
#http://sigil.place/post/0/and
#http://sigil.place/post/0/True
#http://sigil.place/post/0/False
<Ctrl-D>
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False
The program's type is Bool
and morte
optimizes away the program to False
at compile time. Both the type (Bool
) and the value (False
) are encoded in the calculus of constructions.
Here we are using morte
as a compile-time calculator, mainly because Morte does not yet compile to a backend language. When I release a backend language I will go into more detail about how to embed expressions to evaluate at runtime instead of compile time.
Local imports
We can shorten this example further because morte
also lets you import expressions from local files using the same hashtag syntax. For example, we can create local files that wrap remote URLs like this:
$ echo "#http://sigil.place/post/0/Bool" > Bool
$ echo "#http://sigil.place/post/0/True" > True
$ echo "#http://sigil.place/post/0/False" > False
$ echo "#http://sigil.place/post/0/or" > or
We can then use these local files as convenient short-hand aliases for remote imports:
$ morte
#or #True #False
<Ctrl-D>
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True
We can use imports anywhere in our program, even as types! For example, in the calculus of constructions you encode if
as the identity function on #Bool
s:
λ(b : #Bool ) → b # Note: Imports must end with whitespace
We can then save our implementation of if
to a file named if
, except using the ASCII symbols \
and ->
instead of λ
and →
:
$ echo "\(b : #Bool ) -> b" > if
Now we can define our own and
function in terms of if
. Remember that the Haskell definition of and
is:
and b1 b2 = if b1 then b2 else False
Our definition won't be much different:
$ echo "\(b1 : #Bool ) -> \(b2 : #Bool ) -> #if b1 #Bool b2 #False" > and
Let's confirm that our new and
function works:
$ echo "#and #True #False" | morte
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False
We can also ask morte
to resolve all imports for our and
function and optimize the result:
$ morte < and
∀(b1 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ ∀(b2 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
λ(b1 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ λ(b2 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ b1 (∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
b2
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
We can then compare our version with the and
expression hosted online, which is identical:
$ curl sigil.place/post/0/and
λ(b1 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ λ(b2 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ b1 (∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
b2
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
Reduction
When we write an expression like this:
#or #True #False
The compiler resolves all imports transitively until all that is left is an expression in the calculus of constructions, like this one:
-- or
( λ(b1 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ b1 (∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True)
)
-- True
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True )
-- False
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
Then the compiler reduces this expression using β-reduction and ε-reduction. We can safely reduce these expressions at compile time because these reductions always terminate in the calculus of constructions, which is a total and non-recursive language.
For example, the above expression reduces to:
( λ(b1 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ b1 (∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True)
)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True )
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
-- β-reduce
= (λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True )
(∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
-- β-reduce
= ( λ(True : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ λ(False : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ True
)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
-- β-reduce
= ( λ(False : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True
)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
-- β-reduce
= λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True
Linking
The and
we defined is "dynamically linked", meaning that the file we saved has not yet resolved all imports:
$ cat and
\(b1 : #Bool ) -> \(b2 : #Bool ) -> #if b1 #Bool b2 #False
The morte
compiler will resolve these imports every time we import this expression within a program. To be precise, each import is resolved once per program and then cached and reused for subsequent duplicate imports. That means that the compiler only imports #Bool
once for the above program and not three times. Also, we can transparently cache these expressions just like any other web resource by providing the appropriate Cache-Control
HTTP header. For example, my static web server sets max-age
to a day so that expressions can be cached for up to one day.
If our imported expressions change then our program will reflect those changes, which may or may not be desirable. For the above program dynamic linking is undesirable because if we change the file #False
to point to sigil.place/post/0/True then we would break the behavior of the and
function.
Alternatively, we can "statically link" the and
function by resolving all imports using the morte
compiler. For example, I statically linked my remote and
expression because the behavior should never change:
$ curl sigil.place/post/0/and
λ(b1 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ λ(b2 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ b1 (∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
b2
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)
In other scenarios you might want to dynamically link expressions if you want to automatically pull in upgrades from trusted upstream sources. This is the same rationale behind service-oriented architectures which optimize for transparent system-wide updates, except that instead of updating a service we update an expression.
Partial application
We can store partially applied functions in files, too. For example, we could store and True
in a statically linked file named example
using morte
:
$ echo "#and #True" | morte > example
∀(b2 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool)
→ ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
The type still goes to standard error, but the partially applied function goes to the example
file. We can use the partially applied function just by referencing our new file:
$ morte
#example #False -- Same as: #and #True #False
<Ctrl-D>
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False
We can even view example
and see that it's still just an ordinary text source file:
$ cat example
λ(b2 : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) → b2
We can also see that morte
was clever and optimized #and #True
to the identity function on #Bool
s.
If we wanted to share our example
code with our friends, we'd just host the file using any static web server. I like to use Haskell's warp
server (from the wai-app-static
package) for static hosting, but even something like python -m SimpleHTTPServer
would work just as well:
$ cabal install wai-app-static
$ warp
Serving directory /tmp/code on port 3000 with ["index.html","index.htm"] index files.
Then we could provide our friends with a URL pointing to the example
file and they could embed our code within their program by pasting in our URL.
Types
The calculus of constructions is typed, so if you make a mistake, you'll know immediately:
$ morte
#True #True
<Ctrl-D>
Expression:
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True)
(λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True)
Error: Function applied to argument of the wrong type
Expected type: *
Argument type: ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool
Types are what differentiates morte
from a curl into sh. You can use the type system to whitelist the set of permissible values to import.
For example, in this code, there are only two values of #x
that will type-check (up to α-conversion):
(λ(b : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) → b) #x
Therefore, we can safely import a remote value knowing that the type-checker will reject attempts to inject arbitrary code.
When building a program with effects, we can similarly refine the set of permissible actions using the types. I introduced one such example in my previous post on morte
, where the recursive.mt
program restricts the effects to reading and printing lines of text and nothing else. You could then import a remote expression of type:
∀(String : *)
→ ∀(U : *)
→ ∀(Unit : U)
→ ∀(IO : *)
→ ∀(GetLine : String → IO → IO)
→ ∀(PutStrLn : (String → IO) → IO)
→ ∀(Return : U → IO)
→ IO
... which is the type of an effect syntax tree built from GetLine
/PutStrLn
/Return
constructors. The type-checker will then enforce that the imported syntax tree cannot contain any other constructors and therefore cannot be interpreted to produce any other effects.
Recursive data types
You can encode recursive data types and functions in the calculus of constructions. This is all the more amazing when you realize that the calculus of constructions does not permit recursion! Also, morte
's import system forbids recursion as well. If you try to recurse using imports you will get an error:
$ echo "#foo" > bar
$ echo "#bar" > foo
$ morte < foo
morte:
⤷ #bar
⤷ #foo
Cyclic import: #bar
Joe Armstrong once proposed that the core language for an internet of code would require built-in support for recursion (via letrec
or something similar), but that's actually not true! The paper "Automatic synthesis of typed Λ-programs on term algebras" spells out how to encode recursive data types in the non-recursive System F language. What's amazing is that the algorithm works even for mutually recursive data types like Even
and Odd
from our original Haskell example.
You don't have to take my word for it! You can verify for yourself that the Even
and Odd
types and the Zero
/SuccE
/SuccO
constructors that I hosted online are not recursive:
- sigil.place/post/0/Even
- sigil.place/post/0/Odd
- sigil.place/post/0/Zero
- sigil.place/post/0/SuccE
- sigil.place/post/0/SuccO
Let's create local aliases for the constructors so we can built our own Even
or Odd
values:
$ echo "#http://sigil.place/post/0/Zero" > Zero
$ echo "#http://sigil.place/post/0/SuccE" > SuccE
$ echo "#http://sigil.place/post/0/SuccO" > SuccO
We can then assemble the number four
using these constructors:
$ morte
#SuccE (#SuccO (#SuccE (#SuccO #Zero )))
<Ctrl-D>
∀(Even : *)
→ ∀(Odd : *)
→ ∀(Zero : Even)
→ ∀(SuccE : ∀(pred : Odd) → Even)
→ ∀(SuccO : ∀(pred : Even) → Odd)
→ Even
λ(Even : *)
→ λ(Odd : *)
→ λ(Zero : Even)
→ λ(SuccE : ∀(pred : Odd) → Even)
→ λ(SuccO : ∀(pred : Even) → Odd)
→ SuccE (SuccO (SuccE (SuccO Zero)))
The result is identical to the four
that I hosted:
$ curl sigil.place/post/0/four
λ(Even : *)
→ λ(Odd : *)
→ λ(Zero : Even)
→ λ(SuccE : ∀(pred : Odd) → Even)
→ λ(SuccO : ∀(pred : Even) → Odd)
→ SuccE (SuccO (SuccE (SuccO Zero)))
We can even encode functions over mutually recursive types like doubleEven
and doubleOdd
. You can verify that the ones I wrote are not recursive:
... and then we can test that they work by doubling the number four
:
$ morte
#http://sigil.place/post/0/doubleEven
#http://sigil.place/post/0/four
<Ctrl-D>
∀(Even : *)
→ ∀(Odd : *)
→ ∀(Zero : Even)
→ ∀(SuccE : ∀(pred : Odd) → Even)
→ ∀(SuccO : ∀(pred : Even) → Odd)
→ Even
λ(Even : *)
→ λ(Odd : *)
→ λ(Zero : Even)
→ λ(SuccE : ∀(pred : Odd) → Even)
→ λ(SuccO : ∀(pred : Even) → Odd)
→ SuccE (SuccO (SuccE (SuccO (SuccE (SuccO (SuccE (SuccO Zero)))))))
We get back the Even
number eight encoded in the calculus of constructions.
Stack traces
morte
will provide a "stack trace" if there is a type error or parse error:
$ echo "\(a : *) ->" > foo # This is a malformed program
$ echo "#foo" > bar
$ echo "#bar" > baz
$ echo "#baz" > qux
$ morte < qux
morte:
⤷ #qux
⤷ #baz
⤷ #bar
⤷ #foo
Line: 2
Column: 1
Parsing: EOF
Error: Parsing failed
You can learn more about how morte
's import system works by reading the newly add "Imports" section of the morte
tutorial.
Comparison to other software architectures
morte
's code distribution system most closely resembles the distribution model of Javascript, meaning that code can be downloaded from any URL and is compiled or interpreted client-side. The most important difference between the two is the granularity of imports and the import mechanism.
In morte
the unit of distribution is individual types, terms, and constructors and you can inject a remote expression anywhere in the syntax tree by referencing its URL. This is why we can do crazy things like use a URL for a type:
λ(b : #http://sigil.place/post/0/Bool ) → ...
The second difference is that morte
is designed from the ground up to be typed and highly optimizable (analogous to asm.js
, a restricted subset of Javascript designed for ease of optimization).
The third difference is that morte
lets you precisely delimit what remote code can do using the type system, unlike Javascript.
Future directions
This is just one piece of the puzzle in a long-term project of mine to build a typed and distributed intermediate language that we can use to share code across language boundaries. I want to give people the freedom to program in the language of their choice while still interoperating freely with other languages. In other words, I'm trying to build a pandoc
for programming languages.
However, this project is still not really usable, even in anger. There are several missing features to go, some of which will be provided by my upcoming annah
library:
Requirement #1: There needs to be a way to convert between restricted subsets of existing programming languages and the calculus of constructions
annah
currently provides logic to encode medium-level language abstractions to and from the calculus of constructions. In fact, that's how I converted the Haskell example at the beginning of this post into the calculus of constructions. For example, I used annah
to derive how to encode the SuccE
constructor in the calculus of constructions:
$ annah compile
type Even
data Zero
data SuccE (pred : Odd)
type Odd
data SuccO (pred : Even)
in SuccE
<Ctrl-D>
... and annah
correctly deduced the type and value in the calculus of constructions:
∀(pred : ∀(Even : *)
→ ∀(Odd : *)
→ ∀(Zero : Even)
→ ∀(SuccE : ∀(pred : Odd)
→ Even )
→ ∀(SuccO : ∀(pred : Even) → Odd) → Odd)
→ ∀(Even : *)
→ ∀(Odd : *)
→ ∀(Zero : Even)
→ ∀(SuccE : ∀(pred : Odd) → Even)
→ ∀(SuccO : ∀(pred : Even) → Odd)
→ Even
λ(pred : ∀(Even : *)
→ ∀(Odd : *)
→ ∀(Zero : Even)
→ ∀(SuccE : ∀(pred : Odd) → Even)
→ ∀(SuccO : ∀(pred : Even) → Odd)
→ Odd )
→ λ(Even : *)
→ λ(Odd : *)
→ λ(Zero : Even)
→ λ(SuccE : ∀(pred : Odd) → Even)
→ λ(SuccO : ∀(pred : Even) → Odd)
→ SuccE (pred Even Odd Zero SuccE SuccO)
Among other things, annah
automates the algorithm from "Automatic synthesis of typed Λ-programs on term algebras", which is known as "Böhm-Berarducci encoding".
Requirement #2: There must be an efficient way to transmit bytes, text, and numbers alongside code.
My plan is to transmit this information out-of-band as a separate file rather than embedding the data directly within the code and annah
will provide a systematic convention for distributing data and referencing that data within source code.
Requirement #3: There needs to be a standard library of types, data structures, functions, and side effects that all target languages must support.
In other words, there needs to be some sort of thrift
for code so that languages can maximize code sharing.
Requirement #4: There must be better tooling for mass installation and hosting of expressions.
For example, I'd like to be able to alias all imports within a remote directory to local files with a single command.
Requirement #5: I need to figure out a way to mesh type inference with an expression-level distribution system.
As far as I can tell this is still an open research problem and this is most likely going to be the greatest obstacle to making this usable in practice.
Resources
If you would like to learn more about Morte or contribute, then check out the following resources: