Friday, March 27, 2015

Algebraic side effects

Haskell differentiates itself from most other functional languages by letting you reason mathematically about programs with side effects. This post begins with a pure Haskell example that obeys algebraic equations and then generalizes the example to impure code that still obeys the same equations.

Algebra

In school, you probably learned algebraic rules like this one:

f * (xs + ys) = (f * xs) + (f * ys)

Now let's make the following substitutions:

  • Replace the mathematical multiplication with Haskell's map function
  • Replace the mathematical addition with Haskell's ++ operator

These two substitutions produce the following Haskell equation:

map f (xs ++ ys) = (map f xs) ++ (map f ys)

In other words, if you concatenate the list xs with the list ys and then map a function named f over the combined list, the result is indistinguishable from mapping f over each list individually and then concatenating them.

Let's test this equation out using the Haskell REPL:

>>> map (+ 1) ([2, 3] ++ [4, 5])
[3,4,5,6]
>>> (map (+ 1) [2, 3]) ++ (map (+ 1) [4, 5])
[3,4,5,6]

Evaluation order

However, the above equation does not hold in most other languages. These other languages use function evaluation to trigger side effects, and therefore if you change the order of evaluation you change the order of side effects.

Let's use Scala to illustrate this. Given the following definitions:

>>> def xs() = { print("!"); Seq(1, 2) }
>>> def ys() = { print("?"); Seq(3, 4) }
>>> def f(x : Int) = { print("*"); x + 1 }

.. the order of side effects differs depending on whether I concatenate or map first:

>>> (xs() ++ ys()).map(f)
!?****res0: Seq[Int] = List(2, 3, 4, 5)
>>> (xs().map(f)) ++ (ys().map(f))
!**?**res1: Seq[Int] = List(2, 3, 4, 5)

One line #1, the two lists are evaluated first, printing "!" and "?", followed by evaluating the function f on all four elements, printing "*" four times. On line #2, we call f on each element of xs before beginning to evaluate ys. Since evaluation order matters in Scala we get two different programs which print the punctuation characters in different order.

The solution

Haskell, on the other hand, strictly separates evaluation order from side effect order using a two-phase system. In the first phase you evaluate your program to build an abstract syntax tree of side effects. In the second phase the Haskell runtime executes the tree by interpreting it for its side effects. This phase distinction ensures that algebraic laws continue to behave even in the presence of side effects.

To illustrate this, we'll generalize our original Haskell code to interleave side effects with list elements and show that it still obeys the same algebraic properties as before. The only difference from before is that we will:

  • Generalize pure lists to their impure analog, ListT
  • Generalize functions to impure functions that wrap side effects with lift
  • Generalize (++) (list concatenation) to (<|>) (ListT concatenation)
  • Generalize map to (=<<), which streams elements through an impure function

This means that our new equation becomes:

-- f  *  (xs  +  ys) = (f  *  xs)  +  (f  * ys)
   f =<< (xs <|> ys) = (f =<< xs) <|> (f
=<< ys)

You can read this as saying: if we concatenate xs and ys and then stream their values through the impure function f, the behavior is indistinguishable from streaming each individual list through f first and then concatenating them.

Let's test this equation out with some sample definitions for xs, ys, and f that mirror their Scala analogs:

>>> import Control.Applicative
>>> import Pipes
>>> let xs = do { lift (putChar '!'); return 1
<|> return 2 }
>>> let ys = do { lift (putChar '?'); return 3
<|> return 4 }
>>> let f x = do { lift (putChar '*'); return (x + 1) }
>>> runListT (f =<< (xs <|> ys)) -- Note:
`runListT` discards the result
!**?**>>> runListT ((f =<< xs) <|> (f =<< ys))
!**?**>>>

The resulting punctuation order is identical. Many people mistakenly believe that Haskell's mathematical elegance breaks down when confronted with side effects, but nothing could be further from the truth.

Conclusion

Haskell preserves algebraic equations even in the presence of side effects, which simplifies reasoning about impure code. Haskell separates evaluation order from side effect order so that you spend less time reasoning about evaluation order and more time reasoning about your program logic.

Thursday, January 29, 2015

Use Haskell for shell scripting

Right now dynamic languages are popular in the scripting world, to the dismay of people who prefer statically typed languages for ease of maintenance.

Fortunately, Haskell is an excellent candidate for statically typed scripting for a few reasons:

  • Haskell has lightweight syntax and very little boilerplate
  • Haskell has global type inference, so all type annotations are optional
  • You can type-check and interpret Haskell scripts very rapidly
  • Haskell's function application syntax greatly resembles Bash

However, Haskell has had a poor "out-of-the-box" experience for a while, mainly due to:

  • Poor default types in the Prelude (specifically String and FilePath)
  • Useful scripting utilities being spread over a large number of libraries
  • Insufficient polish or attention to user experience (in my subjective opinion)

To solve this, I'm releasing the turtle library, which provides a slick and comprehensive interface for writing shell-like scripts in Haskell. I've also written a beginner-friendly tutorial targeted at people who don't know any Haskell.

Overview

turtle is a reimplementation of the Unix command line environment in Haskell. The best way to explain this is to show what a simple "turtle script" looks like:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = do
    cd "/tmp"
    mkdir "test"
    output "test/foo" "Hello, world!"  -- Write "Hello, world!" to "test/foo"
    stdout (input "test/foo")          -- Stream "test/foo" to stdout
    rm "test/foo"
    rmdir "test"
    sleep 1
    die "Urk!"

If you make the above file executable, you can then run the program directly as a script:

$ chmod u+x example.hs
$ ./example.hs
Hello, world!
example.hs: user error (Urk!)

The turtle library renames a lot of existing Haskell utilities to match their Unix counterparts and places them under one import. This lets you reuse your shell scripting knowledge to get up and going quickly.

Shell compatibility

You can easily invoke an external process or shell command using proc or shell:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = do
    mkdir "test"
    output "test/file.txt" "Hello!"
    proc "tar" ["czf", "test.tar.gz", "test"] empty

    -- or: shell "tar czf test.tar.gz test" empty

Even people unfamiliar with Haskell will probably understand what the above program does.

Portability

"turtle scripts" run on Windows, OS X and Linux. You can either compile scripts as native executables or interpret the scripts if you have the Haskell compiler installed.

Streaming

You can build or consume streaming sources. For example, here's how you print all descendants of the /usr/lib directory in constant memory:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = view (lstree "/usr/lib")

... and here's how you count the number of descendants:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import qualified Control.Foldl as Fold
import Turtle

main = do
    n <- fold (lstree "/usr/lib") Fold.length
    print n

... and here's how you count the number of lines in all descendant files:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import qualified Control.Foldl as Fold
import Turtle

descendantLines = do
    file <- lstree "/usr/lib"
    True <- liftIO (testfile file)
    input file

main = do
    n <- fold descendantLines Fold.length
    print n

Exception Safety

turtle ensures that all acquired resources are safely released in the face of exceptions. For example, if you acquire a temporary directory or file, turtle will ensure that it's safely deleted afterwards:

example = do
    dir <- using (mktempdir "/tmp" "test")
    liftIO (die "The temporary directory will still be deleted!")

However, exception safety comes at a price. turtle forces you to consume all streams in their entirety so you can't lazily consume just the initial portion of a stream. This was a tradeoff I chose to keep the API as simple as possible.

Patterns

turtle supports Patterns, which are like improved regular expressions. Use Patterns as lightweight parsers to extract typed values from unstructured text:

$ ghci
>>> :set -XOverloadedStrings
>>> import Turtle
>>> data Pet = Cat | Dog deriving (Show)
>>> let pet = ("cat" *> return Cat) <|> ("dog" *> return Dog) :: Pattern Pet
>>> match pet "dog"
>>> [Dog]
>>> match (pet `sepBy` ",") "cat,dog,cat"
[[Cat,Dog,Cat]]

You can also use Patterns as arguments to commands like sed, grep, find and they do the right thing:

>>> stdout (grep (prefix "c") "cat")             -- grep '^c'
cat
>>> stdout (grep (has ("c" <|> "d")) "dog")      -- grep 'cat\|dog'
dog
>>> stdout (sed (digit *> return "!") "ABC123")  -- sed 's/[[:digit:]]/!/g'
ABC!!!

Unlike many Haskell parsers, Patterns are fully backtracking, no exceptions.

Formatting

turtle supports typed printf-style string formatting:

>>> format ("I take "%d%" "%s%" arguments") 2 "typed"
"I take 2 typed arguments"

turtle even infers the number and types of arguments from the format string:

>>> :type format ("I take "%d%" "%s%" arguments")
format ("I take "%d%" "%s%" arguments") :: Text -> Int -> Text

This uses a simplified version of the Format type from the formatting library. Credit to Chris Done for the great idea.

The reason I didn't reuse the formatting library was that I spent a lot of effort keeping the types as simple as possible to improve error messages and inferred types.

Learn more

turtle doesn't try to ambitiously reinvent shell scripting. Instead, turtle just strives to be a "better Bash". Embedding shell scripts in Haskell gives you the the benefits of easy refactoring and basic sanity checking for your scripts.

You can find the turtle library on Hackage or Github. Also, turtle provides an extensive beginner-friendly tutorial targeted at people who don't know any Haskell at all.

Saturday, January 10, 2015

total-1.0.0: Exhaustive pattern matching using traversals, prisms, and lenses

The lens library provides Prisms, which are a powerful way to decouple a type's interface from its internal representation. For example, consider the Either type from the Haskell Prelude:

data Either a b = Left a | Right b

Instead of exposing the Left and Right constructors, you could instead expose the following two Prisms:

_Left  :: Prism (Either a b) (Either a' b ) a a'

_Right :: Prism (Either a b) (Either a  b') b b'

Everything you can do with a constructor, you can do with the equivalent Prism. For example, if you want to build a value using a Prism, you use review:

review _Left  :: a -> Either a b
review _Right :: b -> Either a b

You can also kind-of pattern match on the Prism using preview, returning a Just if the match succeeds or Nothing if the match fails:

preview _Left  :: Either a b -> Maybe a
preview _Right :: Either a b -> Maybe b

However, preview is not quite as good as real honest-to-god pattern matching, because if you wish to handle every branch you can't prove in the types that your pattern match was exhaustive:

nonTypeSafe :: Either Char Int -> String
nonTypeSafe e =
    case preview _Left e of
        Just c  -> replicate 3  c
        Nothing -> case preview _Right e of
            Just n  -> replicate n '!'
            Nothing -> error "Impossible!"  -- Unsatisfying

However, this isn't a flaw with Prisms, but rather a limitation of preview. Prisms actually preserve enough information in the types to support exhaustive pattern matching! The trick for this was described a year ago by Tom Ellis, so I decided to finally implement his idea as the total library.

Example

Here's an example of a total pattern matching using the total library:

import Lens.Family.Total
import Lens.Family.Stock

total :: Either Char Int -> String       -- Same as:
total = _case                            -- total = \case
    & on _Left  (\c -> replicate 3  c )  --     Left  c -> replicate 3 c
    & on _Right (\n -> replicate n '!')  --     Right n -> replicate n '!'

The style resembles LambdaCase. If you want to actually supply a value in the same expression, you can also write:

e & (_case
    & on _Left  ...
    & on _Right ... )

There's nothing unsafe going on under the hood. on is just 3 lines of code:

on p f g x = case p Left x of
    Left  l -> f l
    Right r -> g r

(&) is just an operator for post-fix function application:

x & f = f x

... and _case is just a synonym for a type class method that checks if a type is uninhabited.

class Empty a where
    impossible :: a -> x

_case = impossible

The rest of the library is just code to auto-derive Empty instances using Haskell generics. The entire library is about 40 lines of code.

Generics

Here's an example of exhaustive pattern matching on a user-defined data type:

{-# LANGUAGE DeriveGeneric   #-}
{-# LANGUAGE TemplateHaskell #-}

import Control.Lens.TH (makePrisms)
import GHC.Generics (Generic)
import Lens.Family.Total

data Example a b c = C1 a | C2 b | C3 c deriving (Generic)

makePrisms ''Example

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)

example :: Example String Char Int -> String  -- Same as:
example = _case                               -- example = \case
    & on _C1 (\s -> s              )          --     C1 s -> s
    & on _C2 (\c -> replicate 3  c )          --     C2 c -> replicate 3  c
    & on _C3 (\n -> replicate n '!')          --     C3 n -> replicate n '!'

This uses two features to remove most of the boiler-plate:

  • deriving (Generic) auto-generates the code for the Empty instance
  • makePrisms uses Template Haskell to auto-generate Prisms for our type.

For those who prefer the lens-family-* suite of libraries over lens, I opened up an issue against lens-family-th so that Lens.Family.TH.makeTraversals can be used in place of makePrisms for exhaustive pattern matching.

When we write this instance in our original code:

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)

... that uses Haskell generics to auto-generate the implementation of the impossible function:

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)
    impossible (C1 a) = impossible a
    impossible (C2 b) = impossible b
    impossible (C3 c) = impossible c

That says that the type Example a b c is uninhabited if and only if the types a, b, and c are themselves uninhabited.

When you write makePrisms ''Example, that creates the following three prisms:

_C1 :: Prism (Example a b c) (Example a' b  c ) a a'
_C2 :: Prism (Example a b c) (Example a  b' c ) b b'
_C3 :: Prism (Example a b c) (Example a  b  c') c c'

These Prisms are useful in their own right. You can export them in lieu of your real constructors and they are more powerful in several respects.

Backwards compatibility

There is one important benefit to exporting Prisms and hiding constructors: if you export Prisms you can change your data type's internal representation without changing your API.

For example, let's say that I change my mind and implement Example as two nested Eithers:

type Example a b c = Either a (Either b c)

Had I exposed my constructors, this would be a breaking change for my users. However, if I expose Prisms, then I can preserve the old API by just changing the Prism definition. Instead of auto-generating them using Template Haskell, I can just write:

import Lens.Family.Stock (_Left, _Right)

_C1 = _Left
_C2 = _Right . _Left
_C3 = _Right . _Right

Done! That was easy and the user of my Prisms are completely unaffected by the changes to the internals.

Under the hood

The trick that makes total work is that every branch of the pattern match subtly alters the type. The value that you match on starts out as:

Example String Char Int

... and every time you pattern match against a branch, the on function Voids one of the type parameters. The pattern matching flows from bottom to top:

example = _case
    & on _C1 (\s -> s              )  -- Step 3: Example Void   Void Void
    & on _C2 (\c -> replicate 3  c )  -- Step 2: Example String Void Void
    & on _C3 (\n -> replicate n '!')  -- Step 1: Example String Char Void
                                      -- Step 0: Example String Char Int

Finally, _case just checks that the final type (Example Void Void Void in this case) is uninhabited. All it does is check if the given type is an instance of the Empty type class, which only provides instances for uninhabited types. In this case Example Void Void Void is definitely uninhabited because Void (from Data.Void) is itself uninhabited:

instance Empty Void where
    impossible = absurd

Lenses

What's neat is that this solution works not only for Prisms and Traversals, but for Lenses, too! Check this out:

example :: (Int, Char) -> String     -- Same as:
example = _case                      -- example = \case
    & on _1 (\n -> replicate n '1')  --     (n, _) -> replicate n '1'

... and the identity lens works (which is just id) works exactly the way you'd expect:

example :: (Int, Char) -> String        -- Same as:
example = _case                         -- example = \case
    & on id (\(n, c) -> replicate n c)  --     (n, c) -> replicate n c

I didn't intend that when I wrote the library: it just fell naturally out of the types.

Conclusion

The total library now allows library authors to hide their constructors and export a backwards compatible Prism API all while still preserving exhaustive pattern matching.

Note that I do not recommend switching to a lens-only API for all libraries indiscriminately. Use your judgement when deciding whether perfect backwards compatibility is worth the overhead of a lens-only API, both for the library maintainers and your users.

I put the code under the Lens.Family.Total module, not necessarily because I prefer the Lens.Family.* hierarchy, but rather because I would like to see Edward Kmett add these utilities to lens, leaving my library primarily for users of the complementary lens-family-* ecosystem.

You can find the total library on Hackage or on GitHub.

Saturday, December 6, 2014

A very general API for relational joins

Maps and tuples are useful data types for modeling relational operations. For example, suppose we have the following table, indexed by the Id column:

| Id | First Name | Last Name |
|----|------------|-----------|
|  0 | Gabriel    | Gonzalez  |
|  1 | Oscar      | Boykin    |
|  2 | Edgar      | Codd      |

We can model that as a Map where the key is the Id column and the value is a tuple of FirstName and LastName:

import Data.Map (Map) -- from the `containers` library
import qualified Data.Map as Map

type Id        = Int
type FirstName = String
type LastName  = String

names :: Map Id (FirstName, LastName)
names = Map.fromList
    [ (0, ("Gabriel", "Gonzalez"))
    , (1, ("Oscar"  , "Boykin"  ))
    , (2, ("Edgar"  , "Codd"    ))
    ]

Now suppose we have another table containing Twitter handles, also indexed by Id:

| Id | Twitter Handle |
|----|----------------|
| 0  |  GabrielG439   |
| 1  |  posco         |
| 3  |  avibryant     |

We can also encode that as a Map:

type TwitterHandle = String

handles :: Map Id TwitterHandle
handles = Map.fromList
    [ (0, "GabrielG439")
    , (1, "posco"      )
    , (3, "avibryant"  )
    ]

One of the nice properties of Maps is that you can join them together:

import Control.Applicative

-- I'm using `join2` to avoid confusion with `Control.Monad.join`
join2 :: Map k v1 -> Map k v2 -> Map k (v1, v2)
join2 = ... -- Implementation elided for now

What if we could generalize join2 to work on types other than Map. Perhaps we could use the Applicative interface to implement join2:

join2 :: Applicative f => f a -> f b -> f (a, b)
join2 = liftA2 (,)

However, the Map type cannot implement Applicative in its current form. The reason why is that there is no sensible definition for pure:

pure :: v -> Map k v
pure = ???

This would require a Map that was defined for every key, which we cannot encode. Or can we?

Tables

Well, who says we need to use the Map type from containers? What if I were to encode my Map this way:

import Prelude hiding (lookup)

-- | A map encoded as a lookup function
newtype Table k v = Table { lookup :: k -> Maybe v }

-- | Encode a traditional map as a lookup function
from :: Ord k => Map k v -> Table k v
from m = Table (\k -> Map.lookup k m)

This new type of Map only permits a single operation: lookup, but because we constrain our API to this single operation we can now implement the full Applicative interface:

instance Functor (Table k) where
    fmap f (Table g) = Table (\k -> fmap f (g k))
           -- Same as: Table (fmap (fmap f) g)

instance Applicative (Table k) where
    pure v            = Table (\k -> Just v)
            -- Same as: Table (pure (pure v))

    Table f <*> Table x = Table (\k -> f k <*> x k)
              -- Same as: Table (liftA2 (<*>) f x)

We can promote conventional Maps to this new Table type using the above from function:

names' :: Table Id (FirstName, LastName)
names' = from names

handles' :: Table Id TwitterHandle
handles' = from handles

... and now the more general Applicative join2 will work on these two tables:

>>> let table = join2 names' handles'
>>> :type table
table :: Table Id ((FirstName, LastName), TwitterHandle)
>>> lookup table 0
Just (("Gabriel","Gonzalez"),"GabrielG439")
>>> lookup table 1
Just (("Oscar","Boykin"),"posco")
>>> lookup table 2
Nothing

However, in its present form we can't dump the table's contents because we don't know which keys are present in the table. Let's fix that by adding an additional field to the Table type listing the keys. We will treat functions as being defined for all keys:

import Data.Set (Set)
import qualified Data.Set as Set 

data Keys k = All | Some (Set k)

instance Ord k => Num (Keys k) where
    fromInteger 0         = Some Set.empty
    fromInteger n | n > 0 = All

    All     + _       = All
    _       + All     = All
    Some s1 + Some s2 = Some (Set.union s1 s2)

    All     * ks      = ks
    ks      * All     = ks
    Some s1 * Some s2 = Some (Set.intersection s1 s2)

-- | A map encoded as a lookup function
data Table k v = Table
    { keys   :: Keys k
    , lookup :: k -> Maybe v
    }

-- | Encode a traditional map as a lookup function
from :: Ord k => Map k v -> Table k v
from m = Table
    { keys   = Some (Set.fromList (Map.keys m))
    , lookup = \k -> Map.lookup k m
    }

Even after extending Table with keys we can still implement the Applicative interface:

instance Functor (Table k) where
    fmap f (Table ks g) = Table ks (fmap (fmap f) g)

instance Ord k => Applicative (Table k) where
    pure v =
        Table 1 (pure (pure v))

    Table s1 f <*> Table s2 x =
        Table (s1 * s2) (liftA2 (<*>) f x)

... and now we can add a Show instance, too!

instance (Show k, Show v) => Show (Table k v) where
    show (Table ks f) = case ks of
        All    -> "<function>"
        Some s -> unlines (do
            k <- Set.toList s
            let Just v = f k
            return (show (k, v)) )

Let's give it a test drive:

>>> names'
(0,("Gabriel","Gonzalez"))
(1,("Oscar","Boykin"))
(2,("Edgar","Codd"))

>>> handles'
(0,"GabrielG439")
(1,"posco")
(3,"avibryant")

>>> join2 names' handles'
(0,(("Gabriel","Gonzalez"),"GabrielG439"))
(1,(("Oscar","Boykin"),"posco"))

So far, so good!

Alternative

However, we need to support more than just inner joins. We'd also like to support left, right, and outer joins, too.

Conceptually, a left join is one in which values from the right table may be optionally present. One way we could implement this would be to define a function that converts a finite map to a function defined on all keys. This function will return Nothing for keys not present in the original finite map and Just for keys that were present:

optional :: Table k v -> Table k (Maybe v)
optional (Table ks f) =
    Table All (\k -> fmap Just (f k) <|> pure Nothing)

Then we could define leftJoin in terms of join2 and optional:

leftJoin :: Table k a -> Table k b -> Table k (a, Maybe b)
leftJoin t1 t2 = join2 t1 (optional t2)

However, if we try to compile the above code, the compiler will give us a really interesting error message:

Ambiguous occurrence ‘optional’
It could refer to either ‘Main.optional’,
                      or ‘Control.Applicative.optional’

Apparently, Control.Applicative has an optional function, too. Let's pause to check out the type of this mysterious function:

optional :: Alternative f => f v -> f (Maybe v)

Wow! That type signature is suprisingly similar to the one we wrote. In fact, if Table k implemented the Alternative interface, the types would match.

Alternative is a type class (also provided by Control.Applicative) that greatly resembles the Monoid type class:

class Applicative f => Alternative f where
    empty :: f a

    (<|>) :: f a -> f a -> f a

... and the core Alternative laws are identical to the Monoid laws:

x <|> empty = x

empty <|> x = x

(x <|> y) <|> z = x <|> (y <|> z)

Let's dig even deeper and see how optional uses this Alternative type class:

optional v = fmap Just v <|> pure Nothing

Even the implementation is eerily similar! This strongly suggests that we should find a way to make our Table type implement Alternative. Perhaps something like this would work:

instance Ord k => Alternative (Table k) where
    empty =
        Table 0 (pure empty)

    Table ks1 f1 <|> Table ks2 f2 =
        Table (ks1 + ks2) (liftA2 (<|>) f1 f2)

Compare this to our Applicative instance (duplicated here):

instance Ord k => Applicative (Table k) where
    pure v =
        Table 1 (pure (pure v))

    Table s1 f <*> Table s2 x =
        Table (s1 * s2) (liftA2 (<*>) f x)

The Alternative instance mirrors our Applicative instance except that we:

  • replace pure v with empty
  • replace (<*>) with (<|>)
  • replace 1 with 0
  • replace (*) with (+)

... and what's really neat is that now the optional function from Control.Applicative behaves just like the original optional function we wrote. (Exercise: Use equational reasoning to verify this)

Derived joins

Armed with this Alternative instance, we can now implement leftJoin in terms of the optional provided by Control.Applicative:

leftJoin t1 t2 = join2 t1 (optional t2)

Sure enough, it works:

>>> leftJoin names' handles'
(0,(("Gabriel","Gonzalez"),Just "GabrielG439"))
(1,(("Oscar","Boykin"),Just "posco"))
(2,(("Edgar","Codd"),Nothing)

Let's check out the type that the compiler infers for leftJoin:

>>> :type leftJoin
leftJoin :: Alternative f => f a -> f b -> f (a, Maybe b)

Notice how there's no longer anything Table-specific about leftJoin. It works for anything that implements the Alternative interface. I could leftJoin two Maybes if I really wanted to:

>>> leftJoin (Just 1) (Just 2)
Just (1,Just 2)
>>> leftJoin (Just 1) Nothing
Just (1,Nothing)
>>> leftJoin Nothing (Just 1)
Nothing

... or two lists:

>>> leftJoin [1, 2] [3, 4]
[(1,Just 3),(1,Just 4),(1,Nothing),(2,Just 3),(2,Just 4),(2,
Nothing)]

In fact, I don't even really need specialized leftJoin or rightJoin functions. optional is sufficiently light-weight that I could inline a right join on the fly:

>>> join2 (optional names') handles'
(0,(Just ("Gabriel","Gonzalez"),"GabrielG439"))
(1,(Just ("Oscar","Boykin"),"posco"))
(3,(Nothing,"avibryant"))

What happens if I try to do an "outer join"?

>>> -- DISCLAIMER: Technically not the same as an SQL outer join
>>> let o = join2 (optional names') (optional handles')
>>> o
<function>

The above "outer join" is defined for all keys (because both sides are optional), so we get back a function! While we can't list the Table (because it's conceptually infinite), we can still perform lookups on it:

>>> lookup o 0
Just (Just ("Gabriel","Gonzalez"),Just "GabrielG439")
>>> lookup o 2
Just (Just ("Edgar","Codd"),Nothing)
>>> lookup o 3
Just (Nothing,Just "avibryant")
>>> lookup o 4
Just (Nothing, Nothing)

... and if we were to join our "infinite" table against a finite table, we get back a finite table (Exercise: Try it! Define a new finite table to join against o and see what happens)

What's nice about optional is that we can easily left-join or right-join in multiple tables at a time. If I had four tables of types:

t1 :: Table k a
t2 :: Table k b
t3 :: Table k c
t4 :: Table k d

... I could left join t2, t3, and t4 into t1 by just writing:

liftA4 (,,,) t1 (optional t2) (optional t3) (optional t4)
    :: Table k (a, Maybe b, Maybe c, Maybe d)

Now that I think about it, I don't even really need to provide join2/join3/join4/join5 since they are not much shorter than using the liftA family of functions in Control.Applicative:

-- Exercise: What would `join1` be?
join2 = liftA2 (,)
join3 = liftA3 (,,)
join4 = liftA4 (,,,)
join5 = liftA5 (,,,,)

In other words, I can implement almost any imaginable join just by using liftA{n} and some permutation of optionals. I don't even know what I'd call this join:

liftA5 (,,,,) t1 (optional t2) t3 (optional t4) t5

... but the beauty is that I don't have to give a name for it. I can easily write anonymous joins on the fly using the Control.Applicative module. Moreover, the above code will work for anything that implements the Alternative interface.

Conclusion

Control.Applicative provides a very general API for relational joins: the Alternative type class (which includes Applicative, since Applicative is a super-class of Alternative). Perhaps Control.Applicative could be improved slightly by providing the join{n} family of functions listed above, but it's still highly usable in its present state.

Note that this trick only works for relational abstractions embedded within Haskell. This API can be generalized for external relational data stores (i.e. Postgres), which I will cover in a subsequent post.

Sunday, November 23, 2014

How to build library-agnostic streaming sources

The Haskell ecosystem has numerous libraries for effectful stream programming, including, but not limited to:

  • List
  • conduit
  • enumerator
  • io-streams
  • iterIO
  • iteratee
  • list-t
  • logict
  • machines
  • pipes

Unfortunately, this poses a problem for library writers. Which streaming library should they pick when providing effectful streaming components?

Sometimes the correct answer is: none of the above! We can often build streaming and effectful generators using only the base and transformers libraries!

The trick is to build polymorphic generators using only the MonadPlus and MonadTrans type classes. These generators can then be consumed by any library that provides an implementation of ListT that implements MonadPlus and MonadTrans.

MonadPlus

I like to think of MonadPlus as the "list building" type class. This is because you can assemble a list using return, mzero, and mplus:

>>> import Control.Monad (MonadPlus(..))
>>> mzero :: [Int]
[]
>>> return 1 :: [Int]
[1]
>>> return 1 `mplus` return 2 :: [Int]  -- [1] ++ [2]
[1, 2]

In other words, mzero is analogous to [], mplus is analogous to (++), and return builds a singleton list.

However, many things other than lists implement MonadPlus, including every implementation of ListT in the wild. Therefore, if we build collections using MonadPlus operations these collections will type-check as ListT streams as well.

Let's provide the following helper function to convert a list to this more general MonadPlus type:

select :: MonadPlus m => [a] -> m a
select  []    = mzero
select (x:xs) = return x `mplus` select xs

-- or: select = foldr (\x m -> return x `mplus` m) mzero

Note that this select function has some nice mathematical properties:

select (xs `mplus` ys) = (select xs) `mplus` (select ys)
select  mzero          = mzero

-- This assumes the distributivity law for `MonadPlus`
select . (f >=> g) = (select . f) >=> (select . g)
select . return    = return

MonadTrans

Using select and lift (from MonadTrans), we can build list comprehensions that interleave effects, like this:

example :: (MonadTrans t, MonadPlus (t IO)) => t IO ()
example = do
    x <- select [1, 2, 3]
    lift (putStrLn ("x = " ++ show x))
    y <- select [4, 5, 6]
    lift (putStrLn ("y = " ++ show y))

You can read the above code as saying:

  • Let x range from 1 to 3
  • Print x every time it selects a new value
  • For each value of x, let y range from 4 to 6
  • Print y every time it selects a new value

Notice how example doesn't depend on any particular streaming library, so we can run it with diverse ListT implementations, all of which implement MonadPlus and MonadTrans:

>>> Pipes.runListT example  -- This requires `pipes-4.1.4`
x = 1
y = 4
y = 5
y = 6
x = 2
y = 4
y = 5
y = 6
x = 3
y = 4
y = 5
y = 6
>>> _ <- Control.Monad.Logic.observeAllT example
<Exact same output>
>>> _ <- ListT.toList example
<Exact same output>

However, we can use this trick for more than just list comprehensions. We can build arbitrary lazy and effectful streams this way!

Generators

Here's an example of a generator that lazily emits lines read from standard input:

import Control.Monad (MonadPlus(..))
import Control.Monad.Trans.Class (MonadTrans(..))
import System.IO (isEOF)

stdinLn :: (MonadTrans t, MonadPlus (t IO)) => t IO String
stdinLn = do
    eof <- lift isEOF
    if eof
        then mzero
        else lift getLine `mplus` stdinLn

You can read the above code as saying:

  • Check if we are at the end of the file
  • If we're done, then return an empty list
  • If we're not done, prepend a getLine onto a recursive call to stdinLn

We can prove this works by writing a program that forwards these lines to standard output:

echo :: (MonadTrans t, MonadPlus (t IO)) => t IO ()
echo = do
    str <- stdinLn
    lift (putStrLn str)

Now we can run echo using any of our favorite ListT implementations and it will do the right thing, streaming lines lazily from standard input to standard output in constant space:

>>> Pipes.runListT echo
Test<Enter>
Test
ABC<Enter>
ABC
42<Enter>
42
<Ctrl-D>
>>>

The exception is the transformers library, whose ListT implementation does not stream or run in constant space.

Combinators

We can also implement lazy variations on Control.Monad combinators using this interface.

For example, we can implement a lazy variation on replicateM using just select and replicate:

replicateM' :: MonadPlus m => Int -> m a -> m a
replicateM' n m = do
    m' <- select (replicate n m)
    m'

-- or: replicateM' n = join . select . replicate n

We can use this lazy replicateM' to lazily echo 10 lines from standard input to standard output:

example :: (MonadTrans t, MonadPlus (t IO)) => t IO ()
example = do
    str <- replicateM' 10 (lift getLine)
    lift (putStrLn str)

We can also implement a lazy mapM and forM, too, except now their implementations are so trivial that they don't even deserve their own functions:

mapM' :: Monad m => (a -> m b) -> m a -> m b
mapM' = (=<<)

forM' :: MOnad m => m a -> (a -> m b) -> m a
forM' = (>>=)

example :: (MonadTrans t, MonadPlus (t IO)) => t IO ()
example = mapM' (lift . print) (replicateM' 10 (lift getLine))

Similarly, a lazy sequence just becomes join.

Conclusion

The following streaming libraries already provide their own implementation of ListT compatible with the above trick:

  • List
  • list-t
  • LogicT
  • pipes

The other streaming libraries do not currently provide a ListT implementation, but there is no reason why they couldn't! For example, conduit could implement an internal ListT type of its own and use that as an intermediate type for converting the above abstraction to the public Source type:

convert :: Monad m
        => Data.Conduit.Internal.ListT m a -> Source m a

This polymorphic API obviously does not capture all possible streaming patterns. For example, you can not implement something like take using this API. However, I suspect that a significant number of streaming components can be written using this dependency-free interface.

Edit: Twan van Laarhoven pointed out that you can sometimes use MonadIO instead of MonadTrans, which produces nicer constraints

Sunday, October 26, 2014

How to desugar Haskell code

Haskell's core language is very small, and most Haskell code desugars to either:

  • lambdas / function application,
  • algebraic data types / case expressions,
  • recursive let bindings,
  • type classes and specialization, or:
  • Foreign function calls

Once you understand those concepts you have a foundation for understanding everything else within the language. As a result, the language feels very small and consistent.

I'll illustrate how many higher-level features desugar to the same set of lower-level primitives.

if

if is equivalent to a case statement:

if b then e1 else e2

-- ... is equivalent to:
case b of
    True  -> e1
    False -> e2

This works because Bools are defined within the language:

data Bool = False | True

Multi-argument lambdas

Lambdas of multiple arguments are equivalent to nested lambdas of single arguments:

\x y z -> e

-- ... is equivalent to:
\x -> \y -> \z -> e

Functions

Functions are equivalent to lambdas:

f x y z = e

-- ... is equivalent to:
f = \x y z -> e

-- ... which in turn desugars to:
f = \x -> \y -> \z -> e

As a result, all functions of multiple arguments are really just nested functions of one argument each. This trick is known as "currying".

Infix functions

You can write functions of at least two arguments in infix form using backticks:

x `f` y

-- ... desugars to:
f x y

Operators

Operators are just infix functions of two arguments that don't need backticks. You can write them in prefix form by surrounding them with parentheses:

x + y

-- ... desugars to:
(+) x y

The compiler distinguishes operators from functions by reserving a special set of punctuation characters exclusively for operators.

Operator parameters

The parentheses trick for operators works in other contexts, too. You can bind parameters using operator-like names if you surround them with parentheses:

let f (%) x y = x % y
in  f (*) 1 2

-- ... desugars to:
(\(%) x y -> x % y) (*) 1 2

-- ... reduces to:
1 * 2

Operator sections

You can partially apply operators to just one argument using a section:

(1 +)

-- desugars to:
\x -> 1 + x

This works the other way, too:

(+ 1)

-- desugars to:
\x -> x + 1

This also works with infix functions surrounded by backticks:

(`f` 1)

-- desugars to:
\x -> x `f` 1

-- desugars to:
\x -> f x 1

Pattern matching

Pattern matching on constructors desugars to case statements:

f (Left  l) = eL
f (Right r) = eR

-- ... desugars to:
f x = case x of
    Left  l -> eL
    Right r -> eR

Pattern matching on numeric or string literals desugars to equality tests:

f 0 = e0
f _ = e1

-- ... desugars to:
f x = if x == 0 then e0 else e1

-- ... desugars to:
f x = case x == 0 of
    True  -> e0
    False -> e1

Non-recursive let / where

Non-recursive lets are equivalent to lambdas:

let x = y in z

-- ... is equivalent to:
(\x -> z) y

Same thing for where, which is identical in purpose to let:

z where x = y

-- ... is equivalent to:
(\x -> z) y
Actually, that's not quite true, because of let generalization, but it's close to the truth.

Recursive let / where cannot be desugared like this and should be treated as a primitive.

Top-level functions

Multiple top-level functions can be thought of as one big recursive let binding:

f0 x0 = e0

f1 x1 = e1

main = e2


-- ... is equivalent to:
main = let f0 x0 = e0
           f1 x1 = e1
       in  e2

In practice, Haskell does not desugar them like this, but it's a useful mental model.

Imports

Importing modules just adds more top-level functions. Importing modules has no side effects (unlike some languages), unless you use Template Haskell.

Type-classes

Type classes desugar to records of functions under the hood where the compiler implicitly threads the records throughout the code for you.

class Monoid m where
    mappend :: m -> m -> m
    mempty :: m

instance Monoid Int where
    mappend = (+)
    mempty  = 0

f :: Monoid m => m -> m
f x = mappend x x


-- ... desugars to:
data Monoid m = Monoid
    { mappend :: m -> m -> m
    , mempty  :: m
    }

intMonoid :: Monoid Int
intMonoid = Monoid
    { mappend = (+)
    , mempty  = 0
    }

f :: Monoid m -> m -> m
f (Monoid p z) x = p x x

... and specializing a function to a particular type class just supplies the function with the appropriate record:

g :: Int -> Int
g = f

-- ... desugars to:
g = f intMonoid

Two-line do notation

A two-line do block desugars to the infix (>>=) operator:

do x <- m
   e

-- ... desugars to:
m >>= (\x ->
e )

One-line do notation

For a one-line do block, you can just remove the do:

main = do putStrLn "Hello, world!"

-- ... desugars to:
main = putStrLn "Hello, world!"

Multi-line do notation

do notation of more than two lines is equivalent to multiple nested dos:

do x <- mx
   y <- my
   z

-- ... is equivalent to:
do x <- mx
   do y <- my
      z

-- ... desugars to:
mx >>= (\x ->
my >>= (\y ->
z ))

let in do notation

Non-recursive let in a do block desugars to a lambda:

do let x = y
   z

-- ... desugars to:
(\x -> z) y

ghci

The ghci interactive REPL is analogous to one big do block (with lots and lots of caveats):

$ ghci
>>> str <- getLine
>>> let str' = str ++ "!"
>>> putStrLn str'

-- ... is equivalent to the following Haskell program:
main = do
    str <- getLine
    let str' = str ++ "!"
    putStrLn str'

-- ... desugars to:
main = do
    str <- getLine
    do let str' = str ++ "!"
       putStrLn str'

-- ... desugars to:
main =
    getLine >>= (\str ->
    do let str' = str ++ "!"
       putStrLn str' )

-- ... desugars to:
main =
    getLine >>= (\str ->
    (\str' -> putStrLn str') (str ++ "!") )

-- ... reduces to:
main =
    getLine >>= (\str ->
    putStrLn (str ++ "!") )

List comprehensions

List comprehensions are equivalent to do notation:

[ (x, y) | x <- mx, y <- my ]

-- ... is equivalent to:

do x <- mx
   y <- my
   return (x, y)

-- ... desugars to:
mx >>= (\x -> my >>= \y -> return (x, y))

-- ... specialization to lists:
concatMap (\x -> concatMap (\y -> [(x, y)]) my) mx

The real desugared code is actually more efficient, but still equivalent.

The MonadComprehensions language extension generalizes list comprehension syntax to work with any Monad. For example, you can write an IO comprehension:

>>> :set -XMonadComprehensions
>>> [ (str1, str2) | str1 <- getLine, str2 <- getLine ]
Line1<Enter>
Line2<Enter>
("Line1", "Line2")

Numeric literals

Integer literals are polymorphic by default and desugar to a call to fromIntegral on a concrete Integer:

1 :: Num a => a

-- desugars to:
fromInteger (1 :: Integer)

Floating point literals behave the same way, except they desugar to fromRational:

1.2 :: Fractional a => a

-- desugars to:
fromRational (1.2 :: Rational)

IO

You can think of IO and all foreign function calls as analogous to building up a syntax tree describing all planned side effects:

main = do
    str <- getLine
    putStrLn str
    return 1


-- ... is analogous to:
data IO r
    = PutStrLn String (IO r)
    | GetLine (String -> IO r)
    | Return r

instance Monad IO where
    (PutStrLn str io) >>= f = PutStrLn str (io >>= f)
    (GetLine k      ) >>= f = GetLine (\str -> k str >>= f)
    Return r          >>= f = f r

main = do
    str <- getLine
    putStrLn str
    return 1

-- ... desugars and reduces to:
main =
    GetLine (\str ->
    PutStrLn str (
    Return 1 ))

This mental model is actually very different from how IO is implemented under the hood, but it works well for building an initial intuition for IO.

For example, one intuition you can immediately draw from this mental model is that order of evaluation in Haskell has no effect on order of IO effects, since evaluating the syntax tree does not actually interpret or run the tree. The only way you can actually animate the syntax tree is to define it to be equal to main.

Conclusion

I haven't covered everything, but hopefully that gives some idea of how Haskell translates higher level abstractions into lower-level abstractions. By keeping the core language small, Haskell can ensure that language features play nicely with each other.

Note that Haskell also has a rich ecosystem of experimental extensions, too. Many of these are experimental because each new extension must be vetted to understand how it interacts with existing language features.

Friday, September 12, 2014

Morte: an intermediate language for super-optimizing functional programs

The Haskell language provides the following guarantee (with caveats): if two programs are equal according to equational reasoning then they will behave the same. On the other hand, Haskell does not guarantee that equal programs will generate identical performance. Consequently, Haskell library writers must employ rewrite rules to ensure that their abstractions do not interfere with performance.

Now suppose there were a hypothetical language with a stronger guarantee: if two programs are equal then they generate identical executables. Such a language would be immune to abstraction: no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected.

Here I will introduce such an intermediate language named Morte that obeys this stronger guarantee. I have not yet implemented a back-end code generator for Morte, but I wanted to pause to share what I have completed so far because Morte uses several tricks from computer science that I believe deserve more attention.

Morte is nothing more than a bare-bones implementation of the calculus of constructions, which is a specific type of lambda calculus. The only novelty is how I intend to use this lambda calculus: as a super-optimizer.

Normalization

The typed lambda calculus possesses a useful property: every term in the lambda calculus has a unique normal form if you beta-reduce everything. If you're new to lambda calculus, normalizing an expression equates to indiscriminately inlining every function call.

What if we built a programming language whose intermediate language was lambda calculus? What if optimization was just normalization of lambda terms (i.e. indiscriminate inlining)? If so, then we would could abstract freely, knowing that while compile times might increase, our final executable would never change.

Recursion

Normally you would not want to inline everything because infinitely recursive functions would become infinitely large expressions. Fortunately, we can often translate recursive code to non-recursive code!

I'll demonstrate this trick first in Haskell and then in Morte. Let's begin from the following recursive List type along with a recursive map function over lists:

import Prelude hiding (map, foldr)

data List a = Cons a (List a) | Nil

example :: List Int
example = Cons 1 (Cons 2 (Cons 3 Nil))

map :: (a -> b) -> List a -> List b
map f  Nil       = Nil
map f (Cons a l) = Cons (f a) (map f l)

-- Argument order intentionally switched
foldr :: List a -> (a -> x -> x) -> x -> x
foldr  Nil       c n = n
foldr (Cons a l) c n = c a (foldr l c n)

result :: Int
result = foldr (map (+1) example) (+) 0

-- result = 9

Now imagine that we disable all recursion in Haskell: no more recursive types and no more recursive functions. Now we must reject the above program because:

  • the List data type definition recursively refers to itself

  • the map and foldr functions recursively refer to themselves

Can we still encode lists in a non-recursive dialect of Haskell?

Yes, we can!

-- This is a valid Haskell program

{-# LANGUAGE RankNTypes #-}

import Prelude hiding (map, foldr)

type List a = forall x . (a -> x -> x) -> x -> x

example :: List Int
example = \cons nil -> cons 1 (cons 2 (cons 3 nil))

map :: (a -> b) -> List a -> List b
map f l = \cons nil -> l (\a x -> cons (f a) x) nil

foldr :: List a -> (a -> x -> x) -> x -> x
foldr l = l

result :: Int
result = foldr (map (+ 1) example) (+) 0

-- result = 9

Carefully note that:

  • List is no longer defined recursively in terms of itself

  • map and foldr are no longer defined recursively in terms of themselves

Yet, we somehow managed to build a list, map a function over the list, and fold the list, all without ever using recursion! We do this by encoding the list as a fold, which is why foldr became the identity function.

This trick works for more than just lists. You can take any recursive data type and mechanically transform the type into a fold and transform functions on the type into functions on folds. If you want to learn more about this trick, the specific name for it is "Boehm-Berarducci encoding". If you are curious, this in turn is equivalent to an even more general concept from category theory known as "F-algebras", which let you encode inductive things in a non-inductive way.

Non-recursive code greatly simplifies equational reasoning. For example, we can easily prove that we can optimize map id l to l:

map id l

-- Inline: map f l = \cons nil -> l (\a x -> cons (f a) x) nil
= \cons nil -> l (\a x -> cons (id a) x) nil

-- Inline: id x = x
= \cons nil -> l (\a x -> cons a x) nil

-- Eta-reduce
= \cons nil -> l cons nil

-- Eta-reduce
= l

Note that we did not need to use induction to prove this optimization because map is no longer recursive. The optimization became downright trivial, so trivial that we can automate it!

Morte optimizes programs using this same simple scheme:

  • Beta-reduce everything (equivalent to inlining)
  • Eta-reduce everything

To illustrate this, I will desugar our high-level Haskell code to the calculus of constructions. This desugaring process is currently manual (and tedious), but I plan to automate this, too, by providing a front-end high-level language similar to Haskell that compiles to Morte:

-- mapid.mt

(    \(List : * -> *)
->   \(  map
     :   forall (a : *)
     ->  forall (b : *)
     -> (a -> b) -> List a -> List b
     )
->   \(id : forall (a : *) -> a -> a)

    ->   \(a : *) -> map a a (id a)
)

-- List
(\(a : *) -> forall (x : *) -> (a -> x -> x) -> x -> x)

-- map
(   \(a : *)
->  \(b : *)
->  \(f : a -> b)
->  \(l : forall (x : *) -> (a -> x -> x) -> x -> x)
->  \(x : *)
->  \(Cons : b -> x -> x)
->  \(Nil: x)
->  l x (\(va : a) -> \(vx : x) -> Cons (f va) vx) Nil
)

-- id
(\(a : *) -> \(va : a) -> va)

This line of code is the "business end" of the program:

\(a : *) -> map a a (id a)

The extra 'a' business is because in any polymorphic lambda calculus you explicitly accept polymorphic types as arguments and specialize functions by applying them to types. Higher-level functional languages like Haskell or ML use type inference to automatically infer and supply type arguments when possible.

We can compile this program using the morte executable, which accepts a Morte program on stdin, outputs the program's type stderr, and outputs the optimized program on stdout:

$ morte < id.mt
∀(a : *) → (∀(x : *) → (a → x → x) → x → x) → ∀(x : *) → (a 
→ x → x) → x → x

λ(a : *) → λ(l : ∀(x : *) → (a → x → x) → x → x) → l

The first line is the type, which is a desugared form of:

forall a . List a -> List a

The second line is the program, which is the identity function on lists. Morte optimized away the map completely, the same way we did by hand.

Morte also optimized away the rest of the code, too. Dead-code elimination is just an emergent property of Morte's simple optimization scheme.

Equality

We could double-check our answer by asking Morte to optimize the identity function on lists:

-- idlist.mt

(    \(List : * -> *)
->   \(id   : forall (a : *) -> a -> a)

    ->   \(a : *) -> id (List a)
)

-- List
(\(a : *) -> forall (x : *) -> (a -> x -> x) -> x -> x)

-- id
(\(a : *) -> \(va : a) -> va)

Sure enough, Morte outputs an alpha-equivalent result (meaning the same up to variable renaming):

$ ~/.cabal/bin/morte < idlist.mt
∀(a : *) → (∀(x : *) → (a → x → x) → x → x) → ∀(x : *) → (a 
→ x → x) → x → x

λ(a : *) → λ(va : ∀(x : *) → (a → x → x) → x → x) → va

We can even use the morte library to mechanically check if two Morte expressions are alpha-, beta-, and eta- equivalent. We can parse our two Morte files into Morte's Expr type and then use the Eq instance for Expr to test for equivalence:

$ ghci
Prelude> import qualified Data.Text.Lazy.IO as Text
Prelude Text> txt1 <- Text.readFile "mapid.mt"
Prelude Text> txt2 <- Text.readFile "idlist.mt"
Prelude Text> import Morte.Parser (exprFromText)
Prelude Text Morte.Parser> let e1 = exprFromText txt1
Prelude Text Morte.Parser> let e2 = exprFromText txt2
Prelude Text Morte.Parser> import Control.Applicative (liftA2)
Prelude Text Morte.Parser Control.Applicative> liftA2 (==) e1 e2
Right True
$ -- `Right` means both expressions parsed successfully
$ -- `True` means they are alpha-, beta-, and eta-equivalent

We can use this to mechanically verify that two Morte programs optimize to the same result.

Compile-time computation

Morte can compute as much (or as little) at compile as you want. The more information you encode directly within lambda calculus, the more compile-time computation Morte will perform for you. For example, if we translate our Haskell List code entirely to lambda calculus, then Morte will statically compute the result at compile time.

-- nine.mt

(   \(Nat : *)
->  \(zero : Nat)
->  \(one : Nat)
->  \((+) : Nat -> Nat -> Nat)
->  \((*) : Nat -> Nat -> Nat)
->  \(List : * -> *)
->  \(Cons : forall (a : *) -> a -> List a -> List a)
->  \(Nil  : forall (a : *)                -> List a)
->  \(  map
    :   forall (a : *) -> forall (b : *)
    ->  (a -> b) -> List a -> List b
    )
->  \(  foldr
    :   forall (a : *)
    ->  List a
    ->  forall (r : *)
    ->  (a -> r -> r) -> r -> r
    )
->  (    \(two   : Nat)
    ->   \(three : Nat)
    ->   (    \(example : List Nat)

         ->   foldr Nat (map Nat Nat ((+) one) example) Nat (+) zero
         )

         -- example
         (Cons Nat one (Cons Nat two (Cons Nat three (Nil Nat))))
    )

    -- two
    ((+) one one)

    -- three
    ((+) one ((+) one one))
)

-- Nat
(   forall (a : *)
->  (a -> a)
->  a
->  a
)

-- zero
(   \(a : *)
->  \(Succ : a -> a)
->  \(Zero : a)
->  Zero
)

-- one
(   \(a : *)
->  \(Succ : a -> a)
->  \(Zero : a)
->  Succ Zero
)

-- (+)
(   \(m : forall (a : *) -> (a -> a) -> a -> a)
->  \(n : forall (a : *) -> (a -> a) -> a -> a)
->  \(a : *)
->  \(Succ : a -> a)
->  \(Zero : a)
->  m a Succ (n a Succ Zero)
)

-- (*)
(   \(m : forall (a : *) -> (a -> a) -> a -> a)
->  \(n : forall (a : *) -> (a -> a) -> a -> a)
->  \(a : *)
->  \(Succ : a -> a)
->  \(Zero : a)
->  m a (n a Succ) Zero
)

-- List
(   \(a : *)
->  forall (x : *)
->  (a -> x -> x)  -- Cons
->  x              -- Nil
->  x
)

-- Cons
(   \(a : *)
->  \(va  : a)
->  \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
->  \(x : *)
->  \(Cons : a -> x -> x)
->  \(Nil  : x)
->  Cons va (vas x Cons Nil)
)

-- Nil
(   \(a : *)
->  \(x : *)
->  \(Cons : a -> x -> x)
->  \(Nil  : x)
->  Nil
)

-- map
(   \(a : *)
->  \(b : *)
->  \(f : a -> b)
->  \(l : forall (x : *) -> (a -> x -> x) -> x -> x)
->  \(x : *)
->  \(Cons : b -> x -> x)
->  \(Nil: x)
->  l x (\(va : a) -> \(vx : x) -> Cons (f va) vx) Nil
)

-- foldr
(   \(a : *)
->  \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
->  vas
)

The relevant line is:

foldr Nat (map Nat Nat ((+) one) example) Nat (+) zero

If you remove the type-applications to Nat, this parallels our original Haskell example. We can then evaluate this expression at compile time:

$ morte < nine.mt
∀(a : *) → (a → a) → a → a

λ(a : *) → λ(Succ : a → a) → λ(Zero : a) → Succ (Succ (Succ 
(Succ (Succ (Succ (Succ (Succ (Succ Zero))))))))

Morte reduces our program to a church-encoded nine.

Run-time computation

Morte does not force you to compute everything using lambda calculus at compile time. Suppose that we wanted to use machine arithmetic at run-time instead. We can do this by parametrizing our program on:

  • the Int type,
  • operations on Ints, and
  • any integer literals we use

We accept these "foreign imports" as ordinary arguments to our program:

-- foreign.mt

-- Foreign imports
    \(Int : *)                      -- Foreign type
->  \((+) : Int -> Int -> Int)      -- Foreign function
->  \((*) : Int -> Int -> Int)      -- Foreign function
->  \(lit@0 : Int)  -- Literal "1"  -- Foreign data
->  \(lit@1 : Int)  -- Literal "2"  -- Foreign data
->  \(lit@2 : Int)  -- Literal "3"  -- Foreign data
->  \(lit@3 : Int)  -- Literal "1"  -- Foreign data
->  \(lit@4 : Int)  -- Literal "0"  -- Foreign data

-- The rest is compile-time lambda calculus
->  (   \(List : * -> *)
    ->  \(Cons : forall (a : *) -> a -> List a -> List a)
    ->  \(Nil  : forall (a : *)                -> List a)
    ->  \(  map
        :   forall (a : *)
        ->  forall (b : *)
        ->  (a -> b) -> List a -> List b
        )
    ->  \(  foldr
        :   forall (a : *)
        ->  List a
        ->  forall (r : *)
        ->  (a -> r -> r) -> r -> r
        )
        ->   (    \(example : List Int)

             ->   foldr Int (map Int Int ((+) lit@3) example) Int (+) lit@4
             )

             -- example
             (Cons Int lit@0 (Cons Int lit@1 (Cons Int lit@2 (Nil Int))))
    )

    -- List
    (   \(a : *)
    ->  forall (x : *)
    ->  (a -> x -> x)  -- Cons
    ->  x              -- Nil
    ->  x
    )

    -- Cons
    (   \(a : *)
    ->  \(va  : a)
    ->  \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
    ->  \(x : *)
    ->  \(Cons : a -> x -> x)
    ->  \(Nil  : x)
    ->  Cons va (vas x Cons Nil)
    )

    -- Nil
    (   \(a : *)
    ->  \(x : *)
    ->  \(Cons : a -> x -> x)
    ->  \(Nil  : x)
    ->  Nil
    )

    -- map
    (   \(a : *)
    ->  \(b : *)
    ->  \(f : a -> b)
    ->  \(l : forall (x : *) -> (a -> x -> x) -> x -> x)
    ->  \(x : *)
    ->  \(Cons : b -> x -> x)
    ->  \(Nil: x)
    ->  l x (\(va : a) -> \(vx : x) -> Cons (f va) vx) Nil
    )

    -- foldr
    (   \(a : *)
    ->  \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
    ->  vas
    )

We can use Morte to optimize the above program and Morte will reduce the program to nothing but foreign types, operations, and values:

$ morte < foreign.mt
∀(Int : *) → (Int → Int → Int) → (Int → Int → Int) → Int →
Int → Int → Int → Int → Int

λ(Int : *) → λ((+) : Int → Int → Int) → λ((*) : Int → Int → 
Int) → λ(lit : Int) → λ(lit@1 : Int) → λ(lit@2 : Int) → 
λ(lit@3 : Int) → λ(lit@4 : Int) → (+) ((+) lit@3 lit) ((+) 
((+) lit@3 lit@1) ((+) ((+) lit@3 lit@2) lit@4))

If you study that closely, Morte adds lit@3 (the "1" literal) to each literal of the list and then adds them up. We can then pass this foreign syntax tree to our machine arithmetic backend to transform those foreign operations to efficient operations.

Morte lets you choose how much information you want to encode within lambda calculus. The more information you encode in lambda calculus the more Morte can optimize your program, but the slower your compile times will get, so it's a tradeoff.

Corecursion

Corecursion is the dual of recursion. Where recursion works on finite data types, corecursion works on potentially infinite data types. An example would be the following infinite Stream in Haskell:

data Stream a = Cons a (Stream a)

numbers :: Stream Int
numbers = go 0
  where
    go n = Cons n (go (n + 1))

-- numbers = Cons 0 (Cons 1 (Cons 2 (...

map :: (a -> b) -> Stream a -> Stream b
map f (Cons a l) = Cons (f a) (map f l)

example :: Stream Int
example = map (+ 1) numbers

-- example = Cons 1 (Cons 2 (Cons 3 (...

Again, pretend that we disable any function from referencing itself so that the above code becomes invalid. This time we cannot reuse the same trick from previous sections because we cannot encode numbers as a fold without referencing itself. Try this if you don't believe me.

However, we can still encode corecursive things in a non-corecursive way. This time, we encode our Stream type as an unfold instead of a fold:

-- This is also valid Haskell code

{-# LANGUAGE ExistentialQuantification #-}

data Stream a = forall s . MkStream
    { seed :: s
    , step :: s -> (a, s)
    }

numbers :: Stream Int
numbers = MkStream 0 (\n -> (n, n + 1))

map :: (a -> b) -> Stream a -> Stream b
map f (MkStream s0 k) = MkStream s0 k'
  where
    k' s = (f a, s')
      where (a, s') = k s 

In other words, we store an initial seed of some type s and a step function of type s -> (a, s) that emits one element of our Stream. The type of our seed s can be anything and in our numbers example, the type of the internal state is Int. Another stream could use a completely different internal state of type (), like this:

-- ones = Cons 1 ones

ones :: Stream Int
ones = MkStream () (\_ -> (1, ()))

The general name for this trick is an "F-coalgebra" encoding of a corecursive type.

Once we encode our infinite stream non-recursively, we can safely optimize the stream by inlining and eta reduction:

map id l

-- l = MkStream s0 k
= map id (MkStream s0 k)

-- Inline definition of `map`
= MkStream s0 k'
  where
    k' = \s -> (id a, s')
      where
        (a, s') = k s

-- Inline definition of `id`
= MkStream s0 k'
  where
    k' = \s -> (a, s')
      where
        (a, s') = k s

-- Inline: (a, s') = k s
= MkStream s0 k'
  where
    k' = \s -> k s

-- Eta reduce
= MkStream s0 k'
  where
    k' = k

-- Inline: k' = k
= MkStream s0 k

-- l = MkStream s0 k
= l

Now let's encode Stream and map in Morte and compile the following four expressions:

map id

id

map f . map g

map (f . g)

Save the following Morte file to stream.mt and then uncomment the expression you want to test:

(   \(id : forall (a : *) -> a -> a)
->  \(  (.)
    :   forall (a : *)
    ->  forall (b : *)
    ->  forall (c : *)
    ->  (b -> c)
    ->  (a -> b)
    ->  (a -> c)
    )
->  \(Pair : * -> * -> *)
->  \(P : forall (a : *) -> forall (b : *) -> a -> b -> Pair a b)
->  \(  first
    :   forall (a : *)
    ->  forall (b : *)
    ->  forall (c : *)
    ->  (a -> b)
    ->  Pair a c
    ->  Pair b c
    )

->  (   \(Stream : * -> *)
    ->  \(  map
        :   forall (a : *)
        ->  forall (b : *)
        ->  (a -> b)
        ->  Stream a
        ->  Stream b
        )

        -- example@1 = example@2
    ->  (   \(example@1 : forall (a : *) -> Stream a -> Stream a)
        ->  \(example@2 : forall (a : *) -> Stream a -> Stream a)

        -- example@3 = example@4
        ->  \(  example@3
            :   forall (a : *)
            ->  forall (b : *)
            ->  forall (c : *)
            ->  (b -> c)
            ->  (a -> b)
            ->  Stream a
            ->  Stream c
            )

        ->  \(  example@4
            :   forall (a : *)
            ->  forall (b : *)
            ->  forall (c : *)
            ->  (b -> c)
            ->  (a -> b)
            ->  Stream a
            ->  Stream c
            )

        -- Uncomment the example you want to test
        ->  example@1
--      ->  example@2
--      ->  example@3
--      ->  example@4
        )

        -- example@1
        (\(a : *) -> map a a (id a))

        -- example@2
        (\(a : *) -> id (Stream a))

        -- example@3
        (   \(a : *)
        ->  \(b : *)
        ->  \(c : *)
        ->  \(f : b -> c)
        ->  \(g : a -> b)
        ->  map a c ((.) a b c f g)
        )

        --  example@4
        (   \(a : *)
        ->  \(b : *)
        ->  \(c : *)
        ->  \(f : b -> c)
        ->  \(g : a -> b)
        ->  (.) (Stream a) (Stream b) (Stream c) (map b c f) (map a b g)
        )
    )

    -- Stream
    (   \(a : *)
    ->  forall (x : *)
    ->  (forall (s : *) -> s -> (s -> Pair a s) -> x)
    ->  x
    )

    -- map
    (   \(a : *)
    ->  \(b : *)
    ->  \(f : a -> b)
    ->  \(  st
        :   forall (x : *)
        -> (forall (s : *) -> s -> (s -> Pair a s) -> x)
        -> x
        )
    ->  \(x : *)
    ->  \(S : forall (s : *) -> s -> (s -> Pair b s) -> x)
    ->  st
        x
        (   \(s : *)
        ->  \(seed : s)
        ->  \(step : s -> Pair a s)
        ->  S
            s
            seed
            (\(seed@1 : s) -> first a b s f (step seed@1))
        )
    )
)

-- id
(\(a : *) -> \(va : a) -> va)

-- (.)
(   \(a : *)
->  \(b : *)
->  \(c : *)
->  \(f : b -> c)
->  \(g : a -> b)
->  \(va : a)
->  f (g va)
)

-- Pair
(\(a : *) -> \(b : *) -> forall (x : *) -> (a -> b -> x) -> x)

-- P
(   \(a : *)
->  \(b : *)
->  \(va : a)
->  \(vb : b)
->  \(x : *)
->  \(P : a -> b -> x)
->  P va vb
)

-- first
(   \(a : *)
->  \(b : *)
->  \(c : *)
->  \(f : a -> b)
->  \(p : forall (x : *) -> (a -> c -> x) -> x)
->  \(x : *)
->  \(Pair : b -> c -> x)
->  p x (\(va : a) -> \(vc : c) -> Pair (f va) vc)
)

Both example@1 and example@2 will generate alpha-equivalent code:

$ morte < example1.mt
∀(a : *) → (∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a → 
s → x) → x) → x) → x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x
 : *) → (a → s → x) → x) → x) → x

λ(a : *) → λ(st : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) →
 (a → s → x) → x) → x) → x) → st

$ morte < example2.mt
∀(a : *) → (∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a → 
s → x) → x) → x) → x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x
 : *) → (a → s → x) → x) → x) → x

λ(a : *) → λ(va : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) →
 (a → s → x) → x) → x) → x) → va

Similarly, example@3 and example@4 will generate alpha-equivalent code:

$ morte < example3.mt
∀(a : *) → ∀(b : *) → ∀(c : *) → (b → c) → (a → b) → (∀(x : 
*) → (∀(s : *) → s → (s → ∀(x : *) → (a → s → x) → x) → x) →
 x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (c → s → x)
 → x) → x) → x

λ(a : *) → λ(b : *) → λ(c : *) → λ(f : b → c) → λ(g : a → b)
 → λ(st : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a → s 
→ x) → x) → x) → x) → λ(x : *) → λ(S : ∀(s : *) → s → (s → ∀
(x : *) → (c → s → x) → x) → x) → st x (λ(s : *) → λ(seed : 
s) → λ(step : s → ∀(x : *) → (a → s → x) → x) → S s seed (λ(
seed@1 : s) → λ(x : *) → λ(Pair : c → s → x) → step seed@1 x
 (λ(va : a) → Pair (f (g va)))))

$ morte < example4.mt
∀(a : *) → ∀(b : *) → ∀(c : *) → (b → c) → (a → b) → (∀(x : 
*) → (∀(s : *) → s → (s → ∀(x : *) → (a → s → x) → x) → x) →
 x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (c → s → x)
 → x) → x) → x

λ(a : *) → λ(b : *) → λ(c : *) → λ(f : b → c) → λ(g : a → b)
 → λ(va : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a → s 
→ x) → x) → x) → x) → λ(x : *) → λ(S : ∀(s : *) → s → (s → ∀
(x : *) → (c → s → x) → x) → x) → va x (λ(s : *) → λ(seed : 
s) → λ(step : s → ∀(x : *) → (a → s → x) → x) → S s seed (λ(
seed@1 : s) → λ(x : *) → λ(Pair : c → s → x) → step seed@1 x
 (λ(va : a) → Pair (f (g va))))

We inadvertently proved stream fusion for free, but we're still not done, yet! Everything we learn about recursive and corecursive sequences can be applied to model recursive and corecursive effects!

Effects

I will conclude this post by showing how to model both recursive and corecursive programs that have side effects. The recursive program will echo ninety-nine lines from stdin to stdout. The equivalent Haskell program is in the comment header:

-- recursive.mt

-- The Haskell code we will translate to Morte:
--
--     import Prelude hiding (
--         (+), (*), IO, putStrLn, getLine, (>>=), (>>), return )
-- 
--     -- Simple prelude
--
--     data Nat = Succ Nat | Zero
--
--     zero :: Nat
--     zero = Zero
--
--     one :: Nat
--     one = Succ Zero
--
--     (+) :: Nat -> Nat -> Nat
--     Zero   + n = n
--     Succ m + n = m + Succ n
--
--     (*) :: Nat -> Nat -> Nat
--     Zero   * n = Zero
--     Succ m * n = n + (m * n)
--
--     foldNat :: Nat -> (a -> a) -> a -> a
--     foldNat  Zero    f x = x
--     foldNat (Succ m) f x = f (foldNat m f x)
--
--     data IO r
--         = PutStrLn String (IO r)
--         | GetLine (String -> IO r)
--         | Return r
--
--     putStrLn :: String -> IO U
--     putStrLn str = PutStrLn str (Return Unit)
--
--     getLine :: IO String
--     getLine = GetLine Return
--
--     return :: a -> IO a
--     return = Return
--
--     (>>=) :: IO a -> (a -> IO b) -> IO b
--     PutStrLn str io >>= f = PutStrLn str (io >>= f)
--     GetLine k       >>= f = GetLine (\str -> k str >>= f)
--     Return r        >>= f = f r
--
--     -- Derived functions
--
--     (>>) :: IO U -> IO U -> IO U
--     m >> n = m >>= \_ -> n
--
--     two :: Nat
--     two = one + one
--
--     three :: Nat
--     three = one + one + one
--
--     four :: Nat
--     four = one + one + one + one
--
--     five :: Nat
--     five = one + one + one + one + one
--
--     six :: Nat
--     six = one + one + one + one + one + one
--
--     seven :: Nat
--     seven = one + one + one + one + one + one + one
--
--     eight :: Nat
--     eight = one + one + one + one + one + one + one + one
--
--     nine :: Nat
--     nine = one + one + one + one + one + one + one + one + one
--
--     ten :: Nat
--     ten = one + one + one + one + one + one + one + one + one + one
--
--     replicateM_ :: Nat -> IO U -> IO U
--     replicateM_ n io = foldNat n (io >>) (return Unit)
--
--     ninetynine :: Nat
--     ninetynine = nine * ten + nine
--
--     main_ :: IO U
--     main_ = getLine >>= putStrLn

-- "Free" variables
(   \(String : *   )
->  \(U : *)
->  \(Unit : U)

    -- Simple prelude
->  (   \(Nat : *)
    ->  \(zero : Nat)
    ->  \(one : Nat)
    ->  \((+) : Nat -> Nat -> Nat)
    ->  \((*) : Nat -> Nat -> Nat)
    ->  \(foldNat : Nat -> forall (a : *) -> (a -> a) -> a -> a)
    ->  \(IO : * -> *)
    ->  \(return : forall (a : *) -> a -> IO a)
    ->  \((>>=)
        :   forall (a : *)
        ->  forall (b : *)
        ->  IO a
        ->  (a -> IO b)
        ->  IO b
        )
    ->  \(putStrLn : String -> IO U)
    ->  \(getLine : IO String)

        -- Derived functions
    ->  (   \((>>) : IO U -> IO U -> IO U)
        ->  \(two   : Nat)
        ->  \(three : Nat)
        ->  \(four  : Nat)
        ->  \(five  : Nat)
        ->  \(six   : Nat)
        ->  \(seven : Nat)
        ->  \(eight : Nat)
        ->  \(nine  : Nat)
        ->  \(ten   : Nat)
        ->  (   \(replicateM_ : Nat -> IO U -> IO U)
            ->  \(ninetynine : Nat)

            ->  replicateM_ ninetynine ((>>=) String U getLine putStrLn)
            )

            -- replicateM_
            (   \(n : Nat)
            ->  \(io : IO U)
            ->  foldNat n (IO U) ((>>) io) (return U Unit)
            )

            -- ninetynine
            ((+) ((*) nine ten) nine)
        )

        -- (>>)
        (   \(m : IO U)
        ->  \(n : IO U)
        ->  (>>=) U U m (\(_ : U) -> n)
        )

        -- two
        ((+) one one)

        -- three
        ((+) one ((+) one one))

        -- four
        ((+) one ((+) one ((+) one one)))

        -- five
        ((+) one ((+) one ((+) one ((+) one one))))

        -- six
        ((+) one ((+) one ((+) one ((+) one ((+) one one)))))

        -- seven
        ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one))))))

        -- eight
        ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one)))))))
        -- nine
        ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one))))))))

        -- ten
        ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one)))))))))
    )

    -- Nat
    (   forall (a : *)
    ->  (a -> a)
    ->  a
    ->  a
    )

    -- zero
    (   \(a : *)
    ->  \(Succ : a -> a)
    ->  \(Zero : a)
    ->  Zero
    )

    -- one
    (   \(a : *)
    ->  \(Succ : a -> a)
    ->  \(Zero : a)
    ->  Succ Zero
    )

    -- (+)
    (   \(m : forall (a : *) -> (a -> a) -> a -> a)
    ->  \(n : forall (a : *) -> (a -> a) -> a -> a)
    ->  \(a : *)
    ->  \(Succ : a -> a)
    ->  \(Zero : a)
    ->  m a Succ (n a Succ Zero)
    )

    -- (*)
    (   \(m : forall (a : *) -> (a -> a) -> a -> a)
    ->  \(n : forall (a : *) -> (a -> a) -> a -> a)
    ->  \(a : *)
    ->  \(Succ : a -> a)
    ->  \(Zero : a)
    ->  m a (n a Succ) Zero
    )

    -- foldNat
    (   \(n : forall (a : *) -> (a -> a) -> a -> a)
    ->  n
    )

    -- IO
    (   \(r : *)
    ->  forall (x : *)
    ->  (String -> x -> x)
    ->  ((String -> x) -> x)
    ->  (r -> x)
    ->  x
    )

    -- return
    (   \(a : *)
    ->  \(va : a)
    ->  \(x : *)
    ->  \(PutStrLn : String -> x -> x)
    ->  \(GetLine : (String -> x) -> x)
    ->  \(Return : a -> x)
    ->  Return va
    )

    -- (>>=)
    (   \(a : *)
    ->  \(b : *)
    ->  \(m : forall (x : *)
        ->  (String -> x -> x)
        ->  ((String -> x) -> x)
        ->  (a -> x)
        ->  x
        )
    ->  \(f : a
        ->  forall (x : *)
        -> (String -> x -> x)
        -> ((String -> x) -> x)
        -> (b -> x)
        -> x
        )
    ->  \(x : *)
    ->  \(PutStrLn : String -> x -> x)
    ->  \(GetLine : (String -> x) -> x)
    ->  \(Return : b -> x)
    ->  m x PutStrLn GetLine (\(va : a) -> f va x PutStrLn GetLine Return)
    )

    -- putStrLn
    (   \(str : String)
    ->  \(x : *)
    ->  \(PutStrLn : String -> x -> x  )
    ->  \(GetLine  : (String -> x) -> x)
    ->  \(Return   : U -> x)
    ->  PutStrLn str (Return Unit)
    )

    -- getLine
    (   \(x : *)
    ->  \(PutStrLn : String -> x -> x  )
    ->  \(GetLine  : (String -> x) -> x)
    ->  \(Return   : String -> x)
    -> GetLine Return
    )
)

This program will compile to a completely unrolled read-write loop, as most recursive programs will:

$ morte < recursive.mt
∀(String : *) → ∀(U : *) → U → ∀(x : *) → (String → x → x) →
 ((String → x) → x) → (U → x) → x

λ(String : *) → λ(U : *) → λ(Unit : U) → λ(x : *) → λ(PutStr
Ln : String → x → x) → λ(GetLine : (String → x) → x) → λ(Ret
urn : U → x) → GetLine (λ(va : String) → PutStrLn va (GetLin
e (λ(va@1 : String) → PutStrLn va@1 (GetLine (λ(va@2 : Strin
g) → PutStrLn va@2 (GetLine (λ(va@3 : String) → PutStrLn ...
 <snip>
... GetLine (λ(va@92 : String) → PutStrLn va@92 (GetLine (λ(
va@93 : String) → PutStrLn va@93 (GetLine (λ(va@94 : String)
 → PutStrLn va@94 (GetLine (λ(va@95 : String) → PutStrLn va@
95 (GetLine (λ(va@96 : String) → PutStrLn va@96 (GetLine (λ(
va@97 : String) → PutStrLn va@97 (GetLine (λ(va@98 : String)
 → PutStrLn va@98 (Return Unit))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))

In contrast, if we encode the effects corecursively we can express a program that echoes indefinitely from stdin to stdout:

-- corecursive.mt

-- data IOF r s
--     = PutStrLn String s
--     | GetLine (String -> s)
--     | Return r
--
-- data IO r = forall s . MkIO s (s -> IOF r s)
--
-- main = MkIO
--     Nothing
--     (maybe (\str -> PutStrLn str Nothing) (GetLine Just))

(   \(String : *)
->  (   \(Maybe : * -> *)
    ->  \(Just : forall (a : *) -> a -> Maybe a)
    ->  \(Nothing : forall (a : *) -> Maybe a)
    ->  \(  maybe
        :   forall (a : *)
        ->  Maybe a
        ->  forall (x : *)
        ->  (a -> x)
        ->  x
        ->  x
        )
    ->  \(IOF : * -> * -> *)
    ->  \(  PutStrLn
        :   forall (r : *)
        ->  forall (s : *)
        ->  String
        ->  s
        ->  IOF r s
        )
    ->  \(  GetLine
        :   forall (r : *)
        ->  forall (s : *)
        ->  (String -> s)
        ->  IOF r s
        )
    ->  \(  Return
        :   forall (r : *)
        ->  forall (s : *)
        ->  r
        ->  IOF r s
        )
    ->  (   \(IO : * -> *)
        ->  \(  MkIO
            :   forall (r : *)
            ->  forall (s : *)
            ->  s
            ->  (s -> IOF r s)
            ->  IO r
            )
        ->  (   \(main : forall (r : *) -> IO r)
            ->  main
            )

            -- main
            (   \(r : *)
            ->  MkIO
                r
                (Maybe String)
                (Nothing String)
                (   \(m : Maybe String)
                ->  maybe
                        String
                        m
                        (IOF r (Maybe String))
                        (\(str : String) ->
                            PutStrLn
                                r
                                (Maybe String)
                                str
                                (Nothing String)
                        )
                        (GetLine r (Maybe String) (Just String))
                )
            )
        )

        -- IO
        (   \(r : *)
        ->  forall (x : *)
        ->  (forall (s : *) -> s -> (s -> IOF r s) -> x)
        ->  x
        )

        -- MkIO
        (   \(r : *)
        ->  \(s : *)
        ->  \(seed : s)
        ->  \(step : s -> IOF r s)
        ->  \(x : *)
        ->  \(k : forall (s : *) -> s -> (s -> IOF r s) -> x)
        ->  k s seed step
        )
    )

    -- Maybe
    (\(a : *) -> forall (x : *) -> (a -> x) -> x -> x)

    -- Just
    (   \(a : *)
    ->  \(va : a)
    ->  \(x : *)
    ->  \(Just : a -> x)
    ->  \(Nothing : x)
    ->  Just va
    )

    -- Nothing
    (   \(a : *)
    ->  \(x : *)
    ->  \(Just : a -> x)
    ->  \(Nothing : x)
    ->  Nothing
    )

    -- maybe
    (   \(a : *)
    ->  \(m : forall (x : *) ->  (a -> x) ->  x-> x)
    ->  m
    )

    -- IOF
    (   \(r : *)
    ->  \(s : *)
    ->  forall (x : *)
    ->  (String -> s -> x)
    ->  ((String -> s) -> x)
    ->  (r -> x)
    ->  x
    )

    -- PutStrLn
    (   \(r : *)
    ->  \(s : *)
    ->  \(str : String)
    ->  \(vs : s)
    ->  \(x : *)
    ->  \(PutStrLn : String -> s -> x)
    ->  \(GetLine : (String -> s) -> x)
    ->  \(Return : r -> x)
    ->  PutStrLn str vs
    )

    -- GetLine
    (   \(r : *)
    ->  \(s : *)
    ->  \(k : String -> s)
    ->  \(x : *)
    ->  \(PutStrLn : String -> s -> x)
    ->  \(GetLine : (String -> s) -> x)
    ->  \(Return : r -> x)
    ->  GetLine k
    )

    -- Return
    (   \(r : *)
    ->  \(s : *)
    ->  \(vr : r)
    ->  \(x : *)
    ->  \(PutStrLn : String -> s -> x)
    ->  \(GetLine : (String -> s) -> x)
    ->  \(Return : r -> x)
    ->  Return vr
    )

)

This compiles to a state machine that we can unfold one step at a time:

$ morte < corecursive.mt
∀(String : *) → ∀(r : *) → ∀(x : *) → (∀(s : *) → s → (s → ∀
(x : *) → (String → s → x) → ((String → s) → x) → (r → x) → 
x) → x) → x

λ(String : *) → λ(r : *) → λ(x : *) → λ(k : ∀(s : *) → s → (
s → ∀(x : *) → (String → s → x) → ((String → s) → x) → (r → 
x) → x) → x) → k (∀(x : *) → (String → x) → x → x) (λ(x : *)
 → λ(Just : String → x) → λ(Nothing : x) → Nothing) (λ(m : ∀
(x : *) → (String → x) → x → x) → m (∀(x : *) → (String → (∀
(x : *) → (String → x) → x → x) → x) → ((String → ∀(x : *) →
 (String → x) → x → x) → x) → (r → x) → x) (λ(str : String) 
→ λ(x : *) → λ(PutStrLn : String → (∀(x : *) → (String → x) 
→ x → x) → x) → λ(GetLine : (String → ∀(x : *) → (String → x
) → x → x) → x) → λ(Return : r → x) → PutStrLn str (λ(x : *)
 → λ(Just : String → x) → λ(Nothing : x) → Nothing)) (λ(x : 
*) → λ(PutStrLn : String → (∀(x : *) → (String → x) → x → x)
 → x) → λ(GetLine : (String → ∀(x : *) → (String → x) → x → 
x) → x) → λ(Return : r → x) → GetLine (λ(va : String) → λ(x 
: *) → λ(Just : String → x) → λ(Nothing : x) → Just va))

I don't expect you to understand that output other than to know that we can translate the output to any backend that provides functions, and primitive read/write operations.

Conclusion

If you would like to use Morte, you can find the library on both Github and Hackage. I also provide a Morte tutorial that you can use to learn more about the library.

Morte is dependently typed in theory, but in practice I have not exercised this feature so I don't understand the implications of this. If this turns out to be a mistake then I will downgrade Morte to System Fw, which has higher-kinds and polymorphism, but no dependent types.

Additionally, Morte might be usable to transmit code in a secure and typed way in distributed environment or to share code between diverse functional language by providing a common intermediate language. However, both of those scenarios require additional work, such as establishing a shared set of foreign primitives and creating Morte encoders/decoders for each target language.

Also, there are additional optimizations which Morte might implement in the future. For example, Morte could use free theorems (equalities you deduce from the types) to simplify some code fragments even further, but Morte currently does not do this.

My next goals are:

  • Add a back-end to compile Morte to LLVM
  • Add a front-end to desugar a medium-level Haskell-like language to Morte

Once those steps are complete then Morte will be a usable intermediate language for writing super-optimizable programs.

Also, if you're wondering, the name Morte is a tribute to a talking skull from the game Planescape: Torment, since the Morte library is a "bare-bones" calculus of constructions.

Literature

If this topic interests you more, you may find the following links helpful, in roughly increasing order of difficulty: