Monday, May 9, 2022

The golden rule of software distributions

golden-software-distro

This is a short post documenting a pattern I learned as a user and maintainer of software distributions. I wanted to share this pattern because the lesson was non-obvious to me in my early days as a software engineer.

I call this pattern the “golden rule of software distributions”, which I’ll define the verbose way followed by the concise way.

The verbose version of the golden rule of software distributions is:

If a package manager only permits installing or depending on one version of each of package then a software distribution for that package manager should bless one version of each package. The blessed version for each package must be compatible with the blessed version of every other package.

The concise version of the golden rule of software distributions is:

A locally coherent package manager requires a globally coherent software distribution.

… where:

  • “locally coherent” means that you can only install or depend on one version of each package for a given system or project

  • “globally coherent” means each package has a unique blessed version compatible with every other package’s blessed version

Note that any sufficiently large software distribution will not perfectly adhere to this golden rule. You should view this rule as an ideal that a software distribution aspires to approximate as closely as possible, but there will necessarily be cases where they cannot.

Motivation

I’ll introduce the term “build plan” to explain the motivation behind the golden rule:

A build plan for a package A specifies a version for each dependency of A such that A successfully builds against those dependencies.

To motivate the golden rule, let’s examine what happens when you have a locally coherent package manager but a globally incoherent software distribution:

  • Package users need to do a combinatorial search of their dependencies

    … in order to find a successful build plan. Specifically, they may need to test multiple major versions of their direct and indirect independencies to find a permutation that successfully builds.

  • Compatible sets of packages become increasingly unlikely at scale

    The likelihood of finding a build plan rapidly diminishes as your dependency tree grows. Beyond a certain number of dependencies a build plan might not even exist, even if every dependency is maintained.

  • Package authors need to support multiple major versions of every dependency

    … in order to maximize the likelihood that downstream packages can find a successful build plan. Maintaining this backwards compatibility greatly increases their maintenance burden.

  • Package authors must test against multiple major versions of each dependency

    … in order to shield their users from build failures due to unexpected build plans. This means a large number of CI runs for every proposed change to the package, which slows down their development velocity.

  • Responsibility for fixing incompatibilities becomes diffuse

    Sometimes you need to depend on two packages (A and B) which transitively depend on incompatible versions of another package (C). Neither package A nor package B can be held responsible for fixing the problem unless there is a blessed version of package C.

These issues lead to a lot of wasted work, which scales exponentially with the number of dependencies. Consequently, software ecosystems that ignore the golden rule run into difficulties scaling dependency trees which people will work around in the following ways:

  • Culturally discouraging dependencies

  • Vendoring dependencies within their projects

  • Gravitating towards large and monolithic dependencies / frameworks

The fundamental problem

The golden rule is necessary because build plans do not compose for a locally coherent package manager. In other words, if you have a working build plan for package A and another working build plan for package B, you cannot necessarily combine those two build plans to generate a working build plan for a package that depends on both A and B. In particular, you definitely cannot combine the two build plans if A and B depend on incompatible versions of another package C.

However, build plans can also fail to compose for more subtle reasons. For example, you can depend on multiple packages whose build plans are all pairwise-compatible, but there still might not exist a build plan for the complete set of packages.

The good news is that you can trivially “weaken” a build plan, meaning that if you find a build plan that includes both packages A and B then you can downgrade that to a working build plan for just package A or just package B.

Consequently, the globally optimal thing to do is to find a working build plan that combines as many packages as possible, because then any subset of that build plan is still a working build plan. That ensures that any work spent fixing this larger build plan is not wasted and benefits everybody. Contrast that with work spent fixing the build for a single package (e.g. creating a lockfile), which does not benefit any other package (not even downstream packages, a.k.a. reverse dependencies).

Common sense?

Some people might view the golden rule of software distributions as common sense that doesn’t warrant a blog post, but my experiences with the Haskell ecosystem indicate otherwise. That’s because I began using Haskell seriously around 2011, four years before Stackage was first released.

Before Stackage, I ran into all of the problems described in the previous section because there was no blessed set of Haskell packages. In particular, the worst problem (for me) was the inability to find a working build plan for my projects.

This issue went on for years; basically everyone in the Haskell ecosystem (myself included) unthinkingly cargo-culted this as the way things were supposed to work. When things went wrong we blamed Cabal (e.g. “Cabal hell”) for our problems when the root of the problem had little to do with Cabal.

Stackage fixed all of that when Michael Snoyman essentially introduced the Haskell ecosystem to the golden rule of software distributions. Stackage works by publishing a set of blessed package versions for all of the packages vetted by Stackage and these packages are guaranteed to all build together. Periodically, Stackage publishes an updated set of blessed package versions.

After getting used to this, I quickly converted to this way of doing things, which seemed blindingly obvious in retrospect. Also, my professional career arc shifted towards DevOps, including managing upgrades and software distributions and I discovered that this was a fairly common practice for most large software distributions.

Why this rule is not intuitive

Actually, this insight is not as obvious as people might think. In fact, a person with a superficial understanding of how software ecosystems work might suspect that the larger a software ecosystem grows the more incoherent things get. However, you actually encounter the opposite phenomenon in practice: the larger a software ecosystem gets the more coherent things get (by necessity).

In fact, I still see people argue against the global coherence of software ecosystems, which indicates to me that this isn’t universally received wisdom. Sometimes they argue against global coherence directly (they believe coherence imposes an undue burden on maintainers or users) or they argue against global coherence indirectly (by positing incoherence as a foundation of a larger architectural pattern). Either way, I strongly oppose global incoherence, both for the theoretical reasons outlined in this post and also based on my practical experience managing dependencies in the pre-Stackage days of the Haskell ecosystem.

Indeed, many of people arguing against globally coherent software ecosystems are actually unwitting beneficiaries of global coherence. There is a massive amount of invisible work that goes on behind the scenes for every software distribution to create a globally coherent package set that benefits everybody (not just the users of those software distributions). For example, all software users benefit from the work that goes into maintaining the Debian, Arch, Nixpkgs, and Brew software distributions even if they don’t specifically use those software distributions or their associated package managers.

Conclusion

This whole post has one giant caveat, which is that all of the arguments assume that the packager manager is locally coherent, which is not always the case! In fact, there’s a post that proves that local coherence can be undesirable because it (essentially) makes dependency resolution NP complete. For more details, see:

I personally have mixed views on whether local coherence is good or bad. Right now I’m slightly on team “local coherence is good”, but my opinions on that are not fully formed, yet.

That said, most package managers tend to require or at least benefit from local coherence so in practice most software distributions also require global coherence. For example, Haskell’s build tooling basically requires global coherence (with some caveats I won’t go into), so global coherence is a good thing for the Haskell ecosystem.

Tuesday, May 3, 2022

Why does Haskell's take function accept insufficient elements?

Why does Haskell's take function accept insufficient elements?

This post is a long-form response to a question on Reddit, which asked:

I just started learning haskell, and the take function confuses me.

e.g take 10 [1,2,3,4,5] will yield [1,2,3,4,5]

How does it not generate an error ?

… and I have enough to say on this subject that I thought it would warrant a blog post rather than just a comment reply.

The easiest way to answer this question is to walk through all the possible alternative implementations that can fail when not given enough elements.

Solution 0: Output a Maybe

The first thing we could try would be to wrap the result in a Maybe, like this:

safeTake :: Int -> [a] -> Maybe [a]
safeTake 0      _   = Just []
safeTake n      []  = Nothing
safeTake n (x : xs) = fmap (x :) (safeTake (n - 1) xs)
>>> safeTake 3 [0..]
Just [0,1,2]

>>> safeTake 3 []
Nothing

The main deficiency with this approach is that it is insufficiently lazy. The result will not produce a single element of the output list until safeTake finishes consuming the required number of elements from the input list.

We can see the difference with the following examples:

>>> oops = 1 : 2 : error "Too short"

>>> take 1 (take 3 oops)
[1]

>>> safeTake 1 =<< safeTake 3 oops
*** Exception: Too short

Solution 1: Fail with error

Another approach would be to create a partial function that fails with an error if we run out of elements, like this:

partialTake :: Int -> [a] -> [a]
partialTake 0      _   = []
partialTake n (x : xs) = x : partialTake (n - 1) xs
>>> partialTake 3 [0..]
[0,1,2]

>>> partialTake 3 []
*Main> partialTake 3 []
*** Exception: Test.hs:(7,1)-(8,51): Non-exhaustive patterns in function partialTake

>>> partialTake 1 (partialTake 3 oops)
[1]

Partial functions like these are undesirable, though, so we won’t go with that solution.

Solution 2: Use a custom list-like type

Okay, but what if we could store a value at the end of the list indicating whether or not the take succeeded. One way we could do that would be to define an auxiliary type similar to a list, like this:

{-# LANGUAGE DeriveFoldable #-}

data ListAnd r a = Cons a (ListAnd r a) | Nil r deriving (Foldable, Show)

… where now the empty (Nil) constructor can store an auxiliary value. We can then use this auxiliary value to indicate to the user whether or not the function succeeded or not:

data Result = Sufficient | Insufficient deriving (Show)

takeAnd :: Int -> [a] -> ListAnd Result a
takeAnd 0      _   = Nil Sufficient
takeAnd n      []  = Nil Insufficient
takeAnd n (x : xs) = Cons x (takeAnd (n - 1) xs)
>>> takeAnd 3 [0..]
Cons 0 (Cons 1 (Cons 2 (Nil Sufficient)))

>>> takeAnd 3 []
Nil Insufficient

Also, the ListAnd type derives Foldable, so we can recover the old behavior by converting the ListAnd Result a type into [a] using toList:

>>> import Data.Foldable (toList)

>>> toList (takeAnd 3 [0..])
[0,1,2]

>>> toList (takeAnd 3 [])
[]

>>> toList (takeAnd 1 (toList (takeAnd 3 oops)))
[1]

This is the first total function that has the desired laziness characteristics, but the downside is that the take function now has a much weirder type. Can we solve this only using existing types from base?

Solution 3: Return a pair

Well, what if we were to change the type of take to return an pair containing an ordinary list alongside a Result, like this:

takeWithResult :: Int -> [a] -> ([a], Result)
takeWithResult 0      _   = (    [], Sufficient  )
takeWithResult n      []  = (    [], Insufficient)
takeWithResult n (x : xs) = (x : ys, result      )
  where
    (ys, result) = takeWithResult (n - 1) xs
>>> takeWithResult 3 [0..]
([0,1,2],Sufficient)
>>> takeWithResult 3 []
([],Insufficient)

Now we don’t need to add this weird ListAnd type to base, and we can recover the old behavior by post-processing the output using fst:

>>> fst (takeWithResult 3 [0..])
[0,1,2]
fst (takeWithResult 3 [])
[]

… and this also has the right laziness characteristics:

>>> fst (takeWithResult 1 (fst (takeWithResult 3 oops)))
[1]

… and we can replace Result with a Bool if want a solution that depends solely on types from base:

takeWithResult :: Int -> [a] -> ([a], Bool)
takeWithResult 0      _   = ([], True)
takeWithResult n      []  = ([], False)
takeWithResult n (x : xs) = (x : ys, result)
  where
    (ys, result) = takeWithResult (n - 1) xs

However, even this solution is not completely satisfactory. There’s nothing that forces the user to check the Bool value before accessing the list, so this is not as safe as, say, the safeTake function which returns a Maybe. The Bool included in the result is more of an informational value rather than a safeguard.

Conclusion

So the long-winded answer to the original question is that there are several alternative ways we could implement take that can fail if the input list is too small, but in my view each of them has their own limitations.

This is why I think Haskell’s current take function is probably the least worst of the alternatives, even if it’s not the safest possible implementation.

Sunday, May 1, 2022

Introductory resources to type theory for language implementers

Introductory resources to type theory for language implementers

This post briefly tours resources that helped introduce me to type theory, because I’m frequently asked by others for resources on this subject (even though I never had a formal education in type theory). Specifically, these resources will focus more on how to implement a type checker or type inference algorithm.

Also, my post will be biased against books, because I don’t tend to learn well from reading books. That said, I will mention a few books that I’ve heard others recommend, even if I can’t personally vouch for them.

What worked for me

The first and most important resource that I found useful was this one:

The reason why is because that paper shows logical notation side-by-side with Haskell code. That juxtaposition served as a sort of “Rosetta stone” for me to understand the correspondence between logical judgments and code. The paper also introduces some type theory basics (and dependent types!).

Along similar lines, another helpful resource was:

… which, as the name suggests, walks through a Haskell program to type-check Haskell code. This paper along with the preceding one helped bring type checkers “down to earth” to me by showing how there wasn’t any magic or secret sauce to implementing a type checker.

After that, the next thing that helped me improve my understanding was learning about pure type systems. Specifically, this paper was a very clear introduction to pure type systems:

You can think of pure type systems as sort of a “framework” for specifying type systems or talking about them. For example, the simply typed lambda calculus, System F, System FΩ, and the calculus of constructions are some example type systems that you’ll hear the literature refer to, and they’re all special cases of this general framework. You can think of pure type systems as generalizing the lambda cube.

However, none of the above resources introduce how to implement a type system with “good” type inference. To elaborate on that, many simple type systems can infer the types of program outputs from program inputs, but cannot work “in reverse” and infer the types of inputs from outputs. Hindley Milner type inference is one example of a “good” type inference algorithm that can work in reverse.

However, I never learned Hindley Milner type inference all that well, because I skipped straight to bidirectional type checking, which is described in this paper:

I prefer bidirectional type checking because (in my limited experience) it’s easier to extend the bidirectional type checking algorithm with new language features (or, at least, easier than extending Hindley Milner with the same language features).

The other reason I’m a fan of bidirectional type checking is that many cutting edge advances in research slot in well to a bidirectional type checker or even explicitly present their research using the framework of bidirectional type checking.

Books

Like I mentioned, I didn’t really learn that much from books, but here are some books that I see others commonly recommend, even if I can’t personally vouch for them:

Example code

I also created a tutorial implementation of a functional programming language that summarizes everything I know about programming language theory so far, which is my Fall-from-Grace project:

This project is a clean reference implementation of how to implement an interpreted langauge using (what I believe are) best practices in the Haskell ecosystem.

I also have a longer post explaining the motivation behind the above project:

Conclusion

Note that these are not the only resources that I learned from. This post only summarizes the seminal resources that greatly enhanced my understanding of all other resources.

Feel free to leave a comment if you have any other resources that you’d like to suggest that you feel were helpful in this regard.

Tuesday, March 29, 2022

Modeling PlusCal in Haskell using Cartesian products of NFAs

Modeling PlusCal in Haskell using Cartesian products of NFAs

PlusCal is a formal specification language created to model concurrent systems, and recently I became obsessed with implementing PlusCal as an embedded domain-specific language (eDSL) in Haskell. In other words, I want to model PlusCal processes as Haskell subroutines and also implement the model checker in Haskell.

I’m not done implementing this PlusCal eDSL, but I’m writing this to share what I learned in the course of doing so. Specifically, what I learned was:

  • You can model individual PlusCal processes as non-deterministic finite-state automata (NFAs)

    I believe this is well-understood by users of PlusCal, but I’m mentioning this for people who are new to PlusCal.

  • You can model concurrent processes in PlusCal by computing the Cartesian product of their NFAs

    I’m not sure if this is widely understood or not.

  • Haskell’s Applicative abstraction can compute Cartesian products of NFAs

    This is the novel insight that this post contributes.

There are two reasons why I’m keen on modeling PlusCal using Cartesian products of NFAs:

  • We can use this trick to combine any number of PlusCal processes into a composite process

    The behavior of this composite process is indistinguishable from the original collection of processes.

  • This approach simplifies the implementation of the model checker

    The model checker now only needs to accept a single process as its input. To model check more than one process you compose them into a single process and model check the composite process.

Also, it’s a theoretically elegant solution, and that’s good enough for me.

Scope

Like I mentioned before, I have not fully implemented all of PlusCal, but here are the features I will explain how to implement in this post:

  • Concurrent processes

    Concurrent processes in PlusCal are essentially NFAs where PlusCal labels correspond to NFA states and state transitions represent atomic process changes.

    To be pedantic, a PlusCal process is an NFA where the transitions are not labeled (or, equivalently, there is only one input symbol).

  • Labels

    These are breakpoints within a PlusCal process where the process may be interrupted by another process. All other actions are uninterruptible, so anything that happens between two successive labels is atomic in PlusCal.

    Note that a label in PlusCal is not the same as a label in traditional NFA terminology. PlusCal labels correspond to NFA states.

  • The either keyword

    PlusCal lets a process non-deterministically branch using the either keyword with one nested subroutine per branch. The model checker explores and verify all possible branches.

  • The with keyword

    A process can non-deterministically choose from zero or more values. Just like with either the model checker explores and verifies all possible choices.

  • The await keyword

    A process can wait until a condition is satisfied before proceeding using the await keyword.

  • The while keyword

    A process can run in a loop until a condition is satisfied

  • The skip keyword

    A process can skip, which does nothing.

I am not (yet) explaining how to implement the following parts of PlusCal:

  • Global or process-local state

  • Temporal expressions

    Note that there is a Haskell package named spectacle for temporal logic in Haskell that a coworker of mine published, which I might be able to use for this purpose, but I haven’t attempted to incorporate that, yet.

  • The print keyword

  • The assert keyword

  • The goto keyword

  • The model checker

    We need to be able to model temporal expressions in order to implement the model checker and I haven’t yet incorporated temporal expressions into my implementation.

I believe the implementation I describe in this post can be extended with those missing features (with the exception of goto) and I’ve already privately implemented some of them. However, I omit them because they would interfere with the subject of this post.

Also, some PlusCal features we’ll get for free by virtue of embedding PlusCal in Haskell:

  • (Non-temporal) Expressions

    We get all of TLA+’s non-temporal functionality (e.g. scalars, functions, composite data structures) for free from the Haskell language and its standard library. Plus we also access new functionality (such as algebraic data types) via Haskell.

  • Procedures (including the call and return keyword)

    We won’t need to explicitly implement support for these keywords. We can already define and invoke procedures in Haskell using do notation, which we can overload to work with PlusCal processes.

  • Modules

    We can model PlusCal modules using a combination of Haskell modules and Haskell functions.

User experience

The best way to illustrate what I have in mind is through some sample working Haskell code.

First, I’d like to be able to translate a PlusCal process like this:

begin
A:
  skip;
B:
  either
    C:
      skip;
  or
    skip;
  end either;
D:
  skip;
end process

… into a Haskell process like this:

strings :: Coroutine String
strings = Begin "A" do
    yield "B"
    either
        [ yield "C"
        , skip
        ]
    yield "D"
    end

The type of our strings process is Coroutine String, meaning that it is a Coroutine whose labels are Strings. The reason we specify the type of the labels is because the Haskell eDSL permits labels of any type and these labels might not necessarily be strings.

For example, suppose PlusCal permitted integers for labels, like this:

begin
0:
  with x \in { 1, 2, 3 } do
    await (x % 2 = 0);
    x: (* Suppose we can even create dynamic labels from values in scope *)
      skip;
end process

Then, the equivalent Haskell process would be:

ints :: Coroutine Int
ints = Begin 0 do
    x <- with [ 1, 2, 3 ]
    await (even x)
    yield x
    end

I mentioned in the introduction that we will be able to combine processes into a composite process, which looks like this:

pairs :: Coroutine (String, Int)
pairs = do
    string <- strings
    int    <- ints
    return (string, int)

The Haskell code evaluates those three processes to canonical normal forms representing the evolution of labels for each process.

For example, the canonical normal form for the strings process is:

Begin "A" [ Yield "B" [ Yield "C" [ Yield "D" [] ] , Yield "D" [] ] ]

… representing the following NFA with these labels and transitions:

A → B → D
     ↘ ↗
      C

In other words, this normal form data structure uses lists to model non-determinism (one list element per valid transition) and nesting to model sequential transitions.

Similarly, the canonical normal form for the ints process is:

Begin 0 [ Yield 2 [] ]

… representing the following NFA with these labels and transitions:

0
↓
2

Finally, the canonical normal form for the composite pairs process is:

Begin ( "A" , 0 )
  [ Yield ( "B" , 0 )
      [ Yield ( "C" , 0 )
          [ Yield ( "D" , 0 ) [ Yield ( "D" , 2 ) [] ]
          , Yield ( "C" , 2 ) [ Yield ( "D" , 2 ) [] ]
          ]
      , Yield ( "D" , 0 ) [ Yield ( "D" , 2 ) [] ]
      , Yield ( "B" , 2 )
          [ Yield ( "C" , 2 ) [ Yield ( "D" , 2 ) [] ]
          , Yield ( "D" , 2 ) []
          ]
      ]
  , Yield ( "A" , 2 )
      [ Yield ( "B" , 2 )
          [ Yield ( "C" , 2 ) [ Yield ( "D" , 2 ) [] ]
          , Yield ( "D" , 2 ) []
          ]
      ]
  ]

… which is the Cartesian product of the two smaller NFAs:

(A,0) → (B,0)   →   (D,0)
             ↘     ↗
              (C,0)    
  ↓       ↓     ↓     ↓
              (C,2)
             ↗     ↘
(A,2) → (B,2)   →   (D,2)

… and if we convert the Haskell composite process to the equivalent PlusCal process we would get:

begin
  (A,0):
    skip;
  either
    (B,0):
      either
        (C,0):
          either
            (D,0):
              skip;
            (D,2):
              skip;
          or
            (C,2):
              skip;
            (D,2):
              skip;
          end either;
      or
        (D,0):
          skip;
        (D,2):
          skip;
      end either;
  or
    (A,2):
      skip;
    (B,2):
      skip;
    either
      (C,2):
        skip;
      (D,2):
        skip;
    or
      (D,2):
        skip;
    end either;
  end either;
end process

This composite process is indistinguishable from the two input processes, meaning that:

  • This composite process is interruptible whenever one of the two original input processes is interruptible

  • This composite process performs an atomic state transition whenever one of the two input processes performs a state transition

  • The current label of the composite process is a function of the current label of the two original input processes

The Process type

Now I’ll explain how to implement this subset of PlusCal as an eDSL in Haskell.

First, we begin from the following two mutually-recursive types which represent NFAs where there are zero more labeled states but the transitions are unlabeled:

newtype Process label result = Choice [Step label result]

data Step label result = Yield label (Process label result) | Done result

The Process type handles the non-deterministic transitions for our NFA: a Process stores zero or more valid state transitions (represented as a list of valid next Steps).

The Step type handles the labeled states for our NFA: a Step can either Yield or be Done:

  • A step that Yields includes the label for the current state and the remainder of the Process

  • A step that is Done includes a result

    This result is used to thread data bound by one step to subsequent steps. For example, this is how our previous example was able to pass the x bound by with to the subsequent yield command:

    ints = Begin 0 do
        x <- with [ 1, 2, 3 ]
    
        yield x
    

We’ll add support for rendering the data structures for debugging purposes:

{-# LANGUAGE DerivingStrategies         #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE UndecidableInstances       #-}

import GHC.Exts (IsList(..))

newtype Process label result = Choice [Step label result]
    deriving newtype (IsList, Show)

data Step label result = Yield label (Process label result) | Done result
    deriving stock (Show)

I make use of a small trick here where I derive IsList for the Process type This lets can omit the Choice constructor when creating or rendering values of type Process. That simplifies Haskell code like this:

Begin "A"
  (Choice
     [ Yield "B"
         (Choice
            [ Yield "C" (Choice [ Yield "D" (Choice []) ])
            , Yield "D" (Choice [])
            ])
     ])

… to this more ergonomic syntax if we enable the OverloadedLists extension:

Begin "A" [ Yield "B" [ Yield "C" [ Yield "D" [] ] , Yield "D" [] ] ]

do notation

However, we don’t expect users to author these data structures by hand. Instead, we can use Haskell’s support for overloading do notation so that users can author Process values using subroutine-like syntax. In order to do this we need to implement the Monad class for Process, like this:

{-# LANGUAGE BlockArguments #-}

import qualified Control.Monad as Monad

instance Monad (Process label) where
    Choice ps >>= f = Choice do
        p <- ps
        case p of
            Yield label rest -> do
                return (Yield label (rest >>= f))
            Done result -> do
                let Choice possibilities = f result
                possibilities

instance Applicative (Process label) where
    pure result = Choice (pure (Done result))

    (<*>) = Monad.ap

instance Functor (Process label) where
    fmap = Monad.liftM

… and we also need to provide the following utility function:

yield :: label -> Process label ()
yield label = Choice [Yield label (pure ())]

Equipped with that Monad instance we can now author code like this:

example :: Process Int ()
example = do
    yield 1
    yield 2
    yield 3

… and that superficially imperative code actually evaluates to a pure data structure:

[ Yield 1 [ Yield 2 [ Yield 3 [ Done () ] ] ] ]

The reason that works is because any type that implements Monad can be sequenced using do notation, because do notation is syntactic sugar for the (>>=) operator we defined in our Monad instance. To learn more, read:

We can also now implement skip, which is just a synonym for pure () (i.e. “do nothing”):

skip :: Process label ()
skip = pure ()

Unlike PlusCal, we don’t need to use skip most of the time. In particular, we don’t need to add a skip in between two labels if nothing happens. For example, suppose we try to translate this PlusCal code:

1:
  skip;
2:
  skip;

… to Haskell:

example :: Process Int ()
example = do
    yield 1
    skip
    yield 2
    skip

We don’t actually need those skips. The following code without skips is the exact same Process:

example :: Process Int ()
example = do
    yield 1
    yield 2

Both ways of writing the Process produce the same result, which is:

[ Yield 1 [ Yield 2 [ Done () ] ] ]

Finally, we can implement the while keyword as an ordinary Haskell function:

while :: Process label Bool -> Process label () -> Process label ()
while condition body = do
    bool <- condition
    Monad.when bool do
        body
        while condition body

In fact, there’s nothing really PlusCal-specific about this utility; this functionality already exists as Control.Monad.Loops.whileM_.

Alternation

So far we can only sequence commands, but we’d also like to be able to branch non-deterministically. Fortunately, Haskell has a standard API for doing that, too, which is the Alternative class. We can implement Alternative for our Process type like this:

import Control.Applicative (Alternative(..))

instance Alternative (Process label) where
    empty = Choice empty

    Choice stepsL <|> Choice stepsR = Choice (stepsL <|> stepsR)

We’ll also define end to be a synonym for empty:

end :: Process label a
end = empty

In other words, if you end a Process there are no more valid transitions.

When we implement this Alternative instance we can make use of several general-purpose utilities that work for any type that implements Alternative. One such utility is Data.Foldable.asum, which behaves exactly the same as PlusCal’s either keyword:

import Prelude hiding (either)

import qualified Data.Foldable as Foldable

either :: [Process label result] -> Process label result
either = Foldable.asum

We could also write the implementation of either by hand if we wanted to, which would be:

either (process : processes) = process <|> either processes
either            []         = empty

… which is exact same behavior as using asum.

In other words, given a list of Processes, we can try all of them in parallel using either. All that either does is concatenate the lists of valid Steps for each of the input Processes.

Control.Monad.guard is another utility we get for free by virtue of implementing Alternative and guard behaves in the exact same way as PlusCal’s await keyword:

await :: Bool -> Process label ()
await = Monad.guard

We could also write the implementation of await by hand if we wanted to, which would be:

await True  = skip
await False = end

… which is the same behavior as using guard.

Finally, we can implement with in terms of either, like this:

with :: [result] -> Process label result
with results = either (map pure results)

In other words, you can implement with as if it were an either statement with one branch per value that you want to bind. However, you have to promote each value to a trivial Process (using pure) in order to combine them using either.

Cartesian product of NFAs

In the introduction I noted that we can model multiple processes in PlusCal by computing the Cartesian product of their NFAs. This section explains that in more detail, first as prose and then followed by Haskell code.

Informally, the Cartesian product of zero or more “input” NFAs creates a composite “output” NFA where:

  • The set of possible states for the output NFA is the Cartesian product of the set of possible states for each input NFA

    In other words, if we have two input NFAs and the first NFA permits the following states:

    [ A, B, C ]

    … and the second NFA permits the following states:

    [ 0, 1, 2 ]

    … then the Cartesian product of those two sets of states is:

    [ (A,0), (B,0), (C,0), (A,1), (B,1), (C,1), (A,2), (B,2), (C,2) ]

    … which is our composite set of possible states.

  • The starting state for our output NFA is the Cartesian product of the starting states for each input NFA

    In other words, if our first NFA has a starting state of:

    A

    … and our second has a starting state of:

    0

    … then the Cartesian product of those two starting states is:

    (A,0)

    … which is our composite starting state.

  • The set of valid transitions for any output state is the union of the set of valid transitions for the input sub-states

    In other words, if the state A can transition to either state B or C:

    [ B, C ]

    … and the state 0 can transition to only state 2:

    [ 2 ]

    … then the state (A,0) can transition to any of these states:

    [ (B, 0), (C, 0), (A, 2) ]

    … and this is because:

    • If state A transitions to state B, then our composite state (A,0) transitions to state (B,0)

    • If state A transitions to state C, then our composite state (B,0) transitions to state (C,0)

    • If state 0 transitions to state 2, then our composite state (A,0) transitions to state (A,2)

Applicative as Cartesian product

Haskell’s standard library defines the following Applicative class:

class Applicative f where
    pure :: a -> f a

    (<*>) :: f (a -> b) -> f a -> f b

…and you can think of Haskell’s Applicative class as (essentially) an interface for arbitrary N-ary Cartesian products, meaning that any type that implements an Applicative instance gets the following family of functions for free:

-- The 0-ary Cartesian product
join0 :: Applicative f => f ()
join0 = pure ()

-- The unary Cartesian product
join1 :: Applicative f => f a -> f a
join1 as = pure id <*> as

-- The binary Cartesian product
join2 :: Applicative f => f a -> f b -> f (a, b)
join2 as bs = pure (,) <*> as <*> bs

-- The trinary Cartesian product
join3 :: Applicative f => f a -> f b -> f c -> f (a, b, c)
join3 as bs cs = pure (,,) <*> as <*> bs <*> cs

… and so on. I deliberately implemented some of those functions in a weird way to illustrate the overall pattern.

This means that if we implement Applicative for our NFA type then we can use that interface to create arbitrary N-ary Cartesian products of NFAs.

The Coroutine type

The Process type does implement an Applicative instance, but this is a different (more boring) Cartesian product and not the one we’re interested in. In fact, the Process type cannot implement the instance we’re interested in (the Cartesian product of NFAs), because our Process type is not a complete NFA: our Process type is missing a starting state.

This is what the following Coroutine type fixes by extending our Process type with an extra field for the starting state:

import Data.Void (Void)

data Coroutine label = Begin label (Process label Void)
    deriving stock (Show)

We also constrain the Process inside of a Coroutine to return Void (the impossible/uninhabited type). Any Process that ends with no valid transitions will satisfy this type, such as a process that concludes with an end or empty:

example :: Coroutine Int
example = Begin 0 do
    yield 1
    yield 2
    end

Once we add in the starting state we can implement Applicative for our Coroutine type, which is essentially the same thing as implementing the Cartesian product of NFAs:

instance Applicative Coroutine where
    -- The empty (0-ary) Cartesian product has only a single valid state, which
    -- is also the starting state, and no possible transitions.
    pure label = Begin label empty

    -- The (sort of) binary Cartesian product …
    (<*>)
        -- … of a first NFA …
        f@(Begin label0F (Choice fs))
        -- … and a second NFA …
        x@(Begin label0X (Choice xs)) =
        Begin
            -- … has a starting state which is (sort of) the product of the
            -- first and second starting states
            (label0F label0X)
            -- … and the set of valid transitions is the union of valid
            -- transitions for the first and second NFAs
            (Choice (fmap adaptF fs <|> fmap adaptX xs))
          where
            -- If the first NFA transitions, then we don't disturb the state
            -- of the second NFA
            adaptF (Done result) = Done result
            adaptF (Yield labelF restF) = Yield labelFX restFX
              where
                Begin labelFX restFX = Begin labelF restF <*> x

            -- If the second NFA transitions, then we don't disturb the state
            -- of the first NFA
            adaptX (Done result) = Done result
            adaptX (Yield labelX restX) = Yield labelFX restFX
              where
                Begin labelFX restFX = f <*> Begin labelX restX

I use “sort of” in the comments to indicate that (<*>) is not actually the binary Cartesian product, but it’s spiritually similar enough.

Moreover, we can reassure ourselves that this Applicative instance is well-behaved because this instance satisfies the Applicative laws. See Appendix D for the proof of all four laws for the above instance.

ApplicativeDo

Haskell is not the only language that defines an Applicative abstraction. For example, the Cats package in the Scala ecosystem also defines an Applicative abstraction, too.

However, the Haskell ecosystem has one edge over other languages (including Scala), which is language support for types that only implement Applicative and not Monad.

Specifically, Haskell has an ApplicativeDo extension, which we can use to combine values for any Applicative type using do notation. In fact, this is why the original example using pairs worked, because of that extension:

{-# LANGUAGE ApplicativeDo #-}

pairs :: Coroutine (String, Int)
pairs = do
    string <- strings
    int    <- ints
    return (string, int)

When we enable that extension the Haskell compiler desugars that do notation to something like this:

pairs = f <$> strings <*> ints
  where
    f string int = (string, int)

… and in my opinion the version using do notation is more readable and ergonomic.

Normally do notation only works on types that implement Monad, but when we enable the ApplicativeDo extension then a subset of do notation works for types that implement Applicative (which is superset of the types that implement Monad).

Our Coroutine type is one such type that benefits from this ApplicativeDo extension. The Coroutine type can implement Applicative, but not Monad, so the only way we can use do notation for Coroutines is if we enable ApplicativeDo.

The ApplicativeDo extension also plays very nicely with Haskell’s support for NamedFieldPuns or RecordWildCards. For example, instead of packing the labels into a tuple we could instead slurp them into a record as fields of the same name:

{-# LANGUAGE RecordWildCards #-}

data Status = Status
    { string :: String
    , int    :: Int
    }

pairs :: Coroutine Status
pairs = do
    string <- strings
    int    <- ints
    return Status{..}

… and this scales really well to a large number of Coroutines.

Conclusion

I’ve included the complete implementation from this post in Appendix A if you want to test this code out on your own. Once I’m done with the complete embedding of PlusCal in Haskell I’ll publish something a bit more polished.

If you enjoyed this post, you’ll probably also enjoy this paper:

… which was the original paper that introduced the Applicative abstraction.

I also left some “bonus commentary” in Appendix B and Appendix C for a few digressions that didn’t quite make the cut for the main body of this post.

Appendix A - Complete implementation

The following module only depends on the pretty-show package (and not even that if you delete the main subroutine).

{-# LANGUAGE ApplicativeDo              #-}
{-# LANGUAGE BlockArguments             #-}
{-# LANGUAGE DerivingStrategies         #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE UndecidableInstances       #-}

import GHC.Exts (IsList(..))
import Control.Applicative (Alternative(..), liftA2)
import Data.Void (Void)
import Prelude hiding (either)

import qualified Control.Applicative as Applicative
import qualified Control.Monad as Monad
import qualified Data.Foldable as Foldable
import qualified Text.Show.Pretty as Pretty

newtype Process label result = Choice [Step label result]
    deriving newtype (IsList, Show)

data Step label result = Yield label (Process label result) | Done result
    deriving stock (Show)

instance Functor (Process label) where
    fmap = Monad.liftM

instance Applicative (Process label) where
    pure result = Choice (pure (Done result))

    (<*>) = Monad.ap

instance Monad (Process label) where
    Choice ps >>= f = Choice do
        p <- ps
        case p of
            Yield label rest -> do
                return (Yield label (rest >>= f))
            Done result -> do
                let Choice possibilities = f result
                possibilities

instance Semigroup result => Semigroup (Process label result) where
    (<>) = liftA2 (<>)

instance Monoid result => Monoid (Process label result) where
    mempty = pure mempty

instance Alternative (Process label) where
    empty = Choice empty

    Choice stepsL <|> Choice stepsR = Choice (stepsL <|> stepsR)

data Coroutine label = Begin label (Process label Void)
    deriving stock (Show)

instance Functor Coroutine where
    fmap = Applicative.liftA

instance Applicative Coroutine where
    pure label = Begin label empty

    (<*>)
        f@(Begin label0F (Choice fs))
        x@(Begin label0X (Choice xs)) =
        Begin (label0F label0X) (Choice (fmap adaptF fs <|> fmap adaptX xs))
          where
            adaptF (Done result) = Done result
            adaptF (Yield labelF restF) = Yield labelFX restFX
              where
                Begin labelFX restFX = Begin labelF restF <*> x

            adaptX (Done result) = Done result
            adaptX (Yield labelX restX) = Yield labelFX restFX
              where
                Begin labelFX restFX = f <*> Begin labelX restX

instance Semigroup label => Semigroup (Coroutine label) where
    (<>) = liftA2 (<>)

instance Monoid label => Monoid (Coroutine label) where
    mempty = pure mempty

yield :: label -> Process label ()
yield label = Choice [Yield label (pure ())]

either :: [Process label result] -> Process label result
either = Foldable.asum

await :: Bool -> Process label ()
await = Monad.guard

with :: [result] -> Process label result
with results = either (map pure results)

end :: Process label a
end = empty

skip :: Process label ()
skip = mempty

strings :: Coroutine String
strings = Begin "A" do
    yield "B"
    either
        [ yield "C"
        , skip
        ]
    yield "D"
    end

ints :: Coroutine Int
ints = Begin 0 do
    x <- with [ 1, 2, 3 ]
    await (even x)
    yield x
    end

pairs :: Coroutine (String, Int)
pairs = do
    string <- strings
    int    <- ints
    return (string, int)

main :: IO ()
main = do
    Pretty.pPrint strings
    Pretty.pPrint ints
    Pretty.pPrint pairs

If you run that it will print:

Begin
  "A" [ Yield "B" [ Yield "C" [ Yield "D" [] ] , Yield "D" [] ] ]
Begin 0 [ Yield 2 [] ]
Begin
  ( "A" , 0 )
  [ Yield
      ( "B" , 0 )
      [ Yield
          ( "C" , 0 )
          [ Yield ( "D" , 0 ) [ Yield ( "D" , 2 ) [] ]
          , Yield ( "C" , 2 ) [ Yield ( "D" , 2 ) [] ]
          ]
      , Yield ( "D" , 0 ) [ Yield ( "D" , 2 ) [] ]
      , Yield
          ( "B" , 2 )
          [ Yield ( "C" , 2 ) [ Yield ( "D" , 2 ) [] ]
          , Yield ( "D" , 2 ) []
          ]
      ]
  , Yield
      ( "A" , 2 )
      [ Yield
          ( "B" , 2 )
          [ Yield ( "C" , 2 ) [ Yield ( "D" , 2 ) [] ]
          , Yield ( "D" , 2 ) []
          ]
      ]
  ]

Appendix B - Free Monads

The implementation for Process is actually a special case of a free monad transformer, except that I’ve hand-written the types and instances so that the types are easier to inspect. However, if we really wanted to code golf all of this we could have replaced all of that code with just these three lines:

import Control.Monad.Trans.Free (FreeT, liftF)

type Process label = FreeT ((,) label) []

yield label = liftF (label, ())

… and that would have behaved the exact same (including the same Monad and Alternative instances). You can read that as essentially saying: “A Process is a subroutine that alternates between emitting a label (i.e. (,) label) and branching non-deterministically (i.e. [])”.

Appendix C - Monoid / Semigroup

If we want to be extra clever, we can implement Semigroup and Monoid instances for Process as suggested in this post:

… which we would do like this:

import Control.Applicative (liftA2)

instance Semigroup result => Semigroup (Process label result) where
    (<>) = liftA2 (<>)
    
instance Monoid result => Monoid (Process label result) where
    mempty = pure mempty

… and then we can simplify skip a tiny bit further to:

skip = mempty

Appendix D - Proof of the Applicative laws

The first Applicative law requires that:

pure id <*> v = v

Proof:

pure id <*> v

-- Definition of `pure`:
--
-- pure label = Begin label empty
= Begin id empty <*> v

-- Define:
--
-- v = Begin label0X (Choice xs)
= Begin id empty <*> Begin label0X (Choice xs)

-- Definition of `(<*>)`
= Begin (id label0X) (Choice (fmap adaptF empty <|> fmap adaptX xs))
  where
    adaptF (Done result) = Done result
    adaptF (Yield labelF restF) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin labelF restF <*> Begin label0X (Choice xs)

    adaptX (Done result) = Done result
    adaptX (Yield labelX restX) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin id empty <*> Begin labelX restX

-- fmap f empty = empty
= Begin (id label0X) (Choice (empty <|> fmap adaptX xs))
  where
    adaptX (Done result) = Done result
    adaptX (Yield labelX restX) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin id empty <*> Begin labelX restX

-- empty <|> xs = xs
= Begin (id label0X) (Choice (fmap adaptX xs))
  where
    adaptX (Done result) = Done result
    adaptX (Yield labelX restX) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin id empty <*> Begin labelX restX

-- Definition of `pure`, in reverse
--
-- pure label = Begin label empty
= Begin (id label0X) (Choice (fmap adaptX xs))
  where
    adaptX (Done result) = Done result
    adaptX (Yield labelX restX) = Yield labelFX restFX
      where
        Begin labelFX restFX = pure id <*> Begin labelX restX

-- Induction: pure id <*> v = v
= Begin (id label0X) (Choice (fmap adaptX xs))
  where
    adaptX (Done result) = Done result
    adaptX (Yield labelX restX) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin labelX restX

-- Simplify
= Begin (id label0X) (Choice (fmap adaptX xs))
  where
    adaptX (Done result) = Done result
    adaptX (Yield labelX restX) = Yield labelX restX

-- Simplify
= Begin (id label0X) (Choice (fmap adaptX xs))
  where
    adaptX = id

-- Functor identity law:
--
-- fmap id = id
= Begin (id label0X) (Choice (id xs))

-- Definition of `id`:
--
-- id x = x
= Begin label0X (Choice xs)

-- Definition of `v`, in reverse
--
-- v = Begin label0X (Choice xs)
= v

The second Applicative law requires that:

pure (.) <*> u <*> v <*> w = u <*> (v <*> w)

Proof:

pure (.) <*> u <*> v <*> w

-- Define:
--
-- u = Begin label0U (Choice us)
-- v = Begin label0V (Choice vs)
-- w = Begin label0W (Choice ws)
=     pure (.)
  <*> Begin label0U (Choice us)
  <*> Begin label0V (Choice vs)
  <*> Begin label0W (Choice ws)

-- Definition of `pure`:
=     Begin (.) empty
  <*> Begin label0U (Choice us)
  <*> Begin label0V (Choice vs)
  <*> Begin label0W (Choice ws)

-- Definition of `(<*>)`
=     Begin ((.) label0U) (Choice (fmap adaptF empty <|> fmap adaptU us))
  <*> Begin label0V (Choice vs)
  <*> Begin label0W (Choice ws)
  where
    adaptF (Done result) = Done result
    adaptF (Yield labelF restF) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin labelF restF <*> Begin label0U (Choice us)

    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFU restFU
      where
        Begin labelFU restFU = Begin (.) empty <*> Begin labelU restU

-- fmap f empty = empty
=     Begin ((.) label0U) (Choice (fmap adaptU us))
  <*> Begin label0V (Choice vs)
  <*> Begin label0W (Choice ws)
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFU restFU
      where
        Begin labelFU restFU = Begin (.) empty <*> Begin labelU restU

-- Definition of `(<*>)`
=     Begin (label0U . label0V)
          (Choice (fmap adaptFU (fmap adaptU us) <|> fmap adaptV vs))
  <*> Begin label0W (Choice ws)
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFU restFU
      where
        Begin labelFU restFU = Begin (.) empty <*> Begin labelU restU

    adaptFU (Done result) = Done result
    adaptFU (Yield labelFU restFU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
            Begin labelFU restFU <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin ((.) label0U) (Choice (fmap adaptU us))
            <*> Begin labelV restV

-- Definition of `(<*>)`, in reverse
=     Begin (label0U . label0V)
          (Choice (fmap adaptFU (fmap adaptU us) <|> fmap adaptV vs))
  <*> Begin label0W (Choice ws)
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFU restFU
      where
        Begin labelFU restFU = Begin (.) empty <*> Begin labelU restU

    adaptFU (Done result) = Done result
    adaptFU (Yield labelFU restFU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
            Begin labelFU restFU <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV

-- fmap f (fmap g x) = fmap (f . g) x
=     Begin (label0U . label0V)
          (Choice (fmap (adaptFU . adaptU) us <|> fmap adaptV vs))
  <*> Begin label0W (Choice ws)
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFU restFU
      where
        Begin labelFU restFU = Begin (.) empty <*> Begin labelU restU

    adaptFU (Done result) = Done result
    adaptFU (Yield labelFU restFU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
            Begin labelFU restFU <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV

-- Consolidate (adaptFU . adaptU)
=     Begin (label0U . label0V) (Choice (fmap adaptU us <|> fmap adaptV vs))
  <*> Begin label0W (Choice ws)
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin labelU restU
            <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV

-- Definition of `(<*>)`
=     Begin (label0U (label0V label0W))
          (Choice
              (   fmap adaptFUV (fmap adaptU us <|> fmap adaptV vs)
              <|> fmap adaptW ws
              )
          )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin labelU restU
            <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV

    adaptFUV (Done result) = Done result
    adaptFUV (Yield labelFUV restFUV) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin labelFUV restFUV
            <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptW (Yield labelW restW) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin (label0U . label0V)
                    (Choice (fmap adaptU us <|> fmap adaptV vs))
            <*> Begin labelW restW

-- Definition of `(<*>)`, in reverse
=     Begin (label0U (label0V label0W))
          (Choice
              (   fmap adaptFUV (fmap adaptU us <|> fmap adaptV vs)
              <|> fmap adaptW ws
              )
          )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin labelU restU
            <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV

    adaptFUV (Done result) = Done result
    adaptFUV (Yield labelFUV restFUV) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin labelFUV restFUV
            <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptW (Yield labelW restW) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin label0V (Choice vs)
            <*> Begin labelW restW

-- fmap f (x <|> y) = fmap f x <|> fmap f y
=     Begin (label0U (label0V label0W))
          (Choice
              (   fmap adaptFUV (fmap adaptU us)
              <|> fmap adaptFUV (fmap adaptV vs)
              <|> fmap adaptW ws
              )
          )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin labelU restU
            <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV

    adaptFUV (Done result) = Done result
    adaptFUV (Yield labelFUV restFUV) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin labelFUV restFUV
            <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptW (Yield labelW restW) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin label0V (Choice vs)
            <*> Begin labelW restW

-- fmap f (fmap g x) = fmap (f . g) x
=     Begin (label0U (label0V label0W))
          (Choice
              (   fmap (adaptFUV . adaptU) us
              <|> fmap (adaptFUV . adaptV) vs
              <|> fmap adaptW ws
              )
          )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin labelU restU
            <*> Begin label0V (Choice vs)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV

    adaptFUV (Done result) = Done result
    adaptFUV (Yield labelFUV restFUV) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin labelFUV restFUV
            <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptW (Yield labelW restW) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin label0V (Choice vs)
            <*> Begin labelW restW

-- Consolidate (adaptFUV . adaptU) and (adaptFUV . adaptV)
=     Begin (label0U (label0V label0W))
          (Choice
              (   fmap adaptU us
              <|> fmap adaptV vs
              <|> fmap adaptW ws
              )
          )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin labelU restU
            <*> Begin label0V (Choice vs)
            <*> Begin label0W (Choice ws)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV
            <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptW (Yield labelW restW) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin (.) empty
            <*> Begin label0U (Choice us)
            <*> Begin label0V (Choice vs)
            <*> Begin labelW restW

-- Definition of `pure`, in reverse
--
-- pure label = Begin label empty
=     Begin (label0U (label0V label0W))
          (Choice
              (   fmap adaptU us
              <|> fmap adaptV vs
              <|> fmap adaptW ws
              )
          )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                pure (.)
            <*> Begin labelU restU
            <*> Begin label0V (Choice vs)
            <*> Begin label0W (Choice ws)

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                pure (.)
            <*> Begin label0U (Choice us)
            <*> Begin labelV restV
            <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptW (Yield labelW restW) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                pure (.)
            <*> Begin label0U (Choice us)
            <*> Begin label0V (Choice vs)
            <*> Begin labelW restW

-- Induction: pure (.) <*> u <*> v <*> w = u <*> (v <*> w)
=     Begin (label0U (label0V label0W))
          (Choice
              (   fmap adaptU us
              <|> fmap adaptV vs
              <|> fmap adaptW ws
              )
          )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin labelU restU
            <*> (   Begin label0V (Choice vs)
                <*> Begin label0W (Choice ws)
                )

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelFUV restFUV
      where
        Begin labelFUV restFUV =
                Begin label0U (Choice us)
            <*> (   Begin labelV restV
                <*> Begin label0W (Choice ws)
                )

    adaptW (Done result) = Done result
    adaptW (Yield labelW restW) = Yield labelFUVW restFUVW
      where
        Begin labelFUVW restFUVW =
                Begin label0U (Choice us)
            <*> (   Begin label0V (Choice vs)
                <*> Begin labelW restW
                )

-- Unfactor up `adaptV` into `adaptVW . adaptV` and `adaptW` into
-- `adaptVW . adaptW`
= Begin (label0U (label0V label0W))
      (Choice
          (   fmap adaptU us
          <|> fmap adaptVW (fmap adaptV vs)
          <|> fmap adaptVW (adaptW ws)
          )
      )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelUVW restUVW
      where
        Begin labelUVW restUVW =
                Begin labelU (Choice us)
            <*> (   Begin label0V (Choice vs)
                <*> Begin label0W (Choice ws)
                )

    adaptVW (Done result) = Done result
    adaptVW (Yield labelVW restVW) = Yield labelUVW restUVW
      where
        Begin labelUVW restUVW =
                Begin label0U (Choice us)
            <*> Begin labelVW restVW

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelVW restVW
      where
        Begin labelVW restVW = Begin labelV restV <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptV (Yield labelW restW) = Yield labelVW restVW
      where
        Begin lableVW restVW = Begin label0V (Choice vs) <*> Begin labelW restW

-- fmap f (x <|> y) = fmap f x <|> fmap f y
= Begin (label0U (label0V label0W))
      (Choice
          (   fmap adaptU us
          <|> fmap adaptVW (fmap adaptV vs <|> fmap adaptW ws)
          )
      )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelUVW restUVW
      where
        Begin labelUVW restUVW =
                Begin labelU (Choice us)
            <*> (   Begin label0V (Choice vs)
                <*> Begin label0W (Choice ws)
                )

    adaptVW (Done result) = Done result
    adaptVW (Yield labelVW restVW) = Yield labelUVW restUVW
      where
        Begin labelUVW restUVW =
                Begin label0U (Choice us)
            <*> Begin labelVW restVW

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelVW restVW
      where
        Begin labelVW restVW = Begin labelV restV <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptV (Yield labelW restW) = Yield labelVW restVW
      where
        Begin lableVW restVW = Begin label0V (Choice vs) <*> Begin labelW restW

-- Definition of `(<*>)`
= Begin (label0U (label0V label0W))
      (Choice
          (   fmap adaptU us
          <|> fmap adaptVW (fmap adaptV vs <|> fmap adaptW ws)
          )
      )
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelUVW restUVW
      where
        Begin labelUVW restUVW =
                Begin labelU (Choice us)
            <*> Begin (label0V label0W)
                    (Choice (fmap adaptV vs <|> fmap adaptW ws))

    adaptVW (Done result) = Done result
    adaptVW (Yield labelVW restVW) = Yield labelUVW restUVW
      where
        Begin labelUVW restUVW =
                Begin label0U (Choice us)
            <*> Begin labelVW restVW

    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelVW restVW
      where
        Begin labelVW restVW = Begin labelV restV <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptV (Yield labelW restW) = Yield labelVW restVW
      where
        Begin lableVW restVW = Begin label0V (Choice vs) <*> Begin labelW restW

-- Definition of `(<*>)`, in reverse
=     Begin label0U (Choice us)
  <*> Begin (label0V label0W)
          (Choice (fmap adaptV vs <|> fmap adaptW ws))
  where
    adaptV (Done result) = Done result
    adaptV (Yield labelV restV) = Yield labelVW restVW
      where
        Begin labelVW restVW = Begin labelV restV <*> Begin label0W (Choice ws)

    adaptW (Done result) = Done result
    adaptV (Yield labelW restW) = Yield labelVW restVW
      where
        Begin lableVW restVW = Begin label0V (Choice vs) <*> Begin labelW restW

-- Definition of `(<*>)`, in reverse
=     Begin label0U (Choice us)
  <*> (   Begin label0V (Choice vs)
      <*> Begin label0W (Choice ws)
      )

-- Definition of `u`, `v`, `w`, in reverse:
--
-- u = Begin label0U (Choice us)
-- v = Begin label0V (Choice vs)
-- w = Begin label0W (Choice ws)
= u <*> (v <*> w)

The third Applicative law requires that:

pure f <*> pure x = pure (f x)

Proof:

pure f <*> pure x

-- Definition of `pure`
--
-- pure label = Begin label empty
= Begin f empty <*> Begin x empty

-- Definition of `(<*>)`
= Begin (f x) (fmap adaptF empty <|> fmap adaptX empty)
  where
    adaptF (Done resultF) = Done resultF
    adaptF (Yield labelF restF) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin labelF restF <*> Begin x empty

    adaptX (Done resultX) = Done resultX
    adaptX (Yield labelX restX) = Yield labelFX restFX
      where
        Begin labelFX restFX = Begin f empty <*> Begin labelX restX

-- fmap f empty = empty
= Begin (f x) (empty <|> empty)

-- empty <|> empty = empty
= Begin (f x) empty

-- Definition of `pure`, in reverse
--
-- pure label = Begin label empty
= pure (f x)

The fourth Applicative law requires that:

u <*> pure y = pure ($ y) <*> u

Proof:

u <*> pure y

-- Define:
--
-- u = Begin labelU0 (Choice us)
= Begin labelU0 (Choice us) <*> pure y

-- Definition of `pure`
--
-- pure label = Begin label empty
= Begin labelU0 (Choice us) <*> Begin y empty

-- Definition of `(<*>)`
= Begin (labelU0 y) (Choice (fmap adaptU us <|> fmap adaptY empty)
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelUY restUY
      where
        Yield labelUY restUY = Begin labelU restU <*> Begin y empty

    adaptY (Done result) = Done result
    adaptY (Yield labelY restY) = Yield labelUY restUY
      where
        Yield labelUY restUY = Begin labelU0 (Choice us) <*> Begin labelY restY

-- fmap f empty = empty
= Begin (labelU0 y) (Choice (fmap adaptU us))
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelUY restUY
      where
        Yield labelUY restUY = Begin labelU restU <*> Begin y empty

-- Definition of `pure`, in reverse
--
-- pure label = Begin label empty
= Begin (labelU0 y) (Choice (fmap adaptU us))
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelUY restUY
      where
        Yield labelUY restUY = Begin labelU restU <*> pure y

-- Induction: u <*> pure y = pure ($ y) <*> u
= Begin (labelU0 y) (Choice (fmap adaptU us))
  where
    adaptU (Done result) = Done result
    adaptU (Yield labelU restU) = Yield labelUY restUY
      where
        Yield labelUY restUY = pure ($ y) <*> Begin labelU restU

-- Definition of `(<*>)`, in reverse
= Begin ($ y) empty <*> Begin labelU0 (Choice us)

-- Definition of `pure`, in reverse
= pure ($ y) <*> Begin labelU0 (Choice us)

-- Definition of `u`, in reverse:
--
-- u = Begin labelU0 (Choice us)
= pure ($ y) <*> u