Tuesday, May 31, 2022

Generate web forms from pure functions

Generate web forms from pure functions

This is an announcement post for my “Grace browser” project, which you can find here:

This project is a web page which can dynamically convert a wide variety of functional programming expressions to their equivalent HTML. This conversion can even auto-generate interactive web forms from functions, which means that people without web programming knowledge can use the Grace browser to create and share simple and interactive web pages.

Demo

You can quickly get the gist of this project by visiting the following page:

The Grace browser begins with a code editor where you can input a purely functional expression and the above link uses the following example code:

\input ->
    { "x or y" : input.x || input.y
    , "x and y": input.x && input.y
    }

The interpreter then infers the type of the input expression, which for the above example is:

{ "x": Bool, "y": Bool } -> { "x or y": Bool, "x and y": Bool }

… and you can read that type as saying:

This is a function whose input is a record and whose output is also a record. The input record has two boolean-valued fields named “x” and “y” and the output record has two boolean-valued fields named “x or y” and “x and y”.

Here is the novel bit: the interpreter then generates a web interface appropriate for that type. In this case, the equivalent web interface is a form with two inputs:

  • a checkbox labeled “x”
  • a checkbox labeled “y”

… and the form also produces two outputs:

  • a read-only checkbox labeled “x or y”
  • a read-only checkbox labeled “x and y”

The result looks like this:

Screen Shot 2022-05-28 at 11 12 23 AM

Moreover, the generated form is reactive: as you check or uncheck the two input checkboxes the two output checkboxes immediately update in response. In particular:

  • the “x or y” box will be checked whenever either input box is checked
  • the “x and y” box will be checked whenever both input boxes are checked

Screen Shot 2022-05-28 at 11 19 15 AM

This also all runs entirely client side, meaning that all the computation happens in your browser. Specifically:

  • compiling the functional code to an interactive form is done client-side
  • updating form outputs in response to inputs is also done client-side

Intelligent forms

Here’s another example that might further pique your interest:

The above example is a pure function to increment each element of a list:

List/map (\x -> x + 1)

The List/map function is unsaturated, meaning that we’re missing the final argument: the actual list to transform. So the interpreter infers the following type for the function:

List Natural -> List Natural

… and you can read that type as saying:

This is a function whose input is a list of natural numbers and whose output is also a list of natural numbers.

So what kind of interactive form will that generate?

The generated input to the form begins with a blue “+” button that you can click to add elements to the input list. Each time you click the button the form creates a numeric input for that list element alongside a red “-” button that you can click to delete the corresponding list element. As you add or remove elements from the input list the reactive form update will also add or remove elements from the output list, too.

Screen Shot 2022-05-28 at 11 14 45 AM

Moreover, each element in the input list will be a numeric input. As you adjust each input element the matching output element will automatically be set to a number that is one greater than the input number.

The interpreter also sets the permitted range of the numeric inputs based on the inferred type. Since the default numeric type is Natural (i.e. non-negative) numbers, the numeric inputs will forbid negative inputs. However, if you were to add a type annotation to specify an Integer element type:

List/map (\x -> x + 1 : Integer)

… then the generated form will change to permit negative inputs and outputs because then the inferred type would be:

List Integer -> List Integer

Shareable forms

The “Grace browser” is based on the Fall-from-Grace functional programming language (or “Grace” for short), which I previously announced here:

… and one of the features of this language is the ability to import arbitrary expressions by URL, including functions!

That means that if you were to create a useful pure function for others to use you could host your code anywhere that you can serve plain text (such as a GitHub repository or gist) and anybody could turn that function into the corresponding form.

For example, the following gist contains a pure Grace function to compute US federal income taxes for 2022:

… so if you paste the URL for the raw text of the gist into the Grace browser you’ll get a shareable form for computing your taxes:

Screen Shot 2022-05-28 at 11 15 45 AM

This provides a lightweight way to publish, share, and consume utility code.

Plain JSON data

The Grace browser also works for plain data, too. In fact, Grace is a superset of JSON so the Grace browser is also a JSON browser.

For example, I can render the JSON served at https://api.github.com as HTML by pasting that URL into the Grace browser, which produces this result:

Screen Shot 2022-05-28 at 11 16 39 AM

If I’m only interested in one field of the JSON output, I can project out the field of interest like this:

( https://api.github.com/ ).starred_gists_url

Screen Shot 2022-05-28 at 11 17 08 AM

In other words, you can use Grace browser sort of like a “jq but with HTML output”.

Teaching tool

You can also use the Grace browser to teach functional programming concepts, too. For example, you can illustrate the difference between Haskell/Rust-style error-handling and Go-style error-handling by entering this input into the code editor:

{ "Error handling in Haskell and Rust":
    \input -> input : < Failure : Text | Success : Natural >
, "Error handling in Go":
    \input -> input : { "Failure" : Optional Text, "Success": Optional Natural }
}

… and the form will illustrate how:

  • In Haskell and Rust, the failure data and success data are mutually exclusive

    The generated form illustrates this by using mutually exclusive radio buttons for the input.

  • In Go, failure data and success data are not mutually exclusive

    The generated form illustrates this by using checkboxes to independently enable or disable the failure and success data.

Screen Shot 2022-05-28 at 11 18 06 AM

Semantic web

Grace is currently very limited in its current incarnation, meaning that the language only provides a small set of built-in functions and operators. The reason why is because I originally created Grace to serve as a simple reference implementation of how to create a functional programming language and I intended people to fork the language to add any additional language features they needed.

However, if Grace were more featureful then you could imagine creating a “semantic web” of purely functional expressions that could reference each other by URL and be visualized using an intelligent client like the Grace browser. For example, you could have some pure data hosted at https://example.com/students.ffg:

[ { "Name": "John Doe"       , "Grade": 95 }
, { "Name": "Mary Jane"      , "Grade": 98 }
, { "Name": "Alice Templeton", "Grade": 90 }
]

… and then you could create a “view” into that data that adds up all the grades by hosting another expression at https://example.com/total.ffg which could contain:

let sum = https://raw.githubusercontent.com/Gabriella439/grace/main/prelude/natural/sum.ffg

in  sum (List/map (\student -> student."Grade") ./students.ffg)

… and whenever you would update the data hosted at students.ffg the view at total.ffg would automatically update, too. You could then generate a web page for either view of the data using something like the Grace browser.

Conclusion

If this interests you, the website contains a tutorial that you can try out that partially overlaps with the examples from this post:

Just click the “Try the tutorial” button to give it a whirl.

If you want a deeper dive into what the Grace language can do, then I recommend reading the original announcement post:

… or reading the README from the corresponding GitHub project:

Also, I set things up so that if you do fork the language you can easily generate your own “Grace browser” for your fork of the language that’s just a bunch of static assets you can host anywhere. No server-side computation necessary!

Monday, May 9, 2022

The golden rule of software distributions

golden-software-distro

This is a short post documenting a pattern I learned as a user and maintainer of software distributions. I wanted to share this pattern because the lesson was non-obvious to me in my early days as a software engineer.

I call this pattern the “golden rule of software distributions”, which I’ll define the verbose way followed by the concise way.

The verbose version of the golden rule of software distributions is:

If a package manager only permits installing or depending on one version of each of package then a software distribution for that package manager should bless one version of each package. The blessed version for each package must be compatible with the blessed version of every other package.

The concise version of the golden rule of software distributions is:

A locally coherent package manager requires a globally coherent software distribution.

… where:

  • “locally coherent” means that you can only install or depend on one version of each package for a given system or project

  • “globally coherent” means each package has a unique blessed version compatible with every other package’s blessed version

Note that any sufficiently large software distribution will not perfectly adhere to this golden rule. You should view this rule as an ideal that a software distribution aspires to approximate as closely as possible, but there will necessarily be cases where they cannot.

Motivation

I’ll introduce the term “build plan” to explain the motivation behind the golden rule:

A build plan for a package A specifies a version for each dependency of A such that A successfully builds against those dependencies.

To motivate the golden rule, let’s examine what happens when you have a locally coherent package manager but a globally incoherent software distribution:

  • Package users need to do a combinatorial search of their dependencies

    … in order to find a successful build plan. Specifically, they may need to test multiple major versions of their direct and indirect independencies to find a permutation that successfully builds.

  • Compatible sets of packages become increasingly unlikely at scale

    The likelihood of finding a build plan rapidly diminishes as your dependency tree grows. Beyond a certain number of dependencies a build plan might not even exist, even if every dependency is maintained.

  • Package authors need to support multiple major versions of every dependency

    … in order to maximize the likelihood that downstream packages can find a successful build plan. Maintaining this backwards compatibility greatly increases their maintenance burden.

  • Package authors must test against multiple major versions of each dependency

    … in order to shield their users from build failures due to unexpected build plans. This means a large number of CI runs for every proposed change to the package, which slows down their development velocity.

  • Responsibility for fixing incompatibilities becomes diffuse

    Sometimes you need to depend on two packages (A and B) which transitively depend on incompatible versions of another package (C). Neither package A nor package B can be held responsible for fixing the problem unless there is a blessed version of package C.

These issues lead to a lot of wasted work, which scales exponentially with the number of dependencies. Consequently, software ecosystems that ignore the golden rule run into difficulties scaling dependency trees which people will work around in the following ways:

  • Culturally discouraging dependencies

  • Vendoring dependencies within their projects

  • Gravitating towards large and monolithic dependencies / frameworks

The fundamental problem

The golden rule is necessary because build plans do not compose for a locally coherent package manager. In other words, if you have a working build plan for package A and another working build plan for package B, you cannot necessarily combine those two build plans to generate a working build plan for a package that depends on both A and B. In particular, you definitely cannot combine the two build plans if A and B depend on incompatible versions of another package C.

However, build plans can also fail to compose for more subtle reasons. For example, you can depend on multiple packages whose build plans are all pairwise-compatible, but there still might not exist a build plan for the complete set of packages.

The good news is that you can trivially “weaken” a build plan, meaning that if you find a build plan that includes both packages A and B then you can downgrade that to a working build plan for just package A or just package B.

Consequently, the globally optimal thing to do is to find a working build plan that combines as many packages as possible, because then any subset of that build plan is still a working build plan. That ensures that any work spent fixing this larger build plan is not wasted and benefits everybody. Contrast that with work spent fixing the build for a single package (e.g. creating a lockfile), which does not benefit any other package (not even downstream packages, a.k.a. reverse dependencies).

Common sense?

Some people might view the golden rule of software distributions as common sense that doesn’t warrant a blog post, but my experiences with the Haskell ecosystem indicate otherwise. That’s because I began using Haskell seriously around 2011, four years before Stackage was first released.

Before Stackage, I ran into all of the problems described in the previous section because there was no blessed set of Haskell packages. In particular, the worst problem (for me) was the inability to find a working build plan for my projects.

This issue went on for years; basically everyone in the Haskell ecosystem (myself included) unthinkingly cargo-culted this as the way things were supposed to work. When things went wrong we blamed Cabal (e.g. “Cabal hell”) for our problems when the root of the problem had little to do with Cabal.

Stackage fixed all of that when Michael Snoyman essentially introduced the Haskell ecosystem to the golden rule of software distributions. Stackage works by publishing a set of blessed package versions for all of the packages vetted by Stackage and these packages are guaranteed to all build together. Periodically, Stackage publishes an updated set of blessed package versions.

After getting used to this, I quickly converted to this way of doing things, which seemed blindingly obvious in retrospect. Also, my professional career arc shifted towards DevOps, including managing upgrades and software distributions and I discovered that this was a fairly common practice for most large software distributions.

Why this rule is not intuitive

Actually, this insight is not as obvious as people might think. In fact, a person with a superficial understanding of how software ecosystems work might suspect that the larger a software ecosystem grows the more incoherent things get. However, you actually encounter the opposite phenomenon in practice: the larger a software ecosystem gets the more coherent things get (by necessity).

In fact, I still see people argue against the global coherence of software ecosystems, which indicates to me that this isn’t universally received wisdom. Sometimes they argue against global coherence directly (they believe coherence imposes an undue burden on maintainers or users) or they argue against global coherence indirectly (by positing incoherence as a foundation of a larger architectural pattern). Either way, I strongly oppose global incoherence, both for the theoretical reasons outlined in this post and also based on my practical experience managing dependencies in the pre-Stackage days of the Haskell ecosystem.

Indeed, many of people arguing against globally coherent software ecosystems are actually unwitting beneficiaries of global coherence. There is a massive amount of invisible work that goes on behind the scenes for every software distribution to create a globally coherent package set that benefits everybody (not just the users of those software distributions). For example, all software users benefit from the work that goes into maintaining the Debian, Arch, Nixpkgs, and Brew software distributions even if they don’t specifically use those software distributions or their associated package managers.

Conclusion

This whole post has one giant caveat, which is that all of the arguments assume that the packager manager is locally coherent, which is not always the case! In fact, there’s a post that proves that local coherence can be undesirable because it (essentially) makes dependency resolution NP complete. For more details, see:

I personally have mixed views on whether local coherence is good or bad. Right now I’m slightly on team “local coherence is good”, but my opinions on that are not fully formed, yet.

That said, most package managers tend to require or at least benefit from local coherence so in practice most software distributions also require global coherence. For example, Haskell’s build tooling basically requires global coherence (with some caveats I won’t go into), so global coherence is a good thing for the Haskell ecosystem.

Tuesday, May 3, 2022

Why does Haskell's take function accept insufficient elements?

Why does Haskell's take function accept insufficient elements?

This post is a long-form response to a question on Reddit, which asked:

I just started learning haskell, and the take function confuses me.

e.g take 10 [1,2,3,4,5] will yield [1,2,3,4,5]

How does it not generate an error ?

… and I have enough to say on this subject that I thought it would warrant a blog post rather than just a comment reply.

The easiest way to answer this question is to walk through all the possible alternative implementations that can fail when not given enough elements.

Solution 0: Output a Maybe

The first thing we could try would be to wrap the result in a Maybe, like this:

safeTake :: Int -> [a] -> Maybe [a]
safeTake 0      _   = Just []
safeTake n      []  = Nothing
safeTake n (x : xs) = fmap (x :) (safeTake (n - 1) xs)
>>> safeTake 3 [0..]
Just [0,1,2]

>>> safeTake 3 []
Nothing

The main deficiency with this approach is that it is insufficiently lazy. The result will not produce a single element of the output list until safeTake finishes consuming the required number of elements from the input list.

We can see the difference with the following examples:

>>> oops = 1 : 2 : error "Too short"

>>> take 1 (take 3 oops)
[1]

>>> safeTake 1 =<< safeTake 3 oops
*** Exception: Too short

Solution 1: Fail with error

Another approach would be to create a partial function that fails with an error if we run out of elements, like this:

partialTake :: Int -> [a] -> [a]
partialTake 0      _   = []
partialTake n (x : xs) = x : partialTake (n - 1) xs
>>> partialTake 3 [0..]
[0,1,2]

>>> partialTake 3 []
*Main> partialTake 3 []
*** Exception: Test.hs:(7,1)-(8,51): Non-exhaustive patterns in function partialTake

>>> partialTake 1 (partialTake 3 oops)
[1]

Partial functions like these are undesirable, though, so we won’t go with that solution.

Solution 2: Use a custom list-like type

Okay, but what if we could store a value at the end of the list indicating whether or not the take succeeded. One way we could do that would be to define an auxiliary type similar to a list, like this:

{-# LANGUAGE DeriveFoldable #-}

data ListAnd r a = Cons a (ListAnd r a) | Nil r deriving (Foldable, Show)

… where now the empty (Nil) constructor can store an auxiliary value. We can then use this auxiliary value to indicate to the user whether or not the function succeeded or not:

data Result = Sufficient | Insufficient deriving (Show)

takeAnd :: Int -> [a] -> ListAnd Result a
takeAnd 0      _   = Nil Sufficient
takeAnd n      []  = Nil Insufficient
takeAnd n (x : xs) = Cons x (takeAnd (n - 1) xs)
>>> takeAnd 3 [0..]
Cons 0 (Cons 1 (Cons 2 (Nil Sufficient)))

>>> takeAnd 3 []
Nil Insufficient

Also, the ListAnd type derives Foldable, so we can recover the old behavior by converting the ListAnd Result a type into [a] using toList:

>>> import Data.Foldable (toList)

>>> toList (takeAnd 3 [0..])
[0,1,2]

>>> toList (takeAnd 3 [])
[]

>>> toList (takeAnd 1 (toList (takeAnd 3 oops)))
[1]

This is the first total function that has the desired laziness characteristics, but the downside is that the take function now has a much weirder type. Can we solve this only using existing types from base?

Solution 3: Return a pair

Well, what if we were to change the type of take to return an pair containing an ordinary list alongside a Result, like this:

takeWithResult :: Int -> [a] -> ([a], Result)
takeWithResult 0      _   = (    [], Sufficient  )
takeWithResult n      []  = (    [], Insufficient)
takeWithResult n (x : xs) = (x : ys, result      )
  where
    (ys, result) = takeWithResult (n - 1) xs
>>> takeWithResult 3 [0..]
([0,1,2],Sufficient)
>>> takeWithResult 3 []
([],Insufficient)

Now we don’t need to add this weird ListAnd type to base, and we can recover the old behavior by post-processing the output using fst:

>>> fst (takeWithResult 3 [0..])
[0,1,2]
fst (takeWithResult 3 [])
[]

… and this also has the right laziness characteristics:

>>> fst (takeWithResult 1 (fst (takeWithResult 3 oops)))
[1]

… and we can replace Result with a Bool if want a solution that depends solely on types from base:

takeWithResult :: Int -> [a] -> ([a], Bool)
takeWithResult 0      _   = ([], True)
takeWithResult n      []  = ([], False)
takeWithResult n (x : xs) = (x : ys, result)
  where
    (ys, result) = takeWithResult (n - 1) xs

However, even this solution is not completely satisfactory. There’s nothing that forces the user to check the Bool value before accessing the list, so this is not as safe as, say, the safeTake function which returns a Maybe. The Bool included in the result is more of an informational value rather than a safeguard.

Conclusion

So the long-winded answer to the original question is that there are several alternative ways we could implement take that can fail if the input list is too small, but in my view each of them has their own limitations.

This is why I think Haskell’s current take function is probably the least worst of the alternatives, even if it’s not the safest possible implementation.

Sunday, May 1, 2022

Introductory resources to type theory for language implementers

Introductory resources to type theory for language implementers

This post briefly tours resources that helped introduce me to type theory, because I’m frequently asked by others for resources on this subject (even though I never had a formal education in type theory). Specifically, these resources will focus more on how to implement a type checker or type inference algorithm.

Also, my post will be biased against books, because I don’t tend to learn well from reading books. That said, I will mention a few books that I’ve heard others recommend, even if I can’t personally vouch for them.

What worked for me

The first and most important resource that I found useful was this one:

The reason why is because that paper shows logical notation side-by-side with Haskell code. That juxtaposition served as a sort of “Rosetta stone” for me to understand the correspondence between logical judgments and code. The paper also introduces some type theory basics (and dependent types!).

Along similar lines, another helpful resource was:

… which, as the name suggests, walks through a Haskell program to type-check Haskell code. This paper along with the preceding one helped bring type checkers “down to earth” to me by showing how there wasn’t any magic or secret sauce to implementing a type checker.

After that, the next thing that helped me improve my understanding was learning about pure type systems. Specifically, this paper was a very clear introduction to pure type systems:

You can think of pure type systems as sort of a “framework” for specifying type systems or talking about them. For example, the simply typed lambda calculus, System F, System FΩ, and the calculus of constructions are some example type systems that you’ll hear the literature refer to, and they’re all special cases of this general framework. You can think of pure type systems as generalizing the lambda cube.

However, none of the above resources introduce how to implement a type system with “good” type inference. To elaborate on that, many simple type systems can infer the types of program outputs from program inputs, but cannot work “in reverse” and infer the types of inputs from outputs. Hindley Milner type inference is one example of a “good” type inference algorithm that can work in reverse.

However, I never learned Hindley Milner type inference all that well, because I skipped straight to bidirectional type checking, which is described in this paper:

I prefer bidirectional type checking because (in my limited experience) it’s easier to extend the bidirectional type checking algorithm with new language features (or, at least, easier than extending Hindley Milner with the same language features).

The other reason I’m a fan of bidirectional type checking is that many cutting edge advances in research slot in well to a bidirectional type checker or even explicitly present their research using the framework of bidirectional type checking.

Books

Like I mentioned, I didn’t really learn that much from books, but here are some books that I see others commonly recommend, even if I can’t personally vouch for them:

Example code

I also created a tutorial implementation of a functional programming language that summarizes everything I know about programming language theory so far, which is my Fall-from-Grace project:

This project is a clean reference implementation of how to implement an interpreted langauge using (what I believe are) best practices in the Haskell ecosystem.

I also have a longer post explaining the motivation behind the above project:

Conclusion

Note that these are not the only resources that I learned from. This post only summarizes the seminal resources that greatly enhanced my understanding of all other resources.

Feel free to leave a comment if you have any other resources that you’d like to suggest that you feel were helpful in this regard.