Wednesday, April 21, 2021

The end of history for programming

history

I spend quite a bit of time thinking about what the end of history for programming might look like. By the “end of history” I mean the point beyond which programming paradigms would not evolve significantly.

I care about programming’s “destiny” because I prefer to work on open source projects that bring us closer to that final programming paradigm. In my experience, this sort of work has greater longevity, higher impact, and helps move the whole field of programming forward.

So what would the end of history look like for programming? Is it:

  • … already here?

    Some people treat programming as a solved problem and view all new languages and paradigms as reskins of old languages or paradigms. From their point of view all that remains is to refine our craft by slowly optimizing things, weeding out bugs or addressing non-technical issues like people management or funding.

    I personally do not subscribe to this philosophy because I believe, at a very minimum, that functional programming paradigms will slowly displace object-oriented and imperative programming paradigms (although functional programming might not necessarily be the final programming paradigm).

  • … artificial intelligence?

    Maybe machines will translate our natural language instructions into code for us, relieving us of the burden of precisely communicating our intent. Or maybe some sufficiently smart AI-powered IDE will auto-complete most of our program for us.

    I don’t believe this either, and I think Dijkstra did an excellent job of dismantling this line of reasoning in his essay: On the foolishness of “natural language programming”.

Truthfully, I’m not certain what is the correct answer, but I’ll present my own guess for what the end of history for programming looks like.

Mathematical DSLs

My view is that the next logical step for programming is to split into two non-overlapping programming domains:

  • runtime building for …
  • … mathematical programming languages

Specifically, I expect programming languages to evolve to become more mathematical in nature where the programs users author to communicate their intent resemble pure mathematical expressions.

For example, consider the following mathematical specifications of the boolean logical “and” operator and the function composition operator:

True  && x     = x
x     && True  = x
False && False = False

(f . g)(x) = f(g(x))

These mathematical specifications are also executable Haskell code (albeit with extra parentheses to resemble mainstream function syntax). Haskell is one example of a programming language where the code aspires to resemble pure mathematical expressions and definitions.

Any language presenting such an idealized mathematical interface requires introducing a significant amount of complexity “under the hood” since the real world is messy. This is where runtime building comes into play to gloss over such ugly details.

In other words, I’m predicting that the end of history for programming is to become an interface bridging pure mathematical expressions to the real world.

The past and present

Let me give a few examples of where this trend towards mathematical userland code is already playing out:

  • Memory management

    Manual memory management used to be a “userland” concern for most programming languages, but the general trend for new languages is automatic memory management (with the exception of Rust). Memory management used to be an explicit side effect that programmers had to care about and pushing memory management down into the runtime (via garbage collection or other methods) has made programming languages more pure, allowing them to get one step closer to idealized mathematical expressions.

    Indeed, Rust is the exception that proves the rule, as Rust is widely viewed as better suited for building runtimes rather than being used for high-level specification of intent.

  • Functional programming (especially purely functional programming)

    Functional programming is a large step towards programming in a more mathematical style that prefers:

    • expressions over statements
    • pure functions over side effects
    • algebraic datatypes over objects
    • recursion over loops

    I cover this in more detail in Why I prefer functional programming.

    However, functional programming is not a free win. Supporting higher-order functions and closures efficiently isn’t easy (especially for compiled languages), which is why less sophisticated language implementations tend to be more imperative and less functional.

  • Evaluation order (especially laziness)

    I’ve always felt that “lazy evaluation” doesn’t do a good job of selling the benefits of Haskell’s evaluation model. I prefer to think of Haskell as having “automatic evaluation management”1. In other words, the programmer specifies the expressions as a graph of dependent computations and the runtime figures out the most efficient order in which to reduce the graph.

    This is yet another example of where we push something that used to be a userland concern (order of evaluation) into the runtime. Glossing over evaluation order frees us to specify things in a more mathematical style, because in mathematics the evaluation order is similarly irrelevant.

The present and future

A common pattern emerges when we study the above trends:

  • Push a userland concern into a runtime concern, which:
  • … makes programs more closely resemble pure mathematical expressions, and:
  • … significantly increases the complexity of the runtime.

You might wonder: what are some other userland concerns that might eventually get pushed into runtime concerns in the near future? Some examples I can think of are:

  • Package management

    This one is so near in the future that it’s already happening (see: Nix and Dhall). Both languages provide built-in support for fetching code, rather than handling packages out-of-band using a standalone package management tool. This language-level support allows programs to embed external code as if it were a pure sub-expression, more closely approximating the mathematical ideal.

  • Error handling

    This one requires more explanation: I view type systems as the logical conclusion of pushing error handling into the “runtime” (actually, into the type-checker, not the runtime, to be pedantic). Dhall is an example of a language that takes this idea to the extreme: Dhall has no userland support for raising or catching errors because all errors are type errors2.

    Advances in dependently-typed and total functional programming languages get us closer to this ideal of pushing error handling into a runtime concern.

  • Logging

    I’m actually surprised that language support for pervasively logging everything hasn’t been done already (or maybe it has happened and I missed it). It seems like a pretty mundane thing languages could implement, especially for application domains that are not performance-sensitive.

    Many languages already support profiling, and it seems like it wouldn’t be a big leap to turn profiling support into logging support if the user is willing to spend their performance budget on logging.

    Logging is one of those classic side effects that is an ugly detail that “ruins” code that would have otherwise been pure and mathematical.

  • Services

    Service-oriented architectures are another thing that tends to get in the way of writing pure side-effect-free code.

    I’m not exactly sure what a service-oriented language runtime would look like, but I don’t think current “serverless” solutions are what I have in mind. Something like AWS Lambda is still too low-level to promote code that is mathematical in nature. Like, if any part of the programming process involves using a separate tool to deploy or manage the serverless code then that is a significant departure from authoring pure mathematical expressions. There needs to be something like a “Nix or Dhall for serverless code”.

Conclusion

You might criticize my prediction by claiming that it’s not falsifiable. It seems like I could explain away any new development in programming as falling into the runtime building or mathematical expression category.

That’s why I’d like to stress a key pillar of the prediction: runtimes and mathematical expressions will become more sharply divided over time. This is the actual substance of the prediction and we can infer a few corrolaries from that prediction.

Currently, many mainstream programming paradigms and engineering organizations conflate the two responsibilities, so you end up with people authoring software projects that mix operational logic (runtime concerns) and “business logic” (mathematical intent).

What I predict will happen is that the field of engineering will begin to generate a sharp demand for people with experience in programming language theory or programming language engineering. These people will be responsible for building special-purpose languages and runtimes that abstract away as many operational concerns as possible to support pure mathematical domain-specific languages for their respective enterprise. These languages will in turn be used by a separate group of people whose aim is to translate human intent into mathematical expressions.

One consequence of this prediction is that you’ll begin to see a Cambrian explosion of programming languages in the near future. Naturally as language engineers push more operational concerns into the runtime they will need to more closely tailor the runtime to specific purposes or organizational needs rather than trying to author a general-purpose language. In other words, there will be a marked fragmentation of language runtimes (and type-checkers) as each new language adapts to their respective niche.

Despite runtime fragmentation, you will see the opposite trend in userland code: programs authored in these disparate languages will begin to more closely resemble one another as they become more mathematical in nature. In a sense, mathematical expressions will become the portable “lingua franca” of userland code, especially as non-mathematical concerns get pushed into each respective language’s runtime.

That is a prediction that is much easier to falsify.

Also, if you like this post then you will probably also enjoy the seminal paper: The Next 700 Programming Languages.


  1. Also, this interpretation is not that far from the truth as the Haskell standard only specifies a non-strict evaluation strategy. GHC is lazy, but Haskell the language standard does not require implementations to be lazy. For more details, see: Lazy vs. non-strict↩︎

  2. Okay, this is a bit of an over-simplification because Dhall has support for Optional values and you can model errors using unions in Dhall, but they are not commonly used in this way and idiomatically most Dhall code in the wild uses the type-checker to catch errors. Dhall is a total functional programming language and the language goes to significant lengths to discourage runtime errors, like forbidding comparing Text values for equality. Also, the language is technically dependently typed and supports testing arbitrary code at type-checking time to catch errors statically.↩︎

Thursday, April 8, 2021

How to replace Proxy with AllowAmbiguousTypes

no-proxy

This post is essentially a longer version of this Reddit post by Ashley Yakeley that provides more explanation and historical context.

Sometimes in Haskell you need to write a function that “dispatches” only on a type and not on a value of that type. Using the example from the above post, we might want to write a function that, given an input type, prints the name of that type.

Approach 1 - undefined

One naive approach would be to do this:

class TypeName a where
    typeName :: a -> String

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

… which we could use like this:

>>> typeName False
"Bool"
>>> typeName (0 :: Int)
"Int"

However, this approach does not work well because we must provide the typeName method with a concrete value of type a. Not only is this superfluous (we don’t care which value we supply) but in some cases we might not even be able to supply such a value.

For example, consider this instance:

import Data.Void (Void)

instance TypeName Void where
    typeName _ = "Void"

There is a perfectly valid name associated with this type, but we cannot retrieve the name without cheating because we cannot produce a (total) value of type Void. Instead, we have to use something like undefined:

>>> typeName (undefined :: Void)
"Void"

The base package uses this undefined-based approach. For example, Foreign.Storable provides a sizeOf function that works just like this:

class Storable a where
    -- | Computes the storage requirements (in bytes) of the argument. The value
    -- of the argument is not used.
    sizeOf :: a -> Int

… and to this day the idiomatic way to use sizeOf is to provide a fake value using undefined:

>>> sizeOf (undefined :: Bool)
4

This works because sizeOf never evaluates its argument. It’s technically safe, albeit not very appealing to depend on undefined.

Approach 2A - Proxy

The next evolution of this approach was to use the Proxy type (now part of base in the Data.Proxy module). As the documentation notes:

Historically, Proxy :: Proxy a is a safer alternative to the undefined :: a idiom.

I’m not exactly sure what the name Proxy was originally meant to convey, but I believe the intention was that a term (the Proxy constructor) stands in as a “proxy” for a type (specifically, the type argument to the Proxy type constructor).

We can amend our original example to use the Proxy type like this:

import Data.Proxy (Proxy(..))
import Data.Void (Void)

class TypeName a where
    typeName :: Proxy a -> String

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

instance TypeName Void where
    typeName _ = "Void"

… and now we can safely get the name of a type without providing a specific value of that type. Instead we always provide a Proxy constructor and give it a type annotation which “stores” the type we wish to use:

>>> typeName (Proxy :: Proxy Bool)
"Bool"
>>> typeName (Proxy :: Proxy Int)
"Int"
>>> typeName (Proxy :: Proxy Void)
"Void"

We can simplify that a little bit by enabling the TypeApplications language extension, which permits us to write this:

>>> :set -XTypeApplications
>>> typeName (Proxy @Bool)
"Bool"
>>> typeName (Proxy @Int)
"Int"
>>> typeName (Proxy @Void)
"Void"

… or this if we prefer:

>>> typeName @Bool Proxy
"Bool"
>>> typeName @Int Proxy
"Int"
>>> typeName @Void Proxy
"Void"

Approach 2B - proxy

A minor variation on the previous approach is to use proxy (with a lowercase “p”) in the typeclass definition:

import Data.Void (Void)

class TypeName a where
    typeName :: proxy a -> String
    --          ↑

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

instance TypeName Void where
    typeName _ = "Void"

Everything else works the same, but now neither the author nor the consumer of the typeclass needs to depend on the Data.Proxy module specifically. For example, the consumer could use any other type constructor just fine:

>>> typeName ([] :: [Int])  -- Technically legal, but weird
"Int"

… or (more likely) the consumer could define their own Proxy type to use instead of the one from Data.Proxy, which would also work fine:

>>> data Proxy a = Proxy
>>> typeName (Proxy :: Proxy Int)
"Int"

This trick helped back when Proxy was not a part of the base package. Even now that Proxy is in base you still see this trick when people author typeclass instances because it’s easier and there’s no downside.

Both of these Proxy-based solutions are definitely better than using undefined, but they are both still a bit unsatisfying because we have to supply a Proxy argument to specify the desired type. The ideal user experience should only require the type and the type alone as an input to our function.

Approach 3 - AllowAmbiguousTypes + TypeApplications

We previously noted that we could shorten the Proxy-based solution by using TypeApplications:

>>> typeName @Bool Proxy
"Bool"

Well, what if we could shorten things even further and just drop the Proxy, like this:

>>> typeName @Bool

Actually, we can! This brings us to a more recent approach (the one summarized in the linked Reddit post), which is to use AllowAmbiguousTypes + TypeApplications, like this:

{-# LANGUAGE AllowAmbiguousTypes #-}

import Data.Void (Void)

class TypeName a where
    typeName :: String

instance TypeName Bool where
    typeName = "Bool"

instance TypeName Int where
    typeName = "Int"

instance TypeName Void where
    typeName = "Void"

… which we can invoke like this:

>>> :set -XTypeApplications
>>> typeName @Bool
"Bool"
>>> typeName @Int
"Int"
>>> typeName @Void
"Void"

The use of TypeApplications is essential, since otherwise GHC would have no way to infer which typeclass instance we meant. Even a type annotation would not work:

>>> typeName :: String  -- Clearly, this type annotation is not very helpful

<interactive>:1:1: error:
Ambiguous type variable ‘a0’ arising from a use of ‘typeName’
      prevents the constraint ‘(TypeName a0)’ from being solved.
      Probable fix: use a type annotation to specify what ‘a0’ should be.
      These potential instances exist:
        instance [safe] TypeName Void -- Defined at Example.hs:14:10
        instance [safe] TypeName Bool -- Defined at Example.hs:8:10
        instance [safe] TypeName Int -- Defined at Example.hs:11:10
In the expression: typeName :: String
      In an equation for ‘it’: it = typeName :: String

Type applications work here because you can think of a polymorphic function as really having one extra function argument: the polymorphic type. I elaborate on this a bit in my post on Polymorphism for dummies, but the basic idea is that TypeApplications makes this extra function argument for the type explicit. This means that you can directly tell the compiler which type to use by “applying” the function to the right type instead of trying to indirectly persuade the compiler into using the the right type with a type annotation.

Conclusion

My personal preference is to use the last approach with AllowAmbiguousTypes and TypeApplications. Not only is it more concise, but it also appeals to my own coding aesthetic. Specifically, guiding compiler behavior using type-annotations feels more like logic programming to me and using explicit type abstractions and TypeApplications feels more like functional programming to me (and I tend to prefer functional programming over logic programming).

However, the Proxy-based approach requires no language extensions, so that approach might appeal to you if you prefer to use the simplest subset of the language possible.