Sunday, January 31, 2021

Dynamic type errors lack relevance

Dynamic type errors lack relevance

Proponents of statically typed languages commonly motivate types as a way to safely detect bugs ahead of time. For example, consider the following Python program that attempts to increment a number stored in counter.txt:

# ./increment.py

with open('counter.txt', 'r') as handle:
    x = handle.readline()

with open('counter.txt', 'w') as handle:
    handle.write(int(x) + 1)

This program contains a type error, but by the time we find out it’s too late: our program will have already wiped the contents of counter.txt by opening the file as a writable handle:

$ echo -n '0' > ./counter.txt

$ cat counter.txt
0

$ python increment.py
Traceback (most recent call last):
  File "increment.py", line 5, in <module>
    handle.write(int(x) + 1)
TypeError: expected a string or other character buffer object

$ cat counter.txt  # The contents of the file were lost

Defenders of dynamically typed languages sometimes counter that these pitfalls do not matter when runtime failures are mostly harmless. If you want to find errors in your program, just run the program!

As an extreme example, Nix is a purely functional language with a dynamic type system, and you can safely interpret a Nix program ahead of time to detect errors since Nix evaluation has no side effects1. Consequently, Nix proponents sometimes reason that these dynamic type errors are functionally indistinguishable from static type errors thanks to Nix’s purity.

However, dynamic types are not a substitute for static types, even in a purely functional language like Nix. To see why, consider the following Nix expression, which attempts to render a structured value as command-line options:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
      ];
    }

The intention was to produce this result:

[ "--option" "max-jobs=5" "--option" "cores=4" "--option" "fallback=true" ]

… but we actually get a dynamic type error when we interpret the expression:

$ nix-instantiate --eval options.nix --strict
error: evaluation aborted with the following error message: 'generators.mkValueStringDefault: functions not supported: <λ>'

This error message is not very helpful, and it’s not due to a lack of effort, funding, or attention. This sort of poor user experience is inherent to any dynamic type system.

The fundamental issue is that in a dynamically typed language you cannot explain errors to the user in terms of the source code they wrote. In other words, dynamic type errors commonly fail to be relevant to the user.

For example, if Nix had a typical static type system, then the diagnostic might have looked something like this:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
#       ~~~~~~
#       This element of the list is not a string
      ];
    }

This sort of diagnostic helps us more easily discern that we forgot to parenthesize (enable "fallback"), so the enable function is treated as another list element.

In a dynamic type system, type errors can potentially be far removed from the code that the user wrote. From Nix’s point of view, the actual error is that somewhere in the middle of interpretation it is trying to apply a mkValueStringDefault utility function to the user’s exclaim function:

mkValueStringDefault enable

… but by that point the Nix interpreter is no longer “thinking” in terms of the original program the user wrote, so any interpreter diagnostics will have difficulty explaining the error in terms that the user can understand. For example:

  • In the middle of interpretation any offending subexpressions are abstract syntax trees, not source code

  • Some of these abstract syntax trees may be functions or closures that cannot be (easily) displayed to the user

    We see this above where the error message is unable to render the enable function so it falls back to displaying <λ>.

  • Intermediate evaluation results might not correspond to the source code at all

    For example, the user might not understand where mkValueStringDefault is originating from in the absence of a stack trace.

  • Even if we could trace subexpressions to their original source code the user still might not be able to work backwards from the dynamic type error to the real problem.

    In other words, even if we showed the user the call site for the mkValueStringDefault function they still wouldn’t necessarily understand why exclaim is the function argument.

In fact, the example error message came out better than I expected. The reason why is because somebody took the time to add a custom error message to the mkValueStringDefault utility instead of falling back on the interpreter throwing a dynamic type error:

  mkValueStringDefault = {}: v: with builtins;
    let err = t: v: abort
          ("generators.mkValueStringDefault: " +
           "${t} not supported: ${toPretty {} v}");
    in

Had they not done so then the error message would have been even further disconnected from the user’s experience. This only reinforces that the relevance of error messages is inversely proportional to the extent to which we avail ourselves of the dynamic type system.

This is why I prefer to lean on static type systems as much as possible to detect errors, because they tend to do a better job of “explaining” what went wrong than dynamic type systems.

Note: The criticisms in this post also apply to exceptions in general (where you can view dynamic types as a special case of exceptions auto-generated by the interpreter). Exceptions also need to be supplemented by stack traces, logging, or debuggers in order to improve their relevance.


  1. Technically, Nix evaluation can trigger builds via “import from derivation”. However, with appropriate sandboxing even builds are mostly harmless. Either way, just assume for the purpose of discussion that Nix evaluation is safe. After all, any unsafety in evaluation only makes the case for static types even stronger.↩︎

Monday, January 4, 2021

The visitor pattern is essentially the same thing as Church encoding

church-encoding

This post explains how the visitor pattern is essentially the same thing as Church encoding (or Böhm-Berarducci encoding). This post also explains how you can usefully employ the visitor pattern / Church encoding / Böhm-Berarducci encoding to expand your programming toolbox.

Background

Church encoding is named after Alonzo Church, who discovered that you could model any type of data structure in the untyped lambda calculus using only functions. The context for this was that he was trying to show that lambda calculus could be treated as a universal computational engine, even though the only features it supported were functions.

Note: Later on, Corrado Böhm and Alessandro Berarducci devised the equivalent solution in a typed lambda calculus (specifically, System F):

… so I’ll use “Church encoding” when talking about this trick in the context of an untyped language and use “Böhm-Berarducci” encoding when talking about the same trick in the context of a typed language. If we’re not talking about any specific language then I’ll use “Church encoding”.

In particular, you can model the following types of data structures using language support for functions and nothing else:

  • records / structs (known as “product types” if you want to get fancy)

    The “product” of two types A and B is a type that stores both an A and a B (e.g. a record with two fields, where the first field has type A and the second has type B)

  • enums / tagged unions (known as “sum types”)

    The “sum” of two types A and B is a type that stores either an A or a B. (e.g. a tagged union where the first tag stores a value of type A and the second tag stores a value of type B)

  • recursive data structures

… and if you can precisely model product types, sum types, and recursion, then you can essentially model any data structure. I’m oversimplifying things, but that’s close enough to true for our purposes.

Relevance

The reason we care about Church-encoding is because not all programming languages natively support sum types or recursion (although most programming languages support product types in the form of records / structs).

However, most programming languages do support functions, so if we have functions then we can use them as a “backdoor” to introduce support for sum types or recursion into our language. This is the essence of the visitor pattern: using functions to Church-encode sum types or recursion into a language that does not natively support sum types or recursion.

To illustrate this, suppose that we begin from the following Haskell code:

data Shape
    = Circle{ x :: Double, y :: Double, r :: Double }
    | Rectangle{ x :: Double, y :: Double, w :: Double, h :: Double }

exampleCircle :: Shape
exampleCircle = Circle 2.0 1.4 4.5

exampleRectangle :: Shape
exampleRectangle = Rectangle 1.3 3.1 10.3 7.7

area :: Shape -> Double
area shape = case shape of
    Circle    x y r   -> pi * r ^ 2
    Rectangle x y w h -> w * h

main :: IO ()
main = do
    print (area exampleCircle)
    print (area exampleRectangle)

… but then we hypothetically disable Haskell’s support for algebraic data types. How would we amend our example to still work in such a restricted subset of the language?

We’d use Böhm-Berarducci encoding (the typed version of Church-encoding), and the solution would look like this:

{-# LANGUAGE RankNTypes #-}

-- | This plays the same role as the old `Shape` type
type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

-- | This plays the same role as the old `Circle` constructor
_Circle :: Double -> Double -> Double -> Shape
_Circle x y r = \_Circle _Rectangle -> _Circle x y r

-- | This plays the same role as the old `Rectangle` constructor
_Rectangle :: Double -> Double -> Double -> Double -> Shape
_Rectangle x y w h = \_Circle _Rectangle -> _Rectangle x y w h

exampleCircle :: Shape
exampleCircle = _Circle 2.0 1.4 4.5

exampleRectangle :: Shape
exampleRectangle = _Rectangle 1.3 3.1 10.3 7.7 

area :: Shape -> Double
area shape = shape
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

main :: IO ()
main = do
    print (area exampleCircle)
    print (area exampleRectangle)

The key is the new representation of the Shape type, which is the type of a higher-order function. In fact, if we squint we might recognize that the Shape type synonym:

type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

… looks an awful lot like a GADT-style definition for the Shape type:

{-# LANGUAGE GADTs #-}

data Shape where
    Circle    :: Double -> Double -> Double -> Shape
    Rectangle :: Double -> Double -> Double -> Double -> Shape

This is not a coincidence! Essentially, Böhm-Berarducci encoding models a type as a function that expects each “constructor” as a function argument that has the same type as that constructor. I put “constructor” in quotes since we never actually use a real constructor. Those function arguments are place-holders that will remain abstract until we attempt to “pattern match” on a value of type Shape.

In the area function we “pattern match” on Shape by supplying handlers instead of constructors. To make this explicit, let’s use equational reasoning to see what happens when we evaluate area exampleCircle:

area exampleCircle

-- Substitute the `area` function with its definition
= exampleCircle
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Substitute `exampleCircle` with its definition
= _Circle 2.0 1.4 4.5
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Substitute the `_Circle` function with its definition
= (\_Circle _Rectangle -> _Circle 2.0 1.4 4.5)
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Evaluate the outer-most anonymous function
= (\x y r -> pi * r ^ 2) 2.0 1.4 4.5

-- Evaluate the anonymous function
= pi * 4.5 ^ 2

In other words, Church encoding / Böhm-Berarducci encoding both work by maintaining a fiction that eventually somebody will provide us the “real” constructors right up until we actually need them. Then when we “pattern match” on the value we pull a last-minute bait-and-switch and use each “handler” of the pattern match where the constructor would normally go and everything works out so that we don’t need the constructor after all. Church-encoding is sort of like the functional programming equivalent of “fake it until you make it”.

The same trick works for recursive data structures as well. For example, the way that we Böhm-Berarducci-encode this Haskell data structure:

data Tree = Node Int Tree Tree | Leaf

exampleTree :: Tree
exampleTree = Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf)

preorder :: Tree -> [Int]
preorder tree = case tree of
    Node value left right -> value : preorder left ++ preorder right
    Leaf                  -> []

main :: IO ()
main = print (preorder exampleTree)

… is like this:

{-# LANGUAGE RankNTypes #-}

type Tree = forall tree
    .  (Int -> tree -> tree -> tree)  -- Node :: Int -> Tree -> Tree -> Tree
    -> tree                           -- Leaf :: Tree
    -> tree

_Node :: Int -> Tree -> Tree -> Tree
_Node value left right =
    \_Node _Leaf -> _Node value (left _Node _Leaf) (right _Node _Leaf)

_Leaf :: Tree
_Leaf = \_Node _Leaf -> _Leaf

exampleTree :: Tree
exampleTree = _Node 1 (_Node 2 _Leaf _Leaf) (_Node 3 _Leaf _Leaf)

preorder :: Tree -> [Int]
preorder tree = tree
    (\value left right -> value : left ++ right)
    []

main :: IO ()
main = print (preorder exampleTree)

This time the translation is not quite as mechanical as before, due to the introduction of recursion. In particular, two differences stand out.

First, the way we encode the _Node constructor is not as straightforward as we thought:

_Node :: Int -> Tree -> Tree -> Tree
_Node value left right =
    \_Node _Leaf -> _Node value (left _Node _Leaf) (right _Node _Leaf)

This is because we need to thread through the _Node / _Leaf function arguments through to the node’s children.

Second, the way we consume the Tree is also different. Compare the original code:

preorder :: Tree -> [Int]
preorder tree = case tree of
    Node value left right -> value : preorder left ++ preorder right
    Leaf                  -> []

… to the Böhm-Berarducci-encoded version:

preorder :: Tree -> [Int]
preorder tree = tree
    (\value left right -> value : left ++ right)
    []

The latter version doesn’t require the preorder function to recursively call itself. The preorder function is performing a task that is morally recursive but the preorder function is, strictly speaking, not recursive at all.

In fact, if we look at the Böhm-Berarducci-encoded solution closely we see that we never use recursion anywhere within the code! There are no recursive datatypes and there are also no recursive functions, yet somehow we still managed to encode a recursive data type and recursive functions on that type. This is what I mean when I say that Church encoding / Böhm-Berarducci encoding let you encode recursion in a language that does not natively support recursion. Our code would work just fine in a recursion-free subset of Haskell!

For example, Dhall is a real example of a language that does not natively support recursion and Dhall uses this same trick to model recursive data types and recursive functions:

That post goes into more detail about the algorithm for Böhm-Berarducci-encoding Haskell types, so you might find that post useful if the above examples were not sufficiently intuitive or clear.

Visitor pattern

The visitor pattern is a special case of Church encoding / Böhm Berarducci encoding. I’m not going to provide a standalone explanation of the visitor pattern since the linked Wikipedia page already does that. This section will focus on explaining the correspondence between Church encoding / Böhm-Berarducci encoding and the visitor pattern.

The exact correspondence goes like this. Given:

  • a Church-encoded / Böhm-Berarducci-encoded type T

    e.g. Shape in the first example

  • … with constructors C₀, C₁, C₂, …

    e.g. Circle, Rectangle

  • … and values of type T named v₀, v₁, v₂, …

    e.g. exampleCircle, exampleRectangle

… then the correspondence (using terminology from the Wikipedia article) is:

  • The “element” class corresponds to the type T

    e.g. Shape

  • A “concrete element” (i.e. an object of the “element” class) corresponds to a constructor for the type T

    e.g. Circle, Rectangle

    The accept method of the element selects which handler from the visitor to use, in the same way that our Church-encoded constructors would select one handler (named after the matching constructor) out of all the handler functions supplied to them.

    _Circle :: Double -> Double -> Double -> Shape
    _Circle x y r = \_Circle _Rectangle -> _Circle x y r
    
    _Rectangle :: Double -> Double -> Double -> Double -> Shape
    _Rectangle x y w h = \_Circle _Rectangle -> _Rectangle x y w h
  • A “visitor” class corresponds to the type of a function that pattern matches on a value of type T

    Specifically, a “visitor” class is equivalent to the following Haskell type:

    T -> IO ()

    This is more restrictive than Böhm-Berarducci encoding, which permits pattern matches that return any type of value, like our area function, which returns a Double. In other words, Böhm-Berarducci encoding is not limited to just performing side effects when “visiting” constructors.

    (Edit: Travis Brown notes that the visitor pattern is not restricted to performing side effects. This might be an idiosyncracy of how Wikipedia presents the design pattern)

  • A “concrete visitor” (i.e. an object of the “visitor” class) corresponds to a function that “pattern matches” on a value of type T

    e.g. area

    … where each overloaded visit method of the visitor corresponds to a branch of our Church-encoded “pattern match”:

    area :: Shape -> Double
    area shape = shape
        (\x y r   -> pi * r ^ 2)
        (\x y w h -> w * h)
  • The “client” corresponds to a value of type T

    e.g. exampleCircle, exampleRectangle:

    exampleCircle :: Shape
    exampleCircle = _Circle 2.0 1.4 4.5
    
    exampleRectangle :: Shape
    exampleRectangle = _Rectangle 1.3 3.1 10.3 7.7 

    The Wikipedia explanation of the visitor pattern adds the wrinkle that the client can represent more than one such value. In my opinion, what the visitor pattern should say is that the client can be a recursive value which may have self-similar children (like our example Tree). This small change would improve the correspondence between the visitor pattern and Church-encoding.

Limitations of Böhm-Berarducci encoding

Church encoding works in any untyped language, but Böhm-Berarducci encoding does not work in all typed languages!

Specifically, Böhm-Berarducci only works in general for languages that support polymorphic types (a.k.a. generic programming). This is because the type of a Böhm-Berarducci-encoded value is a polymorphic type:

type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

… but such a type cannot be represented in a language that lacks polymorphism. So what the visitor pattern commonly does to work around this limitation is to pick a specific result type, and since there isn’t a one-size-fits-all type, they’ll usually make the result a side effect, as if we had specialized the universally quantified type to IO ():

type Shape =
    .  (Double -> Double -> Double -> IO ())
    -> (Double -> Double -> Double -> Double -> IO ())
    -> IO ()

This is why Go has great difficulty modeling sum types accurately, because Go does not support polymorphism (“generics”) and therefore Böhm-Berarducci encoding does not work in general for introducing sum types in Go. This is also why people with programming language theory backgrounds make a bigger deal out of Go’s lack of generics than Go’s lack of sum types, because if Go had generics then people could work around the lack of sum types using a Böhm-Berarducci encoding.

Conclusions

Hopefully this gives you a better idea of what Church encoding and Böhm-Berarducci encoding are and how they relate to the visitor pattern.

In my opinion, Böhm-Berarducci encoding is a bigger deal in statically-typed languages because it provides a way to introduce sum types and recursion into a language in a type-safe way that makes invalid states unrepresentable. Conversely, Church encoding is not as big of a deal in dynamically-typed languages because a Church-encoded type is still vulnerable to runtime exceptions.