Saturday, February 6, 2021

Folds are constructor substitution

folds

I notice that functional programming beginners and experts understand the word “fold” to mean subtly different things, so I’d like to explain what experienced functional programmers usually mean when they use the term “fold”. This post assumes a passing familiarity with Haskell.

Overview

A “fold” is a function that replaces all constructors of a datatype with corresponding expressions. “fold”s are not limited to lists, linear sequences, or even containers; you can fold any inductively defined datatype.

To explain the more general notion of a “fold”, we’ll consider three representative data structures:

  • lists
  • Maybe values
  • binary trees

… and show how we can automatically derive the “one true fold” for each data structure by following the same general principle.

Lists

Many beginners understand the word “fold” to be a way to reduce some collection of values (e.g. a list) to a single value. For example, in Haskell you can add up the elements of a list like this:

sum :: [Int] -> Int
sum xs = foldr (+) 0 xs

… where sum reduces a sequence of Ints to a single Int by starting from an initial accumulator value of 0 and then “folding” each element of the list into the accumulator using (+).

Haskell’s standard library provides at least two fold functions named foldl and foldr, but only foldr is the “canonical” fold for a list. By “canonical” I mean that foldr is the only fold that works by substituting list constructors.

We can more easily see this if we define our own linked list type with explicitly named constructors:

data List a = Cons a (List a) | Nil

… where instead of writing a list as [ 1, 2, 3 ] we instead will write such a list as:

example :: List Int
example = Cons 1 (Cons 2 (Cons 3 Nil))

This is a very unergonomic representation for a list, but bear with me!

We can implement the “canonical” fold for the above List type as a function that takes two arguments:

  • The first argument (named cons) replaces all occurrences of the Cons constructor
  • The second argument (named nil) replaces all occurrences of the Nil constructor

The implementation of the canonical fold looks like this:

fold :: (a -> list -> list) -> list -> List a -> list
fold cons nil (Cons x xs) = cons x (fold cons nil xs)
fold cons nil  Nil        = nil

You might not necessarily follow how that implementation works, so a more direct way to appreciate how fold works is to see how the function behaves on some sample inputs:

-- The general case, step-by-step
fold cons nil (Cons x (Cons y (Cons z Nil)))
    = cons x (fold cons nil (Cons y (Cons z Nil)))
    = cons x (cons y (fold cons nil (Cons z Nil)))
    = cons x (cons y (cons z (fold cons nil Nil)))
    = cons x (cons y (cons z nil))

-- Add up the elements of the list, but skipping more steps this time
fold (+) 0 (Cons 1 (Cons 2 (Cons 3 Nil)))
  = (+) 1 ((+) 2 ((+) 3 0))
  = 1 + (2 + (3 + 0))
  = 6

-- Calculate the list length
fold (\_ n -> n + 1) 0 (Cons True (Cons False (Cons True Nil)))
  = (\_ n -> n + 1) True ((\_ n -> n + 1) False ((\_ n -> n + 1) True 0))
  = (\_ n -> n + 1) True ((\_ n -> n + 1) False 1)
  = (\_ n -> n + 1) True 2
  = 3

Notice that if we format the type of fold a bit we can see that the type of each argument to fold (sort of) matches the type of the corresponding constructor they replace:

fold
    :: (a -> list -> list)  -- Cons :: a -> List a -> List a
    -> list                 -- Nil  :: List a
    -> List a
    -> list

In the above type, list is actually a type variable and we could have used any name for that type variable instead of list, such as b. In fact, if we were to replace list with b, we would get essentially the same type as foldr for Haskell lists:

-- Our `fold` type, replacing `list` with `b`
fold
    :: (a -> b -> b)
    -> b
    -> List a
    -> b

-- Now compare that type to the `foldr` type from the Prelude:
foldr
    :: (a -> b -> b)
    -> b
    -> [a]
    -> b

We commonly use folds to reduce a List to a single scalar value, but folds are actually much more general-purpose than that and they can be used to transform one data structure into another data structure. For example, we can use the same fold function to convert our clumsy List type into the standard Haskell list type, like this:

fold (:) [] (Cons 1 (Cons 2 (Cons 3 Nil)))
    = (:) 1 ((:) 2 ((:) 3 []))
    = 1 : (2 : (3 : []))
    = [ 1, 2, 3 ]

Maybe

Folds are not limited to recursive data types. For example, here is the canonical fold for Haskell’s Maybe type, which is not recursive:

data Maybe a = Nothing | Just a

fold :: maybe -> (a -> maybe) -> Maybe a -> maybe
fold nothing just  Nothing  = nothing
fold nothing just (Just x ) = just x

In fact, this function already exists in Haskell’s standard library by the name of maybe:

maybe :: b -> (a -> b) -> Maybe a -> b
maybe n _ Nothing  = n
maybe _ f (Just x) = f x

Once you think of folds in terms of constructor substitution you can quickly spot these canonical folds for other types.

Binary trees

What about more complex data structures, like the following binary Tree type?

data Tree a = Node a (Tree a) (Tree a) | Leaf

This sort of fold is still straightforward to write, by applying the same principle of constructor substitution:

fold :: (a -> tree -> tree -> tree) -> tree -> Tree a -> tree
fold node leaf (Node x l r) = node x (fold node leaf l) (fold node leaf r)
fold node leaf  Leaf        = leaf

We only need to keep recursively descending over the Tree, replacing constructors as we go.

We can use this fold to reduce the Tree to a single value, like this:

-- Add up all the nodes in the tree
fold (\x l r -> x + l + r) 0 (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = (\x l r -> x + l + r) 1
        ((\x l r -> x + l + r) 2 0 0)
        ((\x l r -> x + l + r) 3 0 0)
    = (\x l r -> x + l + r) 1
        (2 + 0 + 0)
        (3 + 0 + 0)
    = (\x l r -> x + l + r) 1
        2
        3
    = 1 + 2 + 3
    = 6

… or we can use the same fold function to transform the Tree into another data structure, like a list:

-- List `Tree` elements in pre-order
fold (\x l r -> x : l ++ r) [] (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = (\x l r -> x : l ++ r) 1
        ((\x l r -> x : l ++ r) 2 [] [])
        ((\x l r -> x : l ++ r) 3 [] [])
    = (\x l r -> x : l ++ r) 1
        (2 : [] ++ [])
        (3 : [] ++ [])
    = (\x l r -> x : l ++ r) 1
        [2]
        [3]
    = (\x l r -> x : l ++ r) 1
        [2]
        [3]
    = 1 : [2] ++ [3]
    = [1, 2, 3]

… even use the fold to reverse the tree:

fold (\x l r -> Node x r l) Leaf (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = (\x l r -> Node x r l) 1
        ((\x l r -> Node x r l) 2 Leaf Leaf)
        ((\x l r -> Node x r l) 3 Leaf Leaf)
    = (\x l r -> Node x r l) 1
        (Node 2 Leaf Leaf)
        (Node 3 Leaf Leaf)
    = Node 1 (Node 3 Leaf Leaf) (Node 2 Leaf Leaf)

Generality

At this point you might be wondering: “what can’t a fold do?”. The answer is: you can do essentially anything with a fold, although it might not necessarily be the most efficient solution to your problem. You can think of a fold as the most general-purpose interface for consuming a data structure because the fold interface is a “lossless” way to process a data structure.

To see why a fold is a “lossless” interface, let’s revisit the fold function for Trees and this time we will pass in the Node and Leaf constructors as the inputs to the fold. In other words, we will replace all occurrences of Node with Node and replace all occurrences of Leaf with Leaf:

fold Node Leaf (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf)

This gives us back the original data structure, demonstrating how we always have the option for a fold to recover the original pristine input. This is what I mean when I say that a fold is a lossless interface.

Sunday, January 31, 2021

Dynamic type errors lack relevance

pure-exceptions

Proponents of statically typed languages commonly motivate types as a way to safely detect bugs ahead of time. For example, consider the following Python program that attempts to increment a number stored in counter.txt:

# ./increment.py

with open('counter.txt', 'r') as handle:
    x = handle.readline()

with open('counter.txt', 'w') as handle:
    handle.write(int(x) + 1)

This program contains a type error, but by the time we find out it’s too late: our program will have already wiped the contents of counter.txt by opening the file as a writable handle:

$ echo -n '0' > ./counter.txt

$ cat counter.txt
0

$ python increment.py
Traceback (most recent call last):
  File "increment.py", line 5, in <module>
    handle.write(int(x) + 1)
TypeError: expected a string or other character buffer object

$ cat counter.txt  # The contents of the file were lost

Defenders of dynamically typed languages sometimes counter that these pitfalls do not matter when runtime failures are mostly harmless. If you want to find errors in your program, just run the program!

As an extreme example, Nix is a purely functional language with a dynamic type system, and you can safely interpret a Nix program ahead of time to detect errors since Nix evaluation has no side effects1. Consequently, Nix proponents sometimes reason that these dynamic type errors are functionally indistinguishable from static type errors thanks to Nix’s purity.

However, dynamic types are not a substitute for static types, even in a purely functional language like Nix. To see why, consider the following Nix expression, which attempts to render a structured value as command-line options:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
      ];
    }

The intention was to produce this result:

[ "--option" "max-jobs=5" "--option" "cores=4" "--option" "fallback=true" ]

… but we actually get a dynamic type error when we interpret the expression:

$ nix-instantiate --eval options.nix --strict
error: evaluation aborted with the following error message: 'generators.mkValueStringDefault: functions not supported: <λ>'

This error message is not very helpful, and it’s not due to a lack of effort, funding, or attention. This sort of poor user experience is inherent to any dynamic type system.

The fundamental issue is that in a dynamically typed language you cannot explain errors to the user in terms of the source code they wrote. In other words, dynamic type errors commonly fail to be relevant to the user.

For example, if Nix had a typical static type system, then the diagnostic might have looked something like this:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
#       ~~~~~~
#       This element of the list is not a string
      ];
    }

This sort of diagnostic helps us more easily discern that we forgot to parenthesize (enable "fallback"), so the enable function is treated as another list element.

In a dynamic type system, type errors can potentially be far removed from the code that the user wrote. From Nix’s point of view, the actual error is that somewhere in the middle of interpretation it is trying to apply a mkValueStringDefault utility function to the user’s exclaim function:

mkValueStringDefault enable

… but by that point the Nix interpreter is no longer “thinking” in terms of the original program the user wrote, so any interpreter diagnostics will have difficulty explaining the error in terms that the user can understand. For example:

  • In the middle of interpretation any offending subexpressions are abstract syntax trees, not source code

  • Some of these abstract syntax trees may be functions or closures that cannot be (easily) displayed to the user

    We see this above where the error message is unable to render the enable function so it falls back to displaying <λ>.

  • Intermediate evaluation results might not correspond to the source code at all

    For example, the user might not understand where mkValueStringDefault is originating from in the absence of a stack trace.

  • Even if we could trace subexpressions to their original source code the user still might not be able to work backwards from the dynamic type error to the real problem.

    In other words, even if we showed the user the call site for the mkValueStringDefault function they still wouldn’t necessarily understand why exclaim is the function argument.

In fact, the example error message came out better than I expected. The reason why is because somebody took the time to add a custom error message to the mkValueStringDefault utility instead of falling back on the interpreter throwing a dynamic type error:

  mkValueStringDefault = {}: v: with builtins;
    let err = t: v: abort
          ("generators.mkValueStringDefault: " +
           "${t} not supported: ${toPretty {} v}");
    in

Had they not done so then the error message would have been even further disconnected from the user’s experience. This only reinforces that the relevance of error messages is inversely proportional to the extent to which we avail ourselves of the dynamic type system.

This is why I prefer to lean on static type systems as much as possible to detect errors, because they tend to do a better job of “explaining” what went wrong than dynamic type systems.

Note: The criticisms in this post also apply to exceptions in general (where you can view dynamic types as a special case of exceptions auto-generated by the interpreter). Exceptions also need to be supplemented by stack traces, logging, or debuggers in order to improve their relevance.


  1. Technically, Nix evaluation can trigger builds via “import from derivation”. However, with appropriate sandboxing even builds are mostly harmless. Either way, just assume for the purpose of discussion that Nix evaluation is safe. After all, any unsafety in evaluation only makes the case for static types even stronger.↩︎

Monday, January 4, 2021

The visitor pattern is essentially the same thing as Church encoding

church-encoding

This post explains how the visitor pattern is essentially the same thing as Church encoding (or Böhm-Berarducci encoding). This post also explains how you can usefully employ the visitor pattern / Church encoding / Böhm-Berarducci encoding to expand your programming toolbox.

Background

Church encoding is named after Alonzo Church, who discovered that you could model any type of data structure in the untyped lambda calculus using only functions. The context for this was that he was trying to show that lambda calculus could be treated as a universal computational engine, even though the only features it supported were functions.

Note: Later on, Corrado Böhm and Alessandro Berarducci devised the equivalent solution in a typed lambda calculus (specifically, System F):

… so I’ll use “Church encoding” when talking about this trick in the context of an untyped language and use “Böhm-Berarducci” encoding when talking about the same trick in the context of a typed language. If we’re not talking about any specific language then I’ll use “Church encoding”.

In particular, you can model the following types of data structures using language support for functions and nothing else:

  • records / structs (known as “product types” if you want to get fancy)

    The “product” of two types A and B is a type that stores both an A and a B (e.g. a record with two fields, where the first field has type A and the second has type B)

  • enums / tagged unions (known as “sum types”)

    The “sum” of two types A and B is a type that stores either an A or a B. (e.g. a tagged union where the first tag stores a value of type A and the second tag stores a value of type B)

  • recursive data structures

… and if you can precisely model product types, sum types, and recursion, then you can essentially model any data structure. I’m oversimplifying things, but that’s close enough to true for our purposes.

Relevance

The reason we care about Church-encoding is because not all programming languages natively support sum types or recursion (although most programming languages support product types in the form of records / structs).

However, most programming languages do support functions, so if we have functions then we can use them as a “backdoor” to introduce support for sum types or recursion into our language. This is the essence of the visitor pattern: using functions to Church-encode sum types or recursion into a language that does not natively support sum types or recursion.

To illustrate this, suppose that we begin from the following Haskell code:

data Shape
    = Circle{ x :: Double, y :: Double, r :: Double }
    | Rectangle{ x :: Double, y :: Double, w :: Double, h :: Double }

exampleCircle :: Shape
exampleCircle = Circle 2.0 1.4 4.5

exampleRectangle :: Shape
exampleRectangle = Rectangle 1.3 3.1 10.3 7.7

area :: Shape -> Double
area shape = case shape of
    Circle    x y r   -> pi * r ^ 2
    Rectangle x y w h -> w * h

main :: IO ()
main = do
    print (area exampleCircle)
    print (area exampleRectangle)

… but then we hypothetically disable Haskell’s support for algebraic data types. How would we amend our example to still work in such a restricted subset of the language?

We’d use Böhm-Berarducci encoding (the typed version of Church-encoding), and the solution would look like this:

{-# LANGUAGE RankNTypes #-}

-- | This plays the same role as the old `Shape` type
type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

-- | This plays the same role as the old `Circle` constructor
_Circle :: Double -> Double -> Double -> Shape
_Circle x y r = \_Circle _Rectangle -> _Circle x y r

-- | This plays the same role as the old `Rectangle` constructor
_Rectangle :: Double -> Double -> Double -> Double -> Shape
_Rectangle x y w h = \_Circle _Rectangle -> _Rectangle x y w h

exampleCircle :: Shape
exampleCircle = _Circle 2.0 1.4 4.5

exampleRectangle :: Shape
exampleRectangle = _Rectangle 1.3 3.1 10.3 7.7 

area :: Shape -> Double
area shape = shape
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

main :: IO ()
main = do
    print (area exampleCircle)
    print (area exampleRectangle)

The key is the new representation of the Shape type, which is the type of a higher-order function. In fact, if we squint we might recognize that the Shape type synonym:

type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

… looks an awful lot like a GADT-style definition for the Shape type:

{-# LANGUAGE GADTs #-}

data Shape where
    Circle    :: Double -> Double -> Double -> Shape
    Rectangle :: Double -> Double -> Double -> Double -> Shape

This is not a coincidence! Essentially, Böhm-Berarducci encoding models a type as a function that expects each “constructor” as a function argument that has the same type as that constructor. I put “constructor” in quotes since we never actually use a real constructor. Those function arguments are place-holders that will remain abstract until we attempt to “pattern match” on a value of type Shape.

In the area function we “pattern match” on Shape by supplying handlers instead of constructors. To make this explicit, let’s use equational reasoning to see what happens when we evaluate area exampleCircle:

area exampleCircle

-- Substitute the `area` function with its definition
= exampleCircle
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Substitute `exampleCircle` with its definition
= _Circle 2.0 1.4 4.5
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Substitute the `_Circle` function with its definition
= (\_Circle _Rectangle -> _Circle 2.0 1.4 4.5)
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Evaluate the outer-most anonymous function
= (\x y r -> pi * r ^ 2) 2.0 1.4 4.5

-- Evaluate the anonymous function
= pi * 4.5 ^ 2

In other words, Church encoding / Böhm-Berarducci encoding both work by maintaining a fiction that eventually somebody will provide us the “real” constructors right up until we actually need them. Then when we “pattern match” on the value we pull a last-minute bait-and-switch and use each “handler” of the pattern match where the constructor would normally go and everything works out so that we don’t need the constructor after all. Church-encoding is sort of like the functional programming equivalent of “fake it until you make it”.

The same trick works for recursive data structures as well. For example, the way that we Böhm-Berarducci-encode this Haskell data structure:

data Tree = Node Int Tree Tree | Leaf

exampleTree :: Tree
exampleTree = Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf)

preorder :: Tree -> [Int]
preorder tree = case tree of
    Node value left right -> value : preorder left ++ preorder right
    Leaf                  -> []

main :: IO ()
main = print (preorder exampleTree)

… is like this:

{-# LANGUAGE RankNTypes #-}

type Tree = forall tree
    .  (Int -> tree -> tree -> tree)  -- Node :: Int -> Tree -> Tree -> Tree
    -> tree                           -- Leaf :: Tree
    -> tree

_Node :: Int -> Tree -> Tree -> Tree
_Node value left right =
    \_Node _Leaf -> _Node value (left _Node _Leaf) (right _Node _Leaf)

_Leaf :: Tree
_Leaf = \_Node _Leaf -> _Leaf

exampleTree :: Tree
exampleTree = _Node 1 (_Node 2 _Leaf _Leaf) (_Node 3 _Leaf _Leaf)

preorder :: Tree -> [Int]
preorder tree = tree
    (\value left right -> value : left ++ right)
    []

main :: IO ()
main = print (preorder exampleTree)

This time the translation is not quite as mechanical as before, due to the introduction of recursion. In particular, two differences stand out.

First, the way we encode the _Node constructor is not as straightforward as we thought:

_Node :: Int -> Tree -> Tree -> Tree
_Node value left right =
    \_Node _Leaf -> _Node value (left _Node _Leaf) (right _Node _Leaf)

This is because we need to thread through the _Node / _Leaf function arguments through to the node’s children.

Second, the way we consume the Tree is also different. Compare the original code:

preorder :: Tree -> [Int]
preorder tree = case tree of
    Node value left right -> value : preorder left ++ preorder right
    Leaf                  -> []

… to the Böhm-Berarducci-encoded version:

preorder :: Tree -> [Int]
preorder tree = tree
    (\value left right -> value : left ++ right)
    []

The latter version doesn’t require the preorder function to recursively call itself. The preorder function is performing a task that is morally recursive but the preorder function is, strictly speaking, not recursive at all.

In fact, if we look at the Böhm-Berarducci-encoded solution closely we see that we never use recursion anywhere within the code! There are no recursive datatypes and there are also no recursive functions, yet somehow we still managed to encode a recursive data type and recursive functions on that type. This is what I mean when I say that Church encoding / Böhm-Berarducci encoding let you encode recursion in a language that does not natively support recursion. Our code would work just fine in a recursion-free subset of Haskell!

For example, Dhall is a real example of a language that does not natively support recursion and Dhall uses this same trick to model recursive data types and recursive functions:

That post goes into more detail about the algorithm for Böhm-Berarducci-encoding Haskell types, so you might find that post useful if the above examples were not sufficiently intuitive or clear.

Visitor pattern

The visitor pattern is a special case of Church encoding / Böhm Berarducci encoding. I’m not going to provide a standalone explanation of the visitor pattern since the linked Wikipedia page already does that. This section will focus on explaining the correspondence between Church encoding / Böhm-Berarducci encoding and the visitor pattern.

The exact correspondence goes like this. Given:

  • a Church-encoded / Böhm-Berarducci-encoded type T

    e.g. Shape in the first example

  • … with constructors C₀, C₁, C₂, …

    e.g. Circle, Rectangle

  • … and values of type T named v₀, v₁, v₂, …

    e.g. exampleCircle, exampleRectangle

… then the correspondence (using terminology from the Wikipedia article) is:

  • The “element” class corresponds to the type T

    e.g. Shape

  • A “concrete element” (i.e. an object of the “element” class) corresponds to a constructor for the type T

    e.g. Circle, Rectangle

    The accept method of the element selects which handler from the visitor to use, in the same way that our Church-encoded constructors would select one handler (named after the matching constructor) out of all the handler functions supplied to them.

    _Circle :: Double -> Double -> Double -> Shape
    _Circle x y r = \_Circle _Rectangle -> _Circle x y r
    
    _Rectangle :: Double -> Double -> Double -> Double -> Shape
    _Rectangle x y w h = \_Circle _Rectangle -> _Rectangle x y w h
  • A “visitor” class corresponds to the type of a function that pattern matches on a value of type T

    Specifically, a “visitor” class is equivalent to the following Haskell type:

    T -> IO ()

    This is more restrictive than Böhm-Berarducci encoding, which permits pattern matches that return any type of value, like our area function, which returns a Double. In other words, Böhm-Berarducci encoding is not limited to just performing side effects when “visiting” constructors.

    (Edit: Travis Brown notes that the visitor pattern is not restricted to performing side effects. This might be an idiosyncracy of how Wikipedia presents the design pattern)

  • A “concrete visitor” (i.e. an object of the “visitor” class) corresponds to a function that “pattern matches” on a value of type T

    e.g. area

    … where each overloaded visit method of the visitor corresponds to a branch of our Church-encoded “pattern match”:

    area :: Shape -> Double
    area shape = shape
        (\x y r   -> pi * r ^ 2)
        (\x y w h -> w * h)
  • The “client” corresponds to a value of type T

    e.g. exampleCircle, exampleRectangle:

    exampleCircle :: Shape
    exampleCircle = _Circle 2.0 1.4 4.5
    
    exampleRectangle :: Shape
    exampleRectangle = _Rectangle 1.3 3.1 10.3 7.7 

    The Wikipedia explanation of the visitor pattern adds the wrinkle that the client can represent more than one such value. In my opinion, what the visitor pattern should say is that the client can be a recursive value which may have self-similar children (like our example Tree). This small change would improve the correspondence between the visitor pattern and Church-encoding.

Limitations of Böhm-Berarducci encoding

Church encoding works in any untyped language, but Böhm-Berarducci encoding does not work in all typed languages!

Specifically, Böhm-Berarducci only works in general for languages that support polymorphic types (a.k.a. generic programming). This is because the type of a Böhm-Berarducci-encoded value is a polymorphic type:

type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

… but such a type cannot be represented in a language that lacks polymorphism. So what the visitor pattern commonly does to work around this limitation is to pick a specific result type, and since there isn’t a one-size-fits-all type, they’ll usually make the result a side effect, as if we had specialized the universally quantified type to IO ():

type Shape =
    .  (Double -> Double -> Double -> IO ())
    -> (Double -> Double -> Double -> Double -> IO ())
    -> IO ()

This is why Go has great difficulty modeling sum types accurately, because Go does not support polymorphism (“generics”) and therefore Böhm-Berarducci encoding does not work in general for introducing sum types in Go. This is also why people with programming language theory backgrounds make a bigger deal out of Go’s lack of generics than Go’s lack of sum types, because if Go had generics then people could work around the lack of sum types using a Böhm-Berarducci encoding.

Conclusions

Hopefully this gives you a better idea of what Church encoding and Böhm-Berarducci encoding are and how they relate to the visitor pattern.

In my opinion, Böhm-Berarducci encoding is a bigger deal in statically-typed languages because it provides a way to introduce sum types and recursion into a language in a type-safe way that makes invalid states unrepresentable. Conversely, Church encoding is not as big of a deal in dynamically-typed languages because a Church-encoded type is still vulnerable to runtime exceptions.

Friday, December 4, 2020

Recruiting for diversity is not lowering the bar

diversity

I believe there are multiple reasons why hiring processes should account for diversity, but I’d like to use this post to address a common counterargument that people raise in discussions about diversity.

The argument typically goes like this: “diversity is important, but we won’t lower the bar”. I believe that this line of reasoning is flawed in a few ways that I’d like to highlight.

The blub paradox

One common source of hiring bias is the inability to recognize or appreciate strengths greater than or different from one one’s own strengths. I would like to make an analogy to Paul Graham’s post on Beating the Averages:

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he’s looking down. Languages less powerful than Blub are obviously less powerful, because they’re missing some feature he’s used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn’t realize he’s looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

The above bias that people apply when evaluating programming languages also applies when evaluating candidates! People naturally prefer to hire people who share similar strengths, because they recognize and appreciate the value of those strengths. However, when confronted with strengths different from their own they may not interview for those strengths or even recognize them as strengths at all. Quite the opposite: they may view the candidate as “weird” or “not a culture fit” for not cultivating the “right” strengths.

The notion of a “hiring bar” presumes that candidates can all be ordered on a line and those on one side of some cutoff should not be hired. This linear metaphor reinforces our biases around hiring candidates whose strengths align with our own: “Let’s take what I’m good at, and aim to hire somebody who is at least 10% better at that”.

Overcoming adversity

“Diversity hires” can also be much stronger than you appreciate, even when you evaluate them according to strengths that you are trained to recognize.

This is because underrepresented minorities often have to swim upstream against institutionalized discrimination and work harder just to reach the same accomplishments and milestones as their majority peers. A minority candidate can outperform your initial impression of them if you can remove these discriminatory barriers within your workplace.

Recruiting diverse candidates does not lower the bar

This post explains the concept well:

First, the idea that reaching a more diverse talent pool requires lowering the bar on quality reflects an insidious form of prejudice: somehow the pool of talent is thought of as a monolithic block, the top of which is predominantly white and male; it is only by going farther down from the top that more diverse candidates can be found. In reality, companies that struggle to attract diverse candidates are probably not making the effort to look in the right places.

Recruiting underrepresented minorities does not dilute the talent pool, unless you assume that you are already interviewing the best of the best (unlikely). In reality, you’re likely recruiting people who are easiest to find: those who already share the same professional networks and backgrounds.

This sort of bias minimizes recruiting costs, but at the expense of diversity and also the expense of the quality of hires. Explicitly recruiting for diversity challenges your recruiting process to expand beyond its comfort zone, increasing the access to talent and the quality of your hires.

Thursday, November 19, 2020

How to use NixOS for lightweight integration tests

nixos-test

If you use Nix in some capacity then you should check out the NixOS integration test system, which provides an easy way to test services that run inside one or more QEMU virtual machines.

NixOS tests are (in my opinion) way ahead of other integration test systems, but the only way to properly illustrate their value is to walk through a real-world example to highlight their salient features.

The motivating example

This post will motivate NixOS tests by using them to detect an error in the official postgrest tutorial.

You can skim the above tutorial to get a sense of the steps involved, but I’ll also summarize them here:

  • Run postgres in a docker container

  • Download postgrest

  • Set up the database by running these commands:

    create table api.todos (
      id serial primary key,
      done boolean not null default false,
      task text not null,
      due timestamptz
    );
    
    insert into api.todos (task) values
      ('finish tutorial 0'), ('pat self on back');
    
    create role web_anon nologin;
    
    grant usage on schema api to web_anon;
    grant select on api.todos to web_anon;
    
    create role authenticator noinherit login password 'mysecretpassword';
    grant web_anon to authenticator;
  • Save the following configuration to tutorial.conf:

    db-uri = "postgres://authenticator:mysecretpassword@localhost:5433/postgres"
    db-schema = "api"
    db-anon-role = "web_anon"
  • Run ./postgrest tutorial.conf

  • Check that it’s working using:

    $ curl http://localhost:3000/todos

    … which should return:

    [
      {
        "id": 1,
        "done": false,
        "task": "finish tutorial 0",
        "due": null
      },
      {
        "id": 2,
        "done": false,
        "task": "pat self on back",
        "due": null
      }
    ]

These are quite a few manual steps, and if I were a postgrest maintainer then it would be a pain to check that they still work for every new software release. In practice, most maintainers write and check a tutorial once and then never check again unless users report errors. This is a shame, because one of the most important functions of a tutorial is to inspire confidence:

Make sure that your tutorial works

One of your jobs as a tutor is to inspire the beginner’s confidence: in the software, in the tutorial, in the tutor and, of course, in their own ability to achieve what’s being asked of them.

There are many things that contribute to this. A friendly tone helps, as does consistent use of language, and a logical progression through the material. But the single most important thing is that what you ask the beginner to do must work. The learner needs to see that the actions you ask them to take have the effect you say they will have.

If the learner’s actions produce an error or unexpected results, your tutorial has failed - even if it’s not your fault. When your students are there with you, you can rescue them; if they’re reading your documentation on their own you can’t - so you have to prevent that from happening in advance. This is without doubt easier said than done.

Fortunately, we can codify the manual steps from the tutorial into a NixOS configuration for a virtual machine, which is a declarative specification of our system’s desired state:

# ./postgrest-tutorial.nix

let
  # For extra determinism
  nixpkgs =
    builtins.fetchTarball {
      url = "https://github.com/NixOS/nixpkgs/archive/58f9c4c7d3a42c912362ca68577162e38ea8edfb.tar.gz";

      sha256 = "1517dy07jf4zhzknqbgm617lgjxsn7a6k1vgq61c67f6h55qs5ij";
    };

  # Single source of truth for all tutorial constants
  database = "postgres";
  schema   = "api";
  table    = "todos";
  username = "authenticator";
  password = "mysecretpassword";
  webRole  = "web_anon";

  nixos =
    import "${nixpkgs}/nixos" {
      system = "x86_64-linux";

      configuration = { config, pkgs, ... }: {
        # Open the default port for `postgrest` in the firewall
        networking.firewall.allowedTCPPorts = [ 3000 ];

        services.postgresql = {
          enable = true;

          initialScript = pkgs.writeText "initialScript.sql" ''
            create schema ${schema};

            create table ${schema}.${table} (
              id serial primary key,
              done boolean not null default false,
              task text not null,
              due timestamptz
            );

            insert into ${schema}.${table} (task) values
              ('finish tutorial 0'), ('pat self on back');

            create role ${webRole} nologin;

            grant usage on schema ${schema} to ${webRole};
            grant select on ${schema}.${table} to ${webRole};

            create role ${username} noinherit login password '${password}';
            grant ${webRole} to ${username};
          '';
        };

        users = {
          mutableUsers = false;

          users = {
            # For ease of debugging the VM as the `root` user
            root.password = "";

            # Create a system user that matches the database user so that we
            # can use peer authentication.  The tutorial defines a password,
            # but it's not necessary.
            "${username}".isSystemUser = true;
          };
        };

        systemd.services.postgrest = {
          wantedBy = [ "multi-user.target" ];

          after = [ "postgresql.service" ];

          script =
            let
              configuration = pkgs.writeText "tutorial.conf" ''
                db-uri = "postgres://${username}:${password}@localhost:${toString config.services.postgresql.port}/${database}"
                db-schema = "${schema}"
                db-anon-role = "${username}"
              '';

            in
              ''
                ${pkgs.haskellPackages.postgrest}/bin/postgrest ${configuration}
              '';

          serviceConfig.User = username;
        };

        # Uncomment the next line for running QEMU on a non-graphical system
        # virtualisation.graphics = false;
      };
    };

in
  nixos.vm

We can then build and run this tutorial virtual machine by running the following commands:

$ nix build --file ./postgrest-tutorial.nix

$ QEMU_NET_OPTS='hostfwd=tcp::3000-:3000' result/bin/run-nixos-vm

That spins up a VM and prompts us to log in when the VM is ready:

<<< Welcome to NixOS 20.09pre-git (x86_64) - ttyS0 >>>

Run 'nixos-help' for the NixOS manual.

nixos login: 

However, before we log in, we can test if postgrest is working using the same curl command from the tutorial:

$ curl http://localhost:3000/todos
{"hint":null,"details":null,"code":"42501","message":"permission denied for schema api"}

Wait, what? We were supposed to get:

[
  {
    "id": 1,
    "done": false,
    "task": "finish tutorial 0",
    "due": null
  },
  {
    "id": 2,
    "done": false,
    "task": "pat self on back",
    "due": null
  }
]

… but apparently something is wrong with the database’s permissions.

Fortunately, we can log into the VM as the root user with an empty password to test the database permissions. Once we log into the system we can further log into the database as the authenticator user:

<<< Welcome to NixOS 20.09pre-git (x86_64) - ttyS0 >>>

Run 'nixos-help' for the NixOS manual.

nixos login: root<Enter>
Password: <Enter>

[root@nixos:~]# sudo --user authenticator psql postgres
psql (11.9)
Type "help" for help.

postgres=> 

Now we can test to see if the authenticator user is able to access the api.todos table:

postgres=> SELECT * FROM api.todos;
ERROR:  permission denied for schema api
LINE 1: SELECT * FROM api.todos;

Good: we can reproduce the problem, but what might be the cause?

As it turns out, the tutorial instructions appear to not configure the authenticator role correctly. Specifically, the noinherit in the following commands is the reason we can’t directly access the schema api:

create role authenticator noinherit login password 'mysecretpassword';
grant web_anon to authenticator;

The noinherit setting prevents the authenticator user from automatically assuming all permissions associated with the web_anon user. Instead, the authenticator user has to explicitly use the SET ROLE command to assume such permissions, and we can verify that at the database prompt:

postgres=> SET ROLE web_anon;
SET
postgres=> SELECT * FROM api.todos;
 id | done |       task        | due 
----+------+-------------------+-----
  1 | f    | finish tutorial 0 | 
  2 | f    | pat self on back  | 
(2 rows)

Mystery solved! We can test our hypothesis by changing that noinherit to inherit:

create role authenticator inherit login password 'mysecretpassword';
grant web_anon to authenticator;

… then we can restart the VM to check that things now work by:

  • typing CTRL-a c and entering quit

  • running the following commands:

    $ rm nixos.qcow2  # Remove the old VM's disk image so we start fresh
    $ nix build --file ./postgrest-tutorial.nix  # The rest is the same as before
    $ QEMU_NET_OPTS='hostfwd=tcp::3000-:3000' result/bin/run-nixos-vm

… and now the curl example from the tutorial works:

$ curl http://localhost:3000/todos
[{"id":1,"done":false,"task":"finish tutorial 0","due":null}, 
 {"id":2,"done":false,"task":"pat self on back","due":null}]

But wait, there’s more!

Automated testing

We don’t have to manually setup/teardown VMs and run curl commands. We can automate the entire process from end-to-end by using NixOS’s support for automated integration tests.

If we follow the instructions from the NixOS manual, then the automated integration test looks like this:

# ./postgrest-tutorial.nix

let
  # For extra determinism
  nixpkgs =
    builtins.fetchTarball {
      url = "https://github.com/NixOS/nixpkgs/archive/58f9c4c7d3a42c912362ca68577162e38ea8edfb.tar.gz";

      sha256 = "1517dy07jf4zhzknqbgm617lgjxsn7a6k1vgq61c67f6h55qs5ij";
    };

  # Single source of truth for all tutorial constants
  database      = "postgres";
  schema        = "api";
  table         = "todos";
  username      = "authenticator";
  password      = "mysecretpassword";
  webRole       = "web_anon";
  postgrestPort = 3000;

in
  import "${nixpkgs}/nixos/tests/make-test-python.nix" ({ pkgs, ...}: {
    system = "x86_64-linux";

    nodes = {
      server = { config, pkgs, ... }: {
        # Open the default port for `postgrest` in the firewall
        networking.firewall.allowedTCPPorts = [ postgrestPort ];

        services.postgresql = {
          enable = true;

          initialScript = pkgs.writeText "initialScript.sql" ''
            create schema ${schema};

            create table ${schema}.${table} (
              id serial primary key,
              done boolean not null default false,
              task text not null,
              due timestamptz
            );

            insert into ${schema}.${table} (task) values
              ('finish tutorial 0'), ('pat self on back');

            create role ${webRole} nologin;

            grant usage on schema ${schema} to ${webRole};
            grant select on ${schema}.${table} to ${webRole};

            create role ${username} inherit login password '${password}';
            grant ${webRole} to ${username};
          '';
        };

        users = {
          mutableUsers = false;

          users = {
            # For ease of debugging the VM as the `root` user
            root.password = "";

            # Create a system user that matches the database user so that we
            # can use peer authentication.  The tutorial defines a password,
            # but it's not necessary.
            "${username}".isSystemUser = true;
          };
        };

        systemd.services.postgrest = {
          wantedBy = [ "multi-user.target" ];

          after = [ "postgresql.service" ];

          script =
            let
              configuration = pkgs.writeText "tutorial.conf" ''
                db-uri = "postgres://${username}:${password}@localhost:${toString config.services.postgresql.port}/${database}"
                db-schema = "${schema}"
                db-anon-role = "${username}"
              '';

            in
              ''
                ${pkgs.haskellPackages.postgrest}/bin/postgrest ${configuration}
              '';

          serviceConfig.User = username;
        };

        # Uncomment the next line for running QEMU on a non-graphical system
        # virtualisation.graphics = false;
      };

      client = { };
    };

    testScript =
      ''
      import json
      import sys

      start_all()

      server.wait_for_open_port(${toString postgrestPort})

      expected = [
          {"id": 1, "done": False, "task": "finish tutorial 0", "due": None},
          {"id": 2, "done": False, "task": "pat self on back", "due": None},
      ]

      actual = json.loads(
          client.succeed(
              "${pkgs.curl}/bin/curl http://server:${toString postgrestPort}/${table}"
          )
      )

      if expected != actual:
          sys.exit(1)
      '';
  })

… and you can run the test with the following command:

$ nix build --file ./postgrest-tutorial.nix

… which will silently succeed with a 0 exit code if the test passes, or fail with an error message otherwise.

The above example highlights a few neat aspects of the NixOS test framework:

  • You can test more than one VM at a time

    The above test creates two VMs:

    • One VM named server which hosts postgres + postgrest

    • One VM named client where we initiate our curl commands

    … so that we can verify that everything works even when curl is run from a separate machine. For example, this comes in handy for testing firewall rules.

  • You can write the test and orchestration logic in Python

    This means that we can use Python not only to run the curl subprocess, but to also compare the result against a golden JSON output.

Conclusion

This NixOS test framework is streets ahead of other integration test frameworks that I’ve worked with:

  • The test is deterministic

    The above example will continue to work a decade from now because all transitive dependencies are fully pinned by the NixOS specification.

  • The test is reproducible

    We don’t need to specify out-of-band instructions for how to obtain or install test dependencies. The only thing users globally install is Nix.

  • The test is compact

    The whole thing fits in a single 120-line file with generous whitespace and formatting (although you have the option of splitting into more files if you prefer)

  • The test is fully isolated

    The test does not mutate any shared resources or files and the test runs within an isolated network, so we can run multiple integration tests in parallel on the same machine for building a test matrix.

  • The test is fast

    You might think that a VM-based test is slow compare to a container-based one, but the entire test run, including VM setup and teardown, only takes about 10 seconds.

  • The test is written in a fully-featured language

    We can use Nix’s support for programming language features to reduce repetition. For example, this is why we can consolidate all test constants to be defined in one place so that there is a single source of truth for everything.

So if you’re already trying out Nix, I highly encourage you to give the NixOS integration test framework a try for the above reasons.