Thursday, April 8, 2021

How to replace Proxy with AllowAmbiguousTypes

no-proxy

This post is essentially a longer version of this Reddit post by Ashley Yakeley that provides more explanation and historical context.

Sometimes in Haskell you need to write a function that “dispatches” only on a type and not on a value of that type. Using the example from the above post, we might want to write a function that, given an input type, prints the name of that type.

Approach 1 - undefined

One naive approach would be to do this:

class TypeName a where
    typeName :: a -> String

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

… which we could use like this:

>>> typeName False
"Bool"
>>> typeName (0 :: Int)
"Int"

However, this approach does not work well because we must provide the typeName method with a concrete value of type a. Not only is this superfluous (we don’t care which value we supply) but in some cases we might not even be able to supply such a value.

For example, consider this instance:

import Data.Void (Void)

instance TypeName Void where
    typeName _ = "Void"

There is a perfectly valid name associated with this type, but we cannot retrieve the name without cheating because we cannot produce a (total) value of type Void. Instead, we have to use something like undefined:

>>> typeName (undefined :: Void)
"Void"

The base package uses this undefined-based approach. For example, Foreign.Storable provides a sizeOf function that works just like this:

class Storable a where
    -- | Computes the storage requirements (in bytes) of the argument. The value
    -- of the argument is not used.
    sizeOf :: a -> Int

… and to this day the idiomatic way to use sizeOf is to provide a fake value using undefined:

>>> sizeOf (undefined :: Bool)
4

This works because sizeOf never evaluates its argument. It’s technically safe, albeit not very appealing to depend on undefined.

Approach 2A - Proxy

The next evolution of this approach was to use the Proxy type (now part of base in the Data.Proxy module). As the documentation notes:

Historically, Proxy :: Proxy a is a safer alternative to the undefined :: a idiom.

I’m not exactly sure what the name Proxy was originally meant to convey, but I believe the intention was that a term (the Proxy constructor) stands in as a “proxy” for a type (specifically, the type argument to the Proxy type constructor).

We can amend our original example to use the Proxy type like this:

import Data.Proxy (Proxy(..))
import Data.Void (Void)

class TypeName a where
    typeName :: Proxy a -> String

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

instance TypeName Void where
    typeName _ = "Void"

… and now we can safely get the name of a type without providing a specific value of that type. Instead we always provide a Proxy constructor and give it a type annotation which “stores” the type we wish to use:

>>> typeName (Proxy :: Proxy Bool)
"Bool"
>>> typeName (Proxy :: Proxy Int)
"Int"
>>> typeName (Proxy :: Proxy Void)
"Void"

We can simplify that a little bit by enabling the TypeApplications language extension, which permits us to write this:

>>> :set -XTypeApplications
>>> typeName (Proxy @Bool)
"Bool"
>>> typeName (Proxy @Int)
"Int"
>>> typeName (Proxy @Void)
"Void"

… or this if we prefer:

>>> typeName @Bool Proxy
"Bool"
>>> typeName @Int Proxy
"Int"
>>> typeName @Void Proxy
"Void"

Approach 2B - proxy

A minor variation on the previous approach is to use proxy (with a lowercase “p”) in the typeclass definition:

import Data.Void (Void)

class TypeName a where
    typeName :: proxy a -> String
    --          ↑

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

instance TypeName Void where
    typeName _ = "Void"

Everything else works the same, but now neither the author nor the consumer of the typeclass needs to depend on the Data.Proxy module specifically. For example, the consumer could use any other type constructor just fine:

>>> typeName ([] :: [Int])  -- Technically legal, but weird
"Int"

… or (more likely) the consumer could define their own Proxy type to use instead of the one from Data.Proxy, which would also work fine:

>>> data Proxy a = Proxy
>>> typeName (Proxy :: Proxy Int)
"Int"

This trick helped back when Proxy was not a part of the base package. Even now that Proxy is in base you still see this trick when people author typeclass instances because it’s easier and there’s no downside.

Both of these Proxy-based solutions are definitely better than using undefined, but they are both still a bit unsatisfying because we have to supply a Proxy argument to specify the desired type. The ideal user experience should only require the type and the type alone as an input to our function.

Approach 3 - AllowAmbiguousTypes + TypeApplications

We previously noted that we could shorten the Proxy-based solution by using TypeApplications:

>>> typeName @Bool Proxy
"Bool"

Well, what if we could shorten things even further and just drop the Proxy, like this:

>>> typeName @Bool

Actually, we can! This brings us to a more recent approach (the one summarized in the linked Reddit post), which is to use AllowAmbiguousTypes + TypeApplications, like this:

{-# LANGUAGE AllowAmbiguousTypes #-}

import Data.Void (Void)

class TypeName a where
    typeName :: String

instance TypeName Bool where
    typeName = "Bool"

instance TypeName Int where
    typeName = "Int"

instance TypeName Void where
    typeName = "Void"

… which we can invoke like this:

>>> :set -XTypeApplications
>>> typeName @Bool
"Bool"
>>> typeName @Int
"Int"
>>> typeName @Void
"Void"

The use of TypeApplications is essential, since otherwise GHC would have no way to infer which typeclass instance we meant. Even a type annotation would not work:

>>> typeName :: String  -- Clearly, this type annotation is not very helpful

<interactive>:1:1: error:
Ambiguous type variable ‘a0’ arising from a use of ‘typeName’
      prevents the constraint ‘(TypeName a0)’ from being solved.
      Probable fix: use a type annotation to specify what ‘a0’ should be.
      These potential instances exist:
        instance [safe] TypeName Void -- Defined at Example.hs:14:10
        instance [safe] TypeName Bool -- Defined at Example.hs:8:10
        instance [safe] TypeName Int -- Defined at Example.hs:11:10
In the expression: typeName :: String
      In an equation for ‘it’: it = typeName :: String

Type applications work here because you can think of a polymorphic function as really having one extra function argument: the polymorphic type. I elaborate on this a bit in my post on Polymorphism for dummies, but the basic idea is that TypeApplications makes this extra function argument for the type explicit. This means that you can directly tell the compiler which type to use by “applying” the function to the right type instead of trying to indirectly persuade the compiler into using the the right type with a type annotation.

Conclusion

My personal preference is to use the last approach with AllowAmbiguousTypes and TypeApplications. Not only is it more concise, but it also appeals to my own coding aesthetic. Specifically, guiding compiler behavior using type-annotations feels more like logic programming to me and using explicit type abstractions and TypeApplications feels more like functional programming to me (and I tend to prefer functional programming over logic programming).

However, the Proxy-based approach requires no language extensions, so that approach might appeal to you if you prefer to use the simplest subset of the language possible.

Saturday, February 6, 2021

Folds are constructor substitution

folds

I notice that functional programming beginners and experts understand the word “fold” to mean subtly different things, so I’d like to explain what experienced functional programmers usually mean when they use the term “fold”. This post assumes a passing familiarity with Haskell.

Overview

A “fold” is a function that replaces all constructors of a datatype with corresponding expressions. “fold”s are not limited to lists, linear sequences, or even containers; you can fold any inductively defined datatype.

To explain the more general notion of a “fold”, we’ll consider three representative data structures:

  • lists
  • Maybe values
  • binary trees

… and show how we can automatically derive the “one true fold” for each data structure by following the same general principle.

Lists

Many beginners understand the word “fold” to be a way to reduce some collection of values (e.g. a list) to a single value. For example, in Haskell you can add up the elements of a list like this:

sum :: [Int] -> Int
sum xs = foldr (+) 0 xs

… where sum reduces a sequence of Ints to a single Int by starting from an initial accumulator value of 0 and then “folding” each element of the list into the accumulator using (+).

Haskell’s standard library provides at least two fold functions named foldl and foldr, but only foldr is the “canonical” fold for a list. By “canonical” I mean that foldr is the only fold that works by substituting list constructors.

We can more easily see this if we define our own linked list type with explicitly named constructors:

data List a = Cons a (List a) | Nil

… where instead of writing a list as [ 1, 2, 3 ] we instead will write such a list as:

example :: List Int
example = Cons 1 (Cons 2 (Cons 3 Nil))

This is a very unergonomic representation for a list, but bear with me!

We can implement the “canonical” fold for the above List type as a function that takes two arguments:

  • The first argument (named cons) replaces all occurrences of the Cons constructor
  • The second argument (named nil) replaces all occurrences of the Nil constructor

The implementation of the canonical fold looks like this:

fold :: (a -> list -> list) -> list -> List a -> list
fold cons nil (Cons x xs) = cons x (fold cons nil xs)
fold cons nil  Nil        = nil

You might not necessarily follow how that implementation works, so a more direct way to appreciate how fold works is to see how the function behaves on some sample inputs:

-- The general case, step-by-step
fold cons nil (Cons x (Cons y (Cons z Nil)))
    = cons x (fold cons nil (Cons y (Cons z Nil)))
    = cons x (cons y (fold cons nil (Cons z Nil)))
    = cons x (cons y (cons z (fold cons nil Nil)))
    = cons x (cons y (cons z nil))

-- Add up the elements of the list, but skipping more steps this time
fold (+) 0 (Cons 1 (Cons 2 (Cons 3 Nil)))
  = (+) 1 ((+) 2 ((+) 3 0))
  = 1 + (2 + (3 + 0))
  = 6

-- Calculate the list length
fold (\_ n -> n + 1) 0 (Cons True (Cons False (Cons True Nil)))
  = (\_ n -> n + 1) True ((\_ n -> n + 1) False ((\_ n -> n + 1) True 0))
  = (\_ n -> n + 1) True ((\_ n -> n + 1) False 1)
  = (\_ n -> n + 1) True 2
  = 3

Notice that if we format the type of fold a bit we can see that the type of each argument to fold (sort of) matches the type of the corresponding constructor they replace:

fold
    :: (a -> list -> list)  -- Cons :: a -> List a -> List a
    -> list                 -- Nil  :: List a
    -> List a
    -> list

In the above type, list is actually a type variable and we could have used any name for that type variable instead of list, such as b. In fact, if we were to replace list with b, we would get essentially the same type as foldr for Haskell lists:

-- Our `fold` type, replacing `list` with `b`
fold
    :: (a -> b -> b)
    -> b
    -> List a
    -> b

-- Now compare that type to the `foldr` type from the Prelude:
foldr
    :: (a -> b -> b)
    -> b
    -> [a]
    -> b

We commonly use folds to reduce a List to a single scalar value, but folds are actually much more general-purpose than that and they can be used to transform one data structure into another data structure. For example, we can use the same fold function to convert our clumsy List type into the standard Haskell list type, like this:

fold (:) [] (Cons 1 (Cons 2 (Cons 3 Nil)))
    = (:) 1 ((:) 2 ((:) 3 []))
    = 1 : (2 : (3 : []))
    = [ 1, 2, 3 ]

Maybe

Folds are not limited to recursive data types. For example, here is the canonical fold for Haskell’s Maybe type, which is not recursive:

data Maybe a = Nothing | Just a

fold :: maybe -> (a -> maybe) -> Maybe a -> maybe
fold nothing just  Nothing  = nothing
fold nothing just (Just x ) = just x

In fact, this function already exists in Haskell’s standard library by the name of maybe:

maybe :: b -> (a -> b) -> Maybe a -> b
maybe n _ Nothing  = n
maybe _ f (Just x) = f x

Once you think of folds in terms of constructor substitution you can quickly spot these canonical folds for other types.

Binary trees

What about more complex data structures, like the following binary Tree type?

data Tree a = Node a (Tree a) (Tree a) | Leaf

This sort of fold is still straightforward to write, by applying the same principle of constructor substitution:

fold :: (a -> tree -> tree -> tree) -> tree -> Tree a -> tree
fold node leaf (Node x l r) = node x (fold node leaf l) (fold node leaf r)
fold node leaf  Leaf        = leaf

We only need to keep recursively descending over the Tree, replacing constructors as we go.

We can use this fold to reduce the Tree to a single value, like this:

-- Add up all the nodes in the tree
fold (\x l r -> x + l + r) 0 (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = (\x l r -> x + l + r) 1
        ((\x l r -> x + l + r) 2 0 0)
        ((\x l r -> x + l + r) 3 0 0)
    = (\x l r -> x + l + r) 1
        (2 + 0 + 0)
        (3 + 0 + 0)
    = (\x l r -> x + l + r) 1
        2
        3
    = 1 + 2 + 3
    = 6

… or we can use the same fold function to transform the Tree into another data structure, like a list:

-- List `Tree` elements in pre-order
fold (\x l r -> x : l ++ r) [] (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = (\x l r -> x : l ++ r) 1
        ((\x l r -> x : l ++ r) 2 [] [])
        ((\x l r -> x : l ++ r) 3 [] [])
    = (\x l r -> x : l ++ r) 1
        (2 : [] ++ [])
        (3 : [] ++ [])
    = (\x l r -> x : l ++ r) 1
        [2]
        [3]
    = (\x l r -> x : l ++ r) 1
        [2]
        [3]
    = 1 : [2] ++ [3]
    = [1, 2, 3]

… even use the fold to reverse the tree:

fold (\x l r -> Node x r l) Leaf (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = (\x l r -> Node x r l) 1
        ((\x l r -> Node x r l) 2 Leaf Leaf)
        ((\x l r -> Node x r l) 3 Leaf Leaf)
    = (\x l r -> Node x r l) 1
        (Node 2 Leaf Leaf)
        (Node 3 Leaf Leaf)
    = Node 1 (Node 3 Leaf Leaf) (Node 2 Leaf Leaf)

Generality

At this point you might be wondering: “what can’t a fold do?”. The answer is: you can do essentially anything with a fold, although it might not necessarily be the most efficient solution to your problem. You can think of a fold as the most general-purpose interface for consuming a data structure because the fold interface is a “lossless” way to process a data structure.

To see why a fold is a “lossless” interface, let’s revisit the fold function for Trees and this time we will pass in the Node and Leaf constructors as the inputs to the fold. In other words, we will replace all occurrences of Node with Node and replace all occurrences of Leaf with Leaf:

fold Node Leaf (Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf))
    = Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf)

This gives us back the original data structure, demonstrating how we always have the option for a fold to recover the original pristine input. This is what I mean when I say that a fold is a lossless interface.

Sunday, January 31, 2021

Dynamic type errors lack relevance

pure-exceptions

Proponents of statically typed languages commonly motivate types as a way to safely detect bugs ahead of time. For example, consider the following Python program that attempts to increment a number stored in counter.txt:

# ./increment.py

with open('counter.txt', 'r') as handle:
    x = handle.readline()

with open('counter.txt', 'w') as handle:
    handle.write(int(x) + 1)

This program contains a type error, but by the time we find out it’s too late: our program will have already wiped the contents of counter.txt by opening the file as a writable handle:

$ echo -n '0' > ./counter.txt

$ cat counter.txt
0

$ python increment.py
Traceback (most recent call last):
  File "increment.py", line 5, in <module>
    handle.write(int(x) + 1)
TypeError: expected a string or other character buffer object

$ cat counter.txt  # The contents of the file were lost

Defenders of dynamically typed languages sometimes counter that these pitfalls do not matter when runtime failures are mostly harmless. If you want to find errors in your program, just run the program!

As an extreme example, Nix is a purely functional language with a dynamic type system, and you can safely interpret a Nix program ahead of time to detect errors since Nix evaluation has no side effects1. Consequently, Nix proponents sometimes reason that these dynamic type errors are functionally indistinguishable from static type errors thanks to Nix’s purity.

However, dynamic types are not a substitute for static types, even in a purely functional language like Nix. To see why, consider the following Nix expression, which attempts to render a structured value as command-line options:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
      ];
    }

The intention was to produce this result:

[ "--option" "max-jobs=5" "--option" "cores=4" "--option" "fallback=true" ]

… but we actually get a dynamic type error when we interpret the expression:

$ nix-instantiate --eval options.nix --strict
error: evaluation aborted with the following error message: 'generators.mkValueStringDefault: functions not supported: <λ>'

This error message is not very helpful, and it’s not due to a lack of effort, funding, or attention. This sort of poor user experience is inherent to any dynamic type system.

The fundamental issue is that in a dynamically typed language you cannot explain errors to the user in terms of the source code they wrote. In other words, dynamic type errors commonly fail to be relevant to the user.

For example, if Nix had a typical static type system, then the diagnostic might have looked something like this:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
#       ~~~~~~
#       This element of the list is not a string
      ];
    }

This sort of diagnostic helps us more easily discern that we forgot to parenthesize (enable "fallback"), so the enable function is treated as another list element.

In a dynamic type system, type errors can potentially be far removed from the code that the user wrote. From Nix’s point of view, the actual error is that somewhere in the middle of interpretation it is trying to apply a mkValueStringDefault utility function to the user’s exclaim function:

mkValueStringDefault enable

… but by that point the Nix interpreter is no longer “thinking” in terms of the original program the user wrote, so any interpreter diagnostics will have difficulty explaining the error in terms that the user can understand. For example:

  • In the middle of interpretation any offending subexpressions are abstract syntax trees, not source code

  • Some of these abstract syntax trees may be functions or closures that cannot be (easily) displayed to the user

    We see this above where the error message is unable to render the enable function so it falls back to displaying <λ>.

  • Intermediate evaluation results might not correspond to the source code at all

    For example, the user might not understand where mkValueStringDefault is originating from in the absence of a stack trace.

  • Even if we could trace subexpressions to their original source code the user still might not be able to work backwards from the dynamic type error to the real problem.

    In other words, even if we showed the user the call site for the mkValueStringDefault function they still wouldn’t necessarily understand why exclaim is the function argument.

In fact, the example error message came out better than I expected. The reason why is because somebody took the time to add a custom error message to the mkValueStringDefault utility instead of falling back on the interpreter throwing a dynamic type error:

  mkValueStringDefault = {}: v: with builtins;
    let err = t: v: abort
          ("generators.mkValueStringDefault: " +
           "${t} not supported: ${toPretty {} v}");
    in

Had they not done so then the error message would have been even further disconnected from the user’s experience. This only reinforces that the relevance of error messages is inversely proportional to the extent to which we avail ourselves of the dynamic type system.

This is why I prefer to lean on static type systems as much as possible to detect errors, because they tend to do a better job of “explaining” what went wrong than dynamic type systems.

Note: The criticisms in this post also apply to exceptions in general (where you can view dynamic types as a special case of exceptions auto-generated by the interpreter). Exceptions also need to be supplemented by stack traces, logging, or debuggers in order to improve their relevance.


  1. Technically, Nix evaluation can trigger builds via “import from derivation”. However, with appropriate sandboxing even builds are mostly harmless. Either way, just assume for the purpose of discussion that Nix evaluation is safe. After all, any unsafety in evaluation only makes the case for static types even stronger.↩︎

Monday, January 4, 2021

The visitor pattern is essentially the same thing as Church encoding

church-encoding

This post explains how the visitor pattern is essentially the same thing as Church encoding (or Böhm-Berarducci encoding). This post also explains how you can usefully employ the visitor pattern / Church encoding / Böhm-Berarducci encoding to expand your programming toolbox.

Background

Church encoding is named after Alonzo Church, who discovered that you could model any type of data structure in the untyped lambda calculus using only functions. The context for this was that he was trying to show that lambda calculus could be treated as a universal computational engine, even though the only features it supported were functions.

Note: Later on, Corrado Böhm and Alessandro Berarducci devised the equivalent solution in a typed lambda calculus (specifically, System F):

… so I’ll use “Church encoding” when talking about this trick in the context of an untyped language and use “Böhm-Berarducci” encoding when talking about the same trick in the context of a typed language. If we’re not talking about any specific language then I’ll use “Church encoding”.

In particular, you can model the following types of data structures using language support for functions and nothing else:

  • records / structs (known as “product types” if you want to get fancy)

    The “product” of two types A and B is a type that stores both an A and a B (e.g. a record with two fields, where the first field has type A and the second has type B)

  • enums / tagged unions (known as “sum types”)

    The “sum” of two types A and B is a type that stores either an A or a B. (e.g. a tagged union where the first tag stores a value of type A and the second tag stores a value of type B)

  • recursive data structures

… and if you can precisely model product types, sum types, and recursion, then you can essentially model any data structure. I’m oversimplifying things, but that’s close enough to true for our purposes.

Relevance

The reason we care about Church-encoding is because not all programming languages natively support sum types or recursion (although most programming languages support product types in the form of records / structs).

However, most programming languages do support functions, so if we have functions then we can use them as a “backdoor” to introduce support for sum types or recursion into our language. This is the essence of the visitor pattern: using functions to Church-encode sum types or recursion into a language that does not natively support sum types or recursion.

To illustrate this, suppose that we begin from the following Haskell code:

data Shape
    = Circle{ x :: Double, y :: Double, r :: Double }
    | Rectangle{ x :: Double, y :: Double, w :: Double, h :: Double }

exampleCircle :: Shape
exampleCircle = Circle 2.0 1.4 4.5

exampleRectangle :: Shape
exampleRectangle = Rectangle 1.3 3.1 10.3 7.7

area :: Shape -> Double
area shape = case shape of
    Circle    x y r   -> pi * r ^ 2
    Rectangle x y w h -> w * h

main :: IO ()
main = do
    print (area exampleCircle)
    print (area exampleRectangle)

… but then we hypothetically disable Haskell’s support for algebraic data types. How would we amend our example to still work in such a restricted subset of the language?

We’d use Böhm-Berarducci encoding (the typed version of Church-encoding), and the solution would look like this:

{-# LANGUAGE RankNTypes #-}

-- | This plays the same role as the old `Shape` type
type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

-- | This plays the same role as the old `Circle` constructor
_Circle :: Double -> Double -> Double -> Shape
_Circle x y r = \_Circle _Rectangle -> _Circle x y r

-- | This plays the same role as the old `Rectangle` constructor
_Rectangle :: Double -> Double -> Double -> Double -> Shape
_Rectangle x y w h = \_Circle _Rectangle -> _Rectangle x y w h

exampleCircle :: Shape
exampleCircle = _Circle 2.0 1.4 4.5

exampleRectangle :: Shape
exampleRectangle = _Rectangle 1.3 3.1 10.3 7.7 

area :: Shape -> Double
area shape = shape
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

main :: IO ()
main = do
    print (area exampleCircle)
    print (area exampleRectangle)

The key is the new representation of the Shape type, which is the type of a higher-order function. In fact, if we squint we might recognize that the Shape type synonym:

type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

… looks an awful lot like a GADT-style definition for the Shape type:

{-# LANGUAGE GADTs #-}

data Shape where
    Circle    :: Double -> Double -> Double -> Shape
    Rectangle :: Double -> Double -> Double -> Double -> Shape

This is not a coincidence! Essentially, Böhm-Berarducci encoding models a type as a function that expects each “constructor” as a function argument that has the same type as that constructor. I put “constructor” in quotes since we never actually use a real constructor. Those function arguments are place-holders that will remain abstract until we attempt to “pattern match” on a value of type Shape.

In the area function we “pattern match” on Shape by supplying handlers instead of constructors. To make this explicit, let’s use equational reasoning to see what happens when we evaluate area exampleCircle:

area exampleCircle

-- Substitute the `area` function with its definition
= exampleCircle
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Substitute `exampleCircle` with its definition
= _Circle 2.0 1.4 4.5
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Substitute the `_Circle` function with its definition
= (\_Circle _Rectangle -> _Circle 2.0 1.4 4.5)
    (\x y r   -> pi * r ^ 2)
    (\x y w h -> w * h)

-- Evaluate the outer-most anonymous function
= (\x y r -> pi * r ^ 2) 2.0 1.4 4.5

-- Evaluate the anonymous function
= pi * 4.5 ^ 2

In other words, Church encoding / Böhm-Berarducci encoding both work by maintaining a fiction that eventually somebody will provide us the “real” constructors right up until we actually need them. Then when we “pattern match” on the value we pull a last-minute bait-and-switch and use each “handler” of the pattern match where the constructor would normally go and everything works out so that we don’t need the constructor after all. Church-encoding is sort of like the functional programming equivalent of “fake it until you make it”.

The same trick works for recursive data structures as well. For example, the way that we Böhm-Berarducci-encode this Haskell data structure:

data Tree = Node Int Tree Tree | Leaf

exampleTree :: Tree
exampleTree = Node 1 (Node 2 Leaf Leaf) (Node 3 Leaf Leaf)

preorder :: Tree -> [Int]
preorder tree = case tree of
    Node value left right -> value : preorder left ++ preorder right
    Leaf                  -> []

main :: IO ()
main = print (preorder exampleTree)

… is like this:

{-# LANGUAGE RankNTypes #-}

type Tree = forall tree
    .  (Int -> tree -> tree -> tree)  -- Node :: Int -> Tree -> Tree -> Tree
    -> tree                           -- Leaf :: Tree
    -> tree

_Node :: Int -> Tree -> Tree -> Tree
_Node value left right =
    \_Node _Leaf -> _Node value (left _Node _Leaf) (right _Node _Leaf)

_Leaf :: Tree
_Leaf = \_Node _Leaf -> _Leaf

exampleTree :: Tree
exampleTree = _Node 1 (_Node 2 _Leaf _Leaf) (_Node 3 _Leaf _Leaf)

preorder :: Tree -> [Int]
preorder tree = tree
    (\value left right -> value : left ++ right)
    []

main :: IO ()
main = print (preorder exampleTree)

This time the translation is not quite as mechanical as before, due to the introduction of recursion. In particular, two differences stand out.

First, the way we encode the _Node constructor is not as straightforward as we thought:

_Node :: Int -> Tree -> Tree -> Tree
_Node value left right =
    \_Node _Leaf -> _Node value (left _Node _Leaf) (right _Node _Leaf)

This is because we need to thread through the _Node / _Leaf function arguments through to the node’s children.

Second, the way we consume the Tree is also different. Compare the original code:

preorder :: Tree -> [Int]
preorder tree = case tree of
    Node value left right -> value : preorder left ++ preorder right
    Leaf                  -> []

… to the Böhm-Berarducci-encoded version:

preorder :: Tree -> [Int]
preorder tree = tree
    (\value left right -> value : left ++ right)
    []

The latter version doesn’t require the preorder function to recursively call itself. The preorder function is performing a task that is morally recursive but the preorder function is, strictly speaking, not recursive at all.

In fact, if we look at the Böhm-Berarducci-encoded solution closely we see that we never use recursion anywhere within the code! There are no recursive datatypes and there are also no recursive functions, yet somehow we still managed to encode a recursive data type and recursive functions on that type. This is what I mean when I say that Church encoding / Böhm-Berarducci encoding let you encode recursion in a language that does not natively support recursion. Our code would work just fine in a recursion-free subset of Haskell!

For example, Dhall is a real example of a language that does not natively support recursion and Dhall uses this same trick to model recursive data types and recursive functions:

That post goes into more detail about the algorithm for Böhm-Berarducci-encoding Haskell types, so you might find that post useful if the above examples were not sufficiently intuitive or clear.

Visitor pattern

The visitor pattern is a special case of Church encoding / Böhm Berarducci encoding. I’m not going to provide a standalone explanation of the visitor pattern since the linked Wikipedia page already does that. This section will focus on explaining the correspondence between Church encoding / Böhm-Berarducci encoding and the visitor pattern.

The exact correspondence goes like this. Given:

  • a Church-encoded / Böhm-Berarducci-encoded type T

    e.g. Shape in the first example

  • … with constructors C₀, C₁, C₂, …

    e.g. Circle, Rectangle

  • … and values of type T named v₀, v₁, v₂, …

    e.g. exampleCircle, exampleRectangle

… then the correspondence (using terminology from the Wikipedia article) is:

  • The “element” class corresponds to the type T

    e.g. Shape

  • A “concrete element” (i.e. an object of the “element” class) corresponds to a constructor for the type T

    e.g. Circle, Rectangle

    The accept method of the element selects which handler from the visitor to use, in the same way that our Church-encoded constructors would select one handler (named after the matching constructor) out of all the handler functions supplied to them.

    _Circle :: Double -> Double -> Double -> Shape
    _Circle x y r = \_Circle _Rectangle -> _Circle x y r
    
    _Rectangle :: Double -> Double -> Double -> Double -> Shape
    _Rectangle x y w h = \_Circle _Rectangle -> _Rectangle x y w h
  • A “visitor” class corresponds to the type of a function that pattern matches on a value of type T

    Specifically, a “visitor” class is equivalent to the following Haskell type:

    T -> IO ()

    This is more restrictive than Böhm-Berarducci encoding, which permits pattern matches that return any type of value, like our area function, which returns a Double. In other words, Böhm-Berarducci encoding is not limited to just performing side effects when “visiting” constructors.

    (Edit: Travis Brown notes that the visitor pattern is not restricted to performing side effects. This might be an idiosyncracy of how Wikipedia presents the design pattern)

  • A “concrete visitor” (i.e. an object of the “visitor” class) corresponds to a function that “pattern matches” on a value of type T

    e.g. area

    … where each overloaded visit method of the visitor corresponds to a branch of our Church-encoded “pattern match”:

    area :: Shape -> Double
    area shape = shape
        (\x y r   -> pi * r ^ 2)
        (\x y w h -> w * h)
  • The “client” corresponds to a value of type T

    e.g. exampleCircle, exampleRectangle:

    exampleCircle :: Shape
    exampleCircle = _Circle 2.0 1.4 4.5
    
    exampleRectangle :: Shape
    exampleRectangle = _Rectangle 1.3 3.1 10.3 7.7 

    The Wikipedia explanation of the visitor pattern adds the wrinkle that the client can represent more than one such value. In my opinion, what the visitor pattern should say is that the client can be a recursive value which may have self-similar children (like our example Tree). This small change would improve the correspondence between the visitor pattern and Church-encoding.

Limitations of Böhm-Berarducci encoding

Church encoding works in any untyped language, but Böhm-Berarducci encoding does not work in all typed languages!

Specifically, Böhm-Berarducci only works in general for languages that support polymorphic types (a.k.a. generic programming). This is because the type of a Böhm-Berarducci-encoded value is a polymorphic type:

type Shape = forall shape
    .  (Double -> Double -> Double -> shape)
    -> (Double -> Double -> Double -> Double -> shape)
    -> shape

… but such a type cannot be represented in a language that lacks polymorphism. So what the visitor pattern commonly does to work around this limitation is to pick a specific result type, and since there isn’t a one-size-fits-all type, they’ll usually make the result a side effect, as if we had specialized the universally quantified type to IO ():

type Shape =
    .  (Double -> Double -> Double -> IO ())
    -> (Double -> Double -> Double -> Double -> IO ())
    -> IO ()

This is why Go has great difficulty modeling sum types accurately, because Go does not support polymorphism (“generics”) and therefore Böhm-Berarducci encoding does not work in general for introducing sum types in Go. This is also why people with programming language theory backgrounds make a bigger deal out of Go’s lack of generics than Go’s lack of sum types, because if Go had generics then people could work around the lack of sum types using a Böhm-Berarducci encoding.

Conclusions

Hopefully this gives you a better idea of what Church encoding and Böhm-Berarducci encoding are and how they relate to the visitor pattern.

In my opinion, Böhm-Berarducci encoding is a bigger deal in statically-typed languages because it provides a way to introduce sum types and recursion into a language in a type-safe way that makes invalid states unrepresentable. Conversely, Church encoding is not as big of a deal in dynamically-typed languages because a Church-encoded type is still vulnerable to runtime exceptions.