In this post I share a trick I use for dealing with bound variables in Dhall that I thought might be useful for other interpreted programming languages. I have no idea if this trick has been introduced before but if it has then just let me know and I’ll acknowledge any prior art here.
Edit: Todd Wilson points out that Mark-Oliver Stehr’s CINNI originally introduced this idea.
The brief explanation of the trick is: instead of choosing between a named or a nameless representation for bound variables you can get the best of both worlds by namespacing De Bruijn indices by variable names. This simplifies the implementation and in some cases improves the end user’s experience.
The rest of this post is a longer explanation of the above summary, starting with an explanation of the trick and followed by a review of the benefits of this approach.
Background
I’d like to first explain what I mean by “named” and “nameless” representations before I explain the trick.
A named representation of the lambda calculus syntax tree typically looks something like this:
data Syntax
= Variable String
| Lambda String Syntax
| Apply Syntax Syntax
For example, if the user wrote the following Haskell-like code:
-> \x -> f x \f
… then that would correspond to this syntax tree:
example :: Syntax
= Lambda "f" (Lambda "x" (Apply (Variable "f") (Variable "x"))) example
The named representation has the nice property that it preserves the original variable names … well, sort of. This representation definitely preserves the variable names when you initially parse the code into the syntax tree, but if you β-reduce an expression you can potentially run into problems.
For example, consider this expression:
-> (\y -> \x -> y) x \x
… which corresponds to this syntax tree:
Lambda "x" (Apply (Lambda "y" (Lambda "x" (Variable "y"))) (Variable "x"))
If you try to β-reduce (\y -> \x -> y) x
without renaming any variables then you get the following incorrect result:
-> \x -> x \x
This bug is known as “name capture” and capture-avoiding substitution requires renaming one of the variables named x
so that the inner x
does not shadow the outer x
. For example, we could fix the problem by renaming the outer x
to x1
like this:
-> \x -> x1 \x1
A nameless representation tries to work around these name capture issues by replacing the variable names with numeric indices (known as De Bruijn indices):
data Syntax
= Variable Int
| Lambda Syntax
| Apply Syntax Syntax
For example, code like this:
-> \x -> f x \f
… corresponds to this nameless representation:
example :: Syntax
= Lambda (Lambda (Apply (Variable 1) (Variable 0))) example
Carefully note that the Lambda
constructor now has no field for the bound variable name, so it’s as if the user had instead written:
-> \ -> @1 @0 \
… using @n
to represent the variable whose De Bruijn index is n
.
The numeric De Bruijn indices refer to bound variables. Specifically, the numeric index 0
refers to the “closest” or “innermost” variable bound by a lambda:
-- This 0 index …
-- ↓
-> \ -> @1 @0
\ -- ↑ … refers to the variable bound by this lambda
… and incrementing the index moves to the next outermost lambda:
-- This 1 index …
-- ↓
-> \ -> @1 @0
\ -- ↑ … refers to the variable bound by this lambda
De Bruijn indices avoid name collisions between bound variables, but they require you to do additional work if you wish to preserve the original variable names. There are several ways to do so, and I’ll present my preferred approach.
The trick - Part 1
We can get the best of both worlds by combining the named and nameless representations into a hybrid representation like this:
data Syntax
= Variable String Int
| Lambda String Syntax
| Apply Syntax Syntax
I call this representation “namespaced De Bruijn indices”.
This is almost the exact same as our named representation, except that we have now added an Int
field to the Variable
constructor. This Int
field is morally the same as the De Bruijn index in the nameless representation, except that this time the De Bruijn index is “namespaced” to a specific variable name.
The easiest way to explain this is with a few examples.
The following expression:
-> \y -> \x -> x@0 \x
… corresponds to this syntax tree:
Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 0)))
… and this curried function returns the third function argument:
-- This …
-- ↓
-> \y -> \x -> x@0
\x -- ↑ … refers to this bound variable
… because that is the innermost bound variable named x
.
Similarly, the following expression:
-> \y -> \x -> y@0 \x
… corresponds to this syntax tree:
Lambda "x" (Lambda "y" (Lambda "x" (Variable "y" 0)))
… which returns the second function argument:
-- This …
-- ↓
-> \y -> \x -> y@0
\x -- ↑ … refers to this bound variable
… because that is the innermost bound variable named y
.
Carefully note that our variable still has a De Bruijn index of 0, but we ignore the innermost bound variable named x
because we also pair our De Bruijn index with name of the variable we are referring to (y
) so we only count bound variables named y
when resolving the De Bruijn index.
Finally, the following expression:
-> \y -> \x -> x@1 \x
… corresponds to this syntax tree:
Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 1)))
… which returns the first function argument:
-- This …
-- ↓
-> \y -> \x -> x@1
\x -- ↑ … refers to this bound variable
The De Bruijn index is 1, which means that it refers to the second innermost (0-indexed) bound variable named x
.
Notice how this representation lets us refer to shadowed variables by their index. These De Bruijn indices are not an internal implementation detail, but are actually available to the user as part of the surface syntax of the language.
However, we want to avoid littering the code with these De Bruijn indices, which brings us to the second part of the trick.
The trick - Part 2
The next step is to add syntactic sugar to the language by allowing users to omit the index in the source code, which defaults the index to 0
. This means that an expression that never references shadowed variables never needs to specify a De Bruijn index.
For example, instead of writing this:
-> \y -> \x -> x@0 \x
… we can elide the index to simplify the code to:
-> \y -> \x -> x \x
… which will still parse as:
Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 0)))
Similarly, we can simplify this:
-> \y -> \x -> y@0 \x
… to this:
-> \y -> \x -> y \x
… which will still parse as:
Lambda "x" (Lambda "y" (Lambda "x" (Variable "y" 0)))
However, we cannot use this syntactic sugar to simplify the final example:
-> \y -> \x -> x@1 \x
… since the index is non-zero. Any code that references a shadowed variable still needs to use an explicit De Bruijn index to do so.
Vice versa, we also omit zero indices when pretty-printing code. When we pretty-print this syntax tree:
Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 0)))
… we don’t include the index:
-> \y -> \x -> x \x
This syntactic sugar ensures that most users do not need to be aware that indices exist at all when writing code. The user only encounters the indices in two scenarios:
The user wishes to explicitly reference a shadowed variable
For example, in the following expression:
-> \y -> \x -> x@1 \x
… the user might prefer to use the built-in language support for disambiguating variables of the same name rather than renaming one of the two variables named
x
.The indices appear in a β-reduced result
For example, this expression has no user-visible De Bruijn indices:
-> (\y -> \x -> y) x \x
… but if you β-reduce the expression (I’ll cover how in the Appendix) and pretty-print the β-reduced expression then the result will introduce a non-zero De Bruijn index to disambiguate the two variables named
x
:-> \x -> x@1 \x
In fact, the latter scenario is the reason I originally adopted this trick: I wanted to be able to display β-reduced functions to the end user while preserving the original variable names as much as possible.
Note that De Bruijn indices don’t appear when a β-reduced expression does not reference any shadowed variables. For example, if you β-reduce this expression:
-> f f) (\x -> x) (\f
… the result has no De Bruijn index (because the index is 0 and is therefore elided by the pretty-printer):
-> x \x
The trick - Part 3
One of the benefits of the traditional nameless representation using (non-namespaced) De Bruijn indices is that you get α-equivalence for free. Two nameless expressions are α-equivalent if they are syntactically identical. We can build upon this useful property to derive a compact algorithm for α-equivalence of “namespaced De Bruijn indices”.
The trick is to recognize that namespaced De Bruijn indices reduce to ordinary De Bruijn indices in the degenerate case when you rename all variables to the same name. I’ll call this renaming process “α-reduction”.
For example, if we α-reduce the following expression by renaming all of the: variables to _
:
-> \y -> \x -> x@1 \x
… then we get this result:
-> \_ -> \_ -> _@2 \_
See the Appendix for the α-reduction algorithm.
Equipped with α-reduction, then we can derive α-equivalence: two expressions are α-equivalent if their α-reduced forms are syntactically identical.
For example, this expression:
-> x \x
… and this expression:
-> y \y
… both α-reduce to:
-> _ \_
… so they are α-equivalent.
Benefits
There are a few benefits of using this trick that motivate me to use this in all of my interpreted languages:
This trick improves the readability of β-reduced functions
β-reduced functions preserve the original variable names and this trick doesn’t suffer from the rename-related name pollution that plagues other capture-avoiding substitution algorithms. In particular, β-reduced expressions only display De Bruijn indices when absolutely necessary (if they reference a shadowed variable) and they otherwise use the original pristine variable names.
This trick simplifies the internal implementation
You don’t need to maintain two separate syntax trees for a named and nameless representation. You can use the same syntax tree for both since any named syntax tree can be α-reduced to give the equivalent nameless syntax tree.
This trick enables userland support for referencing shadowed variables
I know some people think that referencing shadowed variable names is a misfeature. However, I personally feel that resolving name collisions by adding
'
or_
characters to the end of variable names is less principled than having language support for resolving name collisions using optional De Bruijn indices.(Not shown) This trick can sometimes improve type errors
To be precise, this trick improves the inferred types displayed in error messages when using explicit universal quantification.
Type variables also have to avoid name collisions, so if you use the same namespaced De Bruijn representation for your types then you avoid polluting your inferred types and error messages with junk type variables like
a14
.This post doesn’t cover the equivalent type-level trick, but you can refer to the Dhall standard if you need an example of a language that uses this trick.
Conclusion
I believe that namespaced De Bruijn indices are most appropriate for languages that are (A) strongly normalizing (like Dhall) and (B) interpreted, because such languages tend to support pretty-printing β-reduced functions.
I think this trick is also useful to a lesser extent for all interpreted languages, if only because the implementation is (in my opinion) simpler and more elegant than other algorithms for capture-avoiding substitution (See the Appendix below).
On the other hand, compiled languages will likely not benefit much from this trick since they typically have no need to preserve the original variable names and they also will use an intermediate representation that is very different from the concrete syntax tree.
Appendix - Implementation
This section provides Haskell code specifying how to α-reduce and β-reduce a syntax tree that uses namespaced De Bruijn indices.
This reference implementation is not the most efficient implementation, but it’s the simplest one which I use for pedagogical purposes. If you’re interested in efficiency then check out my Grace project, which mixes this trick with the more efficient normalization-by-evaluation algorithm.
I also don’t include code for the parser or pretty-printer, because the only interesting part is the syntactic sugar for handling variables with a De Bruijn index of 0
. Again, check out Grace if you want to refer to a more complete implementation.
-- | Syntax tree
data Syntax
= Variable String Int
| Lambda String Syntax
| Apply Syntax Syntax
deriving (Eq, Show)
{-| Increase the index of all bound variables matching the given variable name
This is modified from the Shifting definition in Pierce's \"Types and
Programming Languages\" by adding an additional argument for the namespace
to shift
-}
shift :: Int
-- ^ The amount to shift by
-> String
-- ^ The variable name to match (a.k.a. the namespace)
-> Int
-- ^ The minimum bound for which indices to shift
-> Syntax
-- ^ The expression to shift
-> Syntax
=
shift offset namespace minIndex syntax case syntax of
Variable name index -> Variable name index'
where
index'| name == namespace && minIndex <= index = index + offset
| otherwise = index
Lambda name body -> Lambda name body'
where
minIndex'| name == namespace = minIndex + 1
| otherwise = minIndex
= shift offset namespace minIndex' body
body'
Apply function argument -> Apply function' argument'
where
= shift offset namespace minIndex function
function'
= shift offset namespace minIndex argument
argument'
{-| Substitute the given variable name and index with an expression
This is modified from the Substitution definition in Pierce's \"Types and
Programming Languages\" by adding an additional argument for the variable
index
-}
substitute :: Syntax
-- ^ The expression to substitute into
-> String
-- ^ The name of the variable to replace
-> Int
-- ^ The index of the variable to replace
-> Syntax
-- ^ The expression to substitute in place of the given variable
-> Syntax
index replacement =
substitute expression name case expression of
Variable name' index'
| name == name' && index == index' -> replacement
| otherwise -> Variable name' index'
Lambda name' body -> Lambda name' body'
where
index'| name == name' = index + 1
| otherwise = index
= shift 1 name' 0 replacement
shiftedBody
= substitute body name index' shiftedBody
body'
Apply function argument -> Apply function' argument'
where
= substitute function name index replacement
function'
= substitute argument name index replacement
argument'
-- | β-reduce an expression
betaReduce :: Syntax -> Syntax
=
betaReduce syntax case syntax of
Variable name index -> Variable name index
Lambda name body -> Lambda name body'
where
= betaReduce body
body'
Apply function argument ->
case function' of
Lambda name body -> body'
where
= shift 1 name 0 argument
shiftedArgument
= substitute body name 0 shiftedArgument
substitutedBody
= shift (-1) name 0 substitutedBody
unshiftedBody
= betaReduce unshiftedBody
body'
-> Apply function' argument'
_ where
= betaReduce function
function'
= betaReduce argument
argument'
-- | α-reduce an expression
alphaReduce :: Syntax -> Syntax
=
alphaReduce syntax case syntax of
Variable name index -> Variable name index
Lambda name body -> Lambda "_" body'
where
= shift 1 "_" 0 body
shiftedBody
= substitute shiftedBody name 0 (Variable "_" 0)
substitutedBody
= shift (-1) name 0 substitutedBody
unshiftedBody
= alphaReduce unshiftedBody
body'
Apply function argument -> Apply function' argument'
where
= alphaReduce function
function'
= alphaReduce argument
argument'
-- | Returns `True` if the two input expressions are α-equivalent
alphaEquivalent :: Syntax -> Syntax -> Bool
= alphaReduce left == alphaReduce right alphaEquivalent left right
Appendix - History
I actually first introduced this feature in Morte, not Dhall. The idea originated from the discussion on this issue.
As for prior art, check out the work of Mark-Oliver Stehr and CINNI from the early 2000's.
ReplyDeleteThank you! I updated the post to acknowledge CINNI
DeleteThis is really nice, thanks!
ReplyDelete