Friday, August 13, 2021

Namespaced De Bruijn indices

Namespaced De Bruijn indices

In this post I share a trick I use for dealing with bound variables in Dhall that I thought might be useful for other interpreted programming languages. I have no idea if this trick has been introduced before but if it has then just let me know and I’ll acknowledge any prior art here.

Edit: Todd Wilson points out that Mark-Oliver Stehr’s CINNI originally introduced this idea.

The brief explanation of the trick is: instead of choosing between a named or a nameless representation for bound variables you can get the best of both worlds by namespacing De Bruijn indices by variable names. This simplifies the implementation and in some cases improves the end user’s experience.

The rest of this post is a longer explanation of the above summary, starting with an explanation of the trick and followed by a review of the benefits of this approach.

Background

I’d like to first explain what I mean by “named” and “nameless” representations before I explain the trick.

A named representation of the lambda calculus syntax tree typically looks something like this:

data Syntax
    = Variable String
    | Lambda String Syntax
    | Apply Syntax Syntax

For example, if the user wrote the following Haskell-like code:

\f -> \x -> f x

… then that would correspond to this syntax tree:

example :: Syntax
example = Lambda "f" (Lambda "x" (Apply (Variable "f") (Variable "x")))

The named representation has the nice property that it preserves the original variable names … well, sort of. This representation definitely preserves the variable names when you initially parse the code into the syntax tree, but if you β-reduce an expression you can potentially run into problems.

For example, consider this expression:

\x -> (\y -> \x -> y) x

… which corresponds to this syntax tree:

Lambda "x" (Apply (Lambda "y" (Lambda "x" (Variable "y"))) (Variable "x"))

If you try to β-reduce (\y -> \x -> y) x without renaming any variables then you get the following incorrect result:

\x -> \x -> x

This bug is known as “name capture” and capture-avoiding substitution requires renaming one of the variables named x so that the inner x does not shadow the outer x. For example, we could fix the problem by renaming the outer x to x1 like this:

\x1 -> \x -> x1

A nameless representation tries to work around these name capture issues by replacing the variable names with numeric indices (known as De Bruijn indices):

data Syntax
    = Variable Int
    | Lambda Syntax
    | Apply Syntax Syntax

For example, code like this:

\f -> \x -> f x

… corresponds to this nameless representation:

example :: Syntax
example = Lambda (Lambda (Apply (Variable 1) (Variable 0)))

Carefully note that the Lambda constructor now has no field for the bound variable name, so it’s as if the user had instead written:

\ -> \ -> @1 @0

… using @n to represent the variable whose De Bruijn index is n.

The numeric De Bruijn indices refer to bound variables. Specifically, the numeric index 0 refers to the “closest” or “innermost” variable bound by a lambda:

--                This 0 index …
--                ↓
    \ -> \ -> @1 @0
--       ↑ … refers to the variable bound by this lambda

… and incrementing the index moves to the next outermost lambda:

--             This 1 index …
--             ↓
    \ -> \ -> @1 @0
--  ↑ … refers to the variable bound by this lambda

De Bruijn indices avoid name collisions between bound variables, but they require you to do additional work if you wish to preserve the original variable names. There are several ways to do so, and I’ll present my preferred approach.

The trick - Part 1

We can get the best of both worlds by combining the named and nameless representations into a hybrid representation like this:

data Syntax
    = Variable String Int
    | Lambda String Syntax
    | Apply Syntax Syntax

I call this representation “namespaced De Bruijn indices”.

This is almost the exact same as our named representation, except that we have now added an Int field to the Variable constructor. This Int field is morally the same as the De Bruijn index in the nameless representation, except that this time the De Bruijn index is “namespaced” to a specific variable name.

The easiest way to explain this is with a few examples.

The following expression:

\x -> \y -> \x -> x@0

… corresponds to this syntax tree:

Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 0)))

… and this curried function returns the third function argument:

--                    This …
--                    ↓
    \x -> \y -> \x -> x@0
--               ↑ … refers to this bound variable

… because that is the innermost bound variable named x.

Similarly, the following expression:

\x -> \y -> \x -> y@0

… corresponds to this syntax tree:

Lambda "x" (Lambda "y" (Lambda "x" (Variable "y" 0)))

… which returns the second function argument:

--                    This …
--                    ↓
    \x -> \y -> \x -> y@0
--         ↑ … refers to this bound variable

… because that is the innermost bound variable named y.

Carefully note that our variable still has a De Bruijn index of 0, but we ignore the innermost bound variable named x because we also pair our De Bruijn index with name of the variable we are referring to (y) so we only count bound variables named y when resolving the De Bruijn index.

Finally, the following expression:

\x -> \y -> \x -> x@1

… corresponds to this syntax tree:

Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 1)))

… which returns the first function argument:

--                    This …
--                    ↓
    \x -> \y -> \x -> x@1
--   ↑ … refers to this bound variable

The De Bruijn index is 1, which means that it refers to the second innermost (0-indexed) bound variable named x.

Notice how this representation lets us refer to shadowed variables by their index. These De Bruijn indices are not an internal implementation detail, but are actually available to the user as part of the surface syntax of the language.

However, we want to avoid littering the code with these De Bruijn indices, which brings us to the second part of the trick.

The trick - Part 2

The next step is to add syntactic sugar to the language by allowing users to omit the index in the source code, which defaults the index to 0. This means that an expression that never references shadowed variables never needs to specify a De Bruijn index.

For example, instead of writing this:

\x -> \y -> \x -> x@0

… we can elide the index to simplify the code to:

\x -> \y -> \x -> x

… which will still parse as:

Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 0)))

Similarly, we can simplify this:

\x -> \y -> \x -> y@0

… to this:

\x -> \y -> \x -> y

… which will still parse as:

Lambda "x" (Lambda "y" (Lambda "x" (Variable "y" 0)))

However, we cannot use this syntactic sugar to simplify the final example:

\x -> \y -> \x -> x@1

… since the index is non-zero. Any code that references a shadowed variable still needs to use an explicit De Bruijn index to do so.

Vice versa, we also omit zero indices when pretty-printing code. When we pretty-print this syntax tree:

Lambda "x" (Lambda "y" (Lambda "x" (Variable "x" 0)))

… we don’t include the index:

\x -> \y -> \x -> x

This syntactic sugar ensures that most users do not need to be aware that indices exist at all when writing code. The user only encounters the indices in two scenarios:

  • The user wishes to explicitly reference a shadowed variable

    For example, in the following expression:

    \x -> \y -> \x -> x@1

    … the user might prefer to use the built-in language support for disambiguating variables of the same name rather than renaming one of the two variables named x.

  • The indices appear in a β-reduced result

    For example, this expression has no user-visible De Bruijn indices:

    \x -> (\y -> \x -> y) x

    … but if you β-reduce the expression (I’ll cover how in the Appendix) and pretty-print the β-reduced expression then the result will introduce a non-zero De Bruijn index to disambiguate the two variables named x:

    \x -> \x -> x@1

In fact, the latter scenario is the reason I originally adopted this trick: I wanted to be able to display β-reduced functions to the end user while preserving the original variable names as much as possible.

Note that De Bruijn indices don’t appear when a β-reduced expression does not reference any shadowed variables. For example, if you β-reduce this expression:

(\f -> f f) (\x -> x)

… the result has no De Bruijn index (because the index is 0 and is therefore elided by the pretty-printer):

\x -> x

The trick - Part 3

One of the benefits of the traditional nameless representation using (non-namespaced) De Bruijn indices is that you get α-equivalence for free. Two nameless expressions are α-equivalent if they are syntactically identical. We can build upon this useful property to derive a compact algorithm for α-equivalence of “namespaced De Bruijn indices”.

The trick is to recognize that namespaced De Bruijn indices reduce to ordinary De Bruijn indices in the degenerate case when you rename all variables to the same name. I’ll call this renaming process “α-reduction”.

For example, if we α-reduce the following expression by renaming all of the: variables to _:

\x -> \y -> \x -> x@1

… then we get this result:

\_ -> \_ -> \_ -> _@2

See the Appendix for the α-reduction algorithm.

Equipped with α-reduction, then we can derive α-equivalence: two expressions are α-equivalent if their α-reduced forms are syntactically identical.

For example, this expression:

\x -> x

… and this expression:

\y -> y

… both α-reduce to:

\_ -> _

… so they are α-equivalent.

Benefits

There are a few benefits of using this trick that motivate me to use this in all of my interpreted languages:

  • This trick improves the readability of β-reduced functions

    β-reduced functions preserve the original variable names and this trick doesn’t suffer from the rename-related name pollution that plagues other capture-avoiding substitution algorithms. In particular, β-reduced expressions only display De Bruijn indices when absolutely necessary (if they reference a shadowed variable) and they otherwise use the original pristine variable names.

  • This trick simplifies the internal implementation

    You don’t need to maintain two separate syntax trees for a named and nameless representation. You can use the same syntax tree for both since any named syntax tree can be α-reduced to give the equivalent nameless syntax tree.

  • This trick enables userland support for referencing shadowed variables

    I know some people think that referencing shadowed variable names is a misfeature. However, I personally feel that resolving name collisions by adding ' or _ characters to the end of variable names is less principled than having language support for resolving name collisions using optional De Bruijn indices.

  • (Not shown) This trick can sometimes improve type errors

    To be precise, this trick improves the inferred types displayed in error messages when using explicit universal quantification.

    Type variables also have to avoid name collisions, so if you use the same namespaced De Bruijn representation for your types then you avoid polluting your inferred types and error messages with junk type variables like a14.

    This post doesn’t cover the equivalent type-level trick, but you can refer to the Dhall standard if you need an example of a language that uses this trick.

Conclusion

I believe that namespaced De Bruijn indices are most appropriate for languages that are (A) strongly normalizing (like Dhall) and (B) interpreted, because such languages tend to support pretty-printing β-reduced functions.

I think this trick is also useful to a lesser extent for all interpreted languages, if only because the implementation is (in my opinion) simpler and more elegant than other algorithms for capture-avoiding substitution (See the Appendix below).

On the other hand, compiled languages will likely not benefit much from this trick since they typically have no need to preserve the original variable names and they also will use an intermediate representation that is very different from the concrete syntax tree.

Appendix - Implementation

This section provides Haskell code specifying how to α-reduce and β-reduce a syntax tree that uses namespaced De Bruijn indices.

This reference implementation is not the most efficient implementation, but it’s the simplest one which I use for pedagogical purposes. If you’re interested in efficiency then check out my Grace project, which mixes this trick with the more efficient normalization-by-evaluation algorithm.

I also don’t include code for the parser or pretty-printer, because the only interesting part is the syntactic sugar for handling variables with a De Bruijn index of 0. Again, check out Grace if you want to refer to a more complete implementation.

-- | Syntax tree
data Syntax
    = Variable String Int
    | Lambda String Syntax
    | Apply Syntax Syntax
    deriving (Eq, Show)

{-| Increase the index of all bound variables matching the given variable name

    This is modified from the Shifting definition in Pierce's \"Types and
    Programming Languages\" by adding an additional argument for the namespace
    to shift
-}
shift
    :: Int
    -- ^ The amount to shift by
    -> String
    -- ^ The variable name to match (a.k.a. the namespace)
    -> Int
    -- ^ The minimum bound for which indices to shift
    -> Syntax
    -- ^ The expression to shift
    -> Syntax
shift offset namespace minIndex syntax =
    case syntax of
        Variable name index -> Variable name index'
          where
            index'
                | name == namespace && minIndex <= index = index + offset
                | otherwise                              = index

        Lambda name body -> Lambda name body'
          where
            minIndex'
                | name == namespace = minIndex + 1
                | otherwise         = minIndex

            body' = shift offset namespace minIndex' body

        Apply function argument -> Apply function' argument'
          where
            function' = shift offset namespace minIndex function

            argument' = shift offset namespace minIndex argument

{-| Substitute the given variable name and index with an expression

    This is modified from the Substitution definition in Pierce's \"Types and
    Programming Languages\" by adding an additional argument for the variable
    index
-}
substitute
    :: Syntax
    -- ^ The expression to substitute into
    -> String
    -- ^ The name of the variable to replace
    -> Int
    -- ^ The index of the variable to replace
    -> Syntax
    -- ^ The expression to substitute in place of the given variable
    -> Syntax
substitute expression name index replacement =
    case expression of
        Variable name' index'
            | name == name' && index == index' -> replacement
            | otherwise                        -> Variable name' index'

        Lambda name' body -> Lambda name' body'
          where
            index'
                | name == name' = index + 1
                | otherwise     = index

            shiftedBody = shift 1 name' 0 replacement

            body' = substitute body name index' shiftedBody

        Apply function argument -> Apply function' argument'
          where
            function' = substitute function name index replacement

            argument' = substitute argument name index replacement

-- | β-reduce an expression
betaReduce :: Syntax -> Syntax
betaReduce syntax =
    case syntax of
        Variable name index -> Variable name index

        Lambda name body -> Lambda name body'
          where
            body' = betaReduce body

        Apply function argument ->
            case function' of
                Lambda name body -> body'
                  where
                    shiftedArgument = shift 1 name 0 argument

                    substitutedBody = substitute body name 0 shiftedArgument

                    unshiftedBody = shift (-1) name 0 substitutedBody

                    body' = betaReduce unshiftedBody

                _ -> Apply function' argument'
          where
            function' = betaReduce function

            argument' = betaReduce argument

-- | α-reduce an expression
alphaReduce :: Syntax -> Syntax
alphaReduce syntax =
    case syntax of
        Variable name index -> Variable name index

        Lambda name body -> Lambda "_" body'
          where
            shiftedBody = shift 1 "_" 0 body

            substitutedBody = substitute shiftedBody name 0 (Variable "_" 0)

            unshiftedBody = shift (-1) name 0 substitutedBody

            body' = alphaReduce unshiftedBody

        Apply function argument -> Apply function' argument'
          where
            function' = alphaReduce function

            argument' = alphaReduce argument

-- | Returns `True` if the two input expressions are α-equivalent
alphaEquivalent :: Syntax -> Syntax -> Bool
alphaEquivalent left right = alphaReduce left == alphaReduce right

Appendix - History

I actually first introduced this feature in Morte, not Dhall. The idea originated from the discussion on this issue.

3 comments:

  1. As for prior art, check out the work of Mark-Oliver Stehr and CINNI from the early 2000's.

    ReplyDelete