Tuesday, November 10, 2020

Pretty-print syntax trees with this one simple trick

prettyprint

I want to share a simple trick for pretty-printing syntax trees with the correct precedence that I’ve been using in my own interpreter projects. I believe this trick has been shared before, but I don’t know what the name of it is, so I wasn’t able to easily search for prior art. If somebody knows where this idea originated from then I can update this post to credit the original.

To illustrate the trick, I’d like to begin from the following Haskell type for a lambda calculus expression:

data Expression
    = Lambda Text Expression
    | Application Expression Expression
    | Variable Text

… and a matching grammar for parsing such an expression (using the same notation that the happy package uses):

Expression
  : '\\' label '->' Expression                { Lambda $2 $4      }
  | ApplicationExpression                     { $1                }

ApplicationExpression
  : ApplicationExpression VariableExpression  { Application $1 $2 }
  | VariableExpression                        { $1                }

VariableExpression
  : label                                     { Variable $1       }
  | '(' Expression ')'                        { $2                }

The trick

We can pretty-print that Expression type with the correct precedence without having to keep track of any precedence level. Instead, all we have to do is to write the pretty-printer to match the shape of the grammar, like this:

prettyExpression :: Expression -> Text
prettyExpression (Lambda x e) =
    "\\" <> x <> " -> " <> prettyExpression e
prettyExpression other =
    prettyApplicationExpression other

prettyApplicationExpression :: Expression -> Text
prettyApplicationExpression (Application f x) =
    prettyApplicationExpression f <> " " <> prettyVariableExpression x
prettyApplicationExpression other =
    prettyVariableExpression other

prettyVariableExpression :: Expression -> Text
prettyVariableExpression (Variable x) =
    x
prettyVariableExpression other =
    "(" <> prettyExpression other <> ")"

The pretty-printing logic closely follows the grammar

  • Create one pretty… function for each nonterminal symbol in the grammar

    For example, since we have a nonterminal symbol named ApplicationExpression, we create a matching pretty-printing function named prettyApplicationExpression.

  • Each pretty… function matches one pattern per alternative in the grammar

    In other words, if the production rule for ApplicationExpression has two alternatives, then the prettyApplicationExpression matches two patterns corresponding to each of the two alternatives, respectively.

  • Pretty-print non-terminal symbols using the matching pretty-printing function

    For example, we pretty-print a function’s argument using prettyVariableExpression since we used the VariableExpression nonterminal symbol to parse that argument.

  • Pretty-print terminal symbols in the obvious way

    … with any necessary whitespace

That’s the entire trick! If you follow those simple rules then the prettyprinter will automatically respect precedence, inserting parentheses in the right places and eliding parentheses when they are not necessary.

Conclusion

There is one major downside to this trick: if you add a new constructor to your syntax tree and you forget to update the pretty-printer then your pretty-printer will infinitely loop. This is pretty annoying as you might imagine.

The main upside to this trick is that pretty-printer logic is very simple to write, so mechanical that you could probably automate it (although I’m not sure if somebody has done so, yet).

3 comments:

  1. I remember this trick being in Pierce's types and programming language, but I'm not sure that's where it is first used. I've been bit by the looping pretty printer a few times, too — it really is annoying!

    ReplyDelete
  2. Arguably a BNF grammar encodes all the ways of generating strings in a given language. So, in a way it is unsurprising that you can construct a string of a language given the proof (i.e. the parse tree) -- and that the generator is fairly mechanically derivable from the BNF.

    ReplyDelete
  3. Most happy parsers use operator precedence declarations to simplify expression parsing, i.e., those that involve usual arithmetic operations. Is there a way to "extend" this trick to cover those cases as well? After all, I believe that's the major source of the need for parens in the first place.

    ReplyDelete