Sunday, January 31, 2021

Dynamic type errors lack relevance

Dynamic type errors lack relevance

Proponents of statically typed languages commonly motivate types as a way to safely detect bugs ahead of time. For example, consider the following Python program that attempts to increment a number stored in counter.txt:

# ./increment.py

with open('counter.txt', 'r') as handle:
    x = handle.readline()

with open('counter.txt', 'w') as handle:
    handle.write(int(x) + 1)

This program contains a type error, but by the time we find out it’s too late: our program will have already wiped the contents of counter.txt by opening the file as a writable handle:

$ echo -n '0' > ./counter.txt

$ cat counter.txt
0

$ python increment.py
Traceback (most recent call last):
  File "increment.py", line 5, in <module>
    handle.write(int(x) + 1)
TypeError: expected a string or other character buffer object

$ cat counter.txt  # The contents of the file were lost

Defenders of dynamically typed languages sometimes counter that these pitfalls do not matter when runtime failures are mostly harmless. If you want to find errors in your program, just run the program!

As an extreme example, Nix is a purely functional language with a dynamic type system, and you can safely interpret a Nix program ahead of time to detect errors since Nix evaluation has no side effects1. Consequently, Nix proponents sometimes reason that these dynamic type errors are functionally indistinguishable from static type errors thanks to Nix’s purity.

However, dynamic types are not a substitute for static types, even in a purely functional language like Nix. To see why, consider the following Nix expression, which attempts to render a structured value as command-line options:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
      ];
    }

The intention was to produce this result:

[ "--option" "max-jobs=5" "--option" "cores=4" "--option" "fallback=true" ]

… but we actually get a dynamic type error when we interpret the expression:

$ nix-instantiate --eval options.nix --strict
error: evaluation aborted with the following error message: 'generators.mkValueStringDefault: functions not supported: <λ>'

This error message is not very helpful, and it’s not due to a lack of effort, funding, or attention. This sort of poor user experience is inherent to any dynamic type system.

The fundamental issue is that in a dynamically typed language you cannot explain errors to the user in terms of the source code they wrote. In other words, dynamic type errors commonly fail to be relevant to the user.

For example, if Nix had a typical static type system, then the diagnostic might have looked something like this:

# ./options.nix

let
  pkgs = import <nixpkgs> { };

  enable = option: "${option}=true";

  disable = option: "${option}=false";

in
  pkgs.lib.cli.toGNUCommandLine { }
    { option = [
        "max-jobs=5"
        "cores=4"
        enable "fallback"
#       ~~~~~~
#       This element of the list is not a string
      ];
    }

This sort of diagnostic helps us more easily discern that we forgot to parenthesize (enable "fallback"), so the enable function is treated as another list element.

In a dynamic type system, type errors can potentially be far removed from the code that the user wrote. From Nix’s point of view, the actual error is that somewhere in the middle of interpretation it is trying to apply a mkValueStringDefault utility function to the user’s exclaim function:

mkValueStringDefault enable

… but by that point the Nix interpreter is no longer “thinking” in terms of the original program the user wrote, so any interpreter diagnostics will have difficulty explaining the error in terms that the user can understand. For example:

  • In the middle of interpretation any offending subexpressions are abstract syntax trees, not source code

  • Some of these abstract syntax trees may be functions or closures that cannot be (easily) displayed to the user

    We see this above where the error message is unable to render the enable function so it falls back to displaying <λ>.

  • Intermediate evaluation results might not correspond to the source code at all

    For example, the user might not understand where mkValueStringDefault is originating from in the absence of a stack trace.

  • Even if we could trace subexpressions to their original source code the user still might not be able to work backwards from the dynamic type error to the real problem.

    In other words, even if we showed the user the call site for the mkValueStringDefault function they still wouldn’t necessarily understand why exclaim is the function argument.

In fact, the example error message came out better than I expected. The reason why is because somebody took the time to add a custom error message to the mkValueStringDefault utility instead of falling back on the interpreter throwing a dynamic type error:

  mkValueStringDefault = {}: v: with builtins;
    let err = t: v: abort
          ("generators.mkValueStringDefault: " +
           "${t} not supported: ${toPretty {} v}");
    in

Had they not done so then the error message would have been even further disconnected from the user’s experience. This only reinforces that the relevance of error messages is inversely proportional to the extent to which we avail ourselves of the dynamic type system.

This is why I prefer to lean on static type systems as much as possible to detect errors, because they tend to do a better job of “explaining” what went wrong than dynamic type systems.

Note: The criticisms in this post also apply to exceptions in general (where you can view dynamic types as a special case of exceptions auto-generated by the interpreter). Exceptions also need to be supplemented by stack traces, logging, or debuggers in order to improve their relevance.


  1. Technically, Nix evaluation can trigger builds via “import from derivation”. However, with appropriate sandboxing even builds are mostly harmless. Either way, just assume for the purpose of discussion that Nix evaluation is safe. After all, any unsafety in evaluation only makes the case for static types even stronger.↩︎

5 comments:

  1. There are lots of valid points here, but the problem I have with this argument is that I think it ignores the compensating factors that always need to be taken into account.

    When I compare my past experience of debugging runtime errors in Haskell programs vs Python programs, for instance, the comparison overwhelmingly favours Python. With Haskell, an error in usage of a SQLite database left me with something like:

    SQLite: error

    (From memory, this was years ago)

    I then had to spend a lot of energy either guessing what the error was, or adding additional code to catch specific errors and try to get more information. I just tried again using modern Haskell, and I now get a much more informative error message - but no stack trace. I have to guess where the error is coming from.

    When my Python apps crash, I get a long and detailed stack trace that shows me a huge amount of context, and makes it tens if not hundreds or thousands or times easier to debug - and I get this **by default**, with no extra work. The amount of relevant contextual information provided when things go wrong (at runtime) is one of the very things that makes Python more compelling to me.

    In addition, if I can reproduce the error, I can pop an Python REPL (`import IPython; IPython.embed()`) into the exact places I need, and explore the state of the program interactively to discover the problem.

    I think the fundamental problem here is comparing language features not languages. https://lukeplant.me.uk/blog/posts/you-cant-compare-language-features-only-languages/ . The reality is that there are usually compensating factors for weaknesses in a language.

    ReplyDelete
    Replies
    1. I believe your criticism of Haskell reinforces my point; specifically, the point in the last paragraph that runtime exceptions in general lack relevance. Stack traces help, but they still don't compare to the quality of error messages that static type errors produce.

      Delete
  2. This sounds like action at a distance, but maybe better called as error at a distance. Errors can travel far before finally blowing up, leaving the developer puzzled with nearly no context.

    ReplyDelete
  3. Totally agree with this post.

    I would argue that this static control should be extended to sync exceptions, these should be validated by the compiler as well.

    ReplyDelete
  4. Perhaps the main point here is that computer languages are poorly implemented and have insufficient tools to help the programmer. "Dynamic typing" means "checked at runtime", but nobody can stop you from performing static program analysis before executing a program in a "dynamically typed" language. In that sense, a compiler not only compiles but is simultaneously a static analysis tool that you use without having to be aware of it. Interpreters could do the same checks, but normally don't because they want to support a REPL which can't check lines you haven't typed in yet.

    Just take the Python example.
    One error is that "handle.write()" expects a string but receives an integer, which is something that static analysis can reveal before running the program.
    The other error that static analysis reveals is that "int(x)" expects a string that can be converted to an integer. This is interesting because the only way to ensure that is by having a function "handle.readint()" that prevents the user from providing something that can not be converted to an int. Otherwise you are still stuck with a runtime error. Ok, beeping when you type a non-digit also indicates a runtime error, of course, but it doesn't terminate the program.

    Static analysis, including type inference, can be applied to any programming language in which case the distinction between "dynamic type" and "static type" disappears. However, the ability to add user-defined types to a language is where a difference can be made, because having richer types means more powerful type inference. Likewise, a language that distinguishes between a character and a string allows for more comprehensive static analysis than a language that doesn't. Finally, if you want to have fun doing "mental static analysis", take a look at "awk" and start with something like length(123+456) and length("123"+"456") 

    ReplyDelete