Friday, December 4, 2020

Recruiting for diversity is not lowering the bar

diversity

I believe there are multiple reasons why hiring processes should account for diversity, but I’d like to use this post to address a common counterargument that people raise in discussions about diversity.

The argument typically goes like this: “diversity is important, but we won’t lower the bar”. I believe that this line of reasoning is flawed in a few ways that I’d like to highlight.

The blub paradox

One common source of hiring bias is the inability to recognize or appreciate strengths greater than or different from one one’s own strengths. I would like to make an analogy to Paul Graham’s post on Beating the Averages:

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he’s looking down. Languages less powerful than Blub are obviously less powerful, because they’re missing some feature he’s used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn’t realize he’s looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

The above bias that people apply when evaluating programming languages also applies when evaluating candidates! People naturally prefer to hire people who share similar strengths, because they recognize and appreciate the value of those strengths. However, when confronted with strengths different from their own they may not interview for those strengths or even recognize them as strengths at all. Quite the opposite: they may view the candidate as “weird” or “not a culture fit” for not cultivating the “right” strengths.

The notion of a “hiring bar” presumes that candidates can all be ordered on a line and those on one side of some cutoff should not be hired. This linear metaphor reinforces our biases around hiring candidates whose strengths align with our own: “Let’s take what I’m good at, and aim to hire somebody who is at least 10% better at that”.

Overcoming adversity

“Diversity hires” can also be much stronger than you appreciate, even when you evaluate them according to strengths that you are trained to recognize.

This is because underrepresented minorities often have to swim upstream against institutionalized discrimination and work harder just to reach the same accomplishments and milestones as their majority peers. A minority candidate can outperform your initial impression of them if you can remove these discriminatory barriers within your workplace.

Recruiting diverse candidates does not lower the bar

This post explains the concept well:

First, the idea that reaching a more diverse talent pool requires lowering the bar on quality reflects an insidious form of prejudice: somehow the pool of talent is thought of as a monolithic block, the top of which is predominantly white and male; it is only by going farther down from the top that more diverse candidates can be found. In reality, companies that struggle to attract diverse candidates are probably not making the effort to look in the right places.

Recruiting underrepresented minorities does not dilute the talent pool, unless you assume that you are already interviewing the best of the best (unlikely). In reality, you’re likely recruiting people who are easiest to find: those who already share the same professional networks and backgrounds.

This sort of bias minimizes recruiting costs, but at the expense of diversity and also the expense of the quality of hires. Explicitly recruiting for diversity challenges your recruiting process to expand beyond its comfort zone, increasing the access to talent and the quality of your hires.

Thursday, November 19, 2020

How to use NixOS for lightweight integration tests

How to use NixOS for lightweight integration tests

If you use Nix in some capacity then you should check out the NixOS integration test system, which provides an easy way to test services that run inside one or more QEMU virtual machines.

NixOS tests are (in my opinion) way ahead of other integration test systems, but the only way to properly illustrate their value is to walk through a real-world example to highlight their salient features.

The motivating example

This post will motivate NixOS tests by using them to detect an error in the official postgrest tutorial.

You can skim the above tutorial to get a sense of the steps involved, but I’ll also summarize them here:

  • Run postgres in a docker container

  • Download postgrest

  • Set up the database by running these commands:

    create table api.todos (
      id serial primary key,
      done boolean not null default false,
      task text not null,
      due timestamptz
    );
    
    insert into api.todos (task) values
      ('finish tutorial 0'), ('pat self on back');
    
    create role web_anon nologin;
    
    grant usage on schema api to web_anon;
    grant select on api.todos to web_anon;
    
    create role authenticator noinherit login password 'mysecretpassword';
    grant web_anon to authenticator;
  • Save the following configuration to tutorial.conf:

    db-uri = "postgres://authenticator:mysecretpassword@localhost:5433/postgres"
    db-schema = "api"
    db-anon-role = "web_anon"
  • Run ./postgrest tutorial.conf

  • Check that it’s working using:

    $ curl http://localhost:3000/todos

    … which should return:

    [
      {
        "id": 1,
        "done": false,
        "task": "finish tutorial 0",
        "due": null
      },
      {
        "id": 2,
        "done": false,
        "task": "pat self on back",
        "due": null
      }
    ]

These are quite a few manual steps, and if I were a postgrest maintainer then it would be a pain to check that they still work for every new software release. In practice, most maintainers write and check a tutorial once and then never check again unless users report errors. This is a shame, because one of the most important functions of a tutorial is to inspire confidence:

Make sure that your tutorial works

One of your jobs as a tutor is to inspire the beginner’s confidence: in the software, in the tutorial, in the tutor and, of course, in their own ability to achieve what’s being asked of them.

There are many things that contribute to this. A friendly tone helps, as does consistent use of language, and a logical progression through the material. But the single most important thing is that what you ask the beginner to do must work. The learner needs to see that the actions you ask them to take have the effect you say they will have.

If the learner’s actions produce an error or unexpected results, your tutorial has failed - even if it’s not your fault. When your students are there with you, you can rescue them; if they’re reading your documentation on their own you can’t - so you have to prevent that from happening in advance. This is without doubt easier said than done.

Fortunately, we can codify the manual steps from the tutorial into a NixOS configuration for a virtual machine, which is a declarative specification of our system’s desired state:

# ./postgrest-tutorial.nix

let
  # For extra determinism
  nixpkgs =
    builtins.fetchTarball {
      url = "https://github.com/NixOS/nixpkgs/archive/58f9c4c7d3a42c912362ca68577162e38ea8edfb.tar.gz";

      sha256 = "1517dy07jf4zhzknqbgm617lgjxsn7a6k1vgq61c67f6h55qs5ij";
    };

  # Single source of truth for all tutorial constants
  database = "postgres";
  schema   = "api";
  table    = "todos";
  username = "authenticator";
  password = "mysecretpassword";
  webRole  = "web_anon";

  nixos =
    import "${nixpkgs}/nixos" {
      system = "x86_64-linux";

      configuration = { config, pkgs, ... }: {
        # Open the default port for `postgrest` in the firewall
        networking.firewall.allowedTCPPorts = [ 3000 ];

        services.postgresql = {
          enable = true;

          initialScript = pkgs.writeText "initialScript.sql" ''
            create schema ${schema};

            create table ${schema}.${table} (
              id serial primary key,
              done boolean not null default false,
              task text not null,
              due timestamptz
            );

            insert into ${schema}.${table} (task) values
              ('finish tutorial 0'), ('pat self on back');

            create role ${webRole} nologin;

            grant usage on schema ${schema} to ${webRole};
            grant select on ${schema}.${table} to ${webRole};

            create role ${username} noinherit login password '${password}';
            grant ${webRole} to ${username};
          '';
        };

        users = {
          mutableUsers = false;

          users = {
            # For ease of debugging the VM as the `root` user
            root.password = "";

            # Create a system user that matches the database user so that we
            # can use peer authentication.  The tutorial defines a password,
            # but it's not necessary.
            "${username}".isSystemUser = true;
          };
        };

        systemd.services.postgrest = {
          wantedBy = [ "multi-user.target" ];

          after = [ "postgresql.service" ];

          script =
            let
              configuration = pkgs.writeText "tutorial.conf" ''
                db-uri = "postgres://${username}:${password}@localhost:${toString config.services.postgresql.port}/${database}"
                db-schema = "${schema}"
                db-anon-role = "${username}"
              '';

            in
              ''
                ${pkgs.haskellPackages.postgrest}/bin/postgrest ${configuration}
              '';

          serviceConfig.User = username;
        };

        # Uncomment the next line for running QEMU on a non-graphical system
        # virtualisation.graphics = false;
      };
    };

in
  nixos.vm

We can then build and run this tutorial virtual machine by running the following commands:

$ nix build --file ./postgrest-tutorial.nix

$ QEMU_NET_OPTS='hostfwd=tcp::3000-:3000' result/bin/run-nixos-vm

That spins up a VM and prompts us to log in when the VM is ready:

<<< Welcome to NixOS 20.09pre-git (x86_64) - ttyS0 >>>

Run 'nixos-help' for the NixOS manual.

nixos login: 

However, before we log in, we can test if postgrest is working using the same curl command from the tutorial:

$ curl http://localhost:3000/todos
{"hint":null,"details":null,"code":"42501","message":"permission denied for schema api"}

Wait, what? We were supposed to get:

[
  {
    "id": 1,
    "done": false,
    "task": "finish tutorial 0",
    "due": null
  },
  {
    "id": 2,
    "done": false,
    "task": "pat self on back",
    "due": null
  }
]

… but apparently something is wrong with the database’s permissions.

Fortunately, we can log into the VM as the root user with an empty password to test the database permissions. Once we log into the system we can further log into the database as the authenticator user:

<<< Welcome to NixOS 20.09pre-git (x86_64) - ttyS0 >>>

Run 'nixos-help' for the NixOS manual.

nixos login: root<Enter>
Password: <Enter>

[root@nixos:~]# sudo --user authenticator psql postgres
psql (11.9)
Type "help" for help.

postgres=> 

Now we can test to see if the authenticator user is able to access the api.todos table:

postgres=> SELECT * FROM api.todos;
ERROR:  permission denied for schema api
LINE 1: SELECT * FROM api.todos;

Good: we can reproduce the problem, but what might be the cause?

As it turns out, the tutorial instructions appear to not configure the authenticator role correctly. Specifically, the noinherit in the following commands is the reason we can’t directly access the schema api:

create role authenticator noinherit login password 'mysecretpassword';
grant web_anon to authenticator;

The noinherit setting prevents the authenticator user from automatically assuming all permissions associated with the web_anon user. Instead, the authenticator user has to explicitly use the SET ROLE command to assume such permissions, and we can verify that at the database prompt:

postgres=> SET ROLE web_anon;
SET
postgres=> SELECT * FROM api.todos;
 id | done |       task        | due 
----+------+-------------------+-----
  1 | f    | finish tutorial 0 | 
  2 | f    | pat self on back  | 
(2 rows)

Mystery solved! We can test our hypothesis by changing that noinherit to inherit:

create role authenticator inherit login password 'mysecretpassword';
grant web_anon to authenticator;

… then we can restart the VM to check that things now work by:

  • typing CTRL-a c and entering quit

  • running the following commands:

    $ rm nixos.qcow2  # Remove the old VM's disk image so we start fresh
    $ nix build --file ./postgrest-tutorial.nix  # The rest is the same as before
    $ QEMU_NET_OPTS='hostfwd=tcp::3000-:3000' result/bin/run-nixos-vm

… and now the curl example from the tutorial works:

$ curl http://localhost:3000/todos
[{"id":1,"done":false,"task":"finish tutorial 0","due":null}, 
 {"id":2,"done":false,"task":"pat self on back","due":null}]

But wait, there’s more!

Automated testing

We don’t have to manually setup/teardown VMs and run curl commands. We can automate the entire process from end-to-end by using NixOS’s support for automated integration tests.

If we follow the instructions from the NixOS manual, then the automated integration test looks like this:

# ./postgrest-tutorial.nix

let
  # For extra determinism
  nixpkgs =
    builtins.fetchTarball {
      url = "https://github.com/NixOS/nixpkgs/archive/58f9c4c7d3a42c912362ca68577162e38ea8edfb.tar.gz";

      sha256 = "1517dy07jf4zhzknqbgm617lgjxsn7a6k1vgq61c67f6h55qs5ij";
    };

  # Single source of truth for all tutorial constants
  database      = "postgres";
  schema        = "api";
  table         = "todos";
  username      = "authenticator";
  password      = "mysecretpassword";
  webRole       = "web_anon";
  postgrestPort = 3000;

in
  import "${nixpkgs}/nixos/tests/make-test-python.nix" ({ pkgs, ...}: {
    system = "x86_64-linux";

    nodes = {
      server = { config, pkgs, ... }: {
        # Open the default port for `postgrest` in the firewall
        networking.firewall.allowedTCPPorts = [ postgrestPort ];

        services.postgresql = {
          enable = true;

          initialScript = pkgs.writeText "initialScript.sql" ''
            create schema ${schema};

            create table ${schema}.${table} (
              id serial primary key,
              done boolean not null default false,
              task text not null,
              due timestamptz
            );

            insert into ${schema}.${table} (task) values
              ('finish tutorial 0'), ('pat self on back');

            create role ${webRole} nologin;

            grant usage on schema ${schema} to ${webRole};
            grant select on ${schema}.${table} to ${webRole};

            create role ${username} inherit login password '${password}';
            grant ${webRole} to ${username};
          '';
        };

        users = {
          mutableUsers = false;

          users = {
            # For ease of debugging the VM as the `root` user
            root.password = "";

            # Create a system user that matches the database user so that we
            # can use peer authentication.  The tutorial defines a password,
            # but it's not necessary.
            "${username}".isSystemUser = true;
          };
        };

        systemd.services.postgrest = {
          wantedBy = [ "multi-user.target" ];

          after = [ "postgresql.service" ];

          script =
            let
              configuration = pkgs.writeText "tutorial.conf" ''
                db-uri = "postgres://${username}:${password}@localhost:${toString config.services.postgresql.port}/${database}"
                db-schema = "${schema}"
                db-anon-role = "${username}"
              '';

            in
              ''
                ${pkgs.haskellPackages.postgrest}/bin/postgrest ${configuration}
              '';

          serviceConfig.User = username;
        };

        # Uncomment the next line for running QEMU on a non-graphical system
        # virtualisation.graphics = false;
      };

      client = { };
    };

    testScript =
      ''
      import json
      import sys

      start_all()

      server.wait_for_open_port(${toString postgrestPort})

      expected = [
          {"id": 1, "done": False, "task": "finish tutorial 0", "due": None},
          {"id": 2, "done": False, "task": "pat self on back", "due": None},
      ]

      actual = json.loads(
          client.succeed(
              "${pkgs.curl}/bin/curl http://server:${toString postgrestPort}/${table}"
          )
      )

      if expected != actual:
          sys.exit(1)
      '';
  })

… and you can run the test with the following command:

$ nix build --file ./postgrest-tutorial.nix

… which will silently succeed with a 0 exit code if the test passes, or fail with an error message otherwise.

The above example highlights a few neat aspects of the NixOS test framework:

  • You can test more than one VM at a time

    The above test creates two VMs:

    • One VM named server which hosts postgres + postgrest

    • One VM named client where we initiate our curl commands

    … so that we can verify that everything works even when curl is run from a separate machine. For example, this comes in handy for testing firewall rules.

  • You can write the test and orchestration logic in Python

    This means that we can use Python not only to run the curl subprocess, but to also compare the result against a golden JSON output.

Conclusion

This NixOS test framework is streets ahead of other integration test frameworks that I’ve worked with:

  • The test is deterministic

    The above example will continue to work a decade from now because all transitive dependencies are fully pinned by the NixOS specification.

  • The test is reproducible

    We don’t need to specify out-of-band instructions for how to obtain or install test dependencies. The only thing users globally install is Nix.

  • The test is compact

    The whole thing fits in a single 120-line file with generous whitespace and formatting (although you have the option of splitting into more files if you prefer)

  • The test is fully isolated

    The test does not mutate any shared resources or files and the test runs within an isolated network, so we can run multiple integration tests in parallel on the same machine for building a test matrix.

  • The test is fast

    You might think that a VM-based test is slow compare to a container-based one, but the entire test run, including VM setup and teardown, only takes about 10 seconds.

  • The test is written in a fully-featured language

    We can use Nix’s support for programming language features to reduce repetition. For example, this is why we can consolidate all test constants to be defined in one place so that there is a single source of truth for everything.

So if you’re already trying out Nix, I highly encourage you to give the NixOS integration test framework a try for the above reasons.

Tuesday, November 10, 2020

Pretty-print syntax trees with this one simple trick

prettyprint

I want to share a simple trick for pretty-printing syntax trees with the correct precedence that I’ve been using in my own interpreter projects. I believe this trick has been shared before, but I don’t know what the name of it is, so I wasn’t able to easily search for prior art. If somebody knows where this idea originated from then I can update this post to credit the original.

To illustrate the trick, I’d like to begin from the following Haskell type for a lambda calculus expression:

data Expression
    = Lambda Text Expression
    | Application Expression Expression
    | Variable Text

… and a matching grammar for parsing such an expression (using the same notation that the happy package uses):

Expression
  : '\\' label '->' Expression                { Lambda $2 $4      }
  | ApplicationExpression                     { $1                }

ApplicationExpression
  : ApplicationExpression VariableExpression  { Application $1 $2 }
  | VariableExpression                        { $1                }

VariableExpression
  : label                                     { Variable $1       }
  | '(' Expression ')'                        { $2                }

The trick

We can pretty-print that Expression type with the correct precedence without having to keep track of any precedence level. Instead, all we have to do is to write the pretty-printer to match the shape of the grammar, like this:

prettyExpression :: Expression -> Text
prettyExpression (Lambda x e) =
    "\\" <> x <> " -> " <> prettyExpression e
prettyExpression other =
    prettyApplicationExpression other

prettyApplicationExpression :: Expression -> Text
prettyApplicationExpression (Application f x) =
    prettyApplicationExpression f <> " " <> prettyVariableExpression x
prettyApplicationExpression other =
    prettyVariableExpression other

prettyVariableExpression :: Expression -> Text
prettyVariableExpression (Variable x) =
    x
prettyVariableExpression other =
    "(" <> prettyExpression other <> ")"

The pretty-printing logic closely follows the grammar

  • Create one pretty… function for each nonterminal symbol in the grammar

    For example, since we have a nonterminal symbol named ApplicationExpression, we create a matching pretty-printing function named prettyApplicationExpression.

  • Each pretty… function matches one pattern per alternative in the grammar

    In other words, if the production rule for ApplicationExpression has two alternatives, then the prettyApplicationExpression matches two patterns corresponding to each of the two alternatives, respectively.

  • Pretty-print non-terminal symbols using the matching pretty-printing function

    For example, we pretty-print a function’s argument using prettyVariableExpression since we used the VariableExpression nonterminal symbol to parse that argument.

  • Pretty-print terminal symbols in the obvious way

    … with any necessary whitespace

That’s the entire trick! If you follow those simple rules then the prettyprinter will automatically respect precedence, inserting parentheses in the right places and eliding parentheses when they are not necessary.

Conclusion

There is one major downside to this trick: if you add a new constructor to your syntax tree and you forget to update the pretty-printer then your pretty-printer will infinitely loop. This is pretty annoying as you might imagine.

The main upside to this trick is that pretty-printer logic is very simple to write, so mechanical that you could probably automate it (although I’m not sure if somebody has done so, yet).

Friday, October 30, 2020

Why I prefer functional programming

functional

This post explains why I stick with functional programming, using a rationale that a non-functional programmer can relate to.

The reason is actually pretty simple: functional programming idioms are more enduring and portable than idioms from other programming paradigms (such as procedural or object-oriented programming). To explain why, I need to first define what I understand “functional programming” to mean (which is admittedly an imprecise and vague term).

I personally use the term “functional programming” to denote a style of programming where you restrict yourself as much as possible to the following language features:

  • Scalars, including:
    • Numbers
    • Strings
  • Algebraic data types, including:
    • Records
    • Tagged unions, including:
      • Bools
      • Optional values
      • Enums
    • Recursion, including:
      • Lists
  • First-class functions

Carefully note what’s absent from the list. We don’t mention:

  • Classes / Objects
  • Mutation
  • Structured programming idioms (e.g. for / while loops)

That’s not to say that those features are banned from my definition of functional programming. Think of the definition as more of a “tech radar” where the former set of features fall in the “Adopt” category and the latter set of features fall in the “Hold” category.

So what distinguishes the former “approved” features from the latter “discouraged” features? The approved language features are “timeless”. You’re always going to need numbers, lists, strings, functions, records, etc. They aren’t even specific to programming: they predate programming and originate from good old-fashioned math. If your language doesn’t support one or more of those features you will run into difficulties modeling some problem domains.

However, once you verse yourself in functional programming idioms you realize that you don’t actually need much else beyond those features:

  • Error handling? Use a tagged union (e.g. Either / Result)
  • Loops? Use recursion
  • Dependency injection? Use a higher-order function

When you view things in that light you begin to view other programming idioms as window dressing that comes and goes; not fundamental to the discipline of software engineering.

Moreover, using “timeless” primitives fosters a programming style that is more portable than most. Most functional programming idioms can be ported to any language, with the notable exception of recursion and generalized tagged unions (which not all languages support). However, functional programmers learn how to translate recursion and tagged unions to equivalent idioms in other languages (e.g. loops and the visitor pattern, respectively). However, if you try to port object-oriented idioms to a non-object-oriented language you’re going to have a bad time — likewise for porting imperative idioms to a functional programming language.

This is why my tech radar marks functional programming as “Adopt” and marks other programming paradigms as “Hold”.

Monday, July 27, 2020

The golden rule of software quality

golden-rule

This post summarizes a rule of thumb that I commonly cite in software quality discussions, so that I can link to my own post to save time. I have taken to calling this the “golden rule of software quality” because the rule is succinct and generalizable.

The golden rule is:

Prefer to push fixes upstream instead of working around problems downstream

… and I’ll explain implications of this rule for a few software engineering tradeoffs (using examples from the Haskell community and ecosystem).

Disclaimer: The golden rule of software quality bears no relationship to the golden rule of treating others as you want to be treated.

Third-party dependencies

Most developers rely on third-party dependencies or tools for their projects, but the same developers rarely give thought to fixing or improving that same third-party code. Instead, they tend to succumb to the bystander effect, meaning that the more widely used a project, the more a person assumes that some other developer will take care of any problems for them. Consequently, these same developers tend to work around problems in widely used tools.

For example, for the longest time Haskell did not support a “dot” syntax for accessing record fields, something that the community worked around downstream through a variety of packages (including lens) to simulate an approximation of dot syntax within the language. This approach had some upsides (accessors were first class), but several downsides such as poor type inference, poor error messages, and lack of editor support for field completions. Only recently did Neil Mitchell and Shayne Fletcher upstream this feature directly into the language via the RecordDotSyntax proposal, solving the root of the problem.

The golden rule of software quality implies that you should prefer to directly improve the tools and packages that you depend on (“push fixes upstream”) instead of hacking around the problem locally (“working around problems downstream”). These sorts of upstream improvements can be made directly to:

  • Your editor / IDE
  • Your command-line shell
  • Programming languages you use
  • Packages that you depend on

Note that this is not always possible (especially if upstream is hostile to outside contributions), but don’t give up before at least trying to do so.

Typed APIs

Function types can also follow this same precept. For example, there are two ways that one can assign a “safe” (total) type to the head function for obtaining the first value in a list.

The first approach pushes error handling downstream:

-- Return the first value wrapped in a `Just` if present, `Nothing` otherwise
head :: [a] -> Maybe a

… and the second approach pushes the requirements upstream:

-- Return the first value of a list, which never fails if the list is `NonEmpty`
head :: NonEmpty a -> a

The golden rule states that you should prefer the latter type signature for head (that requires a NonEmpty input) since this type pushes the fix upstream by not allowing the user to supply an empty list in the first place. More generally, if you take this rule to its logical conclusion you end up making illegal states unrepresentable.

Contrast this with the former type signature for head that works around a potentially empty list by returning a Maybe. This type promotes catching errors later in the process, which reduces quality since we don’t fail as quickly as we should. You can improve quality by failing fast at the true upstream root of the problem instead of debugging indirect downstream symptoms of the problem.

Social divisions

I’m a firm believer in Conway’s Law, which says:

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.

— Melvin E. Conway

… which I sometimes paraphrase as “social divisions lead to technical divisions”.

If social issues are upstream of technical issues, the golden rule implies that we should prefer fixing root causes (social friction) instead of attempting to mask social disagreements with technical solutions.

The classic example of this within the Haskell community is the cabal vs. stack divide, which originated out of divisions between FPComplete and Cabal contributors (Corrected based on feedback from the Haskell subreddit). The failure to resolve the upstream friction between the paid and open source contributors led to an attempt to work around the problem downstream with a technical solution by creating a parallel install tool. This in turn fragmented the Haskell community, leading to a poor and confusing experience for first-time users.

That’s not to imply that the divide in the community could have been resolved (maybe the differences between paid contributors and open source volunteers were irreconcilable), but the example still illustrates the marked impact on quality of failing to fix issues at the source.

Conclusion

Carefully note that the golden rule of software quality does not mandate that you have to fix problems upstream. The rule advises that you should prefer to upstream fixes, all other things equal. Sometimes other considerations can prevent one from doing so (such as limitations on time or money). However, when quality is paramount then you should strive to observe the rule!

Monday, July 13, 2020

Record constructors

records

This is a short post documenting various record-related idioms in the Haskell ecosystem. First-time package users can use this post to better understand record API idioms they encounter in the wild.

For package authors, I also include a brief recommendation near the end of the post explaining which idiom I personally prefer.

The example

I’ll use the following record type as the running example for this post:

module Example where

data Person = Person{ name :: String , admin :: Bool }

There are a few ways you can create a Person record if the package author exports the record constructors.

The simplest approach requires no extensions. You can initialize the value of every field in a single expression, like this:

example :: Person
example = Person{ name = "John Doe", admin = True }

Some record literals can get quite large, so the language provides two extensions which can help with record assembly.

First, you can use the NamedFieldPuns extension, to author a record like this:

{-# LANGUAGE NamedFieldPuns #-}

example :: Person
example = Person{ name, admin }
  where
    name = "John Doe"

    admin = True

This works because the NamedFieldPuns extension translates Person{ name, admin } to Person{ name = name, admin = admin }.

The RecordWildCards extension goes a step further and allows you to initialize a record literal without naming all of the fields (again), like this:

{-# LANGUAGE RecordWildCards #-}

example :: Person
example = Person{..}
  where
    name = "John Doe"

    admin = True

Vice versa, you can destructure a record literal in a few ways. For example, you can access record fields using accessor functions:

render :: Person -> String
render person = name person ++ suffix
  where
    suffix = if admin person then " - Admin" else ""

… or you can pattern match on a record literal:

render :: Person -> String
render Person{ name = name, admin = admin } = name ++ suffix
  where
    suffix = if admin then " - Admin" else ""

… or you can use the NamedFieldPuns extension (which also works in reverse):

render :: Person -> String
render Person{ name, admin } = name ++ suffix
  where
    suffix = if admin then " - Admin" else ""

… or you can use the RecordWildCards extension (which also works in reverse):

render :: Person -> String
render Person{..} = name ++ suffix
  where
    suffix = if admin then " - Admin" else ""

Also, once the RecordDotSyntax extension is available you can use ordinary dot syntax to access record fields:

render :: Person -> String
render person = person.name ++ suffix
  where
    suffix = if person.admin then " - Admin" else ""

Opaque record types

Some Haskell packages will elect to not export the record constructor. When they do so they will instead provide a function that initializes a record value with all required fields and defaults the remaining fields.

For example, suppose the name field were required for our Person type and the admin field were optional (defaulting to False). The API might look like this:

module Example (
      Person(name, admin)
    , makePerson
    ) where

data Person = Person{ name :: String, admin :: Bool }

makePerson :: String -> Person
makePerson name = Person{ name = name, admin = False }

Carefully note that the module exports the Person type and all of the fields, but not the Person constructor. So the only way that a user can create a Person record is to use the makePerson “smart constructor”. The typical idiom goes like this:

example :: Person
example = (makePerson "John Doe"){ admin = True }

In other words, the user is supposed to initialize required fields using the “smart constructor” and then set the remaining non-required fields using record syntax. This works because you can update a record type using exported fields even if the constructor is not exported.

The wai package is one of the more commonly used packages that observes this idiom. For example, the Request record is opaque but the accessors are still exported, so you can create a defaultRequest and then update that Request using record syntax:

example :: Request
example = defaultRequest{ requestMethod = "GET", isSecure = True }

… and you can still access fields using the exported accessor functions:

requestMethod example

This approach also works in conjunction with NamedFieldPuns for assembly (but not disassembly), so something like this valid:

example :: Request
example = defaultRequest{ requestMethod, isSecure }
  where
    requestMethod = "GET"

    isSecure = True

However, this approach does not work with the RecordWildCards language extension.

Some other packages go a step further and instead of exporting the accessors they export lenses for the accessor fields. For example, the amazonka-* family of packages does this, leading to record construction code like this:

example :: PutObject
example =
    putObject "my-example-bucket" "some-key" "some-body"
    & poContentLength .~ Just 9
    & poStorageClass  .~ ReducedRedundancy

… and you access fields using the lenses:

view poContentLength example

My recommendation

I believe that package authors should prefer to export record constructors instead of using smart constructors. Specifically, the smart constructor idiom requires too much specialized language knowledge to create a record, something that should be an introductory task for a functional programming language.

Package authors typically justify smart constructors to improve API stability since they permit adding new default-valued fields in a backwards compatible way. However, I personally do not weight such stability highly (both as a package author and a package user) because Haskell is a typed language and these changes are easy for reverse dependencies to accommodate with the aid of the type-checker.

I place a higher premium on improving the experience for new contributors so that Haskell projects can more easily take root within a polyglot engineering organization. Management tends to be less reluctant to accept Haskell projects within their organization if they feel that other teams can confidently contribute to the Haskell code.

Future directions

One long-term solution that could provide the best of both worlds is if the language had first-class support for default-valued fields. In other words, perhaps you could author a record type like this:

data Person = Person{ name :: String , admin :: Bool = False }

… and then you could safely omit default-valued fields when initializing a record. Of course, I haven’t fully thought through the implications of such a change.