Thursday, September 17, 2015

How to make your Haskell code more readable to non-Haskell programmers

This post summarizes a few tips that increase the readability of Haskell code in my anecdotal experience. Each guideline will have a corresponding rationale.

Do not take this post to mean that all Haskell code should be written this way. These are guidelines for code that you wish to use as expository examples to introduce people to the language without scaring them with unfamiliar syntax.

Rule #1: Don't use ($)

This is probably the most controversial guideline but I believe this is the recommendation which has the highest impact on readability.

A typical example of this issue is something like the following code:

print $ even $ 4 * 2

... which is equivalent to this code:

print (even (4 * 2))

The biggest issue with the dollar sign is that most people will not recognize it as an operator! There is no precedent for using the dollar sign as an operator in any other languages. Indeed, the vast majority of developers program in languages that do not support adding new operators at all, such as Javascript, Java, C++, or Python, so you cannot reasonably expect them to immediately recognize that the dollar symbol is an operator.

This then leads people to believe that the dollar sign is some sort of built-in language syntax, which in turn convinces them that Haskell's syntax is needlessly complex and optimized for being terse over readable. This perception is compounded by the fact that the most significant use of the dollar symbol outside of Haskell is in Perl (a language notorious for being write-only).

Suppose that they do recognize that the symbol represents an operator. They still cannot guess at what the operator means. There is no obvious mental connection between a symbol used for currency and function application. There is also no prior art for this connection outside of the Haskell language.

Even if a newcomer is lucky enough to guess that the symbol represents function application, it's still ambiguous because they cannot tell if the symbol is left- or right-associative. Even people who do actually take the time to learn Haskell in more depth have difficulty understanding how ($) behaves and frequently confuse it with the composition operator, (.). If people earnestly learning the language have difficulty understanding ($), what chance do skeptics have?

By this point you've already lost many people who might have been potentially interested in the language, and for what? The dollar sign does not even shorten the expression.

Rule #2: Use operators sparingly

Rule #1 is a special case of Rule #2.

My rough guideline for which operators to use is that assocative operators are okay, and all other operators are not okay.

Okay:

  • (.)
  • (+) / (*)
  • (&&) / (||)
  • (++)

Not okay:

  • (<$>) / (<*>) - Use liftA{n} or ApplicativeDo in the future
  • (^.) / (^..) / %~ / .~ - Use view / toListOf / over / set instead

You don't have to agree with me on the specific operators to keep or reject. The important part is just using them more sparingly when teaching Haskell.

The issues with operators are very similar in principle to the issue with the dollar sign:

  • They are not recognizable as operators to some people, especially if they have no equivalent in other languages
  • Their meaning is not immediately obvious
  • Their precedence and fixity are not obvious, particular for Haskell-specific operators

The main reason I slightly prefer associative operators is that their fixity does not matter and they usually have prior art outside the language as commonly used mathematical operators.

Rule #3: Use do notation generously

Prefer do notation over (>>=) or fmap when available, even if it makes your code a few lines longer. People won't reject a language on the basis of verbosity (Java and Go are living proof of that), but they will reject languages on the basis of unfamiliar operators or functions.

This means that instead of writing this:

example = getLine >>= putStrLn . (++ "!")

You instead write something like this:

example = do
    str <- getLine
    putStrLn (str ++ "!")

If you really want a one-liner you can still use do notation, just by adding a semicolon:

example = do str <- getLine; putStrLn (str ++ "!")

do notation and semicolons are immediately recognizable to outsiders because they resemble subroutine syntax and in the most common case (IO) it is in fact subroutine syntax.

A corollary of this is to use the newly added ApplicativeDo extension, which was recently merged into the GHC mainline and will be available in the next GHC release. I believe that ApplicativeDo will be more readable to outsiders than the (<$>) and (<*>) operators.

Rule #4: Don't use lenses

Don't get me wrong: I'm one of the biggest advocates for lenses and I think they firmly belong as a mainstream Haskell idiom. However, I don't feel they are appropriate for beginners.

The biggest issues are that:

  • It's difficult to explain to beginners how lenses work
  • They require Template Haskell or boilerplate lens definitions
  • They require separate names for function accessors and lenses, and one or the other is bound to look ugly as a result
  • They lead to poor inferred types and error message, even when using the more monomorphic versions in lens-family-core

Lenses are wonderful, but there's no hurry to teach them. There are already plenty of uniquely amazing things about the Haskell language worth learning before even mentioning lenses.

Rule #5: Use where and let generously

Resist the temptation to write one giant expression spanning multiple lines. Break it up into smaller sub-expressions each defined on their own line using either where or let.

This rule exists primarily to ease imperative programmers into functional programming. These programmers are accustomed to frequent visual "punctuation" in the form of statement boundaries when reading code. let and where visually simulate decomposing a larger program into smaller "statements" even if they are really sub-expressions and not statements.

Rule #6: Use point-free style sparingly

Every Haskell programmer goes through a phase where they try to see if they can eliminate all variable names. Spoiler alert: you always can, but this just makes the code terse and unreadable.

For example, I'll be damned if I know what this means without some careful thought and some pen and paper:

((==) <*>)

... but I can tell at a glance what this equivalent expression does:

\f x -> x == f x

This is a real example, by the way.

There's no hard and fast rule for where to draw the line, but when in doubt err on the side of being less point-free.

Conclusion

That's it! Those six simple rules go a very long way towards improving the readability of Haskell code to outsiders.

Haskell is actually a supremely readable language once you familiarize yourself with the prevailing idioms, thanks to:

  • purity
  • the opinionated functional paradigm
  • minimization of needless side effects and state

However, we should make an extra effort to make our code readable even to complete outsiders with absolutely no familiarity or experience with the language. The entry barrier is one of the most widely cited criticisms of the language and I believe that a simple and clean coding style can lower that barrier.

15 comments:

  1. Rule #1 might be the more controversial, but it is also the one I am most likely to follow beyond didactic examples. When I first saw Chris Done's style guide a few months ago, "don't use `($)`" sounded very unreasonable, but reading that started a gradual shift in my preferences. Now I mostly use `($)` where there is a large difference between the expressions on either side of it in terms of relevance (e.g. after boilerplate wrappers) or appearance (e.g. before do-blocks).

    ReplyDelete
  2. The other thing, which is probably worth to be mentioned, is the choose of names for types and values (vars). Quite often the newcomer may see "f" both at the type declaration and as one of the parameter. Eric Meijer even joken - let's call our children like "x, y and z".

    ReplyDelete
  3. The one thing I have heard again and again regarding OCaml, F#, Clojure, and Elixir has been that one single operator (or family of operators) is much loved, and that is the reverse-application "pipe" operator "|>" (which GHC 7.10 has now in a fashion, but with precedence issues, as "&"). This is so much loved that, for example, entire blog posts and the main book on Elixir https://pragprog.com/book/elixir/programming-elixir promote "|>" as one of the main features of the language! Given that, I think it is clear there is a market for non-parenthesized application syntax, with the caveat just that "$" is foreign and also is right-to-left, which in other guises (such as function composition) is difficult for newcomers to stomach. Therefore, I have experimented with presenting code using "&" (and ">>>" when going point-free) which may look a little funny at first lexically, but coupled with good variables names, and reading aloud left to right, seem to click pretty well.

    I think it is good to empirically examine what people do like about the readability of different languages. Clojure's "threading macros" are also much loved, to the point of obsession, judging by blog posts about them.

    Don't laugh, this stuff matters. Sensible use of imports, using records with named fields rather than tuples, choosing names of more than a single character, etc. Again and again people get the wrong idea about Haskell from some old practices in the name of sheer brevity. Haskell suffers from "there's more than one way to do it".

    ReplyDelete
    Replies
    1. In some ways the love of (|>) in those languages is due to the lack of viable alternatives. They basically _have_ to use it over composition for type inference reasons, so it gradually became ingrained in the culture.

      Delete
    2. Let me respectfully disagree, as a beginner. It's not a simple matter of being ingrained in culture: the reason they love it is that it conveys a very convenient notion of passing data left to right through a series of transforms. All the other alternatives mess it up in subtle psychological ways, costing time to parse even to the trained eye. One of the latest experiments in this regard is Elm, which again shows that people love |> , and can easily transform almost any code to this style in no time. As it stands '&' is also a very bad choice of symbol for this.

      Delete
    3. Also, it's similar to Unix pipes, which are familiar almost to anyone. In this way, I would agree that Unix pipes are somewhat ingrained in culture.

      Delete
    4. I somehow forgot to mention the love for pipes in Elm: https://twitter.com/mariusbutuc/status/647152222267006976

      Delete
    5. 100% agree Franklin. It's analogous to the value of do-notation. People should keep banging on about this.

      Delete
  4. I don't necessarily agree with all points; Definitely with lens (IMHO you should actually understand lens if you want to use it; And there is no way a neophyte will even get the type signatures of lens).

    The thing about confusing (.) and ($) seems to me more like an artifact of how they're usually introduced; If you make a habit out of introducing them as function composition (which they're likely familiar with from school) and function application (which they're probably not familiar with as an operator, but it's easy enough to introduce) rather than as some kind of special form to save parentheses (which seems to be the impression people often get) you'd have much less of an issue.

    ReplyDelete
  5. What would you think of non-existent yet "Haskell glasses" tool? It would expand hard to read for novice expressions into a verbose equivalents.

    ReplyDelete
  6. This post reminds me a lot of 'How to get people to like vi/vim' which goes into the thought that new-users need a gentle introduction of which the commands used would get in the way of power-users. I particularly take umbrage from perl being write-only. :) I used perl in anger and for great good for a dozen years. My team wrote in a disciplined manner (we could have written Fortran but didn't -- Now to be lit up by Fortran programmers :) but when it was needed we could drop into syntax we knew acted correctly for a great return on the syntactic, and for those in the know, cognitive load.

    These are great rules for showing your (not-even newbie Haskel-er) friends Haskell. Even a toe in the water (or maybe a foot) into Haskell and I think these rules might start to hinder. Personally once I saw what $ did (and I did recognize it as an operator at that point so not gainsaying your premise for people that have never seen Haskell code) I thought it was about the best things since sliced bread. Does anyone really like playing with parenthesis while they code?

    Hmm. Re-reading I'm struck again by the different reception this post might get depending on where your audience is coming from. Mentoring new-hires I think they'll get to follow where I lead. On the other hand I'm gonna give the next presentation I give to a lay-audience a lot more thought.

    (Back and forth on the premise but many thanks for an article that's making me think about it!)

    ReplyDelete
    Replies
    1. Yeah, this post is targeted more at Haskell code that is intended to be shown to people who are not comfortable with Haskell yet, such as tutorial examples or small test projects being used to assess the language. For professional programming in Haskell I'm much less opinionated about coding style.

      Delete
  7. Loved the post, thanks! One additional request I would like to see for all "beginner friendly" haskell code is to do a simple "import qualified" or possibly an explicit import ("import Data.List (foo, bar)") so that readers can know where bound symbols come from. The default is to assume they are in Prelude.

    ReplyDelete
  8. Great post, Gabriel! I recently started using Haskell seriously and I’d like to +1 Michael’s comment about preferring `import qualified …`. How do experienced Haskell programmers understand which package a function comes from? IDEs?

    I think Taylor Fausak sets a great example with this convention: https://github.com/tfausak

    For example: https://github.com/tfausak/factory

    ReplyDelete