Wednesday, May 5, 2021

The trick to avoid deeply-nested error-handling code

either-trick

This post documents a small trick that I use to avoid deeply-nested error-handling code. This trick is a common piece of Haskell folklore that many people either learn from others or figure out on their own, but I’m not sure what the official name of this trick is (so I had difficulty searching for prior art explaining this trick). However, I’ve taught this trick to others enough times that I think it merits a blog post of its own.

This post assumes some familiarity with Haskell’s Either type and do notation, but the Appendix at the end of the post will walk through all of the details using equational reasoning if you’re having trouble following along with how things work.

The motivating example

Let’s begin from the following contrived Either-based example that uses deeply nested error-handling:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

example :: String -> String -> Either String Person
example ageString aliveString = do
    case readMaybe ageString of
        Nothing -> do
            Left "Invalid age string"

        Just age -> do
            if age < 0
                then do
                    Left "Negative age"

                else do
                    case readMaybe aliveString of
                        Nothing -> do
                            Left "Invalid alive string"

                        Just alive -> do
                            return Person{ age, alive }

… which we can use like this:

>>> example "24" "True"
Right (Person {age = 24, alive = True})

>>> example "24" "true"
Left "Invalid alive string"

>>> example "" "True"
Left "Invalid age string"

>>> example "-5" "True"
Left "Negative age"

Notice how the above coding style increases the nesting / indentation level every time we parse or validate the input in some way. We could conserve indentation by using 2-space indents or indenting only once for each level of nesting:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

example :: String -> String -> Either String Person
example ageString aliveString = case readMaybe ageString of
  Nothing  -> Left "Invalid age string"
  Just age -> if age < 0
    then Left "Negative age"
    else case readMaybe aliveString of
      Nothing    -> Left "Invalid alive string"
      Just alive -> return Person{ age, alive }

However, I think most people writing code like that would prefer to keep the indentation level stable, no matter how many validations the code requires.

The trick

Fortunately, we can avoid nesting the code with the following change:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

example :: String -> String -> Either String Person
example ageString aliveString = do
    age <- case readMaybe ageString of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe aliveString of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

Here we make use of several useful properties:

  • return in Haskell does not short-circuit or exit from the surrounding code

    In fact, some Haskell programmers prefer to use pure (a synonym for return) to better convey the absence of short-circuiting behavior:

    {-# LANGUAGE NamedFieldPuns #-}
    
    import Text.Read (readMaybe)
    
    data Person = Person { age :: Int, alive :: Bool } deriving (Show)
    
    example :: String -> String -> Either String Person
    example ageString aliveString = do
        age <- case readMaybe ageString of
            Nothing  -> Left "Invalid age string"
            Just age -> pure age
    
        if age < 0
            then Left "Negative age"
            else pure ()
    
        alive <- case readMaybe aliveString of
            Nothing    -> Left "Invalid alive string"
            Just alive -> pure alive
    
        pure Person{ age, alive }
  • Left does short-circuit from the surrounding code

    In fact, we can define a synonym for Left named throw if we want to better convey the presence of short-circuiting behavior::

    {-# LANGUAGE NamedFieldPuns #-}
    
    import Text.Read (readMaybe)
    
    data Person = Person { age :: Int, alive :: Bool } deriving (Show)
    
    example :: String -> String -> Either String Person
    example ageString aliveString = do
        age <- case readMaybe ageString of
            Nothing  -> throw "Invalid age string"
            Just age -> pure age
    
        if age < 0
            then throw "Negative age"
            else pure ()
    
        alive <- case readMaybe aliveString of
            Nothing    -> throw "Invalid alive string"
            Just alive -> pure alive
    
        pure Person{ age, alive }
    
    throw :: String -> Either String a
    throw = Left
  • Left / throw “return” any type of value

    If you look at the type of throw, the “return” type is a polymorphic type a because throw short-circuits (and therefore doesn’t need to return a real value of that type).

    This is why the type checker doesn’t complain when we do this:

    age <- case readMaybe ageString of
        Nothing  -> throw "Invalid age string"
        Just age -> pure age

    … because both branches of the case expression type-check as an expression that “return”s an Int. The left branch type-checks as a branch that returns an Int because it cheats and never needs to return anything and the right branch returns a bona-fide Int (the age).

    We can make this explicit by giving both branches of the case expression an explicit type annotation:

    age <- case readMaybe ageString of
        Nothing  -> throw "Invalid age string" :: Either String Int
        Just age -> pure age                   :: Either String Int

    This means that both branches of a case expression will always share the same return type if at least one branch is a Left / throw. Similarly, both branches of an if expression will share the same return type if at least one branch is a Left / throw:

    if age < 0
        then throw "Negative age" :: Either String ()
        else pure ()              :: Either String ()
  • We can return a value from a case expression

    New Haskell programmers might not realize that case expressions can return a value (just like any other expression), which means that their result can be stored as a new value within the surrounding scope:

    age <- case readMaybe ageString of
        Nothing  -> throw "Invalid age string" :: Either String Int
        Just age -> pure age                   :: Either String Int

    The type-checker doesn’t mind that only the second branch returns a valid age because it knows that the outer age is unreachable if the first branch short-circuits. This understanding is not built into the compiler, but is rather an emergent property of how do notation works for Either. See the Appendix for a fully-worked equational reasoning example showing why this works.

Combinators

You can build upon this trick by defining helpful utility functions to simplify things further. For example, I sometimes like to define the following helper function:

orDie :: Maybe a -> String -> Either String a
Just a  `orDie` _      = return a
Nothing `orDie` string = Left string

{- Equivalent, more explicit, implementation:

maybe `orDie` string =
    case maybe of
        Nothing -> Left string
        Just a  -> return a
-}

… which you can use like this:

{-# LANGUAGE NamedFieldPuns #-}

import Text.Read (readMaybe)

data Person = Person { age :: Int, alive :: Bool } deriving (Show)

orDie :: Maybe a -> String -> Either String a
Just a  `orDie` _      = Right a
Nothing `orDie` string = Left string

example :: String -> String -> Either String Person
example ageString aliveString = do
    age <- readMaybe ageString `orDie` "Invalid age string"

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- readMaybe aliveString `orDie` "Invalid alive string"

    return Person{ age, alive }

… which is even more clear (in my opinion).

Conclusion

That is the entirety of the trick. You can return values from case expressions to avoid deeply-nesting your Either code, or you can define utility functions (such as orDie) which do essentially the same thing.

This trick applies equally well to any other Monad that supports some notion of short-circuiting on failure, such as ExceptT (from transformers / mtl). The only essential ingredient is some throw-like function that short-circuits and therefore has a polymorphic return type.

Appendix

You can build a better intuition for why this works using equational reasoning. We’ll begin from an example usage of our function and at each step of the process we will either desugar the code or substitute one subexpression with another equal subexpression.

In all of the examples, we’ll begin from this definition for the example function:

example :: String -> String -> Either String Person
example ageString aliveString = do
    age <- case readMaybe ageString of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe aliveString of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

Let’s first begin with the example that fails the most quickly:

example "" "True"

-- Substitute `example` with its definition:
--
-- … replacing `ageString` with `""`
--
-- … replacing `aliveString` with `"True"`
do  age <- case readMaybe "" of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Definition of `readMaybe` (not shown):
--
-- (readMaybe "" :: Maybe Int) = Nothing
do  age <- case Nothing of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Simplify the first `case` expression
do  age <- Left "Invalid age string"

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Left "Invalid age string" >>= \age ->
    do  if age < 0
            then Left "Negative age"
            else return ()

        alive <- case readMaybe "True" of
            Nothing    -> Left "Invalid alive string"
            Just alive -> return alive

        return Person{ age, alive }

-- Definition of `>>=` for `Either`
--
-- (Left x >>= _) = (Left x)
Left "Invalid age string"

… and we’re done! The key step is the last bit where we simplify (Left … >>= _) to (Left …), which is how Either short-circuits on failure. Because this simplification does not use the downstream code everything type-checks just fine even though we never supply a valid age.

For completeness, let’s also walk through the example where everything succeeds:

example "24" "True"

-- Substitute `example` with its definition:
--
-- … replacing `ageString` with `"24"`
--
-- … replacing `aliveString` with `"True"`
do  age <- case readMaybe "24" of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Definition of `readMaybe` (not shown):
--
-- (readMaybe "24" :: Maybe Int) = (Just 24)
do  age <- case Just 24 of
        Nothing  -> Left "Invalid age string"
        Just age -> return age

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Simplify the first `case` expression
do  age <- return 24

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- return = Right
do  age <- Right 24

    if age < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Right 24 >>= \age ->
    do  if age < 0
            then Left "Negative age"
            else return ()

        alive <- case readMaybe "True" of
            Nothing    -> Left "Invalid alive string"
            Just alive -> return alive

        return Person{ age, alive }

-- Definition of `>>=` for `Either`
--
-- (Right x >>= f) = (f x)
--
-- This means that we substitute `age` with `24`
do  if 24 < 0
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- (24 < 0) = False
do  if False
        then Left "Negative age"
        else return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- (if False then l else r) = r
do  return ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- return = Right
do  Right ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- (do m; n) = (do _ <- m; n)
do  _ <- Right ()

    alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Right () >>= _ -> 
    do  alive <- case readMaybe "True" of
            Nothing    -> Left "Invalid alive string"
            Just alive -> return alive

        return Person{ age = 24, alive }

-- Definition of `>>=` for `Either`
--
-- (Right x >>= f) = (f x)
--
-- This means that we substitute `_` (unused) with `()`
do  alive <- case readMaybe "True" of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- Definition of `readMaybe` (not shown):
--
-- (readMaybe "True" :: Maybe Bool) = (Just True)
do  alive <- case Just True of
        Nothing    -> Left "Invalid alive string"
        Just alive -> return alive

    return Person{ age = 24, alive }

-- Simplify the `case` expression
do  alive <- return True

    return Person{ age = 24, alive }

-- return = Right
do  alive <- Right True

    return Person{ age = 24, alive }

-- Desugar `do` notation one step
--
-- (do age <- m; n) = (m >>= \age -> n)
Right True >>= \alive ->
    do  return Person{ age = 24, alive }

-- Definition of `>>=` for `Either`
--
-- (Right x >>= f) = (f x)
--
-- This means that we substitute `alive` with `True`
do  return Person{ age = 24, alive = True }

-- Desugar `do` notation
--
-- do m = m
return Person{ age = 24, alive = True }

-- return = Right
Right Person{ age = 24, alive = True }

As an exercise, you can try to extrapolate between those two examples to reason through what happens when we evaluate the remaining two examples which fail somewhere in between:

>>> example "24" "true"

>>> example "-5" "True"

Wednesday, April 21, 2021

The end of history for programming

history

I spend quite a bit of time thinking about what the end of history for programming might look like. By the “end of history” I mean the point beyond which programming paradigms would not evolve significantly.

I care about programming’s “destiny” because I prefer to work on open source projects that bring us closer to that final programming paradigm. In my experience, this sort of work has greater longevity, higher impact, and helps move the whole field of programming forward.

So what would the end of history look like for programming? Is it:

  • … already here?

    Some people treat programming as a solved problem and view all new languages and paradigms as reskins of old languages or paradigms. From their point of view all that remains is to refine our craft by slowly optimizing things, weeding out bugs or addressing non-technical issues like people management or funding.

    I personally do not subscribe to this philosophy because I believe, at a very minimum, that functional programming paradigms will slowly displace object-oriented and imperative programming paradigms (although functional programming might not necessarily be the final programming paradigm).

  • … artificial intelligence?

    Maybe machines will translate our natural language instructions into code for us, relieving us of the burden of precisely communicating our intent. Or maybe some sufficiently smart AI-powered IDE will auto-complete most of our program for us.

    I don’t believe this either, and I think Dijkstra did an excellent job of dismantling this line of reasoning in his essay: On the foolishness of “natural language programming”.

Truthfully, I’m not certain what is the correct answer, but I’ll present my own guess for what the end of history for programming looks like.

Mathematical DSLs

My view is that the next logical step for programming is to split into two non-overlapping programming domains:

  • runtime building for …
  • … mathematical programming languages

Specifically, I expect programming languages to evolve to become more mathematical in nature where the programs users author to communicate their intent resemble pure mathematical expressions.

For example, consider the following mathematical specifications of the boolean logical “and” operator and the function composition operator:

True  && x     = x
x     && True  = x
False && False = False

(f . g)(x) = f(g(x))

These mathematical specifications are also executable Haskell code (albeit with extra parentheses to resemble mainstream function syntax). Haskell is one example of a programming language where the code aspires to resemble pure mathematical expressions and definitions.

Any language presenting such an idealized mathematical interface requires introducing a significant amount of complexity “under the hood” since the real world is messy. This is where runtime building comes into play to gloss over such ugly details.

In other words, I’m predicting that the end of history for programming is to become an interface bridging pure mathematical expressions to the real world.

The past and present

Let me give a few examples of where this trend towards mathematical userland code is already playing out:

  • Memory management

    Manual memory management used to be a “userland” concern for most programming languages, but the general trend for new languages is automatic memory management (with the exception of Rust). Memory management used to be an explicit side effect that programmers had to care about and pushing memory management down into the runtime (via garbage collection or other methods) has made programming languages more pure, allowing them to get one step closer to idealized mathematical expressions.

    Indeed, Rust is the exception that proves the rule, as Rust is widely viewed as better suited for building runtimes rather than being used for high-level specification of intent.

  • Functional programming (especially purely functional programming)

    Functional programming is a large step towards programming in a more mathematical style that prefers:

    • expressions over statements
    • pure functions over side effects
    • algebraic datatypes over objects
    • recursion over loops

    I cover this in more detail in Why I prefer functional programming.

    However, functional programming is not a free win. Supporting higher-order functions and closures efficiently isn’t easy (especially for compiled languages), which is why less sophisticated language implementations tend to be more imperative and less functional.

  • Evaluation order (especially laziness)

    I’ve always felt that “lazy evaluation” doesn’t do a good job of selling the benefits of Haskell’s evaluation model. I prefer to think of Haskell as having “automatic evaluation management”1. In other words, the programmer specifies the expressions as a graph of dependent computations and the runtime figures out the most efficient order in which to reduce the graph.

    This is yet another example of where we push something that used to be a userland concern (order of evaluation) into the runtime. Glossing over evaluation order frees us to specify things in a more mathematical style, because in mathematics the evaluation order is similarly irrelevant.

The present and future

A common pattern emerges when we study the above trends:

  • Push a userland concern into a runtime concern, which:
  • … makes programs more closely resemble pure mathematical expressions, and:
  • … significantly increases the complexity of the runtime.

You might wonder: what are some other userland concerns that might eventually get pushed into runtime concerns in the near future? Some examples I can think of are:

  • Package management

    This one is so near in the future that it’s already happening (see: Nix and Dhall). Both languages provide built-in support for fetching code, rather than handling packages out-of-band using a standalone package management tool. This language-level support allows programs to embed external code as if it were a pure sub-expression, more closely approximating the mathematical ideal.

  • Error handling

    This one requires more explanation: I view type systems as the logical conclusion of pushing error handling into the “runtime” (actually, into the type-checker, not the runtime, to be pedantic). Dhall is an example of a language that takes this idea to the extreme: Dhall has no userland support for raising or catching errors because all errors are type errors2.

    Advances in dependently-typed and total functional programming languages get us closer to this ideal of pushing error handling into a runtime concern.

  • Logging

    I’m actually surprised that language support for pervasively logging everything hasn’t been done already (or maybe it has happened and I missed it). It seems like a pretty mundane thing languages could implement, especially for application domains that are not performance-sensitive.

    Many languages already support profiling, and it seems like it wouldn’t be a big leap to turn profiling support into logging support if the user is willing to spend their performance budget on logging.

    Logging is one of those classic side effects that is an ugly detail that “ruins” code that would have otherwise been pure and mathematical.

  • Services

    Service-oriented architectures are another thing that tends to get in the way of writing pure side-effect-free code.

    I’m not exactly sure what a service-oriented language runtime would look like, but I don’t think current “serverless” solutions are what I have in mind. Something like AWS Lambda is still too low-level to promote code that is mathematical in nature. Like, if any part of the programming process involves using a separate tool to deploy or manage the serverless code then that is a significant departure from authoring pure mathematical expressions. There needs to be something like a “Nix or Dhall for serverless code”.

Conclusion

You might criticize my prediction by claiming that it’s not falsifiable. It seems like I could explain away any new development in programming as falling into the runtime building or mathematical expression category.

That’s why I’d like to stress a key pillar of the prediction: runtimes and mathematical expressions will become more sharply divided over time. This is the actual substance of the prediction and we can infer a few corrolaries from that prediction.

Currently, many mainstream programming paradigms and engineering organizations conflate the two responsibilities, so you end up with people authoring software projects that mix operational logic (runtime concerns) and “business logic” (mathematical intent).

What I predict will happen is that the field of engineering will begin to generate a sharp demand for people with experience in programming language theory or programming language engineering. These people will be responsible for building special-purpose languages and runtimes that abstract away as many operational concerns as possible to support pure mathematical domain-specific languages for their respective enterprise. These languages will in turn be used by a separate group of people whose aim is to translate human intent into mathematical expressions.

One consequence of this prediction is that you’ll begin to see a Cambrian explosion of programming languages in the near future. Naturally as language engineers push more operational concerns into the runtime they will need to more closely tailor the runtime to specific purposes or organizational needs rather than trying to author a general-purpose language. In other words, there will be a marked fragmentation of language runtimes (and type-checkers) as each new language adapts to their respective niche.

Despite runtime fragmentation, you will see the opposite trend in userland code: programs authored in these disparate languages will begin to more closely resemble one another as they become more mathematical in nature. In a sense, mathematical expressions will become the portable “lingua franca” of userland code, especially as non-mathematical concerns get pushed into each respective language’s runtime.

That is a prediction that is much easier to falsify.

Also, if you like this post then you will probably also enjoy the seminal paper: The Next 700 Programming Languages.


  1. Also, this interpretation is not that far from the truth as the Haskell standard only specifies a non-strict evaluation strategy. GHC is lazy, but Haskell the language standard does not require implementations to be lazy. For more details, see: Lazy vs. non-strict↩︎

  2. Okay, this is a bit of an over-simplification because Dhall has support for Optional values and you can model errors using unions in Dhall, but they are not commonly used in this way and idiomatically most Dhall code in the wild uses the type-checker to catch errors. Dhall is a total functional programming language and the language goes to significant lengths to discourage runtime errors, like forbidding comparing Text values for equality. Also, the language is technically dependently typed and supports testing arbitrary code at type-checking time to catch errors statically.↩︎

Thursday, April 8, 2021

How to replace Proxy with AllowAmbiguousTypes

no-proxy

This post is essentially a longer version of this Reddit post by Ashley Yakeley that provides more explanation and historical context.

Sometimes in Haskell you need to write a function that “dispatches” only on a type and not on a value of that type. Using the example from the above post, we might want to write a function that, given an input type, prints the name of that type.

Approach 1 - undefined

One naive approach would be to do this:

class TypeName a where
    typeName :: a -> String

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

… which we could use like this:

>>> typeName False
"Bool"
>>> typeName (0 :: Int)
"Int"

However, this approach does not work well because we must provide the typeName method with a concrete value of type a. Not only is this superfluous (we don’t care which value we supply) but in some cases we might not even be able to supply such a value.

For example, consider this instance:

import Data.Void (Void)

instance TypeName Void where
    typeName _ = "Void"

There is a perfectly valid name associated with this type, but we cannot retrieve the name without cheating because we cannot produce a (total) value of type Void. Instead, we have to use something like undefined:

>>> typeName (undefined :: Void)
"Void"

The base package uses this undefined-based approach. For example, Foreign.Storable provides a sizeOf function that works just like this:

class Storable a where
    -- | Computes the storage requirements (in bytes) of the argument. The value
    -- of the argument is not used.
    sizeOf :: a -> Int

… and to this day the idiomatic way to use sizeOf is to provide a fake value using undefined:

>>> sizeOf (undefined :: Bool)
4

This works because sizeOf never evaluates its argument. It’s technically safe, albeit not very appealing to depend on undefined.

Approach 2A - Proxy

The next evolution of this approach was to use the Proxy type (now part of base in the Data.Proxy module). As the documentation notes:

Historically, Proxy :: Proxy a is a safer alternative to the undefined :: a idiom.

I’m not exactly sure what the name Proxy was originally meant to convey, but I believe the intention was that a term (the Proxy constructor) stands in as a “proxy” for a type (specifically, the type argument to the Proxy type constructor).

We can amend our original example to use the Proxy type like this:

import Data.Proxy (Proxy(..))
import Data.Void (Void)

class TypeName a where
    typeName :: Proxy a -> String

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

instance TypeName Void where
    typeName _ = "Void"

… and now we can safely get the name of a type without providing a specific value of that type. Instead we always provide a Proxy constructor and give it a type annotation which “stores” the type we wish to use:

>>> typeName (Proxy :: Proxy Bool)
"Bool"
>>> typeName (Proxy :: Proxy Int)
"Int"
>>> typeName (Proxy :: Proxy Void)
"Void"

We can simplify that a little bit by enabling the TypeApplications language extension, which permits us to write this:

>>> :set -XTypeApplications
>>> typeName (Proxy @Bool)
"Bool"
>>> typeName (Proxy @Int)
"Int"
>>> typeName (Proxy @Void)
"Void"

… or this if we prefer:

>>> typeName @Bool Proxy
"Bool"
>>> typeName @Int Proxy
"Int"
>>> typeName @Void Proxy
"Void"

Approach 2B - proxy

A minor variation on the previous approach is to use proxy (with a lowercase “p”) in the typeclass definition:

import Data.Void (Void)

class TypeName a where
    typeName :: proxy a -> String
    --          ↑

instance TypeName Bool where
    typeName _ = "Bool"

instance TypeName Int where
    typeName _ = "Int"

instance TypeName Void where
    typeName _ = "Void"

Everything else works the same, but now neither the author nor the consumer of the typeclass needs to depend on the Data.Proxy module specifically. For example, the consumer could use any other type constructor just fine:

>>> typeName ([] :: [Int])  -- Technically legal, but weird
"Int"

… or (more likely) the consumer could define their own Proxy type to use instead of the one from Data.Proxy, which would also work fine:

>>> data Proxy a = Proxy
>>> typeName (Proxy :: Proxy Int)
"Int"

This trick helped back when Proxy was not a part of the base package. Even now that Proxy is in base you still see this trick when people author typeclass instances because it’s easier and there’s no downside.

Both of these Proxy-based solutions are definitely better than using undefined, but they are both still a bit unsatisfying because we have to supply a Proxy argument to specify the desired type. The ideal user experience should only require the type and the type alone as an input to our function.

Approach 3 - AllowAmbiguousTypes + TypeApplications

We previously noted that we could shorten the Proxy-based solution by using TypeApplications:

>>> typeName @Bool Proxy
"Bool"

Well, what if we could shorten things even further and just drop the Proxy, like this:

>>> typeName @Bool

Actually, we can! This brings us to a more recent approach (the one summarized in the linked Reddit post), which is to use AllowAmbiguousTypes + TypeApplications, like this:

{-# LANGUAGE AllowAmbiguousTypes #-}

import Data.Void (Void)

class TypeName a where
    typeName :: String

instance TypeName Bool where
    typeName = "Bool"

instance TypeName Int where
    typeName = "Int"

instance TypeName Void where
    typeName = "Void"

… which we can invoke like this:

>>> :set -XTypeApplications
>>> typeName @Bool
"Bool"
>>> typeName @Int
"Int"
>>> typeName @Void
"Void"

The use of TypeApplications is essential, since otherwise GHC would have no way to infer which typeclass instance we meant. Even a type annotation would not work:

>>> typeName :: String  -- Clearly, this type annotation is not very helpful

<interactive>:1:1: error:
Ambiguous type variable ‘a0’ arising from a use of ‘typeName’
      prevents the constraint ‘(TypeName a0)’ from being solved.
      Probable fix: use a type annotation to specify what ‘a0’ should be.
      These potential instances exist:
        instance [safe] TypeName Void -- Defined at Example.hs:14:10
        instance [safe] TypeName Bool -- Defined at Example.hs:8:10
        instance [safe] TypeName Int -- Defined at Example.hs:11:10
In the expression: typeName :: String
      In an equation for ‘it’: it = typeName :: String

Type applications work here because you can think of a polymorphic function as really having one extra function argument: the polymorphic type. I elaborate on this a bit in my post on Polymorphism for dummies, but the basic idea is that TypeApplications makes this extra function argument for the type explicit. This means that you can directly tell the compiler which type to use by “applying” the function to the right type instead of trying to indirectly persuade the compiler into using the the right type with a type annotation.

Conclusion

My personal preference is to use the last approach with AllowAmbiguousTypes and TypeApplications. Not only is it more concise, but it also appeals to my own coding aesthetic. Specifically, guiding compiler behavior using type-annotations feels more like logic programming to me and using explicit type abstractions and TypeApplications feels more like functional programming to me (and I tend to prefer functional programming over logic programming).

However, the Proxy-based approach requires no language extensions, so that approach might appeal to you if you prefer to use the simplest subset of the language possible.