Thursday, February 21, 2019

Dhall Survey Results (2018-2019)

2018-2019-dhall-survey

The results from the latest Dhall survey are in, which you can view here:

… and I would like to thank everybody who took the time to participate in the survey!

Adoption

This year 61 people completed the survey (compared to 19 last year), so we have a greater sample size to inform the future direction of the language.

Here is the breakdown of how often people used Dhall:

  • 07 (11.7%) - Never used it
  • 22 (36.7%) - Briefly tried it out
  • 11 (18.3%) - Use it for my personal projects
  • 19 (31.7%) - Use it at work
  • 01 (01.7%) - Trying to convince work people to use it

I was pleasantly surprised by the fact that more people use Dhall at work than those who use Dhall solely for personal projects. That suggests to me that people who do enjoy using Dhall do not have difficulty getting permission from their manager or coworkers to also use Dhall at work. I could be wrong, though, because there might be sampling bias or insufficient data to get accurate numbers. Those numbers also don’t necessarily imply that people have convinced their coworkers to use Dhall and I plan to update the survey next year to ask about that, too.

Let me know if you think I’m wrong and you have difficulty getting permission to use Dhall at work. Providing a smooth adoption path has always been a high priority for me because I know from experience the difficulty of introducing new tools at work.

Reasons to adopt

Most people confirmed that they use Dhall for ops-related use cases:

Kubernetes only at the moment, but more to follow

To generate Kubernetes YAML configuration files

Kubernetes, Terraform and application configs

I’m still evaluating it. Currently I’m generating Prometheus (YAML) configuration from it.

As of now I am trying it to use it for generating concourse pipelines. I work at a very ops heavy company, I can see couple of our proprietary tools could also leverage dhall.

Kubernetes, custom config for Haskell projects

configuring GoCD yaml pipelines, https://github.com/tomzo/gocd-yaml-config-plugin

Generating config for Terraform, Packer and for our application configuration.

I’m trialing it for configuring various tools configured with yaml or json like docker-compose and some internal haskell tools.

Description language for a Terraform replacement tool.

… generate yaml files then read by ansible

We use it to validate we have all the required (and no excess) variables set for our deploy scripts via type checking. If we add a variable to one environment, but miss others, the builds abort before being shipped to the world. …

Configuration files for Elm, Packer …

Replace Nix code with something saner and generate configuration for Kubernetes.

Kubernetes, Kops, Concourse, Terraform, application config

The other specific use cases I noticed were:

  • configuring build tools
  • command-line interface
  • backend service configuration
  • wire format for transmitting code

This feedback is consistent with my understanding of how Dhall is used in the wild.

Each year I try port a difficult configuration file format to Dhall to stress test the language design. Last year I ported the nethack configuration format to Dhall and this year I plan to port an ops-related configuration format with a weakly-typed schema to inform the language design process.

Document best practices

Survey respondents very commonly requested use cases, cook books, design patterns, real-world examples, and project structure guidelines. This was far-and-away the most consistent feedback from the survey:

More real world examples, like CLI application config files. Or web server-client communication examples.

More blog posts/tutorials on use cases

Resources on the next steps beyond learning syntax: how to structure Dhall code, how to organise your files, design patterns, etc.

Guides/pointers to regular things like newtypes, String equality, Homogeneous record/map constraints?

A set of documented best practices for doing common things

What would make me really happy is to see some guidelines, patterns or examples for how to support evolving schemas for clients you don’t control. …

Full list of possible uses with detailed examples and comparisons with similar tools.

Add several complete realistic examples (besides the existing snippets).

Learning curve through examples, starter apps, tutorials

… docs, examples

… basic usage patterns (newtypes, sums, comparisons)

A use-case.

I think Dhall should develop on making its way into common use cases boosting its clout in the industry.

End-user ergonomics and patterns. Pain points will come up rapidly if you start writing libraries to configure various popular tools. Those pain points should guide possible changes to the language, and the patterns developed need to be put front and center in a cookbook or something because there are many ways to tackle problems in Dhall and the best ways aren’t always obvious.

Widely used, typical use cases. Most people configure something “simple”, show me how Dhall can improve my workflow (validation against a “schema”, deduplication, sharing of common config between applications). The initial impression from the Dhall Github and website is that its very powerful but most configuration is dead simple, just very verbose…

Documentation.

Docs, …

examples how to approach a domain generically…

Last time I checked, the Dhall doc was very focused on the language itself, and the configuration part was kind of forgotten. Also, having the example with unicode syntax, while cool, make them hard/impossible to type along. I think having a few more docs about basic usecase to translate an existing json/yaml config into dhall would help adoption.

Nothing fancy with language feature, simply having a statically typed configuration, and optionally a type safe way to read it and map it to host language structure would be a very good start already.

Documentation should provide more pointers to idiomatic code. We based most of our current development off the dhall-kubernetes model which leads to dozens of boilerplate type imports at the beginning of files just to see how dhall-terraform makes all of their types available in a single expression ( https://github.com/blast-hardcheese/dhall-terraform/blob/master/Terraform/Providers/Datadog/Types.dhall ). Writing a cookbook would help with this.

This is also consistent with the highest rated funding mechanism: “Books and/or merchandise” (See below). I suspect that most people who selected that option were interested in the book.

This feedback surprised me because I was still overly focused on improving the documentation for lower-level language features. So thank you to everybody who pointed this out because this was a huge blind spot of mine.

So my plan is to initially focus on documenting the current state-of-the-art best practices as one of my highest priorities and keeping that document up to date as the language evolves. This sort of technical writing is also the kind of thing I enjoy doing anyway! :)

Language bindings and/or integrations

People also very commonly requested more language bindings and integrations with existing tools:

python library. having haskell is great, but we have a lot of python and C# as well and it would be great to use it from there too

Bindings for languages I have to work with (Java/Python).

Integration with NixOS

Tighter integration with Nix

… would like to use it for nix

Compilation to Java, Haskell, etc.

Scala integration, complete Kubernetes API in dhall-kubernetes

More DSLs a-la dhall-kubernetes. IMHO the assurance to have a well-formed (in the sense of API conformance) config, is a huge plus.

A complete Scala implementation, as we use almost only Scala at work

A JSON->dhall process would be a big help to import complex data sources. There is a dhall-terraform effort underway, but we have decently large number of terraform (HCL) files, and being able to convert HCL->json->dhall would mean I’m completely free of some really annoying restrictions HCL imposes on me.

Import from Jason/yaml

Python interpreter and syntax highlight in github

… more language/tool integration

Language Bindings …

More language bindings (e.g. Go and Python are two big Ops markets)

I think golang bindings would potentially help with lots of the things I care about in my day job: terraform, prometheus, kubernetes, cloud foundry…

other language integration

Bindings/libraries for other languages.

Language bindings, …

Getting more software to adopt dhall is a novel goal. But I see the modern ops world is full of golang software which would require golang bindings for dhall. It’s probably not a good language to write the bindings (or anything really), but it’s popularity might mean that for dhall’s success golang bindings are necessary.

Having more good language implementation, into industrial programming languages, and get rid of Yaml.

Bindings to increase adoption.

More languages…Ocaml maybe?

Cleanbindings https://wiki.clean.cs.ru.nl/

More integrations.

Using dhall from more languages would help adoption IMO. A statically-compiled lib with bindings in JS, python, java/scala would be good (and less work than implementing the language itself in all those languages). More projects like dhall-kubernetes are also a good way to drive adoption.

Being able to more quickly add it to existing code bases would be helpful. Going from JSON->dhall and then dhall being able to spit out nix/json, without needing a linter for the down stream languages makes life easier. …

I do plan to address the request to import Dhall values from JSON this year and this issue tracks the discussion and work on that:

On the other hand, I do not yet plan to personally contribute a new language binding, mainly because I want to distribute control of the language evolution process. Each reimplementation of the language gets a vote on proposed language changes and if I contribute more than one language binding then I get a disproportionate voice. Instead, I focus on making it as easy as possible for others to port the language by improving the documentation and automation surrounding the language standard.

I have no intention of becoming a benevolent dictator-for-life over the language. For example, the other voting member (Fabrizio Ferrai, who maintains the Clojure bindings to Dhall) plans to author their own interpretation of the survey feedback to parallel this post. Also, hopefully there will be two new voting members this year since the Python and PureScript language bindings are getting close to completion.

However, I can still help recruit donations for both new and existing language bindings. If you work on any language binding to Dhall and you would like to be paid for your work then set up a Patreon account or similar donation mechanism and I will help advertise that. This is an area where I recommend using distributed, non-corporate sources of funding to ensure that the language evolution process remains as democratic as possible.

Performance

Performance was another common theme:

Faster performance; …

Speed. I’d really like to mix it into things more freely as a templating language, but it’s too noticeable a slow down.

If dhall-kubernetes finally becomes fast, our whole infrastructure config set (at Wire) can move there

Performance and UX. Performance currently is very bad for large dhall projects (like dhall-kubernetes)

Performance for larger scale usage (such as nix)

Speed.

I agree with this feedback and improving performance is generally a never-ending process because every time I improve the interpreter performance people begin using the language for even more ambitious projects that strain the interpreter (myself included).

Because performance is an open-ended problem this is one of the areas I’m most likely to solicit donations to fund somebody to improve interpreter performance. That would also free me up to do more technical writing for the Dhall ecosystem.

If you are able and willing to improve the performance of the interpreter then let me know and I’ll work with you to secure some donation mechanism to fund your work. I am reasonably confident there are a few companies using Dhall that would fund improvements to interpreter performance. Similarly, if you are a company that can spare some budget to fund performance improvements, also reach out to me :)

Default record values

Another common theme was that people are struggling to port Dhall to some configuration formats that have optionally present keys:

Optionally present keys in a dictionary

… Defaults for records

Having optionals that don’t need to be maked with None when not present

I’d love Dhall to become a type-safe version of Nix (the language) or a handy language for configuration files, but there are no optionally present keys in dictionaries now, so I cannot use Dhall for config files yet. Currently I’ll just stick with YAML

Dhall actually does have a design pattern for this sort of idiom, which is to override a default record with the present values, like this:

defaultRecord // { foo = 1 }

… so this might just be another case for needing better documentation for best practices. However, if you are not satisfied with that idiom then I invite you to open an issue on the issue tracker with your thoughts on the subject:

Language stability / migration

People are beginning to request greater language stability as they begin to adopt Dhall at work:

Greater stability in the core language and prelude. There have been a bunch of changes last year, that while I agree with their purpose, I cranked down on adding more usage of Dhall until the ecosystem is more stable to avoid version brittleness

Smoother transitions between standard versions. Maybe if the last two versions of standard were supported we could provide deprecation warnings, and have something like rehash subcommand. This would allow us to update hash of existing imports only if existing (old) hash is valid.

  1. Stabilizing the language standard; 2. Stabilizing the prelude API.

… stability of upstream packages

There is one feature in the pipeline which should improve things here, which is support for stable hashes (i.e. semantic integrity checks):

This pain point was one of the most commonly cited issues with upgrading to the next language version and this will be available in the next release of the language standard and the Haskell interpreter (along with backwards-compatible support for older semantic integrity checks).

Besides that, I don’t expect new language features to be too disruptive at this point. At this point I expect most new features to the language to be additive to the language and the most disruption that users might run into is some identifier becoming a reserved keyword.

Also, the Haskell implementation takes pains to make the migration process smoother by providing dhall lint support for migrating old code to new idioms. For example, dhall lint will automatically rewrite the old syntax for Optional literals to use the newer syntax.

One of the features of the Dhall language’s standardization process is that every new language binding adds a new vote on proposed changes, which raises the bar for changing the language. So as the language grows more mainstream I expect the language standard to stabilize due to this effect.

No Unicode in examples

Three respondents disliked the use of Unicode in tutorials and examples. This was an unusually specific point of feedback that recurred a few times throughout the responses:

Get rid of unicode. It is cool but it really scares off beginners to see examples with unicode.

Better documentation, without unicode symbols.

removing unicode syntax

I don’t have any objection to switching examples to use ASCII instead of Unicode, so I will do that. I think Unicode is prettier, but if it gets in the way of the learning process then I’ll switch to ASCII.

For people who hate Unicode syntax, period, the dhall command-line tool also supports an --ascii flag for emitting ASCII when formatting or linting code.

Developer experience

Most of the developer experience feedback revolved around editor support, although there were a few other recommendations:

Better error messages (I mean, I can tell a lot of work has gone into them, which i really appreciate, but I am still often baffled by them)

A community mailing list where I could ask dumb questions when I get stuck. There’s a stackoverflow tag but it seems quite low-traffic so I was put off by that.

… editor support

… editor integrations, …

Release on OSX/Windows …

Type / syntax errors that are easier to visually parse.

Ergonomics - language server, support for auto-complete in editors, seeing type errors in the editor without needing to run the dhall executable separately

I love Gabriel’s idea on LSP support to provide support for all editors in his state of Dhall address blog post. Making the development experience (including stability of the language) will be key to letting the configs loose outside of the services team which is a little more adept at employing functional and typed methods and tools than our Ruby applications developers on the adjacent team we support.

I’d love to be able to specify a dhall file as an argument to dhall/dhall-to-* rather than feeding to STDIN. Pipes and redirection can get clumsy when incorporating into scripts.

As I mentioned in my previous post, one goal we have for this year is to add a Dhall implementation of the language server protocol:

… and as I’m writing up this post a contributor has already done some initial groundwork on this (see the above issue).

Other language features

The most commonly requested language feature was (bidirectional) type inference:

Type inference

Also, having to literally specify type parameters to everything forces you to name and define all sorts of intermediate types, which makes things very messy. I don’t know enough about Dhall to know if this is possible/desirable to avoid

Lack of gradual typing for type arguments is the biggest deal breaker - it breaks the whole “gradual typing” story

Easing migrating existing large structures to Dhall by ensuring gradual typing is viable

I don’t expect Dhall to get bidirectional type inference at this point. The two main reasons are:

  • This is harder to standardize and port to multiple languages
  • This would be very disruptive at a point where we are trying to stabilize the language

So in my opinion that ship has sailed, although I no longer have veto power over proposed changes so there is still always the possibility that somebody puts forth a compelling proposal for type inference that proves me wrong.

Building maps in yaml/json with a dynamic set of keys is a challenge. The other side of the fence is less strongly typed and so they define fields that may or may not be present, or are keyed on userland values. This sucks but if dhall is going to target tools that use yaml/json, there needs to be a way with good ergonomics to build those values. …

I do plan on adding better support for working with weakly-typed dictionaries. As I mentioned earlier, I plan to port one of the more difficult ops-related configuration formats to Dhall to guide the language design and this will inform how I propose to support these weakly-typed dictionaries.

Turing completeness

I don’t plan on making the language Turing-complete. The absence of Turing completeness is Dhall’s sine qua non.

The ability to to simple math on anything other than Nats. Being able to add or multiply two Doubles would vastly increase Dhall’s usefulness to me.

I’d recommend opening an issue for this if you are interested in Double arithmetic. There is a bit to discuss here that won’t fit in this post:

  1. Simpler syntax for input data to write configuration files. 2. Recursion.

I’m not sure if (1) would be solved by the planned support for importing from JSON (as JSON syntax is still a bit clumsy for people to author in my opinion), but at the moment there aren’t any other plans along those lines.

I also don’t think Dhall will get native language support for recursion (or at least not anytime soon). Recursion will likely remain a design pattern, as described in this document:

… rather than a language feature.

Preserving comments

Two people specifically requested support for fixing a very specific bug, which is that the Haskell implementation swallows all comments except the leading comment when formatting code:

Preserving comments in the output of dhall lint.

https://github.com/dhall-lang/dhall-haskell/issues/145

I hope to get to this soon, because I understand what a pain this is (it bites me, too).

Packaging

Better “package” story. …

Packaging/versioning. Importing from URL makes builds prone to flakiness, local caching helps but is insufficient. Currently using Make to clone repositories at given tags and import locally, which can be a hassle since Make won’t fetch transitive dependencies for the types/expressions that are cloned. Expressions which import types off local disk refer to relative locations, which are prone to breakage if those locations change i.e. due to refactoring. Some kind of standard which allows for a Cargo-style definition of imported types, locked to a version, with a separate lifecycle for resolving imports would improve stability.

I’m a bit resistant to this feedback (although I might have misunderstood it). I think the traditional packaging model adds a lot of unnecessary complexity to distributing code. I view Dhall as the “Bash of functional programming”, where the way you author code is you just save a file and the way you consume code is you refer directly to the file you want (except that Dhall fixes some mistakes Bash made in this regard).

The other reason I question the traditional packaging model is that I don’t see the value that it adds above hosting Dhall code on GitHub or GitLab, especially given that Dhall can import from URLs and resolve relative references correctly.

However, I do think there is value in making Dhall packages easier to discover, browse, and document (i.e. like Hackage for the Haskell ecosystem).

Possibly old interpreters

… less verbose support for using union types more easily

Making it simpler to declare Sum types

I believe these respondents may be using an older version of the interpreter because this should be addressed in the latest release of the language and Haskell interpreter. See this page for more details:

However, reading this makes me realize that next year I should add a question asking respondents what implementation and what version they use. That would also help me gauge how long to support deprecated features (such as the constructors keyword).

Funding

“Books and/or merchandise” was the clear leader for funding mechanism, although I suspect that’s primarily because people want a book (paid or not) based on the overwhelming feedback in favor of documentation:

  • Books and/or merchandise - 24 (52.2%)
  • Crowdfunding (recurring) - i.e. Patreon - 18 (39.1%)
  • Donation button - 14 (30.4%)
  • Project bounties - 14 (30.4%)
  • Crowdfunding (one-time) i.e. Kickstarter - 9 (19.6%)
  • Opening PRS myself :) - 1 (2.2%)
  • Consulting - 1 (2.2%)
  • Open source sponsorship - 1 (2.2%)
  • Company donation for time/moral license - 1 (2.2%)

Note that I also plan to apply for grants for open source work. I didn’t list that as a funding option because my mind was already made up on that.

The funding mechanism that surprised me the most was “Donation button”. I thought that was something that people didn’t really do any more (i.e. not “hip”), but it was tied for third-most popular funding mechanism.

I didn’t list consulting as one of the funding mechanisms (which one respondent had to write in), because I’ve heard from a few sources that consulting can create perverse incentives because simplifying things means fewer billable hours. However, on more reflection I think it might be worth getting corporate sponsorship for performance improvements to the language (as previously mentioned), because that’s less likely to create the wrong incentives or lead to undue corporate influence over the language evolution.

Conclusions

This post doesn’t include the comments of praise in the interest of modesty, but I did read them all and really appreciated them! 🙂

Hopefully this gives people an idea of where my current priorities are at and also helps others understand how they might be able to contribute to the Dhall ecosystem, too!

Also, if this post is the first you are hearing about the survey, you can still complete the survey and I’ll read your response even if it won’t be summarized in this post. I still get an e-mail notification for each new submission.

You can also use the Dhall language’s issue tracker to provide feedback of any kind, too. For more details on how to discuss, propose, or contribute changes, see the following guide:

Monday, February 11, 2019

Haskell command-line utility using GHC generics

cli-twitter

Today, Justin Woo wrote a post about writing a simple Haskell command-line utility with minimal dependencies. The utility is a small wrapper around the nix-prefetch-git command.

In the post he called out people who recommend overly complex solutions on Twitter:

Nowadays if you read about Haskell on Twitter, you will quickly find that everyone is constantly screaming about some “advanced” techniques and trying to flex on each other

However, I hope to show that we can simplify his original solution by taking advantage of just one feature: Haskell’s support for generating code from data-type definitions. My aim is to convince you that this Haskell feature improves code clarity without increasing the difficulty. If anything, I consider this version less difficult both to read and write.

Without much ado, here is my solution to the same problem (official Twitter edition):

{-# LANGUAGE DeriveAnyClass        #-}
{-# LANGUAGE DeriveGeneric         #-}
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE OverloadedStrings     #-}
{-# LANGUAGE RecordWildCards       #-}

import Data.Aeson (FromJSON, ToJSON)
import Data.Text (Text)
import Options.Generic (Generic, ParseRecord)

import qualified Data.Aeson
import qualified Data.ByteString.Lazy
import qualified Data.Text.Encoding
import qualified Data.Text.IO
import qualified Options.Generic
import qualified Turtle

data Options = Options
    { branch   :: Bool
    , fetchgit :: Bool
    , hashOnly :: Bool
    , owner    :: Text
    , repo     :: Text
    , rev      :: Maybe Text
    } deriving (Generic, ParseRecord)

data NixPrefetchGitOutput = NixPrefetchGitOutput
    { url             :: Text
    , rev             :: Text
    , date            :: Text
    , sha256          :: Text
    , fetchSubmodules :: Bool
    } deriving (Generic, FromJSON)

data GitTemplate = GitTemplate
    { url    :: Text
    , sha256 :: Text
    } deriving (Generic, ToJSON)

data GitHubTemplate = GitHubTemplate
    { owner  :: Text
    , repo   :: Text
    , rev    :: Text
    , sha256 :: Text
    } deriving (Generic, ToJSON)

main :: IO ()
main = do
    Options {..} <- Options.Generic.getRecord "Wrapper around nix-prefetch-git"

    let revisionFlag = case (rev, branch) of
            (Just r , True ) -> "--rev origin/" <> r
            (Just r , False) -> "--rev " <> r
            (Nothing, _    ) -> ""

    let url = "https://github.com/" <> owner <> "/" <> repo <> ".git/"

    let command =
            "GIT_TERMINAL_PROMPT=0 nix-prefetch-git " <> url <> " --quiet " <> revisionFlag

    text <- Turtle.strict (Turtle.inshell command Turtle.empty)

    let bytes = Data.Text.Encoding.encodeUtf8 text

    NixPrefetchGitOutput {..} <- case Data.Aeson.eitherDecodeStrict bytes of
        Left  string -> fail string
        Right result -> return result

    if hashOnly
    then Data.Text.IO.putStrLn sha256
    else if fetchgit
    then Data.ByteString.Lazy.putStr (Data.Aeson.encode (GitTemplate {..}))
    else Data.ByteString.Lazy.putStr (Data.Aeson.encode (GitHubTemplate {..}))

This solution takes advantage of two libraries:

  • optparse-generic

    This is a library I authored which auto-generates a command-line interface (i.e. argument parser) from a Haskell datatype definition.

  • aeson

    This is a library that generates JSON encoders/decoders from Haskell datatype definitions.

Both libraries take advantage of GHC’s support for generating code statically from datatype definitions. This support is known as “GHC generics”. While a bit tricky for a library author to support, it’s very easy for a library user to consume.

All a user has to do is enable two extensions:

… and then they can auto-generate an instance for any typeclass that implements GHC generics support by adding a line like this to the end of their data type:

You can see that in the above example, replacing SomeTypeClass with FromJSON, ToJSON, and ParseRecord.

And that’s it. There’s really not much more to it than that. The result is significantly shorter than the original example (which still omitted quite a bit of code) and (in my opinion) easier to follow because actual program logic isn’t diluted by superficial encoding/decoding concerns.

I will note that the original solution only requires using libraries that are provided as part of a default GHC installation. However, given that the example is a wrapper around nix-prefetch-git then that implies that the user already has Nix installed, so they can obtain the necessary libraries by running this command:

… which is one of the reasons I like to use Nix.