Monday, October 16, 2017

Advice for Haskell beginners

This post summarizes advice that I frequently give to Haskell beginners asking how to start out learning the language

First, in general I recommend reading the Haskell Programming from first principles book, mainly because the book teaches Haskell without leaving out details and also provides plenty of exercises to test your understanding. This is usually good enough if you are learning Haskell as your first language.

However, I would like to give a few additional tips for programmers who are approaching Haskell from other programming languages.

Learn Haskell for the right reasons

Some people learn Haskell with the expectation that they will achieve some sort of programming enlightenment or nirvana. You will be disappointed if you bring these unrealistic expectations to the language. Haskell is not an achievement to unlock or a trophy to be won because learning is a never-ending process and not a finish line.

I think a realistic expectation is to treat Haskell as a pleasant language to use that lets you focus on solving real problems (as opposed to wasting your time fixing silly self-induced problems like null pointers and "undefined is not a function").

Avoid big-design-up-front

Haskell beginners commonly make the mistake of trying to learn as much of the language as possible before writing their first program and overengineering the first draft. This will quickly burn you out.

You might come to Haskell from a dynamically typed background like JavaScript, Python, or Ruby where you learned to avoid refactoring large code bases due to the lack of static types. This aversion to refactoring promotes a culture of "big-design-up-front" where you try to get your project as close to correct on the first try so that you don't have to refactor your code later.

This is a terrible way to learn Haskell, for two reasons. First, Haskell has a much higher ceiling than most other programming languages, so if you wait until you hit that ceiling before building something you will wait a looooooong time. Second, refactoring is cheap in Haskell so you don't need to get things right the first time.

You will accelerate your learning process if you get dirty and make mistakes. Write really ugly and embarrassing code and then iteratively refine your implementation. There is no royal road to learning Haskell.

Avoid typeclass abuse

Specifically, avoid creating new typeclasses until you are more comfortable with the language.

Functional programming languages excel because many language features are "first class". For example, functions and effects are first class in Haskell, meaning that you can stick them in a list, add them, nest them, or pass them as arguments, which you can't (easily) do in imperative languages.

However, typeclasses are not first-class, which means that if you use them excessively you will quickly depend on advanced language features to do even simple things. Programming functionally at the term-level is much simpler and more enjoyable than the type-level Prolog that type-classes encourage.

Begin by learning how to solve problems with ordinary functions and ordinary data structures. Once you feel like you understand how to solve most useful problems with these simple tools then you can graduate to more powerful tools like typeclasses. Typeclasses can reduce a lot of boilerplate in proficient hands, but I like to think of them as more of a convenience than a necessity.

You can also take this approach with you to other functional languages (like Elm, Clojure, Elixir, or Nix). You can think of "functions + data structures" as a simple and portable programming style that will improve all the code that you write, Haskell or not.

Build something useful

Necessity is the mother of invention, and you will learn more quickly if you try to build something that you actually need. You will quickly convince yourself that Haskell is useless if you only use the language to solve Project Euler exercises or Sudoku puzzles.

You are also much more likely to get a Haskell job if you have a portfolio of one or two useful projects to show for your time. These sorts of projects demonstrate that you learned Haskell in order to build something instead of learning Haskell for its own sake.

Conclusion

Hopefully these tips will help provide some guard rails for learning the language for the first time. That's not to say that Haskell is perfect, but I think you will enjoy the language if you avoid these common beginner pitfalls.

Saturday, October 7, 2017

Why do our programs need to read input and write output?

I wrote this post to challenge basic assumptions that people make about software architecture, which is why I chose a deliberately provocative title. You might not agree with all the points that I am about to make, but I do hope that this post changes the way that you think about programming

This post is an attempt to restate in my own words what Conal Elliot (and others before him) have been saying for a while: modern programming is a Rube-Goldberg machine that could be much simpler if we change the way we compose code.

Most programmers already intuitively understand this at some level. They will tell you that the programming ecosystem is deeply flawed, fragmented, and overly complex. I know that I felt that way myself for a long time, but in retrospect I believe that my objections at the time were superficial. There were deeper issues with programming that I was blind to because they are so ingrained in our collective psyche.

Disclaimer: This post uses my pet configuration language Dhall to illustrate several points, mainly because Dhall is a constructive proof-of-concept of these ideas. The purpose of this post is not so much to convince you to use Dhall but rather to convince you to think about software in a new way

Input and output

Consider the title of this post for example:

"Why do our programs need to read input and write output?"

Most people will answer the title with something like:

  • "Our programs need a way to communicate with the outside world"
  • "Programs need to do something other than heat up CPUs"

Now suppose I phrased the question in a different way:

"What if only the compiler could read input and write output?"

"What's the difference?", you might ask. "Surely you mean that the language provides some library function that I can call to read or write input to some handles, whether they be file handles or standard input/output."

No, I mean that only the compiler implementation is allowed to read input or write output, but programs written within the compiled language cannot read input or write output. You can only compute pure functions within the language proper.

Again, this probably seems ridiculous. How could you communicate at all with the program?

Imports

Most languages have some way to import code, typically bundled in the form of packages. An imported function or value is something that our compiler reads as input statically at compile time as opposed to a value read by our program dynamically at run time.

Suppose I told you that our hypothetical programming language could only read input values by importing them

"Ridiculous!" you exclaim as you spit out your coffee. "Nobody would ever use such a language." You probably wouldn't even know where to begin since so many things seem wrong with that proposal

Perhaps you would object to the heavyweight process for publishing and subscribing to new values. You would recite the package management process for your favorite programming language:

  • Create a source module containing your value
  • Create a standard project layout
  • Create a package description file
  • Check your project into version control
  • Publish your package to a package repository

Perhaps you would object to the heavyweight process for configuring programs via imports? Your favorite programming language would typically require you to:

  • Retrieve the relevant program from version control
  • Modify the project description file to reference the newly published dependency
  • Modify project code to import your newly published value
  • Compile the program
  • Run the program

Why would a non-technical end user do any of that just to read and write values?

This is exactly the Rube-Goldberg machine I'm referring to. We have come to expect a heavyweight process for source code to depend on other source code

Importing paths

Distributing code doesn't have to be heavyweight, though. Consider Dhall's import system which lets you reference expressions directly by their paths. For example, suppose we saved the value True to a file named example.dhall:

$ echo 'True' > example.dhall

Another Dhall program can reference the above file anywhere the program expects a boolean value:

$ dhall <<< './example.dhall || False'
Bool

True

This is the exact same as if we had just replaced the path with the file's contents:

$ dhall <<< 'True || False'
Bool

True

Dhall doesn't need to support an explicit operation to read input because Dhall can read values by just importing them

Similarly, Dhall doesn't need to support an explicit write operation either. Just save a Dhall expression to a file using your favorite text editor.

"What if I need a way to automate the generation of files?"

You don't need to automate the process of saving a file because one file is always sufficiently rich to store as much information as you need. Dhall is a programmable configuration language which supports lists and records so any one file can store or programmatically generate any number of values. Files are human-generated artifacts which exist purely for our convenience but Dhall code does not behave any differently whether or not the program spans multiple files or a single file.

Most of the time people need to automate reads and writes because they are using non-programmable configuration file formats or data storage formats

Programmable configuration

You might object: "Configuration files shouldn't be Turing-complete!"

However, Dhall is not Turing-complete. Dhall is a total programming language, meaning that evaluation eventually halts. In practice, we don't actually care that much if Dhall halts, since we cannot guarantee that the program halts on reasonable human timescales. However, we can statically analyze Dhall and most of the Halting Problem objections to static analysis don't apply.

For example, Dhall can statically guarantee that programs never fail or throw exceptions (because obviously a configuration file should never crash). Dhall also lets you simplify confusing files by eliminating all indirection because Dhall can reduce every program to a canonical normal form.

In fact, most objections to programmable configuration files are actually objections to Turing-completeness

Importing URLs

Dhall also lets you import code by URL instead of path. Dhall hosts the Prelude of standard utilities online using IPFS (a distributed hashtable for the web),and you can browse the Prelude using this link, which redirects to the latest version of the Prelude:

For example, you can browse the above Prelude to find the not function hosted here:

... which has the following contents:

$ curl https://ipfs.io/ipfs/QmQ8w5PLcsNz56dMvRtq54vbuPe9cNnCCUXAQp6xLc6Ccx/Prelude/Bool/not
{-
Flip the value of a `Bool`

Examples:

```
./not True = False

./not False = True
```
-}
let not : Bool → Bool
    =   λ(b : Bool) → b == False

in  not

... and we can "apply" our URL to our file as if they were both just ordinary functions and values:

$ dhall <<< 'https://ipfs.io/ipfs/QmQ8w5PLcsNz56dMvRtq54vbuPe9cNnCCUXAQp6xLc6Ccx/Prelude/Bool/not ./example.dhall'
Bool

False

... or we could use let-expressions to improve readability without changing the result:

    let not = https://ipfs.io/ipfs/QmQ8w5PLcsNz56dMvRtq54vbuPe9cNnCCUXAQp6xLc6Ccx/Prelude/Bool/not

in  let example = ./example.dhall

in  not example

"That's a security vulnerability!", you protest. "You are ... literally ... injecting remote code into your program."

Playing the devil's advocate, I ask you what is wrong with remote code injection

"Well, for starters, if the URL is compromised the attacker can run arbitrary code like ..."

... reading and writing files? Dhall is totally pure and doesn't support any effects at all (besides heating up CPUs ☺).

This brings us full circle back to our original question:

"Why do our programs need to read input and write output?"

Conclusion

Software doesn't have to be so complex if we challenge our assumptions that got us into this mess

These ideas were heavily inspired the following post by Conal Elliot:

... and this post is not about Dhall so much as Conal's vision of an effect-free purely functional future for programming. I believe explicitly reading input and writing output will eventually become low-level implementation details of higher-level languages, analogous to allocating stack registers or managing memory.