Wednesday, December 9, 2015

How to contribute to the Haskell ecosystem

I wanted to share a few quick ways that beginning Haskell programmers can contribute to the Haskell ecosystem. I selected these tasks according to a few criteria:

  • They are fun! These tasks showcase enjoyable tricks
  • They are easy! They straightforwardly apply existing libraries
  • They are useful! You can probably find something relevant to your project

For each task I'll give a brief end-to-end example of what a contribution might look like and link to relevant educational resources.

This post only assumes that you have the stack build tool installed, which you can get from haskellstack.com. This tool takes care of the rest of the Haskell toolchain for you so you don't need to install anything else.

Contribution #1: Write a parser for a new file format

Writing parsers in Haskell is just about the slickest thing imaginable. For example, suppose that we want to parse the PPM "plain" file format, which is specified like this [Source]:

Each PPM image consists of the following:

  1. A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P3".
  2. Whitespace (blanks, TABs, CRs, LFs).
  3. A width, formatted as ASCII characters in decimal.
  4. Whitespace.
  5. A height, again in ASCII decimal.
  6. Whitespace.
  7. The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
  8. A single whitespace character (usually a newline).
  9. A raster of Height rows, in order from top to bottom. Each row consists of Width pixels, in order from left to right. Each pixel is a triplet of red, green, and blue samples, in that order. Each sample is represented as an ASCII decimal number.

The equivalent Haskell parser reads almost exactly like the specification:

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad (guard)
import Data.Attoparsec.Text

data PPM = PPM
    { width             :: Int
    , height            :: Int
    , maximumColorValue :: Int
    , image             :: [[RGB]]
    } deriving (Show)

data RGB = RGB
    { red   :: Int
    , green :: Int
    , blue  :: Int
    } deriving (Show)

ppm3 :: Parser PPM
ppm6 = do
    "P3"
    skipMany1 space
    w <- decimal
    skipMany1 space
    h <- decimal
    skipMany1 space
    maxVal <- decimal
    guard (maxVal < 65536)
    space
    let sample = do
            lo <- decimal
            skipMany1 space
            return lo
    let pixel = do
            r <- sample
            g <- sample
            b <- sample
            return (RGB r g b)

    rows <- count h (count w pixel)
    return (PPM w h maxVal rows)

We can try to test our parser out on the following example file:

$ cat example.ppm
P6
4 4
255
0  0  0   100 0  0       0  0  0    255   0 255
0  0  0    0 255 175     0  0  0     0    0  0
0  0  0    0  0  0       0 15 175    0    0  0
255 0 255  0  0  0       0  0  0    255  255 255

We don't even have to compile a program to test our code. We can load our code into the Haskell REPL for quick feedback on whether or not our code works:

$ stack ghci attoparsec --resolver=lts-3.14
...
Prelude> :load ppm.hs
[1 of 1] Compiling Main             ( ppm.hs, interpreted )
Ok, modules loaded: Main.
*Main> txt <- Data.Text.IO.readFile "example.ppm"
*Main> parseOnly ppm3 txt
Right (PPM {width = 4, height = 4, maximumColorValue = 255, 
image = [[RGB {red = 0, green = 0, blue = 0},RGB {red = 100,
 green = 0, blue = 0},RGB {red = 0, green = 0, blue = 0},RGB
 {red = 255, green = 0, blue = 255}],[RGB {red = 0, green = 
0, blue = 0},RGB {red = 0, green = 255, blue = 175},RGB {red
 = 0, green = 0, blue = 0},RGB {red = 0, green = 0, blue = 0
}],[RGB {red = 0, green = 0, blue = 0},RGB {red = 0, green =
 0, blue = 0},RGB {red = 0, green = 15, blue = 175},RGB {red
 = 0, green = 0, blue = 0}],[RGB {red = 255, green = 0, blue
 = 255},RGB {red = 0, green = 0, blue = 0},RGB {red = 0, gre
en = 0, blue = 0},RGB {red = 255, green = 255, blue = 255}]]
})

Works like a charm!

You can very quickly get your hands dirty with Haskell by writing a parser that converts a file format you know and love into a more structured data type.

To learn more about parser combinators in Haskell, I highly recommend this "functional pearl":

... as well as this attoparsec tutorial:

To see a "long form" example of attoparsec, check out this HTTP request parser written using attoparsec:

I use "long form" in quotes because the entire code is around 60 lines long.

Contribution #2: Write a useful command-line tool

Haskell's turtle library makes it very easy to write polished command-line tools in a tiny amount of code. For example, suppose that I want to build a simple comand-line tool for managing a TODO list stored in a todo.txt file. First I just need to provide a subroutine for displaying the current list:

{-# LANGUAGE OverloadedStrings #-}

import Turtle

todoFile = "TODO.txt"

todoItem = d%": "%s

display :: IO ()
display = sh (do
    (n, line) <- nl (input todoFile)
    echo (format todoItem n line) )

... a subroutine for adding an item to the list:

add :: Text -> IO ()
add txt = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (input todoFile <|> pure txt)
    mv tempfile todoFile )

... and a subroutine for removing an item from the list:

remove :: Int -> IO ()
remove m = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (do
        (n, line) <- nl (input todoFile)
        guard (m /= n)
        return line )
    mv tempfile todoFile )

... then I can just wrap them in a command line API. I create a command line parser that runs display by default if the command line is empty:

parseDisplay :: Parser (IO ())
parseDisplay = pure display

... then a command line parser for the add subcommand:

parseAdd :: Parser (IO ())
parseAdd =
    fmap add
        (subcommand "add" "Add a TODO item"
            (argText "item" "The item to add to the TODO list") )

... and a command line parser for the remove subcommand:

parseRemove :: Parser (IO ())
parseRemove =
    fmap remove
        (subcommand "rm" "Remove a TODO item"
            (argInt "index" "The numeric index of the TODO item to remove") )

Finally, I combine them into a single composite parser for all three subcommands:

parseCommand :: Parser (IO ())
parseCommand = parseDisplay <|> parseAdd <|> parseRemove

... and run the parser:

main = do
    command <- options "A TODO list manager" parseCommand
    exists  <- testfile todoFile
    when (not exists) (touch todoFile)
    command

... and I'm done! That's the full program:

{-# LANGUAGE OverloadedStrings #-}

import Turtle

todoFile = "TODO.txt"

todoItem = d%": "%s

display :: IO ()
display = sh (do
    (n, line) <- nl (input todoFile)
    echo (format todoItem n line) )

add :: Text -> IO ()
add txt = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (input todoFile <|> pure txt)
    mv tempfile todoFile )

remove :: Int -> IO ()
remove m = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (do
        (n, line) <- nl (input todoFile)
        guard (m /= n)
        return line )
    mv tempfile todoFile )

parseDisplay :: Parser (IO ())
parseDisplay = pure display

parseAdd :: Parser (IO ())
parseAdd =
    fmap add
        (subcommand "add" "Add a TODO item"
            (argText "item" "The item to add to the TODO list") )

parseRemove :: Parser (IO ())
parseRemove =
    fmap remove
        (subcommand "rm" "Remove a TODO item"
            (argInt "index" "The numeric index of the TODO item to remove") )

parseCommand :: Parser (IO ())
parseCommand = parseDisplay <|> parseAdd <|> parseRemove

main = do
    command <- options "A TODO list manager" parseCommand
    exists <- testfile todoFile
    when (not exists) (touch todoFile)
    command

We can compile that program into an native binary on any platform (i.e. Windows, OS X, or Linux) with a fast startup time:

$ stack build turtle --resolver=lts-3.14
$ stack ghc --resolver=lts-3.14 -- -O2 todo.hs

... and verify that the program works:

$ ./todo add "Brush my teeth"
$ ./todo add "Shampoo my hamster"
$ ./todo
0: Brush my teeth
1: Shampoo my hamster
$ ./todo rm 0
$ ./todo
0: Shampoo my hamster

The program also auto-generates the usage and help information:

$ ./todo --help
A TODO list manager

Usage: todo ([add] | [rm])

Available options:
  -h,--help                Show this help text

Available commands:
  add
  rm
$ ./todo add
Usage: todo add ITEM
$ ./todo rm
Usage: todo rm INDEX

Amazingly, you can delete all the type signatures from the above program and the program will still compile. Try it! Haskell's type inference and fast type-checking algorithm makes it feel very much like a scripting language. The combination of type inference, fast startup time, and polished command line parsing makes Haskell an excellent choice for writing command-line utilities.

You can learn more about scripting in Haskell by reading the turtle tutorial, written for people who have no prior background in Haskell programming:

Contribution #3: Client bindings to a web API

Haskell's servant library lets you write very clean and satisfying bindings to a web API. For example, suppose that I want to define a Haskell client to to the JSONPlaceholder test API. We'll use two example endpoints that the API provides.

A GET request against the /posts endpoint returns a list of fake posts:

[
  {
    "userId": 1,
    "id": 1,
    "title": "sunt aut facere repellat ..."
    "body": "quia et suscipit\nsuscipit ..."
  },
  {
    "userId": 1,
    "id": 2,
    "title": "qui est esse",
    "body": "est rerum tempore vitae\nsequi ..."
  },
...

... and a POST request against the same endpoint accepts a list of posts and returns them back as the response.

To write a client binding to this API, we just need to define a record representing APost:

data APost = APost
    { userId :: Int
    , id     :: Int
    , title  :: Text
    , body   :: Text
    } deriving (Show, Generic, FromJSON, ToJSON)

The last line instructs the Haskell compiler to auto-derive conversion functions between APost and JSON.

Now we just encode the REST API as a type:

-- We can `GET` a list of posts from the `/posts` endpoint
type GetPosts =                            "posts" :> Get  '[JSON] [APost]

-- We can `POST` a list of posts to the `/posts` endpoint
-- using the request body and get a list of posts back as
-- the response
type PutPosts = ReqBody '[JSON] [APost] :> "posts" :> Post '[JSON] [APost]

type API = GetPosts :<|> PutPosts

... and then the compiler will "automagically" generate API bindings:

getPosts :<|> putPosts =
    client (Proxy :: Proxy API) (BaseUrl Http "jsonplaceholder.typicode.com" 80)

Now anybody can use our code to GET or POST lists of posts. We can also quickly test out our code within the Haskell REPL to verify that everything works:

$ stack ghci servant-server servant-client --resolver=lts-3.14
ghci> :load client.hs
[1 of 1] Compiling Main             ( httpbin.hs, interpreted )
Ok, modules loaded: Main.
*Main> import Control.Monad.Trans.Either as Either
*Main Either> -- Perform a `GET` request against the `/posts` endpoint
*Main Either> runEitherT getPosts
Right [APost {userId = 1, id = 1, title = "sunt aut facere ...
*Main Either> -- Perform a `POST` request against the `/posts` endpoint
*Main Either> runEitherT (putPosts [APost 1 1 "foo" "bar"])
Right [APost {userId = 1, id = 1, title = "foo", body = "bar"}]

Here's the full code with all the extensions and imports that enable this magic:

{-# LANGUAGE DataKinds         #-}
{-# LANGUAGE DeriveGeneric     #-}
{-# LANGUAGE DeriveAnyClass    #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeOperators     #-}

import Data.Aeson (FromJSON, ToJSON)
import Data.Text (Text)
import GHC.Generics (Generic)
import Servant
import Servant.Client

data APost = APost
    { userId :: Int
    , id     :: Int
    , title  :: Text
    , body   :: Text
    } deriving (Show, Generic, FromJSON, ToJSON)

type GetPosts =                            "posts" :> Get  '[JSON] [APost]
type PutPosts = ReqBody '[JSON] [APost] :> "posts" :> Post '[JSON] [APost]
type API      = GetPosts :<|> PutPosts

getPosts :<|> putPosts =
    client (Proxy :: Proxy API) (BaseUrl Http "jsonplaceholder.typicode.com" 80)

To learn more about how this works, check out the servant tutorial here:

Note that servant is both a client and server library so everything you learn about auto-generating client side bindings can be reused to auto-generate a server, too!

To see a more long-form example of bindings to the Google Translate API, check out this code:

Conclusion

Suppose that you write up some useful code and you wonder: "What's next? How do I make this code available to others?". You can learn more by reading the stack user guide which contains complete step-by-step instructions for authoring a new Haskell project, including beginning from a pre-existing project template:

8 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. excellent idea for a blogpost... as usual !

    ReplyDelete
  3. A lot of great ideas here, thanks! Personally i have been blocked often by the feeling that writing an API wrapper should be exhaustive. Everyone of us often needs just a subset of an API for his own purposes. One might think to start small and let the community extend the project over time, but this seems to be an antipattern in comparison with automatically generated APIs like those provided by https://github.com/brendanhay, for example

    ReplyDelete
  4. Another idea is to optimize code that's used in benchmarking here https://benchmarksgame.alioth.debian.org/u64q/haskell.html

    And, perhaps, write a blog post about how the optimization process was done. The gist is that for newbies in Haskell it pretty hard to think about performance, because most mainstream langs doesn't have notion of thunks and lazyness. So, the more blog posts about it, the easier it is to build the new mindset.

    ReplyDelete
  5. I'm pretty sure the PPM parser doesn't match its specification (nor does the example PPM file you give). See the words "pure binary" in point 9 of the spec? I don't think the parser implements that; instead, it seems to be reading decimal numbers in ASCII.

    ReplyDelete
    Replies
    1. Oops! You're right. I was accidentally implementing PPM3, so I just updated the example to be a PPM3 parser instead.

      Delete
  6. Inspiring article - I'd really like to get started! Where's a good place to find these kinds of projects? Are there any open source projects that are in need of these kinds of parsers, client bindings, etc?

    ReplyDelete
    Replies
    1. Most projects like that are one-man shows since they are pretty easy. For the larger projects (like bindings to large APIs such as Amazon's or Googles) the bindings are auto-generated.

      I don't know if there is a centralized place that lists projects in need of help, but you can try the Github trending list for Haskell and see if any project interests you:

      https://github.com/trending?l=haskell

      If you want a high-impact area to contribute to, I'd personally recommend one of the Haskell tooling projects like:

      * https://github.com/fpco/ide-backend
      * https://github.com/leksah/leksah
      * https://github.com/haskell/haskell-mode

      Delete