Saturday, February 27, 2016

Auto-generate a command line interface from a data type

I'm releasing the optparse-generic library which uses Haskell's support for generic programming to auto-generate command-line interfaces for a wide variety of types.

For example, suppose that you define a record with two fields:

data Example = Example { foo :: Int, bar :: Double }

You can auto-generate a command-line interface tailored to that record like this:

{-# LANGUAGE DeriveGeneric     #-}
{-# LANGUAGE OverloadedStrings #-}

import Options.Generic

data Example = Example { foo :: Int, bar :: Double }
    deriving (Generic, Show)

instance ParseRecord Example

main = do
    x <- getRecord "Test program"
    print (x :: Example)

This generates the following command-line interface:

$ stack runghc Example.hs -- --help
Test program

Usage: Example.hs --foo INT --bar DOUBLE

Available options:
  -h,--help                Show this help text

... and we can verify that the interface works by supplying the appropriate arguments:

$ stack runghc Example.hs -- --foo 1 --bar 2.5
Example {foo = 1, bar = 2.5}

You can also compile the program into a native executable binary:

$ stack ghc Example.hs
[1 of 1] Compiling Main             ( Example.hs, Example.o )
Linking Example ...
$ ./Example --foo 1 --bar 2.5
Example {foo = 1, bar = 2.5}

Features

The auto-generated interface tries to be as intelligent as possible. For example, if you omit the record labels:

data Example = Example Int Double

... then the fields will become positional arguments:

$ ./Example --help
Test program

Usage: Example INT DOUBLE

Available options:
  -h,--help                Show this help text

$ ./Example 1 2.5
Example 1 2.5

If you wrap a field in Maybe:

data Example = Example { foo :: Maybe Int }

... then the corresponding command-line flag/argument becomes optional:

$ ./Example --help
Test program

Usage: Example [--foo INT]

Available options:
  -h,--help                Show this help text

$ ./Example
Example {foo = Nothing}

$ ./Example --foo 2
Example {foo = Just 2}

If a field is a list of values:

data Example = Example { foo :: [Int] }

... then the corresponding command-line flag/argument can be repeated:

$ ./Example --foo 1 --foo 2
Example {foo = [1,2]}

$ ./Example
Example {foo = []}

If you wrap a value in First or Last:

data Example = Example { foo :: First Int, bar :: Last Int }

... then you will get the first or last match, respectively:

$ ./Example --foo 1 --foo 2 --bar 1 --bar 2
Example {foo = First {getFirst = Just 1}, bar = Last {getLast = Just 2}}

$ ./Example
Example {foo = First {getFirst = Nothing}, bar = Last {getLast = Nothing}}

You can even do fancier things like ask for the Sum or Product of all matching fields:

data Example = Example { foo :: Sum Int, bar :: Product Int }

... and it will do the "right thing":

$ ./Example --foo 1 --foo 2 --bar 1 --bar 2
Example {foo = Sum {getSum = 3}, bar = Product {getProduct = 2}}

$ ./Example
Example {foo = Sum {getSum = 0}, bar = Product {getProduct = 1}}

If a data type has multiple constructors:

data Example
    = Create { name :: Text, duration :: Maybe Int }
    | Kill   { name :: Text }

... then that translates to subcommands named after each constructor:

$ ./Example --help
Test program

Usage: Example (create | kill)

Available options:
  -h,--help                Show this help text

Available commands:
  create                   
  kill     

$ ./Example create --help
Usage: Example create --name TEXT [--duration INT]

Available options:
  -h,--help                Show this help text

$ ./Example kill --help
Usage: Example kill --name TEXT

Available options:

  -h,--help                Show this help text

$ ./Example create --name foo --duration 60
Create {name = "foo", duration = Just 60}

$ ./Example kill --name foo
Kill {name = "foo"}

This library also supports many existing Haskell data types out of the box. For example, if you just need to get a Double and Int from the command line you could just write:

{-# LANGUAGE DeriveGeneric     #-}
{-# LANGUAGE OverloadedStrings #-}

import Options.Generic

main = do
    x <- getRecord "Test program"
    print (x :: (Double, Int))

... and that will parse two positional arguments:

$ ./Example --help
Test program

Usage: Example DOUBLE INT

Available options:
  -h,--help                Show this help text

$ ./Example 1.1 2
(1.1,2)

Compile-time safety

Haskell's support for generic programming is done completely at compile time. This means that if you ask for something that cannot be sensibly converted into a command-line interface your program will fail to compile.

For example, if you ask for a list of lists:

data Example = Example { foo :: [[Int]] }

.. then the compiler will fail with the following error message since you can't (idiomatically) model "repeated (repeated Ints)" on the command line:

    No instance for (ParseField [Int])
      arising from a use of ‘Options.Generic.$gdmparseRecord’
    In the expression: Options.Generic.$gdmparseRecord
    In an equation for ‘parseRecord’:
        parseRecord = Options.Generic.$gdmparseRecord
    In the instance declaration for ‘ParseRecord Example’

Conclusion

If you would like to use this package or learn more you can find this package:

on Github
on Hackage

I also plan to re-export this package's functionality from turtle to further simplify command-line programming.

Sunday, February 21, 2016

State of the Haskell Ecosystem - February 2016 Edition

Six months ago I released the first "State of the Haskell Ecosystem", a collaborative wiki documenting the maturity of the Haskell language for various application domains:

State of the Haskell Ecosystem

The primary goals of this wiki are to:

Advertise what areas the Haskell language and ecosystem excel at
Warn newcomers about common pitfalls so they avoid unpleasant surprises
Give new contributors ideas for where they can improve things

Every six months I plan to post about what changed since the last update in order to highlight any major changes or trends.

Education

The biggest improvement in the Haskell ecosystem was the Early Access release of the Haskell Programming from first principles book. The book is not yet released but I consider this book the best resource for people new to the language. The material is very beginner-friendly and written for somebody without any functional programming experience whatsoever.

The book is not free, but if you're really serious about learning Haskell the price is well worth it and this book will save you a lot of headaches.

The rating of the "Educational" section still remains "Immature" until this book is out of Early Access and finally released, but once the book is out I will finally mark Haskell as having "Mature" educational resources.

Read more about educational resources for Haskell

IDE support

For a long time vim and emacs were the Haskell editors of choice. Now more traditional IDEs like Atom and IntelliJ are starting to get Haskell support, but their respective Haskell plugins still need a bit more polish.

Also, the Haskell for Mac is supposed to work really well for learning the language if you have an OS X development environment.

However, my rating hasn't changed for IDE support, and I believe this is still the biggest gap in the Haskell ecosystem so I want to draw attention to this area for people interested in contributing to Haskell. Improving IDE support is the single easiest way to lower the entry barrier to newcomers.

If you're not sure what editor to contribute to I recommend the ide-haskell plugin for Atom. This editor and plugin are freely available and cross-platform and many users have reported an excellent experience with this plugin, although some setup issues still remain.

Another important area where newcomers can contribute is the Leksah IDE, which is a true integrated development environment for Haskell which is also written in Haskell.

Read more on Haskell IDE support

Front-end web programming

stack recently added support for ghcjs, meaning that it's now very easy to start a new ghcjs project. Previously, setting up ghcjs correctly was very difficult, but those days are over now.

The ghcjs ecosystem still has a long way to go before I would rate it as "Mature", but stack support is a big and necessary step in that direction.

Read more on compiling Haskell to Javascript

Standalone GUI applications

Right now I'm looking for just one really polished widget toolkit before I rate this area of the Haskell ecosystem "Mature".

Deech has made great strides in improving the ease of setup and use for the FLTK Haskell bindings and integration with visual interface builders. The setup process still needs a bit more polish but I think his work probably holds the most promise for a mature widget toolkit binding.

Read more about GUI applications in Haskell

Types

The Liquid Haskell extension has made some great strides in adding refinement types to the language. This is not yet an official language extension, but you can still use it today and it works really well. You can learn more about refinement types by reading the awesome Liquid Haskell tutorial:

Programming with refinement types

I already gave Haskell a "Best in class" rating for the type system, but advances like Liquid Haskell just further cement its lead.

Read more about Haskell's type system

Parsing

I upgraded the parsing rating from "Mature" to "Best in class". Haskell has always been a leader among languages when it comes to parsing, but I held off on a "Best in class" rating for a while because all the Haskell parsing libraries required you to sacrifice one of the following features:

Good error messages
Full backtracking (i.e. no need to left-factor the grammar)
First-class parsers (i.e. not a parser generator like happy)

The Earley library changed that and provides a well-rounded choice. That doesn't mean that I recommend Earley for all parsing use cases, though, and there are still great reasons to use other parsing libraries:

attoparsec remains the king of speed, generating parsers competitive in speed with C
trifecta remains the king of error messages, generating gorgeous clang-style errors

However, if Earley gets a little more polish then I'd probably switch to Earley as my default parsing library recommendation for new users.

Read more about parsing in Haskell

Distributed systems

The newly added glue-core / glue-* libraries give Haskell a new service toolkit useful. Haskell still gets an "Immature" rating in this area until I see people consolidate on a common stack for service-oriented architectures and report success stories in industry.

Read more on distributed programming in Haskell

New sections

Two generous contributors added two new sections to the wiki which I would like to highlight:

ARM processor support - contributed by @bburdette
Computer vision - contributed by Siddhanathan Shanmugam (@siddhanathan)

I would like to thank both of them for their contributions!

Conclusions

As always, visit the Github collaborative wiki for the most up-to-date information since this post will eventually go stale. Pull requests are always welcome, both for corrections and new additions.

Wednesday, February 3, 2016

From mathematics to map-reduce

There's more mathematics to programming than meets the eye. This post will highlight one such connection that explains the link between map-reduce and category theory. I will then conclude with some wild speculation about what this might imply for future programming paradigms.

This post assumes that you already know Haskell and explains the mathematics behind the map-reduce using Haskell concepts and terminology. This means that this post will oversimplify some of the category theory concepts in order to embed them in Haskell, but the overall gist will still be correct.

Background (Isomorphism)

In Haskell, we like to say that two types, s and t, are "isomorphic" if and only if there are two functions, fw and bw, of types

fw :: s -> t
bw :: t -> s

... that are inverse of each other:

fw . bw = id
bw . fw = id

We will use the symbol ≅ to denote that two types are isomorphic. So, for example, we would summarize all of the above by just writing:

s ≅ t

The fully general definition of isomorphism from category theory is actually much broader than this, but this definition will do for now.

Background (Adjoint functors)

Given two functors, f and g, f is left-adjoint to g if and only if:

f a -> b ≅ a -> g b

In other words, for them to be adjoint there must be two functions, fw and bw of types:

fw :: (f a -> b) -> (a -> g b)
bw :: (a -> g b) -> (f a -> b)

... such that:

fw . bw = id
bw . fw = id

These "functors" are not necessarily the same as Haskell's Functor class. The category theory definition of "functor" is more general than Haskell's Functor class and we'll be taking advantage of that extra generality in the next section.

Free functors

Imagine a functor named g that acted more like a type-level function that transforms one type into another type. In this case, g will be a function that erases a constraint named C. For example:

-- `g` is a *type-level* function, and `t` is a *type*
g (C t => t) = t

In other words, g "forgets" the C constraint on type t. We call g a "forgetful functor".

If some other functor, f is left-adjoint to g then we say that f is the "free C" (where C is the constraint that g "forgets").

In other words, a "free C" is a functor that is left-adjoint to another functor that forgets the constraint C.

Free monoid

The list type constructor, [], is the "free Monoid"

The "free Monoid" is, by definition, a functor [] that is left-adjoint to some other functor g that deletes Monoid constraints.

When we say that g deletes Monoid constraints we mean that:

g (Monoid m => m) = m

... and when we say that [] is left-adjoint to g that means that:

[] a -> b ≅ a -> g b

... and the type [a] is syntactic sugar for [] a, so we can also write:

[a] -> b ≅ a -> g b

Now substitute b with some type with a Monoid constraint, like this one:

b = Monoid m => m

That gives us:

[a] -> (Monoid m => m) ≅ a -> g (Monoid m => m)

... and since g deletes Monoid constraints, that leaves us with:

[a] -> (Monoid m => m) ≅ a -> m

The above isomorphism in turn implies that there must be two functions, fw and bw, of types:

fw :: ([a] -> (Monoid m => m)) -> (a -> m)
bw :: (a -> m) -> ([a] -> (Monoid m => m))

... and these two functions must be inverses of each other:

fw . bw = id
bw . fw = id

We can pull out the Monoid constraints to the left to give us these more idiomatic types:

fw :: (Monoid m => [a] -> m)) -> (a -> m)
bw :: Monoid m => ( a  -> m) -> ([a] -> m)

Both of these types have "obvious" implementations:

fw :: (Monoid m => [a] -> m)) -> (a -> m)
fw k x = k [x]

bw :: Monoid m => (a -> m) -> ([a] -> m)
bw k xs = mconcat (map k xs)

Now we need to prove that the fw and bw functions are inverse of each other. Here are the proofs:

-- Proof #1
fw . bw

-- eta-expand
= \k -> fw (bw k)

-- eta-expand
= \k x -> fw (bw k) x

-- Definition of `fw`
= \k x -> bw k [x]

-- Definition of `bw`
= \k x -> mconcat (map k [x])

-- Definition of `map`
= \k x -> mconcat [k x]

-- Definition of `mconcat`
= \k x -> k x

-- eta-reduce
= \k -> k

-- Definition of `id`
= id



-- Proof #2
bw . fw

-- eta-expand
= \k -> bw (fw k)

-- eta-expand
= \k xs -> bw (fw k) xs

-- Definition of `bw`
= \k xs -> mconcat (map (fw k) xs)

-- eta-expand
= \k xs -> mconcat (map (\x -> fw k x) xs)

-- Definition of `fw`
= \k xs -> mconcat (map (\x -> k [x]) xs)

-- map (f . g) = map f . map g
= \k xs -> mconcat (map k (map (\x -> [x]) xs))

-- ... and then a miracle occurs ...
--
-- In all seriousness this step uses a "free theorem" which says
-- that:
--
--     forall (k :: Monoid m => [a] -> m) . mconcat . map k = k . mconcat
--
-- We haven't covered free theorems, but you can read more about them
-- here: http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf
= \k xs -> k (mconcat (map (\x -> [x]) xs)

-- This next step is a proof by induction, which I've omitted
= \k xs -> k xs

-- eta-reduce
= \k -> k

-- Definition of `id`
= id

Map reduce

Let's revisit the type and implementation of our bw function:

bw :: Monoid m => (a -> m) -> ([a] -> m)
bw k xs = mconcat (map k xs)

That bw function is significant because it is a simplified form of map-reduce:

First you "map" a function named k over the list of xs
Then you "reduce" the list using mconcat

In other words, bw is a pure "map-reduce" function and actually already exists in Haskell's standard library as the foldMap function.

The theory of free objects predict that all other functions of interest over a free object (like the free Monoid) can be reduced to the above fundamental function. In other words, the theory indicates that we can implement all other functions over lists in terms of this very general map-reduce function. We could have predicted the importance of "map-reduce purely from the theory of "free Monoids"!

However, there are other free objects besides free Monoids. For example, there are "free Monads" and "free Categorys" and "free Applicatives" and each of them is equipped with a similarly fundamental function that we can use to express all other functions of interest. I believe that each one of these fundamental functions is a programming paradigm waiting to be discovered just like the map-reduce paradigm.

Haskell for all