Thursday, January 29, 2015

Use Haskell for shell scripting

Right now dynamic languages are popular in the scripting world, to the dismay of people who prefer statically typed languages for ease of maintenance.

Fortunately, Haskell is an excellent candidate for statically typed scripting for a few reasons:

  • Haskell has lightweight syntax and very little boilerplate
  • Haskell has global type inference, so all type annotations are optional
  • You can type-check and interpret Haskell scripts very rapidly
  • Haskell's function application syntax greatly resembles Bash

However, Haskell has had a poor "out-of-the-box" experience for a while, mainly due to:

  • Poor default types in the Prelude (specifically String and FilePath)
  • Useful scripting utilities being spread over a large number of libraries
  • Insufficient polish or attention to user experience (in my subjective opinion)

To solve this, I'm releasing the turtle library, which provides a slick and comprehensive interface for writing shell-like scripts in Haskell. I've also written a beginner-friendly tutorial targeted at people who don't know any Haskell.

Overview

turtle is a reimplementation of the Unix command line environment in Haskell. The best way to explain this is to show what a simple "turtle script" looks like:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = do
    cd "/tmp"
    mkdir "test"
    output "test/foo" "Hello, world!"  -- Write "Hello, world!" to "test/foo"
    stdout (input "test/foo")          -- Stream "test/foo" to stdout
    rm "test/foo"
    rmdir "test"
    sleep 1
    die "Urk!"

If you make the above file executable, you can then run the program directly as a script:

$ chmod u+x example.hs
$ ./example.hs
Hello, world!
example.hs: user error (Urk!)

The turtle library renames a lot of existing Haskell utilities to match their Unix counterparts and places them under one import. This lets you reuse your shell scripting knowledge to get up and going quickly.

Shell compatibility

You can easily invoke an external process or shell command using proc or shell:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = do
    mkdir "test"
    output "test/file.txt" "Hello!"
    proc "tar" ["czf", "test.tar.gz", "test"] empty

    -- or: shell "tar czf test.tar.gz test" empty

Even people unfamiliar with Haskell will probably understand what the above program does.

Portability

"turtle scripts" run on Windows, OS X and Linux. You can either compile scripts as native executables or interpret the scripts if you have the Haskell compiler installed.

Streaming

You can build or consume streaming sources. For example, here's how you print all descendants of the /usr/lib directory in constant memory:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = view (lstree "/usr/lib")

... and here's how you count the number of descendants:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import qualified Control.Foldl as Fold
import Turtle

main = do
    n <- fold (lstree "/usr/lib") Fold.length
    print n

... and here's how you count the number of lines in all descendant files:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import qualified Control.Foldl as Fold
import Turtle

descendantLines = do
    file <- lstree "/usr/lib"
    True <- liftIO (testfile file)
    input file

main = do
    n <- fold descendantLines Fold.length
    print n

Exception Safety

turtle ensures that all acquired resources are safely released in the face of exceptions. For example, if you acquire a temporary directory or file, turtle will ensure that it's safely deleted afterwards:

example = do
    dir <- using (mktempdir "/tmp" "test")
    liftIO (die "The temporary directory will still be deleted!")

However, exception safety comes at a price. turtle forces you to consume all streams in their entirety so you can't lazily consume just the initial portion of a stream. This was a tradeoff I chose to keep the API as simple as possible.

Patterns

turtle supports Patterns, which are like improved regular expressions. Use Patterns as lightweight parsers to extract typed values from unstructured text:

$ ghci
>>> :set -XOverloadedStrings
>>> import Turtle
>>> data Pet = Cat | Dog deriving (Show)
>>> let pet = ("cat" *> return Cat) <|> ("dog" *> return Dog) :: Pattern Pet
>>> match pet "dog"
>>> [Dog]
>>> match (pet `sepBy` ",") "cat,dog,cat"
[[Cat,Dog,Cat]]

You can also use Patterns as arguments to commands like sed, grep, find and they do the right thing:

>>> stdout (grep (prefix "c") "cat")             -- grep '^c'
cat
>>> stdout (grep (has ("c" <|> "d")) "dog")      -- grep 'cat\|dog'
dog
>>> stdout (sed (digit *> return "!") "ABC123")  -- sed 's/[[:digit:]]/!/g'
ABC!!!

Unlike many Haskell parsers, Patterns are fully backtracking, no exceptions.

Formatting

turtle supports typed printf-style string formatting:

>>> format ("I take "%d%" "%s%" arguments") 2 "typed"
"I take 2 typed arguments"

turtle even infers the number and types of arguments from the format string:

>>> :type format ("I take "%d%" "%s%" arguments")
format ("I take "%d%" "%s%" arguments") :: Text -> Int -> Text

This uses a simplified version of the Format type from the formatting library. Credit to Chris Done for the great idea.

The reason I didn't reuse the formatting library was that I spent a lot of effort keeping the types as simple as possible to improve error messages and inferred types.

Learn more

turtle doesn't try to ambitiously reinvent shell scripting. Instead, turtle just strives to be a "better Bash". Embedding shell scripts in Haskell gives you the the benefits of easy refactoring and basic sanity checking for your scripts.

You can find the turtle library on Hackage or Github. Also, turtle provides an extensive beginner-friendly tutorial targeted at people who don't know any Haskell at all.

Saturday, January 10, 2015

total-1.0.0: Exhaustive pattern matching using traversals, prisms, and lenses

The lens library provides Prisms, which are a powerful way to decouple a type's interface from its internal representation. For example, consider the Either type from the Haskell Prelude:

data Either a b = Left a | Right b

Instead of exposing the Left and Right constructors, you could instead expose the following two Prisms:

_Left  :: Prism (Either a b) (Either a' b ) a a'

_Right :: Prism (Either a b) (Either a  b') b b'

Everything you can do with a constructor, you can do with the equivalent Prism. For example, if you want to build a value using a Prism, you use review:

review _Left  :: a -> Either a b
review _Right :: b -> Either a b

You can also kind-of pattern match on the Prism using preview, returning a Just if the match succeeds or Nothing if the match fails:

preview _Left  :: Either a b -> Maybe a
preview _Right :: Either a b -> Maybe b

However, preview is not quite as good as real honest-to-god pattern matching, because if you wish to handle every branch you can't prove in the types that your pattern match was exhaustive:

nonTypeSafe :: Either Char Int -> String
nonTypeSafe e =
    case preview _Left e of
        Just c  -> replicate 3  c
        Nothing -> case preview _Right e of
            Just n  -> replicate n '!'
            Nothing -> error "Impossible!"  -- Unsatisfying

However, this isn't a flaw with Prisms, but rather a limitation of preview. Prisms actually preserve enough information in the types to support exhaustive pattern matching! The trick for this was described a year ago by Tom Ellis, so I decided to finally implement his idea as the total library.

Example

Here's an example of a total pattern matching using the total library:

import Lens.Family.Total
import Lens.Family.Stock

total :: Either Char Int -> String       -- Same as:
total = _case                            -- total = \case
    & on _Left  (\c -> replicate 3  c )  --     Left  c -> replicate 3 c
    & on _Right (\n -> replicate n '!')  --     Right n -> replicate n '!'

The style resembles LambdaCase. If you want to actually supply a value in the same expression, you can also write:

e & (_case
    & on _Left  ...
    & on _Right ... )

There's nothing unsafe going on under the hood. on is just 3 lines of code:

on p f g x = case p Left x of
    Left  l -> f l
    Right r -> g r

(&) is just an operator for post-fix function application:

x & f = f x

... and _case is just a synonym for a type class method that checks if a type is uninhabited.

class Empty a where
    impossible :: a -> x

_case = impossible

The rest of the library is just code to auto-derive Empty instances using Haskell generics. The entire library is about 40 lines of code.

Generics

Here's an example of exhaustive pattern matching on a user-defined data type:

{-# LANGUAGE DeriveGeneric   #-}
{-# LANGUAGE TemplateHaskell #-}

import Control.Lens.TH (makePrisms)
import GHC.Generics (Generic)
import Lens.Family.Total

data Example a b c = C1 a | C2 b | C3 c deriving (Generic)

makePrisms ''Example

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)

example :: Example String Char Int -> String  -- Same as:
example = _case                               -- example = \case
    & on _C1 (\s -> s              )          --     C1 s -> s
    & on _C2 (\c -> replicate 3  c )          --     C2 c -> replicate 3  c
    & on _C3 (\n -> replicate n '!')          --     C3 n -> replicate n '!'

This uses two features to remove most of the boiler-plate:

  • deriving (Generic) auto-generates the code for the Empty instance
  • makePrisms uses Template Haskell to auto-generate Prisms for our type.

For those who prefer the lens-family-* suite of libraries over lens, I opened up an issue against lens-family-th so that Lens.Family.TH.makeTraversals can be used in place of makePrisms for exhaustive pattern matching.

When we write this instance in our original code:

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)

... that uses Haskell generics to auto-generate the implementation of the impossible function:

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)
    impossible (C1 a) = impossible a
    impossible (C2 b) = impossible b
    impossible (C3 c) = impossible c

That says that the type Example a b c is uninhabited if and only if the types a, b, and c are themselves uninhabited.

When you write makePrisms ''Example, that creates the following three prisms:

_C1 :: Prism (Example a b c) (Example a' b  c ) a a'
_C2 :: Prism (Example a b c) (Example a  b' c ) b b'
_C3 :: Prism (Example a b c) (Example a  b  c') c c'

These Prisms are useful in their own right. You can export them in lieu of your real constructors and they are more powerful in several respects.

Backwards compatibility

There is one important benefit to exporting Prisms and hiding constructors: if you export Prisms you can change your data type's internal representation without changing your API.

For example, let's say that I change my mind and implement Example as two nested Eithers:

type Example a b c = Either a (Either b c)

Had I exposed my constructors, this would be a breaking change for my users. However, if I expose Prisms, then I can preserve the old API by just changing the Prism definition. Instead of auto-generating them using Template Haskell, I can just write:

import Lens.Family.Stock (_Left, _Right)

_C1 = _Left
_C2 = _Right . _Left
_C3 = _Right . _Right

Done! That was easy and the user of my Prisms are completely unaffected by the changes to the internals.

Under the hood

The trick that makes total work is that every branch of the pattern match subtly alters the type. The value that you match on starts out as:

Example String Char Int

... and every time you pattern match against a branch, the on function Voids one of the type parameters. The pattern matching flows from bottom to top:

example = _case
    & on _C1 (\s -> s              )  -- Step 3: Example Void   Void Void
    & on _C2 (\c -> replicate 3  c )  -- Step 2: Example String Void Void
    & on _C3 (\n -> replicate n '!')  -- Step 1: Example String Char Void
                                      -- Step 0: Example String Char Int

Finally, _case just checks that the final type (Example Void Void Void in this case) is uninhabited. All it does is check if the given type is an instance of the Empty type class, which only provides instances for uninhabited types. In this case Example Void Void Void is definitely uninhabited because Void (from Data.Void) is itself uninhabited:

instance Empty Void where
    impossible = absurd

Lenses

What's neat is that this solution works not only for Prisms and Traversals, but for Lenses, too! Check this out:

example :: (Int, Char) -> String     -- Same as:
example = _case                      -- example = \case
    & on _1 (\n -> replicate n '1')  --     (n, _) -> replicate n '1'

... and the identity lens works (which is just id) works exactly the way you'd expect:

example :: (Int, Char) -> String        -- Same as:
example = _case                         -- example = \case
    & on id (\(n, c) -> replicate n c)  --     (n, c) -> replicate n c

I didn't intend that when I wrote the library: it just fell naturally out of the types.

Conclusion

The total library now allows library authors to hide their constructors and export a backwards compatible Prism API all while still preserving exhaustive pattern matching.

Note that I do not recommend switching to a lens-only API for all libraries indiscriminately. Use your judgement when deciding whether perfect backwards compatibility is worth the overhead of a lens-only API, both for the library maintainers and your users.

I put the code under the Lens.Family.Total module, not necessarily because I prefer the Lens.Family.* hierarchy, but rather because I would like to see Edward Kmett add these utilities to lens, leaving my library primarily for users of the complementary lens-family-* ecosystem.

You can find the total library on Hackage or on GitHub.