Haskell for all: Use Haskell for shell scripting

Thursday, January 29, 2015

Use Haskell for shell scripting

Right now dynamic languages are popular in the scripting world, to the dismay of people who prefer statically typed languages for ease of maintenance.

Fortunately, Haskell is an excellent candidate for statically typed scripting for a few reasons:

Haskell has lightweight syntax and very little boilerplate
Haskell has global type inference, so all type annotations are optional
You can type-check and interpret Haskell scripts very rapidly
Haskell's function application syntax greatly resembles Bash

However, Haskell has had a poor "out-of-the-box" experience for a while, mainly due to:

Poor default types in the Prelude (specifically String and FilePath)
Useful scripting utilities being spread over a large number of libraries
Insufficient polish or attention to user experience (in my subjective opinion)

To solve this, I'm releasing the turtle library, which provides a slick and comprehensive interface for writing shell-like scripts in Haskell. I've also written a beginner-friendly tutorial targeted at people who don't know any Haskell.

Overview

turtle is a reimplementation of the Unix command line environment in Haskell. The best way to explain this is to show what a simple "turtle script" looks like:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = do
    cd "/tmp"
    mkdir "test"
    output "test/foo" "Hello, world!"  -- Write "Hello, world!" to "test/foo"
    stdout (input "test/foo")          -- Stream "test/foo" to stdout
    rm "test/foo"
    rmdir "test"
    sleep 1
    die "Urk!"

If you make the above file executable, you can then run the program directly as a script:

$ chmod u+x example.hs
$ ./example.hs
Hello, world!
example.hs: user error (Urk!)

The turtle library renames a lot of existing Haskell utilities to match their Unix counterparts and places them under one import. This lets you reuse your shell scripting knowledge to get up and going quickly.

Shell compatibility

You can easily invoke an external process or shell command using proc or shell:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = do
    mkdir "test"
    output "test/file.txt" "Hello!"
    proc "tar" ["czf", "test.tar.gz", "test"] empty

    -- or: shell "tar czf test.tar.gz test" empty

Even people unfamiliar with Haskell will probably understand what the above program does.

Portability

"turtle scripts" run on Windows, OS X and Linux. You can either compile scripts as native executables or interpret the scripts if you have the Haskell compiler installed.

Streaming

You can build or consume streaming sources. For example, here's how you print all descendants of the /usr/lib directory in constant memory:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import Turtle

main = view (lstree "/usr/lib")

... and here's how you count the number of descendants:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import qualified Control.Foldl as Fold
import Turtle

main = do
    n <- fold (lstree "/usr/lib") Fold.length
    print n

... and here's how you count the number of lines in all descendant files:

#!/usr/bin/env runhaskell

{-# LANGUAGE OverloadedStrings #-}

import qualified Control.Foldl as Fold
import Turtle

descendantLines = do
    file <- lstree "/usr/lib"
    True <- liftIO (testfile file)
    input file

main = do
    n <- fold descendantLines Fold.length
    print n

Exception Safety

turtle ensures that all acquired resources are safely released in the face of exceptions. For example, if you acquire a temporary directory or file, turtle will ensure that it's safely deleted afterwards:

example = do
    dir <- using (mktempdir "/tmp" "test")
    liftIO (die "The temporary directory will still be deleted!")

However, exception safety comes at a price. turtle forces you to consume all streams in their entirety so you can't lazily consume just the initial portion of a stream. This was a tradeoff I chose to keep the API as simple as possible.

Patterns

turtle supports Patterns, which are like improved regular expressions. Use Patterns as lightweight parsers to extract typed values from unstructured text:

$ ghci
>>> :set -XOverloadedStrings
>>> import Turtle
>>> data Pet = Cat | Dog deriving (Show)
>>> let pet = ("cat" *> return Cat) <|> ("dog" *> return Dog) :: Pattern Pet
>>> match pet "dog"
>>> [Dog]
>>> match (pet `sepBy` ",") "cat,dog,cat"
[[Cat,Dog,Cat]]

You can also use Patterns as arguments to commands like sed, grep, find and they do the right thing:

>>> stdout (grep (prefix "c") "cat")             -- grep '^c'
cat
>>> stdout (grep (has ("c" <|> "d")) "dog")      -- grep 'cat\|dog'
dog
>>> stdout (sed (digit *> return "!") "ABC123")  -- sed 's/[[:digit:]]/!/g'
ABC!!!

Unlike many Haskell parsers, Patterns are fully backtracking, no exceptions.

Formatting

turtle supports typed printf-style string formatting:

>>> format ("I take "%d%" "%s%" arguments") 2 "typed"
"I take 2 typed arguments"

turtle even infers the number and types of arguments from the format string:

>>> :type format ("I take "%d%" "%s%" arguments")
format ("I take "%d%" "%s%" arguments") :: Text -> Int -> Text

This uses a simplified version of the Format type from the formatting library. Credit to Chris Done for the great idea.

The reason I didn't reuse the formatting library was that I spent a lot of effort keeping the types as simple as possible to improve error messages and inferred types.

Learn more

turtle doesn't try to ambitiously reinvent shell scripting. Instead, turtle just strives to be a "better Bash". Embedding shell scripts in Haskell gives you the the benefits of easy refactoring and basic sanity checking for your scripts.

You can find the turtle library on Hackage or Github. Also, turtle provides an extensive beginner-friendly tutorial targeted at people who don't know any Haskell at all.

22 comments:

Christopher DoneJanuary 30, 2015 at 12:25 AM
Cool. :-) Looks very attractive, newbies and scripters alike should like it. Is there a story for piping like in shell-conduit?

I made a major version bump to formatting (6.2.0) to include this simplification, I'd been meaning to drop the Holey type for a while after it became clear abstracting over the particular monoid wasn't useful. Oh, I discovered this nifty Monoid instance recently, check it out: https://github.com/chrisdone/formatting#using-more-than-one-formatter-on-the-same-argument
ReplyDelete
Replies
Barry KellyJanuary 30, 2015 at 2:08 AM
This isn't a meaningful alternative to bash until you show equivalents to at least these bash operators:

| (pipe)
& (background job) and wait
<() (create fifo pulling input from subshell)

Most shell scripts are gluing together other commands written in a variety of different languages. It's an orchestration language with simple job control and trivial parallelization via pipes.

At best, this currently seems to be a replacement for DOS's batch interpreter, which is scarcely even a scripting language.
ReplyDelete
Replies
ESTraderJanuary 30, 2015 at 10:51 AM
Great idea! I applaud the effort. I as well would request pipe as it's a mandatory feature.
ReplyDelete
Replies
John WiegleyJanuary 30, 2015 at 11:16 AM
What if we had a quasi-quoter from bash syntax to turtle? It wouldn't help for the interpreted case, but it would make writing compiled scripts dead easy, since your target audience presumably already knows that syntax. Then they'd have the power of Haskell within reach when they were ready for it.
ReplyDelete
Replies
Yuriy SyrovetskiyFebruary 2, 2015 at 8:01 AM
Why some of turtle functions are IO actions, and some are Shell actions? How to unify it? How about everything in Shell and `main = sh $ do ...`?
ReplyDelete
Replies
Gabriella GonzalezFebruary 2, 2015 at 7:06 PM
I actually did consider wrapping all `IO` actions in `liftIO`. There are actually two ways to do this:

A) Use the inferred type of:

> MonadIO m => m r

B) Specialize the type to `Shell`:

> Shell r

There were two problems with approach `A`:

* I wanted concrete types to teach users with instead of type classes. One of the design constraints of the library was that it had to be teachable to absolute Haskell beginners.
* It's a leaky abstraction. Not everything `IO` action in the Haskell ecosystem is wrapped in `liftIO`, so the moment they deviate from the turtle library they are suddenly hit with `liftIO` anyway.

There were a few problems with approach `B`:

* There's no function of type `Shell a -> IO a`. The best you can do is `Shell a -> IO (Maybe a)` or `Shell a -> IO [a]`. That means that once you wrap an `IO` action in `Shell` the result can no longer be reused within a naked `IO` block.
* This actually makes the teaching process rougher on new Haskell programmers because I don't have any useful `IO` actions within the library to use as examples to explain how `IO` works.

Generally there is a tension between the "principle of least permission" (giving things the narrowest types possible) and the proliferation of too many types. I decided that I could live with teaching new users the distinction between `IO` and `Shell` (since they'll have to learn it anyway if they want to use other Haskell libraries).

I have a lot more to say on this subject, but a good starting point is my post on the Functor Design Pattern (http://www.haskellforall.com/2012/09/the-functor-design-pattern.html), which talks about how I prefer to specialize components as much as possible and delay unifying their types until the very last moment. Vice versa, I discourage pre-unifying everything into a monolithic component framework.
ReplyDelete
Replies
Yuriy SyrovetskiyFebruary 3, 2015 at 7:37 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJuly 25, 2015 at 4:24 PM
One of the main reasons I don't use Haskell for shell scripting is that I like having standalone scripts that work out-of-the-box. I often write shell scripts for things like configuration and installation where you can't assume anything but a minimal POSIX system. Using Haskell for this is troublesome because GHC needs to be installed first, whereas Bourne shell is available on all Unix flavors. This would be fine for doing Haskell-related tasks, but asking a user to install GHC just to do something entirely unrelated is a bit too much.

I think there is a way to get the best of both worlds though, if there is some sort of Haskell-to-shell compiler. It doesn't even have to be fancy … an EDSL would probably work.
ReplyDelete
Replies
UnknownSeptember 7, 2015 at 3:38 AM
Gabriel, you just made my day. I was using shelly for some scripting tasks, but I like your library much more. I've been playing with it for the last few hours and I already love it :-) Thank you!
ReplyDelete
Replies
AnonymousSeptember 10, 2015 at 4:11 AM
Dear Gabriel,

I would like to download some files in parallel but can't get it working. Code works nice
without the "T.sh . T.using . T.fork" part. No separate processes are spawn. I don't know
how to fix the problem. Can you help me out?

Thanks.

Source code is here:
https://gist.github.com/schmidh/fa6112719e08c626db44
ReplyDelete
Replies

Add comment