Wednesday, December 30, 2015

Compile-time memory safety using Liquid Haskell

When indexing into a vector, you usually have one of two options:

  • enable bounds-checking to throw an exception if you index out of bounds or:
  • disable bounds-checking to improve performance.

Both of these options are still unsatisfactory, though. Even if your program fails fast with an exception, your program still failed.

Fortunately, Haskell programmers can now select a third option:

  • verify at compile time that you never index a vector out of bounds.

A new tool named Liquid Haskell makes this sort of verification possible. Liquid Haskell is a customizable static analysis tool for the Haskell language that eliminates a wide variety of programming errors at compile time that would normally be difficult to eliminate using Haskell's ordinary type system.

However, the Liquid Haskell tool still needs a lot of polish. I've been using Liquid Haskell for the last couple of months and in my experience the tool is just starting to become "usable in anger". I'm saying this as somebody who has reported eight issues with the tool so far and encountered a ninth issue while writing up this post.

I'll illustrate how Liquid Haskell works by:

  • implementing a binary search algorithm for Haskell's Vector type,
  • using Liquid Haskell to statically verify the complete absence of out-of-bound indexing, and then:
  • safely disabling bounds checking since I've proven the algorithm correct.

Along the way I'll highlight both the ups and downs of using Liquid Haskell so that you can preview what it's like to use the tool.

Let's begin with the following Haskell code for binary search over a sorted Vector. I've included a real mistake I made when first implementing this algorithm. See if you can spot the mistake:

import Data.Vector as Vector

{-| Postconditions:

    * If the result is `Just i`, then `0 <= i < length v`
-}
binarySearch :: Ord a => a -> Vector a -> Maybe Int
binarySearch x v = loop x v 0 (Vector.length v - 1)

{-| Preconditions:

    * `0  <= lo < length v`
    * `lo <= hi < length v`
-}
loop :: Ord a => a -> Vector a -> Int -> Int -> Maybe Int
loop x v lo hi = do
    let mid = (lo + hi) `div` 2
    if x < v ! mid
    then do
        let hi' = mid - 1
        if lo <= hi'
        then loop x v lo hi'
        else Nothing
    else if v ! mid < x
    then do
        let lo' = mid + 1
        if lo' <= hi
        then loop x v lo' hi
        else Nothing
    else Just mid

Let's try this out on a few values:

$ stack ghci --resolver=lts-3.19 vector
...
Prelude> :load binarysearch.hs
...
*Main> let v = Vector.fromList [1, 4, 5, 10]
*Main> binarySearch 4 v
Just 1
*Main> binarySearch 3 v
Nothing
*Main> binarySearch 1 v
Just 0
*Main> binarySearch 0 v
Nothing
*Main> binarySearch 10 v
Just 3
*Main> binarySearch 11 v
Nothing

Liquid Haskell

We can run this code through the Liquid Haskell tool in order to locate all potential out-of-bounds indexes. In order to install the tool we must have some SMT solver on our $PATH which Liquid Haskell uses to automate the deduction process. In my case, I installed the Z3 solver by downloading the latest stable release from this page (Version 4.4.1 at the time of this writing):

Then I installed Liquid Haskell using stack:

$ cat stack.yaml 
resolver: lts-3.19
packages: []
extra-deps:
- intern-0.9.1.4
- liquid-fixpoint-0.4.0.0
$ stack build liquidhaskell

... and then I run the liquid tool over my code like this:

$ stack exec liquid binarysearch.hs
LiquidHaskell Copyright 2009-15 Regents of the University of California. All Rights Reserved.


**** DONE:  Extracted Core using GHC *******************************************
 

**** DONE:  generateConstraints ************************************************
 

**** START: fixpoint ***********************************************************
 


========================================================
© Copyright 2009 Regents of the University of California.
All Rights Reserved.
========================================================
............../Unsatisfied Constraints:
{snip}
UNSAT

**** DONE:  fixpoint ***********************************************************
 

**** DONE:  solve **************************************************************
 

**** DONE:  annotate ***********************************************************
 

**** UNSAFE ********************************************************************


 binarysearch.hs:18:16-18: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == mid}
  
   not a subtype of Required type
     VV : {VV : Int | VV >= 0 && VV < vlen v}
  
   In Context
     VV  : {VV : Int | VV == mid}
     v   : {v : (Vector a) | 0 <= vlen v}
     ?a  : {?a : Int | ?a == lo + hi}
     hi  : Int
     mid : {mid : Int | mid == ?a / ?b && (?a >= 0 && ?b >= 0 => mid >= 0) &&
(?a >= 0 && ?b >= 1 => mid <= ?a)}
     ?b  : {?b : Int | ?b == 2}
     lo  : Int


 binarysearch.hs:22:23-24: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == lo}
  
   not a subtype of Required type
     VV : {VV : Int | VV < lo && VV >= 0}
  
   In Context
     VV : {VV : Int | VV == lo}
     lo : Int


 binarysearch.hs:24:17-19: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == mid}
  
   not a subtype of Required type
     VV : {VV : Int | VV >= 0 && VV < vlen v}
  
   In Context
     VV  : {VV : Int | VV == mid}
     v   : {v : (Vector a) | 0 <= vlen v}
     ?a  : {?a : Int | ?a == lo + hi}
     hi  : Int
     mid : {mid : Int | mid == ?a / ?b && (?a >= 0 && ?b >= 0 => mid >= 0) &&
(?a >= 0 && ?b >= 1 => mid <= ?a)}
     ?b  : {?b : Int | ?b == 2}
     lo  : Int


 binarysearch.hs:28:23-25: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == lo'}
  
   not a subtype of Required type
     VV : {VV : Int | VV < lo && VV >= 0}
  
   In Context
     VV  : {VV : Int | VV == lo'}
     ?a  : {?a : Int | ?a == lo + hi}
     ?c  : {?c : Int | ?c == 2}
     hi  : Int
     mid : {mid : Int | mid == ?a / ?c && (?a >= 0 && ?c >= 0 => mid >= 0) &&
(?a >= 0 && ?c >= 1 => mid <= ?a)}
     ?b  : {?b : Int | ?b == 1}
     lo  : Int
     lo' : {lo' : Int | lo' == mid + ?b}

Okay, so we got four errors on our first check. How concerned should we be?

Usually these errors fall into three categories:

  • Missing preconditions

    We fix these by formally documenting the preconditions of our own functions using Liquid Haskell's type system

  • Missing postconditions

    We fix these by formally documenting the postconditions of functions we use, also using Liquid Haskell's type system

  • Genuine bugs (i.e. your code is wrong even when all preconditions and postconditions are correctly documented)

    We fix these by correcting our code

When we first begin most of the errors will fall into the first two categories (missing preconditions or postconditions) and as things progress we may discover errors in the third category (genuine bugs).

Preconditions

Liquid Haskell documents the preconditions and postconditions of many Haskell functions "out-of-the-box" by providing "refined" type signatures for these functions. You can find all of the built-in refined type signatures here:

... and here's an example refined type from the Data.Vector module (slightly modified from the original):

assume ! :: v : Vector a -> { i : Int | 0 <= i && i < vlen v } -> a

(!) is Haskell's infix operator for indexing into a vector. Normally, the ordinary Haskell type signature for this operator would look like this:

(!) :: Vector a -> Int -> a

... which you can read roughly as saying: "The first (left) argument to this infix operator is a Vector containing elements of type a (where a can be any type). The second (right) argument to this infix operator is the index you want to retrieve. The result is the retrieved element.

Here is an example use of the operator:

$ stack ghci --resolver=lts-3.19 vector
Prelude> import Data.Vector as Vector
Prelude Vector> let v = Vector.fromList [1, 4, 5, 10]
Prelude Vector> v ! 0
1
Prelude Vector> v ! 3
10
Prelude Vector> v ! 2
5
Prelude Vector> v ! 4
*** Exception: ./Data/Vector/Generic.hs:249 ((!)): index out of bounds (4,4)

Oops! This operator is not safe because indexing can fail at runtime with an exception if you index out of bounds.

This is where Liquid Haskell comes into play. Liquid Haskell lets you write richer type signatures that document preconditions and postconditions. These type signatures resemble Haskell type signatures except that you can decorate them with logical predicates, like this:

assume ! :: v : Vector a -> { i : Int | 0 <= i && i < vlen v } -> a

This is an example of a precondition. The above refined type says: "The index, i, must be non-negative and must be less than the length of the vector, v, that you supplied for the first argument".

So how does Liquid Haskell know what the length of the vector is? Remember that Liquid Haskell needs to somehow verify this precondition at compile time even though we don't necessarily know what the input will be at compile time. In fact, Liquid Haskell actually doesn't know anything about Vectors at all, let alone their lengths, unless we teach it!

Postconditions

We can teach Liquid Haskell about Vector lengths by introducing a new "measure" named vlen:

measure vlen :: Vector a -> Int

The only thing we provide is the type of vlen, but there is no code associated with vlen. Just treat vlen as an abstract type-level placeholder for the length of the Vector. We can use vlen to give a more refined type to Vector.length:

assume Vector.length :: v : Vector a -> { n : Int | n == vlen v }

This is an example of a postcondition. You can read this type signature as saying: "We know for sure that whatever Int that length returns must be the length of the Vector".

The assume keyword indicates that we haven't proven this the correctness of this refined type. Instead we are asserting that the type is true. Any time you study a Liquid Haskell program you need to stare really hard at any part of the code that uses assume since that's an escape hatch that might compromise any safety guarantees.

Not all postconditions require the use of assume. In fact, Liquid Haskell can automatically promote a restricted subset of Haskell safely to the type level. Unfortunately, this subset does not include operations on Vectors, which is why we must assert the correctness of the above refined type.

Safety

Let's verify that the refined types for (!) and length work correctly using a small test example. We'll begin with a program that is not necessarily safe:

import Data.Vector as Vector

example :: Vector a -> a
example v = v ! 2

We run the above program through the liquid type-checker and get a type error (as expected) since the type-checker cannot verify that the input Vector has at least three elements:

$ stack exec example.hs
...
UNSAT

**** UNSAFE ********************************************************************


 safe.hs:4:17: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == ?a}
  
   not a subtype of Required type
     VV : {VV : Int | VV >= 0 && VV < vlen v}
  
   In Context
     VV : {VV : Int | VV == ?a}
     v  : {v : (Vector a) | 0 <= vlen v}
     ?a : {?a : Int | ?a == 2}

Don't worry if you don't understand the type error just yet. Right now all we care about is the line and column number, which point directly at the 2 in the source code:

import Data.Vector as Vector

example :: Vector a -> a
example v = v ! 2
--   ERROR HERE ^

There are two ways we can fix our program. The best solution is to add a precondition to the type of our example function. We can specify that our function only works for Vectors that have at least 3 elements:

import Data.Vector as Vector

{-@ example :: { v : Vector a | 3 <= vlen v } -> a @-}
example :: Vector a -> a
example v = v ! 2

We document this in a second parallel type signature embedded within a Haskell comment. Liquid Haskell is designed to be backwards compatible with the Haskell language and is not yet a formal language extension supported by ghc, which is why Liquid Haskell types have to live inside comments.

Once we add that precondition then the liquid Haskell type-checker verifies that our program is correct:

$ stack exec liquid example.hs
...
SAT

**** DONE:  fixpoint ***********************************************************
 

**** DONE:  solve **************************************************************
 

**** DONE:  annotate ***********************************************************
 

**** SAFE **********************************************************************

WARNING: Once you begin using this tool you will quickly grow addicted to that SAFE message. Using Liquid Haskell feels very much like a fun game where the reward is correct code!

Another way we can satisfy the type-checker is to check the Vector's length at runtime and return Nothing if the Vector is too small. We wrap successful results in Just if the Vector is large enough:

import Data.Vector as Vector

example :: Vector a -> Maybe a
example v =
    if 3 <= Vector.length v
    then Just (v ! 2)
    else Nothing

Notice how this program requires no Liquid Haskell type signature or type annotation. Liquid Haskell is smart enough to figure out that the precondition for (!) was already satisfied by the runtime check:

$ stack exec liquid example.hs
...
**** SAFE **********************************************************************

However, we prefer to avoid runtime checks if possible. Type-level refinements improve on runtime checks in two important ways:

  • Refinements are "free" - they incur no runtime cost
  • Refinements encourage pushing fixes upstream (by constraining the input) instead of propagating problems downstream (by returning a Maybe output)

Binary search - Continued

Let's revisit our binary search algorithm and study the first error message more closely:

 binarysearch.hs:18:16-18: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == mid}
  
   not a subtype of Required type
     VV : {VV : Int | VV >= 0 && VV < vlen v}
  
   In Context
     VV  : {VV : Int | VV == mid}
     v   : {v : (Vector a) | 0 <= vlen v}
     ?a  : {?a : Int | ?a == lo + hi}
     hi  : Int
     mid : {mid : Int | mid == ?a / ?b && (?a >= 0 && ?b >= 0 => mid >= 0) &&
(?a >= 0 && ?b >= 1 => mid <= ?a)}
     ?b  : {?b : Int | ?b == 2}
     lo  : Int

Your first reflex should be to jump to the line and column numbers, which point to this location:

{-| Invariants:

    * `0  <= lo < length v`
    * `lo <= hi < length v`
-}
loop :: Ord a => a -> Vector a -> Int -> Int -> Maybe Int
loop x v lo hi = do
    let mid = (lo + hi) `div` 2
    if x < v ! mid
    --         ^^^ ERROR HERE
    then do
        let hi' = mid - 1
        if lo <= hi'
        then loop x v lo hi'
        else Nothing
    else if v ! mid < x
    then do
        let lo' = mid + 1
        if lo' <= hi
        then loop x v lo' hi
        else Nothing
    else Just mid

This suggests that we provided an index that might be out of bounds and the error message confirms that:

   Inferred type
     VV : {VV : Int | VV == mid}
  
   not a subtype of Required type
     VV : {VV : Int | VV >= 0 && VV < vlen v}

This is Liquid Haskell's way of saying: "I can't prove that mid is within the vector bounds". Liquid Haskell will even explain its reasoning process, showing what refinements were in scope for that expression:

   In Context
     VV  : {VV : Int | VV == mid}
     v   : {v : (Vector a) | 0 <= vlen v}
     ?a  : {?a : Int | ?a == lo + hi}     
     hi  : Int
     mid : {mid : Int | mid == ?a / ?b && (?a >= 0 && ?b >= 0 => mid >= 0) &&
(?a >= 0 && ?b >= 1 => mid <= ?a)}
     ?b  : {?b : Int | ?b == 2}
     lo  : Int

This is an example of the first class of errors: missing preconditions. Liquid Haskell can't read our comment so Liquid Haskell has no way of knowing that lo and hi are within the vector bounds. Therefore, Liquid Haskell can't conclude that their midpoint, mid, is also within the vector bounds.

This is very easy to fix. We can transform our unstructured comment describing preconditions into a refined type for the loop function:

{-@
loop
    :: Ord a
    => x  : a
    -> v  : Vector a
    -> lo : { lo : Int | 0  <= lo && lo < vlen v }
    -> hi : { hi : Int | lo <= hi && hi < vlen v }
    -> Maybe Int
@-}
loop :: Ord a => a -> Vector a -> Int -> Int -> Maybe Int

Now our documentation is machine-checked and will never go out of date!

Refining our input

Now we get several new errors when we run our code through the liquid type-checker. I've only included the first error message for brevity:

 binarysearch.hs:8:29: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == ?a}
  
   not a subtype of Required type
     VV : {VV : Int | 0 <= VV && VV < vlen v}
  
   In Context
     VV : {VV : Int | VV == ?a}
     v  : {v : (Vector a) | 0 <= vlen v}
     ?a : {?a : Int | ?a == 0}

This points to a new subexpression:

binarySearch :: Ord a => a -> Vector a -> Maybe Int
binarySearch x v = loop x v 0 (Vector.length v - 1)
--               ERROR HERE ^

Liquid Haskell incorporated the new information we supplied through the refined type. The type-checker then worked backwards from loop's preconditions and found a problem in our code via the following reasoning process:

  • The type of loop says that the lo argument must be less than the length of the vector
  • We supplied 0 for the lo argument to loop
  • Therefore the Vector v needs to be at least length 1
  • But we never proved that the Vector has at least one element!

This is an example of the third class of errors: a genuine bug. I introduced this bug when I first wrote up this algorithm and Liquid Haskell caught my mistake. I never thought about the case where the Vector was empty. Many programmers smarter than me would carefully consider the corner case where the Vector was empty, but I use tools like Liquid Haskell so that I don't need to be smart (or careful).

In this case we don't want to refine the type of binarySearch to require non-empty Vector inputs since we want to support binary searches for Vectors of all sizes. Instead, we will add a special case to handle an empty Vector input:

binarySearch :: Ord a => a -> Vector a -> Maybe Int
binarySearch x v =
    if Vector.length v == 0
    then Nothing
    else loop x v 0 (Vector.length v - 1)

Refining our output

Now the type-checker gives a new error message:

 binarysearch.hs:25:16-18: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == mid}
  
   not a subtype of Required type
     VV : {VV : Int | VV >= 0 && VV < vlen v}
  
   In Context
     VV  : {VV : Int | VV == mid}
     v   : {v : (Vector a) | 0 <= vlen v}
     ?a  : {?a : Int | ?a == lo + hi}
     hi  : {hi : Int | lo <= hi && hi < vlen v}
     mid : {mid : Int | mid == ?a / ?b && (?a >= 0 && ?b >= 0 => mid >= 0) &&
(?a >= 0 && ?b >= 1 => mid <= ?a)}
     lo  : {lo : Int | 0 <= lo && lo < vlen v}
     ?b  : {?b : Int | ?b == 2}

Let's see where this points:

loop :: Ord a => a -> Vector a -> Int -> Int -> Maybe Int
loop x v lo hi = do
    let mid = (lo + hi) `div` 2
    if x < v ! mid
    --         ^^^ ERROR HERE
    then do
        let hi' = mid - 1
        if lo <= hi'
        then loop x v lo hi'
        else Nothing
    else if v ! mid < x
    then do
        let lo' = mid + 1
        if lo' <= hi
        then loop x v lo' hi
        else Nothing
    else Just mid

This is the same as the first error message! WHAT GIVES?!?!

Liquid Haskell already knows that both lo and hi are within the Vector bounds, but cannot deduce that their midpoint must also lie within the Vector bounds.

This is an example of the second class of errors: missing postconditions. In this case the issue here is a deficiency of Liquid Haskell's built-in Prelude. Without going into too many details, the built-in refinement for the div function uses integer division, but for some reason type-level division does not provide sufficient information for Liquid Haskell to deduce that the midpoint lies between lo and hi. I'm actually still in the process of narrowing downthe precise problem before reporting this on the Liquid Haskell issue tracker, so I may be interpreting this problem incorrectly.

Edit: Rhanjit Jhala explained that you can fix this by supplying the --real flag or adding this pragma to the top of the file:

{-@ LIQUID "--real" @-}

import Data.Vector as Vector

...

Once you supply this flag then the postcondition for div correctly satisfies the type-checker. The previous version of this post used an assume to enforce the relevant postcondition, which is no longer necessary with this flag.

Termination checking

We're now down to two error messages, the first of which is:

 binarysearch.hs:31:23-24: Error: Liquid Type Mismatch
   Inferred type
     VV : {VV : Int | VV == lo}
  
   not a subtype of Required type
     VV : {VV : Int | 0 <= VV && VV < vlen v && VV < lo && VV >= 0}
  
   In Context
     VV : {VV : Int | VV == lo}
     v  : {v : (Vector a) | 0 <= vlen v}
     lo : {lo : Int | 0 <= lo && lo < vlen v}

This one took me an hour to figure out and the error message originates from Liquid Haskell's termination checker! I only figured this error out because I already knew that Liquid Haskell had a built-in termination checker and even then I had to first minimize the code example before I fully understood the problem.

Liquid Haskell supports termination checking by default, which is actually pretty awesome despite the confusing error. Termination checking means that Liquid Haskell verifies that our code never endlessly loops. Or in other words, Liquid Haskell transforms Haskell into a "total" programming language (i.e. a programming language where computation always halts). You can also disable the termination checker completely or disable the checker for selected functions if you find the check too restrictive.

The termination checker proves termination by looking for some sort of "well-founded metric" that shows that the function will eventually terminate. This "well-founded metric" is usually an Int that decreases each time the function recurses and the recursion halts when the Int reaches 0.

By default, Liquid Haskell guesses and tries to use the first Int argument to the recursive function as the well-founded metric, which was the lo argument in our example. You get an error message like the above when Liquid Haskell guesses wrong. In our case, lo is not a suitable "well-founded" metric because lo does not decrease every time the function recurses. Quite the opposite: lo either stays the same or increases.

loop x v lo hi = do
    let mid = (lo + hi) `div` 2
    if x < v ! mid
    then do
        let hi' = mid - 1
        if lo <= hi'
        then loop x v lo hi'  -- `lo` stays the same
        else Nothing
    else if v ! mid < x
    then do
        let lo' = mid + 1
        if lo' <= hi
        then loop x v lo' hi  -- `lo` increases
        else Nothing
    else Just mid

The hi argument is also not a suitable well-founded metric. hi never increases, but hi does not necessarily decrease either:

loop x v lo hi = do
    let mid = (lo + hi) `div` 2
    if x < v ! mid
    then do
        let hi' = mid - 1
        if lo <= hi'
        then loop x v lo hi'  -- `hi` decreases
        else Nothing
    else if v ! mid < x
    then do
        let lo' = mid + 1
        if lo' <= hi
        then loop x v lo' hi  -- `hi` stays the same
        else Nothing
    else Just mid

However, we are not limited to using arguments as well-founded metrics. We can create our own custom metric that is a composite of the given arguments. In this case we do know that hi - lo always decreases on every iteration and we can instruct Liquid Haskell to use that as the well-founded metric using the following syntax:

{-@
loop
    :: Ord a
    => x  : a
    -> v  : Vector a
    -> lo : { lo : Int | 0  <= lo && lo < vlen v }
    -> hi : { hi : Int | lo <= hi && hi < vlen v }
    -> Maybe Int
    / [hi - lo]  -- The well-founded metric
@-}

SAFE

Once we add the well-founded metric the type-checker accepts our program!!!

stack exec liquid binarysearch.hs 
LiquidHaskell Copyright 2009-15 Regents of the University of California. All Rights Reserved.


**** DONE:  Extracted Core using GHC *******************************************
 

**** DONE:  generateConstraints ************************************************
 

**** START: fixpoint ***********************************************************
 


========================================================
© Copyright 2009 Regents of the University of California.
All Rights Reserved.
========================================================
........................../
SAT

**** DONE:  fixpoint ***********************************************************
 

**** DONE:  solve **************************************************************
 

**** DONE:  annotate ***********************************************************
 

**** SAFE **********************************************************************

That's it! We're done. Liquid Haskell analyzed the flow of our program and statically verified two properties about our program:

  • Our program will never fail from an out-of-bounds index error
  • Our program will never loop endlessly

The only assistance we had to provide was to document preconditions and postconditions in the types and the type-checker automated the rest of the proof.

This means that we can safely disable bounds checking by replacing (!) with Vector.unsafeIndex. Compile-time memory safety already rocks, but getting a free speed improvement is icing on the cake.

Edit: /u/f2u on /r/haskell points out that there is still a bug: the code can fail due to integer overflow.

You can further eliminate the possibility of integer overflow through the use of refinement types, but Liquid Haskell does not provide these extra refinements to protect against overflow by default. Instead you must opt-in to them by adding your own refinements.

The solution to prevent overflow is to replace this:

(lo + hi) `div` 2

... with this:

lo + (hi - lo) `div` 2

Exercise

If you remember correctly, our original code had a postcondition:

{-| Postconditions:

    * If the result is `Just i`, then `0 <= i < length v`
-}
binarySearch :: Ord a => a -> Vector a -> Maybe Int
binarySearch x v = loop x v 0 (Vector.length v - 1)

Try modifying the final code example (See the Appendix) to add this postcondition as a refinement to the type of binarySearch. Hint: you will also need to further refine the type of loop as well.

You will not need to assume any additional information if you do this correctly. The type-checker should be able to verify the newly added postcondition from the existing preconditions.

Conclusion

If you would like to learn more about Liquid Haskell, I highly recommend the official Liquid Haskell tutorial. This tutorial is well-written, comprehensive, and packed with practical examples:

Liquid Haskell is designed in the same spirit as Haskell: check as much as possible with as little input from the programmer. Despite all the bugs I run into, I keep coming back to this tool because of the very high power-to-weight ratio for formal verification.

Also note that the final program is still valid Haskell code so we don't sacrifice any compatibility with the existing Haskell toolchain by using Liquid Haskell.

Appendix

Here is the complete final program that type-checked, for reference:

{-@ LIQUID "--real" @-}

import Data.Vector as Vector

{-| Postconditions:

    * If the result is `Just i`, then `0 <= i < length v`
-}
binarySearch :: Ord a => a -> Vector a -> Maybe Int
binarySearch x v =
    if Vector.length v == 0
    then Nothing
    else loop x v 0 (Vector.length v - 1)

{-@
loop
    :: Ord a
    => x  : a
    -> v  : Vector a
    -> lo : { lo : Int | 0  <= lo && lo < vlen v }
    -> hi : { hi : Int | lo <= hi && hi < vlen v }
    -> Maybe Int
    / [hi - lo]
@-}
loop :: Ord a => a -> Vector a -> Int -> Int -> Maybe Int
loop x v lo hi = do
    let mid = (lo + hi) `div` 2
    if x < v ! mid
    then do
        let hi' = mid - 1
        if lo <= hi'
        then loop x v lo hi'
        else Nothing
    else if v ! mid < x
    then do
        let lo' = mid + 1
        if lo' <= hi
        then loop x v lo' hi
        else Nothing
    else Just mid

Wednesday, December 9, 2015

How to contribute to the Haskell ecosystem

I wanted to share a few quick ways that beginning Haskell programmers can contribute to the Haskell ecosystem. I selected these tasks according to a few criteria:

  • They are fun! These tasks showcase enjoyable tricks
  • They are easy! They straightforwardly apply existing libraries
  • They are useful! You can probably find something relevant to your project

For each task I'll give a brief end-to-end example of what a contribution might look like and link to relevant educational resources.

This post only assumes that you have the stack build tool installed, which you can get from haskellstack.com. This tool takes care of the rest of the Haskell toolchain for you so you don't need to install anything else.

Contribution #1: Write a parser for a new file format

Writing parsers in Haskell is just about the slickest thing imaginable. For example, suppose that we want to parse the PPM "plain" file format, which is specified like this [Source]:

Each PPM image consists of the following:

  1. A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P3".
  2. Whitespace (blanks, TABs, CRs, LFs).
  3. A width, formatted as ASCII characters in decimal.
  4. Whitespace.
  5. A height, again in ASCII decimal.
  6. Whitespace.
  7. The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
  8. A single whitespace character (usually a newline).
  9. A raster of Height rows, in order from top to bottom. Each row consists of Width pixels, in order from left to right. Each pixel is a triplet of red, green, and blue samples, in that order. Each sample is represented as an ASCII decimal number.

The equivalent Haskell parser reads almost exactly like the specification:

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad (guard)
import Data.Attoparsec.Text

data PPM = PPM
    { width             :: Int
    , height            :: Int
    , maximumColorValue :: Int
    , image             :: [[RGB]]
    } deriving (Show)

data RGB = RGB
    { red   :: Int
    , green :: Int
    , blue  :: Int
    } deriving (Show)

ppm3 :: Parser PPM
ppm6 = do
    "P3"
    skipMany1 space
    w <- decimal
    skipMany1 space
    h <- decimal
    skipMany1 space
    maxVal <- decimal
    guard (maxVal < 65536)
    space
    let sample = do
            lo <- decimal
            skipMany1 space
            return lo
    let pixel = do
            r <- sample
            g <- sample
            b <- sample
            return (RGB r g b)

    rows <- count h (count w pixel)
    return (PPM w h maxVal rows)

We can try to test our parser out on the following example file:

$ cat example.ppm
P6
4 4
255
0  0  0   100 0  0       0  0  0    255   0 255
0  0  0    0 255 175     0  0  0     0    0  0
0  0  0    0  0  0       0 15 175    0    0  0
255 0 255  0  0  0       0  0  0    255  255 255

We don't even have to compile a program to test our code. We can load our code into the Haskell REPL for quick feedback on whether or not our code works:

$ stack ghci attoparsec --resolver=lts-3.14
...
Prelude> :load ppm.hs
[1 of 1] Compiling Main             ( ppm.hs, interpreted )
Ok, modules loaded: Main.
*Main> txt <- Data.Text.IO.readFile "example.ppm"
*Main> parseOnly ppm3 txt
Right (PPM {width = 4, height = 4, maximumColorValue = 255, 
image = [[RGB {red = 0, green = 0, blue = 0},RGB {red = 100,
 green = 0, blue = 0},RGB {red = 0, green = 0, blue = 0},RGB
 {red = 255, green = 0, blue = 255}],[RGB {red = 0, green = 
0, blue = 0},RGB {red = 0, green = 255, blue = 175},RGB {red
 = 0, green = 0, blue = 0},RGB {red = 0, green = 0, blue = 0
}],[RGB {red = 0, green = 0, blue = 0},RGB {red = 0, green =
 0, blue = 0},RGB {red = 0, green = 15, blue = 175},RGB {red
 = 0, green = 0, blue = 0}],[RGB {red = 255, green = 0, blue
 = 255},RGB {red = 0, green = 0, blue = 0},RGB {red = 0, gre
en = 0, blue = 0},RGB {red = 255, green = 255, blue = 255}]]
})

Works like a charm!

You can very quickly get your hands dirty with Haskell by writing a parser that converts a file format you know and love into a more structured data type.

To learn more about parser combinators in Haskell, I highly recommend this "functional pearl":

... as well as this attoparsec tutorial:

To see a "long form" example of attoparsec, check out this HTTP request parser written using attoparsec:

I use "long form" in quotes because the entire code is around 60 lines long.

Contribution #2: Write a useful command-line tool

Haskell's turtle library makes it very easy to write polished command-line tools in a tiny amount of code. For example, suppose that I want to build a simple comand-line tool for managing a TODO list stored in a todo.txt file. First I just need to provide a subroutine for displaying the current list:

{-# LANGUAGE OverloadedStrings #-}

import Turtle

todoFile = "TODO.txt"

todoItem = d%": "%s

display :: IO ()
display = sh (do
    (n, line) <- nl (input todoFile)
    echo (format todoItem n line) )

... a subroutine for adding an item to the list:

add :: Text -> IO ()
add txt = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (input todoFile <|> pure txt)
    mv tempfile todoFile )

... and a subroutine for removing an item from the list:

remove :: Int -> IO ()
remove m = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (do
        (n, line) <- nl (input todoFile)
        guard (m /= n)
        return line )
    mv tempfile todoFile )

... then I can just wrap them in a command line API. I create a command line parser that runs display by default if the command line is empty:

parseDisplay :: Parser (IO ())
parseDisplay = pure display

... then a command line parser for the add subcommand:

parseAdd :: Parser (IO ())
parseAdd =
    fmap add
        (subcommand "add" "Add a TODO item"
            (argText "item" "The item to add to the TODO list") )

... and a command line parser for the remove subcommand:

parseRemove :: Parser (IO ())
parseRemove =
    fmap remove
        (subcommand "rm" "Remove a TODO item"
            (argInt "index" "The numeric index of the TODO item to remove") )

Finally, I combine them into a single composite parser for all three subcommands:

parseCommand :: Parser (IO ())
parseCommand = parseDisplay <|> parseAdd <|> parseRemove

... and run the parser:

main = do
    command <- options "A TODO list manager" parseCommand
    exists  <- testfile todoFile
    when (not exists) (touch todoFile)
    command

... and I'm done! That's the full program:

{-# LANGUAGE OverloadedStrings #-}

import Turtle

todoFile = "TODO.txt"

todoItem = d%": "%s

display :: IO ()
display = sh (do
    (n, line) <- nl (input todoFile)
    echo (format todoItem n line) )

add :: Text -> IO ()
add txt = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (input todoFile <|> pure txt)
    mv tempfile todoFile )

remove :: Int -> IO ()
remove m = runManaged (do
    tempfile <- mktempfile "/tmp" "todo"
    output tempfile (do
        (n, line) <- nl (input todoFile)
        guard (m /= n)
        return line )
    mv tempfile todoFile )

parseDisplay :: Parser (IO ())
parseDisplay = pure display

parseAdd :: Parser (IO ())
parseAdd =
    fmap add
        (subcommand "add" "Add a TODO item"
            (argText "item" "The item to add to the TODO list") )

parseRemove :: Parser (IO ())
parseRemove =
    fmap remove
        (subcommand "rm" "Remove a TODO item"
            (argInt "index" "The numeric index of the TODO item to remove") )

parseCommand :: Parser (IO ())
parseCommand = parseDisplay <|> parseAdd <|> parseRemove

main = do
    command <- options "A TODO list manager" parseCommand
    exists <- testfile todoFile
    when (not exists) (touch todoFile)
    command

We can compile that program into an native binary on any platform (i.e. Windows, OS X, or Linux) with a fast startup time:

$ stack build turtle --resolver=lts-3.14
$ stack ghc --resolver=lts-3.14 -- -O2 todo.hs

... and verify that the program works:

$ ./todo add "Brush my teeth"
$ ./todo add "Shampoo my hamster"
$ ./todo
0: Brush my teeth
1: Shampoo my hamster
$ ./todo rm 0
$ ./todo
0: Shampoo my hamster

The program also auto-generates the usage and help information:

$ ./todo --help
A TODO list manager

Usage: todo ([add] | [rm])

Available options:
  -h,--help                Show this help text

Available commands:
  add
  rm
$ ./todo add
Usage: todo add ITEM
$ ./todo rm
Usage: todo rm INDEX

Amazingly, you can delete all the type signatures from the above program and the program will still compile. Try it! Haskell's type inference and fast type-checking algorithm makes it feel very much like a scripting language. The combination of type inference, fast startup time, and polished command line parsing makes Haskell an excellent choice for writing command-line utilities.

You can learn more about scripting in Haskell by reading the turtle tutorial, written for people who have no prior background in Haskell programming:

Contribution #3: Client bindings to a web API

Haskell's servant library lets you write very clean and satisfying bindings to a web API. For example, suppose that I want to define a Haskell client to to the JSONPlaceholder test API. We'll use two example endpoints that the API provides.

A GET request against the /posts endpoint returns a list of fake posts:

[
  {
    "userId": 1,
    "id": 1,
    "title": "sunt aut facere repellat ..."
    "body": "quia et suscipit\nsuscipit ..."
  },
  {
    "userId": 1,
    "id": 2,
    "title": "qui est esse",
    "body": "est rerum tempore vitae\nsequi ..."
  },
...

... and a POST request against the same endpoint accepts a list of posts and returns them back as the response.

To write a client binding to this API, we just need to define a record representing APost:

data APost = APost
    { userId :: Int
    , id     :: Int
    , title  :: Text
    , body   :: Text
    } deriving (Show, Generic, FromJSON, ToJSON)

The last line instructs the Haskell compiler to auto-derive conversion functions between APost and JSON.

Now we just encode the REST API as a type:

-- We can `GET` a list of posts from the `/posts` endpoint
type GetPosts =                            "posts" :> Get  '[JSON] [APost]

-- We can `POST` a list of posts to the `/posts` endpoint
-- using the request body and get a list of posts back as
-- the response
type PutPosts = ReqBody '[JSON] [APost] :> "posts" :> Post '[JSON] [APost]

type API = GetPosts :<|> PutPosts

... and then the compiler will "automagically" generate API bindings:

getPosts :<|> putPosts =
    client (Proxy :: Proxy API) (BaseUrl Http "jsonplaceholder.typicode.com" 80)

Now anybody can use our code to GET or POST lists of posts. We can also quickly test out our code within the Haskell REPL to verify that everything works:

$ stack ghci servant-server servant-client --resolver=lts-3.14
ghci> :load client.hs
[1 of 1] Compiling Main             ( httpbin.hs, interpreted )
Ok, modules loaded: Main.
*Main> import Control.Monad.Trans.Either as Either
*Main Either> -- Perform a `GET` request against the `/posts` endpoint
*Main Either> runEitherT getPosts
Right [APost {userId = 1, id = 1, title = "sunt aut facere ...
*Main Either> -- Perform a `POST` request against the `/posts` endpoint
*Main Either> runEitherT (putPosts [APost 1 1 "foo" "bar"])
Right [APost {userId = 1, id = 1, title = "foo", body = "bar"}]

Here's the full code with all the extensions and imports that enable this magic:

{-# LANGUAGE DataKinds         #-}
{-# LANGUAGE DeriveGeneric     #-}
{-# LANGUAGE DeriveAnyClass    #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeOperators     #-}

import Data.Aeson (FromJSON, ToJSON)
import Data.Text (Text)
import GHC.Generics (Generic)
import Servant
import Servant.Client

data APost = APost
    { userId :: Int
    , id     :: Int
    , title  :: Text
    , body   :: Text
    } deriving (Show, Generic, FromJSON, ToJSON)

type GetPosts =                            "posts" :> Get  '[JSON] [APost]
type PutPosts = ReqBody '[JSON] [APost] :> "posts" :> Post '[JSON] [APost]
type API      = GetPosts :<|> PutPosts

getPosts :<|> putPosts =
    client (Proxy :: Proxy API) (BaseUrl Http "jsonplaceholder.typicode.com" 80)

To learn more about how this works, check out the servant tutorial here:

Note that servant is both a client and server library so everything you learn about auto-generating client side bindings can be reused to auto-generate a server, too!

To see a more long-form example of bindings to the Google Translate API, check out this code:

Conclusion

Suppose that you write up some useful code and you wonder: "What's next? How do I make this code available to others?". You can learn more by reading the stack user guide which contains complete step-by-step instructions for authoring a new Haskell project, including beginning from a pre-existing project template: