Friday, November 1, 2013

Test stream programming using Haskell's `QuickCheck`

pipes is a stream programming library built on top of a foundation of basic category theory. The core of the library consists of a set of five categories that all intersect in a single streaming data type and the library's contract is the set of associated category laws.

For example, one such category is the "respond category", which pipes uses to implement for loops and ListT. The two key operations are (/>/), which is the category composition operator, and respond, which is the identity. These must satisfy the following category laws:

respond />/ f = f                  -- Left Identity

f />/ respond = f                  -- Right Identity

(f />/ g) />/ h = f />/ (g />/ h)  -- Associativity

Previously, I described how I manually proved the category laws using equational reasoning, which elicited a strong response from readers that I should also verify the laws empirically using Haskell's QuickCheck library. After all, Haskell's QuickCheck library shines at automated property-based testing. Why not just fire up QuickCheck and test something like:

> quickCheck $ \f -> f />/ respond == f

However, this leads to several problems:

  • You can't compare pipes for Equality
  • QuickCheck can't Show pipes when it discovers a counter-example
  • You can't generate Arbitrary pipes

The latter is the most challenging problem of the three to solve: how do we generate random pipes to test? This has to be done in such a way that it exercises the system and efficiently discovers corner cases.

Randomizing pipes

I decided to try encoding a random pipe as random sequence of operations (specifically Kleisli arrows). This actually works out quite well, because we already have two natural operations built into the core bidirectional API: request and respond. Both of them allow information to pass through twice, as illustrated by the following ASCII diagrams:

request :: Monad m => a' -> Proxy a' a y' y m a

          a'
          |
     +----|----+
     |    |    |
 a' <=====/    |
     |         |
 a  ======\    |
     |    |    |
     +----|----+
          v
          a


respond :: Monad m => a  -> Proxy x' x a' a m a'

          a
          |
     +----|----+
     |    |    |
     |    \ /==== a'
     |     X   |
     |    / \===> a
     |    |    |
     +----|----+
          v 
          a'

If we generate a random sequence of them them we just connect them head to toe. For simplicity, I will just assume that all inputs and outputs are of type Int:

pRandom1 = respond >=> request >=> respond

         Int
          |
     +----|----+
     |    |    |
     |    \ /==== Int
     |     X   |
     |    / \===> Int
     |    |    |
     +----|----+
          v 
         Int
          |
     +----|----+
     |    |    |
Int <=====/    |
     |         |
Int ======\    |
     |    |    |
     +----|----+
          v
         Int
          |
     +----|----+
     |    |    |
     |    \ /==== Int
     |     X   |
     |    / \===> Int
     |    |    |
     +----|----+
          v 
         Int

When we compose proxies using something like (>+>) (the bidirectional generalization of (>->)), we conceptually place these random chains side by side and match inputs with outputs. For example, if we generate the following two random sequences of request and respond:

pRandom1 = respond >=> request >=> respond
pRandom2 = request >=> request

... then this generates the following flow of information, beginning from the top-right Int.

pRandom1 >+> pRandom2:

                                  Int
                                   |
                              +----|----+
                              |    |    |
         Int<======================/    |
          |                   |         |
     +----|----+            /======\    |
     |    |    |           /  |    |    |
     |    \ /==== Int <=\ /   +----|----+
     |     X   |         X         |
     |    / \===> Int ==/ \        v
     |    |    |          |       Int
     +----|----+          |        |
          v               \   +----|----+
         Int               \  |    |    |
          |                 \======/    |
     +----|----+              |         |
     |    |    |            /======\    |
... <=====/    |           /  |    |    |
     |         |          /   +----|----+
... ======\    |          |        v
     |    |    |          |       Int
     +----|----+          |
          v               |
         Int              |
          |               |
     +----|----+          |
     |    |    |          |
     |    \ /==== Int     /
     |     X   |         /
     |    / \===> Int ==/
     |    |    |
     +----|----+
          v 
         Int

This shows how you can easily generate complex flows of information to exercise the API with just a few simple permutations of request and respond.

However, notice how the Int flows through the pipeline without interruption until the pipeline terminates. This means that unless we do something else we will always return the same Int we began with.

So we add two additional actions to our random repertoire which will provide diagnostic information about which paths we took. The first one is inc, which increments the Int flowing through it:

inc :: (Monad m) => Int -> m Int
inc n = return (n + 1)

The second is log, which stores the current value of the Int using a Writer [] base monad and reforwards the Int further along:

log :: Int -> Proxy a' a b' b (Writer [Int]) Int
log n = do
    lift (tell [n])
    return n

Now we can easily generate random Proxys just by assembling random chains of these four actions:

pRandom = log >=> inc >=> request >=> inc >=> inc >=> ...

We can use this same trick to generate arbitrary Servers and Clients (i.e. one-sided Proxys) by just selectively removing either request or respond from the list of permissible actions.

Once we have random Proxys, Servers, and Clients, we can test for equality by testing the values that each side returns and logs when given an initial value of 0:

p1 === p2 = run p1 == run p2
  where
    run p = runWriter (runEffect (p 0))

infix 4 ===

This comparison is pure because the Writer [] base monad is pure, so we can pass it as suitable property that QuickCheck can test. Well, almost...

We also need to be able to Show the randomized values that we selected so that QuickCheck can print any counter-examples it discovers. The solution, though, is pretty simple. We can use an intermediate representation that is just an enumeration. This just stores placeholders for each action in our chain, and these placeholders are Showable:

-- Place holder for a single `Proxy` action
data ProxyStep
    = ProxyRequest
    | ProxyRespond
    | ProxyLog
    | ProxyInc
    deriving (Enum, Bounded)

instance Arbitrary ProxyStep where
    arbitrary = arbitraryBoundedEnum
    shrink _  = []

instance Show ProxyStep where
    show x = case x of
        ProxyRequest -> "request"
        ProxyRespond -> "respond"
        ProxyLog     -> "log"
        ProxyInc     -> "inc"

-- Place holder for a single `Client` action
data ClientStep
    = ClientRequest
    | ClientLog
    | ClientInc
    deriving (Enum, Bounded)

instance Arbitrary ClientStep where
    arbitrary = arbitraryBoundedEnum
    shrink _  = []

instance Show ClientStep where
    show x = case x of
        ClientRequest -> "request"
        ClientLog     -> "log"
        ClientInc     -> "inc"

-- Place holder for a single `Server` action
data ServerStep
    = ServerRespond
    | ServerLog
    | ServerInc
    deriving (Enum, Bounded)

instance Arbitrary ServerStep where
    arbitrary = arbitraryBoundedEnum
    shrink _  = []

instance Show ServerStep where
    show x = case x of
        ServerRespond -> "respond"
        ServerLog     -> "log"
        ServerInc     -> "inc"

Then we tell QuickCheck to generate random lists of these actions which will form our complete Proxys, Clients, or Servers:

correct :: String -> String
correct str = case str of
    [] -> "return"
    _  -> str

-- Place holder for a `Proxy`
newtype AProxy = AProxy { unAProxy :: [ProxyStep] }

instance Arbitrary AProxy where
    arbitrary = fmap AProxy arbitrary
    shrink = map AProxy . shrink . unAProxy

instance Show AProxy where
    show = correct . intercalate " >=> " . map show . unAProxy

-- Place holder for a `Client`
newtype AClient = AClient { unAClient :: [ClientStep] }

instance Arbitrary AClient where
    arbitrary = fmap AClient arbitrary
    shrink = map AClient . shrink . unAClient

instance Show AClient where
    show = correct . intercalate " >=> " . map show . unAClient

-- Place holder for a `Server`
newtype AServer = AServer { unAServer :: [ServerStep] }

instance Arbitrary AServer where
    arbitrary = fmap AServer arbitrary
    shrink = map AServer . shrink . unAServer

instance Show AServer where
    show = correct . intercalate " >=> " . map show . unAServer

Once QuickCheck generates these randomized lists of actions we can convert them to honest-to-god Proxys/Servers/Clients for testing purposes using the following conversion functions:

aProxy
    :: AProxy -> Int -> Proxy Int Int Int Int (Writer [Int]) Int
aProxy = foldr (>=>) return . map f . unAProxy
  where
    f x = case x of
        ProxyRequest -> request
        ProxyRespond -> respond
        ProxyLog     -> log
        ProxyInc     -> inc

aClient :: AClient -> Int -> Client Int Int (Writer [Int]) Int
aClient = foldr (>=>) return . map f . unAClient
  where
    f x = case x of
        ClientRequest -> request
        ClientLog     -> log
        ClientInc     -> inc

aServer :: AServer -> Int -> Server Int Int (Writer [Int]) Int
aServer = foldr (>=>) return . map f . unAServer
  where
    f x = case x of
        ServerRespond -> respond
        ServerLog     -> log
        ServerInc     -> inc

Test-driving pipes

Combined with the previous (===) function for testing equality, I can now test high-level properties like associativity:

main = quickCheck $ \p1' p2' p3' ->
    let p1 = aServer p1'
        p2 = aProxy  p2'
        p3 = aClient p3'
    in  p1 >+> (p2 >+> p3) === (p1 >+> p2) >+> p3

QuickCheck then generates 100 random test cases and verifies that all of them obey the associativity law:

>>> main
++ OK, passed 100 tests.

However, this is still not enough. Perhaps my randomization scheme is simply not exercising corner cases sufficiently well. To really convince myself that I have a good randomization scheme I must try a few negative controls to see how effectively QuickCheck uncovers property violations.

For example, let's suppose that some enterprising young NSA employee were to try to commit a modification to the identity pipe pull' to try to log the second value flowing upstream. We could set up a test case to warn us if the modified pull' function failed to obey the identity law:

main = quickCheck $ \p1' p2' ->
    let p1    = aServer p1'

        -- The maliciously modified `pull` function
        pull' = request >=> respond >=> log >=> pull'

        p2    = aClient p2'
    in  p1 >+> (pull' >+> p2) === p1 >+> p2

QuickCheck would then snoop this out in a hurry and catch the law violation.

*** Failed! Falsifiable (after 12 tests and 6 shrinks):    
respond
request >=> request

Not only does QuickCheck detect violations, but it also goes out of its way to minimize the violation to the minimum reproducing test case. In this case, the way you read the QuickCheck output is that the minimal code necessary to trigger the violation is when:

p1 = respond
p2 = request >=> request

In other words, QuickCheck detects a spurious log on the left-hand side of the equation if p2 requests two values and p1 responds with at least one value. Notice that if p1 did not respond with at least one value then the left pipeline would terminate before p2's second request and avoid triggering the log statement.

QuickCheck can do this kind of minimization because of purity. Since our test case is pure, QuickCheck can safely run it repeatedly as it tries to shrink the counter-example, without having to worry that repeated runs will interfere with each other because of side effects or statefulness.

Here's another example, where we accidentally wrote pull wrong and inserted one extra request too many:

main = quickCheck $ \p1' p3' ->
    let p1    = aServer p1'
        pull' = request >=> respond >=> request >=> pull
        p3    = aClient p3'
    in  p1 >+> (pull' >+> p3) === p1 >+> p3

QuickCheck immediately pulls out this clever counter-example:

*** Failed! Falsifiable (after 15 tests and 8 shrinks):    
respond >=> respond
request >=> request >=> inc

This instructs p1 and p2 to exchange information twice. This triggers pull' to accidentally request one value too many after the second exchange and terminate early before p2 can call inc.

Examples like these give me confidence that permutations on these four actions suffice to build most useful counter-examples. I could probably even narrow it down to three commands by eliminating log, but for now I will keep it.

Conclusion

I believe this is the first example of a useful Arbitrary instance for randomizing a stream programming data type. This allows pipes to test more powerful properties than most stream programming libraries, which would normally settle for randomizing input to the system instead of randomizing the flow of control.

I want to thank Csernik Flaviu Andrei who took the time to write up all the pipes laws into a QuickCheck suite integrated with cabal test. Thanks to him you can now have even greater confidence in the correctness of the pipes library.

The next step would be to machine-check the pipes laws using Agda, which would provide the greatest assurance of correctness.