Monday, July 4, 2016

Auto-generate service API endpoints from records

Haskell has pretty cool support for code generation from data type definitions using GHC generics. So I thought: "why not generate a service from a data type?".

The basic idea is pretty simple. Given a data type definition like this:

data Command
    = Create { filepath :: FilePath, contents :: String }
    | Delete { filepath :: FilePath }

... we'll auto-generate two API endpoints:

  • /create?filepath=:string&contents=:string
  • /delete?filepath=:string

Each endpoint accepts query parameters matching the fields for their respective constructors:

$ curl 'localhost:8080/create?filepath=test.txt&contents=ABC'
"File created"
$ cat test.txt
ABC
$ curl 'localhost:8080/delete?filepath=test.txt'
"File deleted"
$ cat test.txt
cat: test.txt: No such file or directory

The complete code to build the server looks like this:

{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric  #-}

import Server.Generic
import System.Directory (removeFile)

data Command
    = Create { filepath :: FilePath, contents :: String }
    | Delete { filepath :: FilePath }
    deriving (Generic, ParseRecord)

handler :: Command -> IO String
handler (Create file text) = do
    writeFile file text
    return "File created"
handler (Delete file) = do
    removeFile file
    return "File deleted"

main :: IO ()
main = serveJSON 8080 handler

You can test it yourself by running:

$ stack build server-generic
$ stack runghc AboveExample.hs

... and then in a separate terminal you can hit each endpoint with curl as illustrated above.

GHC Generics

The Haskell magic is in this one line of code:

    deriving (Generic, ParseRecord)

This auto-generates code that tells the server how to marshal the route and query parameters into our Command data type. All we have to do is supply a handler that pattern matches on the incoming Command to decide what to do:

handler :: Command -> IO String
handler (Create file text) = do
    writeFile file text
    return "File created"
handler (Delete file) = do
    removeFile file
    return "File deleted"

You can read that as saying:

  • "If a client hits the /create endpoint, create the specified file"
  • "If a client hits the /delete endpoint, delete the specified file"

As an exercise, you can try modifying the handler to respond with the name of the file that was created or deleted.

However, you're not limited to query parameters. You can also parse data from path tokens by just omitting field labels from the data type:

{-# LANGUAGE DeriveGeneric #-}

import Server.Generic

data Command
    = Add      Double Double
    | Multiply Double Double
    deriving (Generic)

instance ParseRecord Command

handler :: Command -> IO Double
handler (Add      x y) = return (x + y)
handler (Multiply x y) = return (x * y)

main :: IO ()
main = serveJSON 8080 handler

If you run the above code, you get a server that has two endpoints:

  • /add/:double/:double
  • /multiply/:double/:double

You can run the server and test that they work:

$ curl 'localhost:8080/add/2/3'
5
$ curl 'localhost:8080/multiply/2/3'
6

This library also intelligently handles optional and repeated fields in data type definitions. For example, suppose we serve this data type::

{-# LANGUAGE DeriveGeneric  #-}
{-# LANGUAGE DeriveAnyClass #-}

import Server.Generic

data Example = Example
    { list     :: [Int]
    , optional :: Maybe   Int
    , first    :: First   Int
    , last     :: Last    Int
    } deriving (Generic, ParseRecord, ToJSON)

handler :: Example -> IO Example
handler = return  -- Serve decoded value back to client as JSON

main :: IO ()
main = serveJSON 8080 handler

... then the server will echo back the decoded type as JSON, correctly handling absent or repeated fields:

$ curl 'localhost:8080/example'
{"list":[],"first":null,"last":null,"optional":null}
$ curl 'localhost:8080/example?optional=1&list=1&list=2&first=1&first=2&last=1&last=2'
{"list":[1,2],"first":1,"last":2,"optional":1}

Also, these repeated and optional annotations work for path components, too, in case you were wondering:

{-# LANGUAGE DeriveGeneric  #-}
{-# LANGUAGE DeriveAnyClass #-}

import Server.Generic

data Example = Example [Int] (Maybe Text)
    deriving (Generic, ParseRecord, ToJSON)

handler :: Example -> IO Example
handler = return

main :: IO ()
main = serveJSON 8080 handler

The above server does "the right thing" and doesn't need to be told where the Ints end and the Text begins:

$ curl 'localhost:8080/example'
[[],null]
$ curl 'localhost:8080/example/1/2/foo'
[[1,2],"foo"]
$ curl 'localhost:8080/example/1/2/3'
[[1,2,3],null]
$ curl 'localhost:8080/example/foo'
[[],"foo"]

The server uses backtracking when parsing the route so the server knows when the Ints end and the Text begins.

Types

The whole thing is strongly typed, which means several things in the context of service programming.

For example, if you define a data type that expects an Int field, then by golly your handler will get an Int field or the server will automatically reject the request for you. You don't have to worry about checking that the field is present nor do you need to validate that the parameter decodes to an Int correctly. If you want the parameter to be optional then you need to make that explicit by marking the field as type Maybe Int.

You also don't have to handle fields that belong to other endpoints. Each endpoint only gets exactly the fields it requested; no more, no less. If a given endpoint gets the wrong set of path tokens or query parameters then the server rejects the request for you.

This is also strongly typed in the sense that more logic is pushed into the type and less logic goes in the handler. If you want just the first or last occurrence of a query parameter, you just annotate the type with First or Last, respectively. The more logic you push into declarative type-level programming the more you distill your handler to focus on business logic.

Caveats

I wrote this library to provide a quick an easy way to spin up Haskell web services but the library could still use some improvement. I'm not really a web developer so I only kind of know what I'm doing and could use help from people more knowledgeable than me.

The most notable deficiency is that the library does not take care to serve proper HTTP status codes for different types of errors. Every failed request returns a 404 status code.

Also, if the route is malformed the error message is a completely unhelpful "404 Not Found" error message that doesn't indicate how to fix the error.

Another blatant deficiency is that the server completely ignores the request method. I wasn't sure how to design this to work within the framework of data type generic programming.

If you have ideas about how to improve things I would greatly welcome any contributions.

Conclusions

People familiar with Haskell will recognize that this library resembles the servant library in some respects. The high-level difference is that this is a subset of servant that is much simpler but also significantly less featureful. For example, servant can also generate client-side bindings and Swagger resource declarations and servant also permits a much greater degree of customization.

This library focuses primarily on simple quick-and-dirty services; for anything more polished and production-ready you will probably want to try other Haskell service libraries. I just wanted to make it as easy as possible for people to get started with back-end development in Haskell and also show off how cool and powerful GHC generics can be.

If you would like to learn more about this library you can read the tutorial or if you would like to use the library you can obtain the code from Hackage or Github.

2 comments:

  1. "Every request returns a 404 status code"

    Did you mean 200 OK? 404s everywhere seems bad.

    However, returning 404 on a bad path with no further help seems ok.

    Some kind of ability to fail with a specific error code is useful. Servant uses EitherT but you can also do it with an exception; I prefer the EitherT approach TBH.

    For quick and dirty things, I'm not sure stuff like the request method (and, relatedly, Content-type negotiation) is all that important. Do you really need to be able to GET and POST to the same path? One possibility could be to do something like defining a singleton type

    data Post = Post

    and then include that as a field of the constructors that represent paths that should be POSTed to.

    ReplyDelete
    Replies
    1. Whoops! That should read "Every *failed* request ...". I fixed it. Thanks for pointing that out.

      Yeah, I like the idea of possibly adding a methods field to constructors to determine the request methods they accept

      Delete