Saturday, June 10, 2017

Translating a C++ parser to Haskell

Recently I translated Nix's derivation parser to Haskell and I thought this would make an instructive example for how C++ idioms map to Haskell idioms. This post targets people who understand Haskell's basic syntax but perhaps have difficulty translating imperative style to a functional style. I will also throw in some benchmarks at the end, too, comparing Haskell performance to C++.

Nix derivations

Nix uses "derivations" to store instructions for how to build something. The corresponding C++ type is called a Derivation, which is located here:

struct Derivation : BasicDerivation
{
    DerivationInputs inputDrvs; /* inputs that are sub-derivations */

    /* Print a derivation. */
    std::string unparse() const;
};

... which in turn references this BasicDerivation type:

struct BasicDerivation
{
    DerivationOutputs outputs; /* keyed on symbolic IDs */
    PathSet inputSrcs; /* inputs that are sources */
    string platform;
    Path builder;
    Strings args;
    StringPairs env;

    virtual ~BasicDerivation() { };

    /* Return the path corresponding to the output identifier `id' in
       the given derivation. */
    Path findOutput(const string & id) const;

    bool willBuildLocally() const;

    bool substitutesAllowed() const;

    bool isBuiltin() const;

    bool canBuildLocally() const;

    /* Return true iff this is a fixed-output derivation. */
    bool isFixedOutput() const;

    /* Return the output paths of a derivation. */
    PathSet outputPaths() const;

};

We can translate the above C++ types to Haskell, even though Haskell is not an object-oriented language.

First, we can translate inheritance to Haskell by either (A) using composition instead of inheritance, like this:

struct Derivation
{
    BasicDerivation basicDrv;

    DerivationInputs inputDrvs; /* inputs that are sub-derivations */

    /* Print a derivation. */
    std::string unparse() const;
};

... or (B) flattening the class hierarchy by combining both classes into an equivalent single-class definition, like this:

struct Derivation
{
    DerivationOutputs outputs; /* keyed on symbolic IDs */
    PathSet inputSrcs; /* inputs that are sources */
    string platform;
    Path builder;
    Strings args;
    StringPairs env;
    DerivationInputs inputDrvs; /* inputs that are sub-derivations */

    virtual ~Derivation() { };

    /* Return the path corresponding to the output identifier `id' in
       the given derivation. */
    Path findOutput(const string & id) const;

    bool willBuildLocally() const;

    bool substitutesAllowed() const;

    bool isBuiltin() const;

    bool canBuildLocally() const;

    /* Return true iff this is a fixed-output derivation. */
    bool isFixedOutput() const;

    /* Return the output paths of a derivation. */
    PathSet outputPaths() const;

    /* Print a derivation. */
    std::string unparse() const;
};

This post will flatten the class hierarchy for simplicity, but in general composition is the more flexible approach for translating inheritance to a functional style.

Second, we separate out all methods into standalone functions that take the an object of that class as their first argument:

struct Derivation
{
    DerivationOutputs outputs; /* keyed on symbolic IDs */
    PathSet inputSrcs; /* inputs that are sources */
    string platform;
    Path builder;
    Strings args;
    StringPairs env;
    DerivationInputs inputDrvs; /* inputs that are sub-derivations */

    virtual ~Derivation() { };
};

/* Return the path corresponding to the output identifier `id' in
   the given derivation. */
Path findOutput(Derivation drv, const string & id) const;

bool willBuildLocally(Derivation drv) const;

bool substitutesAllowed(Derivation drv) const;

bool isBuiltin(Derivation drv) const;

bool canBuildLocally(Derivation drv) const;

/* Return true iff this is a fixed-output derivation. */
bool isFixedOutput(Derivation drv) const;

/* Return the output paths of a derivation. */
PathSet outputPaths(Derivation drv) const;

/* Print a derivation. */
std::string unparse(Derivation drv) const;

This is how people used to encode object-oriented programming before there was such a thing as object-oriented programming and this pattern is common in functional languages. The disadvantage is that this leads to an import-heavy programming style.

We can now translate this C++ to Haskell now that we've reduced the code to simple data types and functions on those types:

data Derivation = Derivation
    { outputs   :: DerivationOutputs
    -- ^ keyed on symbolic IDs
    , inputSrcs :: PathSet
    -- ^ inputs that are sources
    , platform  :: String
    , builder   :: String
    , args      :: Strings
    , env       :: StringPairs
    , inputDrvs :: DerivationInputs
    }

-- | Return the path corresponding to the output identifier `id' in
-- the given derivation.
findOutput :: Derivation -> String -> Path

willBuildLocally :: Derivation -> Bool

substitutesAllowed :: Derivation -> Bool

isBuiltin :: Derivation -> Bool

canBuildLocally :: Derivation -> Bool

-- | Return true iff this is a fixed-output derivation.
isFixedOutput :: Derivation -> Bool

-- | Return the output paths of a derivation.
outputPaths :: Derivation -> PathSet

-- | Print a derivation.
unparse :: Derivation -> String

Since this post is all about parsing we won't be defining or using any of these methods, so we'll throw them away for now and stick to the datatype definition:

data Derivation = Derivation
    { outputs   :: DerivationOutputs -- ^ keyed on symbolic IDs
    , inputSrcs :: PathSet           -- ^ inputs that are sources
    , platform  :: String
    , builder   :: String
    , args      :: Strings
    , env       :: StringPairs
    , inputDrvs :: DerivationInputs
    }

This isn't valid Haskell code, yet, because we haven't defined any of these other types, like DerivationOutputs or PathSet. We'll need to translate their respective C++ definitions to Haskell, too.

The DerivationOutput class resides in the same file:

struct DerivationOutput
{
    Path path;
    string hashAlgo; /* hash used for expected hash computation */
    string hash; /* expected hash, may be null */
    DerivationOutput()
    {
    }
    DerivationOutput(Path path, string hashAlgo, string hash)
    {
        this->path = path;
        this->hashAlgo = hashAlgo;
        this->hash = hash;
    }
    void parseHashInfo(bool & recursive, Hash & hash) const;
};

When we strip the methods, that translates to Haskell as:

data DerivationOutput = DerivationOutput
    { path     :: Path
    , hashAlgo :: String  -- ^ hash used for expected hash computation
    , hash     :: String  -- ^ expected hash, may be null
    }

All of the other C++ types are typedefs which reside in either the same file or in Nix's types.hh file:

I'll consolidate all the relevant typedefs here:

typedef string                             Path;
typedef set<Path>                          PathSet;
typedef list<string>                       Strings;
typedef set<string>                        StringSet;
typedef std::map<string, string>           StringPairs;
typedef std::map<Path, StringSet>          DerivationInputs;
typedef std::map<string, DerivationOutput> DerivationOutputs;

The Haskell analog of a C++ typedef is a type synonym, and the above C++ typedefs translate to the following type synonyms:

import Data.Map (Map)
import Data.Set (Set)

type Path              = String
type PathSet           = Set Path
type Strings           = [String]  -- [a] is Haskell syntax for "list of `a`s"
type StringSet         = Set String
type StringPairs       = Map String String
type DerivationInputs  = Map Path StringSet
type DerivationOutputs = Map String DerivationOutput

Note that Haskell type synonyms reverse the order of the types compared to C++. The new type that you define goes on the left and the body of the definition goes on the right. Haskell's order makes more sense to me, since I'm used to the same order when defining values like x = 5.

There are a few more changes that I'd like to make before we proceed to the parsing code:

First, Haskell's String type and default list type are inefficient for both performance and space utilization, so we will replace them with Text and Vector, respectively. The latter types are more compact and provide better performance:

import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)

type Path              = Text
type PathSet           = Set Path
type Strings           = Vector Text
type StringSet         = Set Text
type StringPairs       = Map Text Text
type DerivationInputs  = Map Path StringSet
type DerivationOutputs = Map Text DerivationOutput

Second, I prefer to use a separate type for Paths that is not synonymous with Text in order to avoid accidentally conflating the two:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
-- The Prelude `FilePath` is a synonym for `String`
import Prelude hiding (FilePath)

type Path              = FilePath
type PathSet           = Set Path
type Strings           = Vector Text
type StringSet         = Set Text
type StringPairs       = Map Text Text
type DerivationInputs  = Map Path StringSet
type DerivationOutputs = Map Text DerivationOutput

Third, I prefer to avoid use type synonyms since I believe they make Haskell code harder to read. Instead, I fully inline all types, like this:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Prelude hiding (FilePath)

data Derivation = Derivation
    { outputs   :: Map Text DerivationOutput  -- ^ keyed on symbolic IDs
    , inputSrcs :: Set FilePath               -- ^ inputs that are sources
    , platform  :: Text
    , builder   :: Text
    , args      :: Vector Text
    , env       :: Map Text Text
    , inputDrvs :: Map FilePath (Set Text)
    }

data DerivationOutput = DerivationOutput
    { path     :: FilePath
    , hashAlgo :: Text    -- ^ hash used for expected hash computation
    , hash     :: Text    -- ^ expected hash, may be null
    }

Fourth, Haskell lets you auto-generate code to render the data type, which is useful for debugging purposes. All you have to do is add deriving (Show) to the end of the datatype definition, like this:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Prelude hiding (FilePath)

data Derivation = Derivation
    { outputs   :: Map Text DerivationOutput  -- ^ keyed on symbolic IDs
    , inputSrcs :: Set FilePath               -- ^ inputs that are sources
    , platform  :: Text
    , builder   :: Text
    , args      :: Vector Text
    , env       :: Map Text Text
    , inputDrvs :: Map FilePath (Set Text)
    } deriving (Show)

data DerivationOutput = DerivationOutput
    { path     :: FilePath
    , hashAlgo :: Text    -- ^ hash used for expected hash computation
    , hash     :: Text    -- ^ expected hash, may be null
    } deriving (Show)

Finally, we'll change the order of the Derivation fields to match the order that they are stored when serialized to disk:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Prelude hiding (FilePath)

data Derivation = Derivation
    { outputs   :: Map Text DerivationOutput  -- ^ keyed on symbolic IDs
    , inputDrvs :: Map FilePath (Set Text)
    , inputSrcs :: Set FilePath               -- ^ inputs that are sources
    , platform  :: Text
    , builder   :: Text
    , args      :: Vector Text
    , env       :: Map Text Text
    } deriving (Show)

data DerivationOutput = DerivationOutput
    { path     :: FilePath
    , hashAlgo :: Text    -- ^ hash used for expected hash computation
    , hash     :: Text    -- ^ expected hash, may be null
    } deriving (Show)

Derivation format

Nix stores derivations as *.drv files underneath the /nix/store directory. For example, here is what one such file looks like:

$ cat /nix/store/zzhs4fb83x5ygvjqn5rdpmpnishpdgy6-perl-MIME-Types-2.13.drv
Derive([("devdoc","/nix/store/15x9ii8c3n5wb5lg80cm8x0yk6zy7rha-perl-MIME-Types-2
.13-devdoc","",""),("out","/nix/store/93d75ghjyibmbxgfzwhh4b5zwsxzs44w-perl-MIME
-Types-2.13","","")],[("/nix/store/57h2hjsdkdiwbzilcjqkn46138n1xb4a-perl-5.22.3.
drv",["out"]),("/nix/store/cvdbbvnvg131bz9bwyyk97jpq1crclqr-MIME-Types-2.13.tar.
gz.drv",["out"]),("/nix/store/p5g31bc5x92awghx9dlm065d7j773l0r-stdenv.drv",["out
"]),("/nix/store/x50y5qihwsn0lfjhrf1s81b5hgb9w632-bash-4.4-p5.drv",["out"])],["/
nix/store/cdips4lakfk1qbf1x68fq18wnn3r5r14-builder.sh"],"x86_64-linux","/nix/sto
re/fi3mbd2ml4pbgzyasrlnp0wyy6qi48fh-bash-4.4-p5/bin/bash",["-e","/nix/store/cdip
s4lakfk1qbf1x68fq18wnn3r5r14-builder.sh"],[("AUTOMATED_TESTING","1"),("PERL_AUTO
INSTALL","--skipdeps"),("buildInputs",""),("builder","/nix/store/fi3mbd2ml4pbgzy
asrlnp0wyy6qi48fh-bash-4.4-p5/bin/bash"),("checkTarget","test"),("devdoc","/nix/
store/15x9ii8c3n5wb5lg80cm8x0yk6zy7rha-perl-MIME-Types-2.13-devdoc"),("doCheck",
"1"),("installTargets","pure_install"),("name","perl-MIME-Types-2.13"),("nativeB
uildInputs","/nix/store/nsa311yg8h93wfaacjk16c96a98bs09f-perl-5.22.3"),("out","/
nix/store/93d75ghjyibmbxgfzwhh4b5zwsxzs44w-perl-MIME-Types-2.13"),("outputs","ou
t devdoc"),("propagatedBuildInputs",""),("propagatedNativeBuildInputs",""),("src
","/nix/store/5smhymz7viq8p47mc3jgyvqd003ab732-MIME-Types-2.13.tar.gz"),("stdenv
","/nix/store/s3rlr45jzlzx0d6k2azlpxa5zwzr7xyy-stdenv"),("system","x86_64-linux"
)])

This corresponds to the following Haskell value using the types we just defined:

Derivation
  { outputs =
      Data.Map.fromList
        [ ( "devdoc"
          , DerivationOutput
              { path =
                  "/nix/store/15x9ii8c3n5wb5lg80cm8x0yk6zy7rha-perl-MIME-Types-2.13-devdoc"
              , hashAlgo = ""
              , hash = ""
              }
          )
        , ( "out"
          , DerivationOutput
              { path =
                  "/nix/store/93d75ghjyibmbxgfzwhh4b5zwsxzs44w-perl-MIME-Types-2.13"
              , hashAlgo = ""
              , hash = ""
              }
          )
        ]
  , inputDrvs =
      Data.Map.fromList
        [ ( "/nix/store/57h2hjsdkdiwbzilcjqkn46138n1xb4a-perl-5.22.3.drv"
          , Data.Set.fromList [ "out" ]
          )
        , ( "/nix/store/cvdbbvnvg131bz9bwyyk97jpq1crclqr-MIME-Types-2.13.tar.gz.drv"
          , Data.Set.fromList [ "out" ]
          )
        , ( "/nix/store/p5g31bc5x92awghx9dlm065d7j773l0r-stdenv.drv"
          , Data.Set.fromList [ "out" ]
          )
        , ( "/nix/store/x50y5qihwsn0lfjhrf1s81b5hgb9w632-bash-4.4-p5.drv"
          , Data.Set.fromList [ "out" ]
          )
        ]
  , inputSrcs =
      Data.Map.fromList
        [ "/nix/store/cdips4lakfk1qbf1x68fq18wnn3r5r14-builder.sh"
        ]
  , platform = "x86_64-linux"
  , builder =
      "/nix/store/fi3mbd2ml4pbgzyasrlnp0wyy6qi48fh-bash-4.4-p5/bin/bash"
  , args =
      [ "-e" , "/nix/store/cdips4lakfk1qbf1x68fq18wnn3r5r14-builder.sh" ]
  , env =
      Data.Map.fromList
        [ ( "AUTOMATED_TESTING" , "1" )
        , ( "PERL_AUTOINSTALL" , "--skipdeps" )
        , ( "buildInputs" , "" )
        , ( "builder"
          , "/nix/store/fi3mbd2ml4pbgzyasrlnp0wyy6qi48fh-bash-4.4-p5/bin/bash"
          )
        , ( "checkTarget" , "test" )
        , ( "devdoc"
          , "/nix/store/15x9ii8c3n5wb5lg80cm8x0yk6zy7rha-perl-MIME-Types-2.13-devdoc"
          )
        , ( "doCheck" , "1" )
        , ( "installTargets" , "pure_install" )
        , ( "name" , "perl-MIME-Types-2.13" )
        , ( "nativeBuildInputs"
          , "/nix/store/nsa311yg8h93wfaacjk16c96a98bs09f-perl-5.22.3"
          )
        , ( "out"
          , "/nix/store/93d75ghjyibmbxgfzwhh4b5zwsxzs44w-perl-MIME-Types-2.13"
          )
        , ( "outputs" , "out devdoc" )
        , ( "propagatedBuildInputs" , "" )
        , ( "propagatedNativeBuildInputs" , "" )
        , ( "src"
          , "/nix/store/5smhymz7viq8p47mc3jgyvqd003ab732-MIME-Types-2.13.tar.gz"
          )
        , ( "stdenv"
          , "/nix/store/s3rlr45jzlzx0d6k2azlpxa5zwzr7xyy-stdenv"
          )
        , ( "system" , "x86_64-linux" )
        ]
  }

We can express the serialization format using the following Extended Backus-Naur Form:

Derivation
    = 'Derive('
    , outputs
    , ','
    , inputDrvs
    , ','
    , inputSrcs
    , ','
    , platform
    , ','
    , builder
    , ','
    , args
    , ','
    , env
    , ')'

outputs = '[]' | '[', output, { ',', output }, ']'

output = '(', string, ',', path, ',', string, ',', string, ')'

inputDrvs = '[]' | '[', inputDrv, { ',', inputDrv }, ']'

inputDrv = '(', path, ',' strings, ')'

inputSrcs = paths

platform = string

builder = string

args = strings

env = '[]' | '[', stringPair, { ',', stringPair }, ']'

stringPair = '(', string, ',' string, ')'

strings = '[]' | '[', string, { ',', string }, ']'

paths = '[]' | '[', path, { ',', path }, ']'

string = '"', { char }, '"'

path = '"/', { char }, '"'

char = ( '\', <any character> ) | <character other than '"' or '\'>

Now we just need a way to convert from Nix's serialization format to the Derivation type.

Parsing derivations

You can find Nix's parseDerivation function here:

... which is what we will translate to Haskell. If you would like to follow along you can find the completed parser code in Appendix A.

Let's start from the top:

static Derivation parseDerivation(const string & s)
{
    Derivation drv;
    istringstream_nocopy str(s);
    expect(str, "Derive([");

    /* Parse the list of outputs. */
    while (!endOfList(str)) {
        DerivationOutput out;
        expect(str, "("); string id = parseString(str);
        expect(str, ","); out.path = parsePath(str);
        expect(str, ","); out.hashAlgo = parseString(str);
        expect(str, ","); out.hash = parseString(str);
        expect(str, ")");
        drv.outputs[id] = out;
    }

    ...
}

static bool endOfList(std::istream & str)
{
    if (str.peek() == ',') {
        str.get();
        return false;
    }
    if (str.peek() == ']') {
        str.get();
        return true;
    }
    return false;
}

The first thing that the C++ parses is the string "Derive(" followed by a list of DerivationOutputs. The code consolidates the first '[' character of the list with the string "Derive(" which is why the code actually matches "Derive([".

This code corresponds to the outputs field of our Derivation type:

data Derivation = Derivation
    { outputs   :: Map Text DerivationOutput  -- ^ keyed on symbolic IDs
    ...
    } deriving (Show)

data DerivationOutput = DerivationOutput
    { path     :: FilePath
    , hashAlgo :: Text    -- ^ hash used for expected hash computation
    , hash     :: Text    -- ^ expected hash, may be null
    } deriving (Show)

Derivation files store the outputs field of our Derivation type as a list of 4-tuples. The first field of each 4-tuple is a key in our Map and the remaining three fields are the corresponding value, which is marshalled into a DerivationOutput.

The C++ code interleaves the logic for parsing the list structure and parsing each element but our Haskell code will separate the two for clarity:

{-# LANGUAGE RecordWildCards   #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Text.Lazy (Parser)
import Data.Map (Map)

parseDerivation :: Parser Derivation
parseDerivation = do
    "Derive("

    let keyValue0 :: Parser (Text, DerivationOutput)
        keyValue0 = do
            "("
            key <- string
            ","
            path <- filepath
            ","
            hashAlgo <- string
            ","
            hash <- string
            ")"
            return (key, DerivationOutput {..})

    outputs <- mapOf keyValue0

    ...

-- We will fill these in later

mapOf :: Ord k => Parser (k, v) -> Parser (Map k v)
mapOf = ???

string :: Parser Text
string = ???

filepath :: Parser FilePath
filepath = ???

You can read the Haskell code as saying:

  • First match the string "Derive("
  • Now define a parser for a key-value pair called keyValue0, which will:
    • Match the string "("
    • Parse a string and stores result as a value named key
    • Match the string ","
    • Parse a path and stores result as a value named path
    • Match the string ","
    • Parse a string and stores result as a value named hashAlgo
    • Match the string ","
    • Parse a string and stores result as a value named hash
    • Match the string ")"
    • Returns a key-value pair:
      • The key is key
      • The value is a DerivationOutput built from path/hashAlgo/hash
        • The {..} populates record fields with values of the same name
  • Use the mapOf utility to parse a list of key-value pairs as a Map

Also, the OverloadedStrings extension is the reason we can use naked string literals as parsers that match the given literal.

If we really wanted to be like the C++ code we could put more than one statement on each line using semicolons, like this:

    let keyValue0 :: Parser (Text, DerivationOutput)
        keyValue0 = do
            "("; key <- string
            ","; path <- filepath
            ","; hashAlgo <- string
            ","; hash <- string
            ")"
            return (key, DerivationOutput {..})

... but I prefer to keep them on separate lines for readability.

The code has placeholders for three utilities we haven't defined yet with the following types:

-- This is a utility function that transforms a parser of key-value pairs into a
-- parser for a `Map`
mapOf
    :: Ord k
    -- ^ This is a "constraint" and not a function argument.  This constraint
    -- says that `k` can be any type as long as we can compare two values of
    -- type `k`
    => Parser (k, v)
    -- ^ This is the actual first function argument: a parser of key-value
    -- pairs.  The type of the key (which we denote as `k`) can be any type as
    -- long as `k` is comparable (due to the `Ord k` constraint immediately
    -- preceding this).  The type of the value (which we dnote as `v`) can be
    -- any type
    -> Parser (Map k v)
    -- ^ This is the function output: a parser of a `Map k v` (i.e. a map from
    -- keys of type `k` to values of type `v`)

--  This is a utility which parses a string literal according to the EBNF rule
-- named `string`
string :: Parser Text

--  This is a utility which parses a string literal according to the EBNF rule
-- named `path`
filepath :: Parser FilePath

mapOf is fairly simple to define:

import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Map

mapOf :: Ord k => Parser (k, v) -> Parser (Map k v)
mapOf keyValue = do
    keyValues <- listOf keyValue
    return (Data.Map.fromList keyValues)

-- | Given a parser for an element, return a parser for a list of elements
listOf :: Parser a -> Parser [a]
listOf element = do
    "["
    es <- Data.Attoparsec.Text.Lazy.sepBy element ","
    "]"
    return es

mapOf use a helper function named listOf which parses a list of values. This parser takes advantage of the handy sepBy utility (short for "separated by") provided by Haskell's attoparsec library. You can read the implementation of listBy as saying:

  • Match the string "["
  • Match 0 or more elements separated by commas
  • Match the string "]"

Then you can read the implementation of mapOf as saying:

  • Parse a list of keyValue pairs
  • Use Data.Map.fromList to transform that into the corresponding Map

We can now use mapOf and listOf to transform the next block of parsing code, too:

static Derivation parseDerivation(const string & s)
{
    ...

    /* Parse the list of input derivations. */
    expect(str, ",[");
    while (!endOfList(str)) {
        expect(str, "(");
        Path drvPath = parsePath(str);
        expect(str, ",[");
        drv.inputDrvs[drvPath] = parseStrings(str, false);
        expect(str, ")");
    }

    ...
}

static StringSet parseStrings(std::istream & str, bool arePaths)
{
    StringSet res;
    while (!endOfList(str))
        res.insert(arePaths ? parsePath(str) : parseString(str));
    return res;
}

The corresponding Haskell code is:

import qualified Data.Set

parseDerivation :: Parser Derivation
parseDerivation = do
    ...

    ","
    let keyValue1 = do
            "("
            key <- filepath
            ","
            value <- setOf string
            ")"
            return (key, value)
    inputDrvs <- mapOf keyValue1

    ...

setOf :: Ord a => Parser a -> Parser (Set a)
setOf element = do
    es <- listOf element
    return (Data.Set.fromList es)

The only difference is that the Haskell code doesn't define a parser for a set of strings. Instead, the Haskell code defines a more general parser for a set of any type of value.

The remaining parsing logic is fairly straightforward to translate. This C++ code:

static Derivation parseDerivation(const string & s)
{
    ...

    expect(str, ",["); drv.inputSrcs = parseStrings(str, true);
    expect(str, ","); drv.platform = parseString(str);
    expect(str, ","); drv.builder = parseString(str);

    /* Parse the builder arguments. */
    expect(str, ",[");
    while (!endOfList(str))
        drv.args.push_back(parseString(str));

    /* Parse the environment variables. */
    expect(str, ",[");
    while (!endOfList(str)) {
        expect(str, "("); string name = parseString(str);
        expect(str, ","); string value = parseString(str);
        expect(str, ")");
        drv.env[name] = value;
    }

    expect(str, ")");

    ...
}

... becomes this Haskell code:

import qualified Data.Vector

parseDerivation :: Parser Derivation
parseDerivation = do
    ...

    ","
    inputSrcs <- setOf filepath

    ","
    platform <- string

    ","
    builder <- string

    ","
    args <- vectorOf string

    ","
    let keyValue2 = do
            "("
            key <- string
            ","
            value <- string
            ")"
            return (key, value)
    env <- mapOf keyValue2

    ")"

    ...

vectorOf :: Parser a -> Parser (Vector a)
vectorOf element = do
    es <- listOf element
    return (Data.Vector.fromList es)

The only thing missing is to translate the C++ code for parsing strings and paths to Haskell. The original C++ code is:

/* Read a C-style string from stream `str'. */
static string parseString(std::istream & str)
{
    string res;
    expect(str, "\"");
    int c;
    while ((c = str.get()) != '"')
        if (c == '\\') {
            c = str.get();
            if (c == 'n') res += '\n';
            else if (c == 'r') res += '\r';
            else if (c == 't') res += '\t';
            else res += c;
        }
        else res += c;
    return res;
}

However, we won't naively translate that to Haskell because this is on our parser's critical path for performance. Haskell's attoparsec library only guarantees good performance if you use bulk parsing primitives when possible instead of character-at-a-time parsing loops.

Our Haskell string literal parser will be a loop, but each iteration of the loop will parse a string block instead of a single character:

import qualified Data.Text.Lazy

string :: Parser Text
string = do
    "\""
    let predicate c = not (c == '"' || c == '\\')
    let loop = do
            text0 <- Data.Attoparsec.Text.Lazy.takeWhile predicate
            char0 <- Data.Attoparsec.Text.Lazy.anyChar
            text2 <- case char0 of
                '"'  -> return ""
                _    -> do
                    char1 <- Data.Attoparsec.Text.Lazy.anyChar
                    char2 <- case char1 of
                        'n' -> return '\n'
                        'r' -> return '\r'
                        't' -> return '\t'
                        _   -> return char1
                    text1 <- loop
                    return (Data.Text.Lazy.cons char2 text1)
            return (Data.Text.Lazy.toStrict text0 <> text2)
    loop

In Haskell, loops become recursive definitions such as the above loop. You can read the above parser as saying:

  • Match a double quote character: "\""
  • Now, define a function named predicate
    • predicate takes a single character c as input
    • predicate returns True if c is neither a quote nor a backslash
  • Now define a loop named loop, which will:
    • Consume consecutive characters up to first quote or backslash (text0)
    • Consume the next character (char0) and branch on its value:
      • If char0 is a double quote, then text2 is the empty string
      • If char0 is a backslash then:
        • Consume the next character (char1) and branch on its value:
          • If char1 is n/r/t then char2 is the matching escape code
          • Otherwise, char2 is just char1
        • Run loop again to parse the rest of the string (text1)
        • text2 is char2 prepended onto text1
    • Return a lazy text0 concatenated with text2
      • Concatenation is more efficient for lazy Text than strict Text
  • Run our recursive loop and store the result as text
  • Transform our lazy Text back into a strict Text result

Once we have a string parser we can then implement the filepath parser. The C++ version is:

static Path parsePath(std::istream & str)
{
    string s = parseString(str);
    if (s.size() == 0 || s[0] != '/')
        throw FormatError(format("bad path ‘%1%’ in derivation") % s);
    return s;
}

The corresponding Haskell code is:

filepath :: Parser FilePath
filepath = do
    text <- string
    case Data.Text.uncons text of
        Just ('/', _) -> do
            return (Filesystem.Path.CurrentOS.fromText text)
        _ -> do
            fail ("bad path ‘" <> Data.Text.unpack text <> "’ in derivation")

You can read that as saying:

  • Parse a string and store the result as a value named text
  • Inspect the beginning of the string and branch on the result:
    • If the beginning of the string is '/', then convert the string to a path
    • If the string is empty or does not begin with '/', then die

If we wanted to more closely match the C++ version, we could have done something like this:

import Prelude hiding (FilePath)
import Filesystem.Path.CurrentOS (FilePath)

import qualified Data.Text
import qualified Filesystem.Path.CurrentOS

filepath :: Parser FilePath
filepath = do
    text <- string
    case Data.Text.uncons text of
        Just ('/', _) -> do
            return ()
        _ -> do
            fail ("bad path ‘" <> Data.Text.unpack text <> "’ in derivation")
    return (Filesystem.Path.CurrentOS.fromText text)

The reason this works is because Haskell's return is not the same as return in C/C++/Java because the Haskell return does not exit from the surrounding subroutine. Indeed, there is no such thing as a "surrounding subroutine" in Haskell and that's a good thing!

In this context the return function is like a Parser that does not parse anything and returns any value that you want. More generally, return is used to denote a subroutine that does nothing and produces a value that can be stored just like any other command.

Benchmarks

Let's test the performance of our parser on a sample derivation file:

import Criterion (Benchmark)

import qualified Criterion
import qualified Criterion.Main
import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Text.Lazy.IO
import qualified Nix.Derivation

main :: IO ()
main = Criterion.Main.defaultMain benchmarks

benchmarks :: [Benchmark]
benchmarks =
    [ Criterion.Main.env
        (Data.Text.Lazy.IO.readFile "/nix/store/zx3rshaya690y0xlc64jb8i12ljr8nyp-ghc-8.0.2-with-packages.drv")
        bench0
    ]
  where
    bench0 example =
        Criterion.bench "example" (Criterion.nf parseExample example)

    parseExample =
        Data.Attoparsec.Text.Lazy.parse Nix.Derivation.parseDerivation

... where /nix/store/zx3rshaya690y0xlc64jb8i12ljr8nyp-ghc-8.0.2-with-packages.drv is a 15 KB file that you can find in Appendix B of this post. This benchmark gives the following results:

Running 1 benchmarks...
Benchmark benchmark: RUNNING...
benchmarking example
time                 3.230 ms   (3.215 ms .. 3.244 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.265 ms   (3.251 ms .. 3.285 ms)
std dev              54.87 μs   (41.41 μs .. 74.99 μs)

Benchmark benchmark: FINISH

Our derivation file is 15,210 characters long, so that comes out to about 200 nanoseconds per character to parse. That's not bad, but could still use improvement. However, I stopped optimizing at this point because I did some experiments that showed that parsing was no longer the bottleneck for even a trivial program.

I compared the performance of an executable written in Haskell to the nix-store executable (written in C++) to see how fast each one could display the outputs of a list of derivations. I ran them on 169 derivations all beginning with the letter z in their hash:

$ ls -d /nix/store/z*.drv | wc -l
169

The nix-store command lets you do this with nix-store --query --outputs:

$ nix-store --query --outputs /nix/store/z*.drv
/nix/store/qq46wcgwk7lh7v5hvlsbr3gi30wh7a81-ansi-wl-pprint-0.6.7.3
/nix/store/sn0v9rkg0q5pdhm6246c7sigrih22k9h-tagged-0.8.5
/nix/store/zsryzwadshszfnkm740b2412v88iqgi4-semigroups-0.18.2
/nix/store/mxl1p0033xf8yd6r5i6h3jraz40akqyb-perl-DBIx-Class-0.082840-devdoc
...

I compared that to the following Haskell program, which parses a list of paths from the command line and then displays their outputs:

{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Text.Lazy (Result(..))

import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Text.Lazy.IO
import qualified Nix.Derivation
import qualified Options.Generic

main :: IO ()
main = do
    paths <- Options.Generic.getRecord "Get the outputs of a Nix derivation"
    mapM_ process (paths :: [FilePath])

process :: FilePath -> IO ()
process path = do
    text <- Data.Text.Lazy.IO.readFile path
    case Data.Attoparsec.Text.Lazy.parse Nix.Derivation.parseDerivation text of
        Fail _ _ string   -> fail string
        Done _ derivation -> do
            let printOutput output = print (Nix.Derivation.path output)
            mapM_ printOutput (Nix.Derivation.outputs derivation)

... which gives this output:

$ query-outputs /nix/store/z*.drv
FilePath "/nix/store/qq46wcgwk7lh7v5hvlsbr3gi30wh7a81-ansi-wl-pprint-0.6.7.3"
FilePath "/nix/store/sn0v9rkg0q5pdhm6246c7sigrih22k9h-tagged-0.8.5"
FilePath "/nix/store/zsryzwadshszfnkm740b2412v88iqgi4-semigroups-0.18.2"
FilePath "/nix/store/mxl1p0033xf8yd6r5i6h3jraz40akqyb-perl-DBIx-Class-0.082840-devdoc"
...

I benchmarked both of these executables using my bench utility. Benchmarks show that both executables take the same amount of time to process all 169 derivation files:

$ bench 'nix-store --query /nix/store/z*.drv'
benchmarking nix-store --query /nix/store/z*.drv
time                 84.19 ms   (83.16 ms .. 85.40 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 84.33 ms   (83.92 ms .. 84.84 ms)
std dev              781.0 μs   (581.5 μs .. 1.008 ms)

$ bench 'query-outputs /nix/store/z*.drv'
benchmarking query-outputs /nix/store/z*.drv
time                 83.52 ms   (82.88 ms .. 83.85 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 84.12 ms   (83.83 ms .. 84.67 ms)
std dev              606.0 μs   (161.1 μs .. 849.9 μs)

Also, note that 9 milliseconds are due to the overhead of the benchmark tool running a subprocess:

$ bench true
benchmarking true
time                 9.274 ms   (9.161 ms .. 9.348 ms)
                     0.998 R²   (0.995 R² .. 1.000 R²)
mean                 9.324 ms   (9.233 ms .. 9.502 ms)
std dev              333.5 μs   (183.1 μs .. 561.9 μs)
variance introduced by outliers: 15% (moderately inflated)

... so if you factor in that overhead then both tools process derivations at a rate of about 440 microseconds per file. Given that the Haskell executable is exactly as efficient as C++ I figured that there was no point further optimizing the code. The first draft is simple, clear and efficient enough.

Conclusion

Hopefully this helps people see that you can translate C++ parsing code to Haskell. The main difference is that Haskell parsing libraries provide some higher-level abstractions and Haskell programs tend to define loops via recursion instead of iteration.

The Haskell code is simpler than the C++ code and efficient, too! This is why I recommend Haskell to people who want want a high-level programming language without sacrificing performance.

I also released the above Haskell parser as part of the nix-derivation library in case people were interested in using this code. You can find the library on Hackage or on GitHub.

Appendix A: Completed parser

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards   #-}

import Data.Attoparsec.Text.Lazy (Parser)
import Data.Map (Map)
import Data.Monoid ((<>))
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Nix.Derivation.Types (Derivation(..), DerivationOutput(..))
import Prelude hiding (FilePath)
import Filesystem.Path.CurrentOS (FilePath)

import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Map
import qualified Data.Set
import qualified Data.Text
import qualified Data.Text.Lazy
import qualified Data.Vector
import qualified Filesystem.Path.CurrentOS

-- | Parse a derivation
parseDerivation :: Parser Derivation
parseDerivation = do
    "Derive("

    let keyValue0 = do
            "("
            key <- string
            ","
            path <- filepath
            ","
            hashAlgo <- string
            ","
            hash <- string
            ")"
            return (key, DerivationOutput {..})
    outputs <- mapOf keyValue0

    ","

    let keyValue1 = do
            "("
            key <- filepath
            ","
            value <- setOf string
            ")"
            return (key, value)
    inputDrvs <- mapOf keyValue1

    ","

    inputSrcs <- setOf filepath

    ","

    platform <- string

    ","

    builder <- string

    ","

    args <- vectorOf string

    ","

    let keyValue2 = do
            "("
            key <- string
            ","
            value <- string
            ")"
            return (key, value)
    env <- mapOf keyValue2

    ")"

    return (Derivation {..})

string :: Parser Text
string = do
    "\""
    let predicate c = not (c == '"' || c == '\\')
    let loop = do
            text0 <- Data.Attoparsec.Text.Lazy.takeWhile predicate
            char0 <- Data.Attoparsec.Text.Lazy.anyChar
            text2 <- case char0 of
                '"'  -> return ""
                _    -> do
                    char1 <- Data.Attoparsec.Text.Lazy.anyChar
                    char2 <- case char1 of
                        'n' -> return '\n'
                        'r' -> return '\r'
                        't' -> return '\t'
                        _   -> return char1
                    text1 <- loop
                    return (Data.Text.Lazy.cons char2 text1)
            return (Data.Text.Lazy.fromStrict text0 <> text2)
    text <- loop
    return (Data.Text.Lazy.toStrict text)

filepath :: Parser FilePath
filepath = do
    text <- string
    case Data.Text.uncons text of
        Just ('/', _) -> do
            return (Filesystem.Path.CurrentOS.fromText text)
        _ -> do
            fail ("bad path ‘" <> Data.Text.unpack text <> "’ in derivation")

listOf :: Parser a -> Parser [a]
listOf element = do
    "["
    es <- Data.Attoparsec.Text.Lazy.sepBy element ","
    "]"
    return es

setOf :: Ord a => Parser a -> Parser (Set a)
setOf element = do
    es <- listOf element
    return (Data.Set.fromList es)

vectorOf :: Parser a -> Parser (Vector a)
vectorOf element = do
    es <- listOf element
    return (Data.Vector.fromList es)

mapOf :: Ord k => Parser (k, v) -> Parser (Map k v)
mapOf keyValue = do
    keyValues <- listOf keyValue
    return (Data.Map.fromList keyValues)

Appendix B: Example derivation

Derive([("out","/nix/store/w3zbr9zj9mn08hnirn34wsxhry40qi3c-ghc-8.0.2-with-packa
ges","","")],[("/nix/store/0cyv377kjnhjc9j1pb0m530lczqj4ksm-optparse-generic-1.1
.5.drv",["out"]),("/nix/store/0w9vy2hmz50j0yhlbj519hnpjbvqhjrj-cookie-0.4.2.1.dr
v",["out"]),("/nix/store/1b75igh40c9agy3sfyl5n7av4070swvn-old-locale-1.0.0.7.drv
",["out"]),("/nix/store/1g2qxhbpk7qjyz8qbami29bn7qmnmgpk-tagged-0.8.5.drv",["out
"]),("/nix/store/20m5alpbwyvyhh43aq3prw07g48apdnj-parsers-0.12.4.drv",["out"]),(
"/nix/store/2bmxgjskcw4vdmcqrw9pc9yjffsqn3i9-byteable-0.1.1.drv",["out"]),("/nix
/store/3fji5p4x9j0cb3q3lp8amrj0qak9d471-asn1-encoding-0.9.5.drv",["out"]),("/nix
/store/43hyjsydndk7vsdjs94why36s8isn6fw-kan-extensions-5.0.1.drv",["out"]),("/ni
x/store/4hkya8j2isw660pj6b0q3by85q2wz1zw-free-4.12.4.drv",["out"]),("/nix/store/
56l353i7v6i7i5vkk2qx4wi4r6p4xll1-void-0.7.2.drv",["out"]),("/nix/store/5c748d8gm
rmg2gy4792a0kzp5bjw8sgr-cereal-0.5.4.0.drv",["out"]),("/nix/store/5d3v9g9jjqznbp
xrlgvcyvmqqz2ffpgc-fingertree-0.1.1.0.drv",["out"]),("/nix/store/5hx7hjjrwqa4zjd
9ql224aif86ncj764-hook.drv",["out"]),("/nix/store/5rpa05i9i5p3i0a06lhyvgg1nvlwnl
fi-unordered-containers-0.2.8.0.drv",["out"]),("/nix/store/5x6d3f9krpqlmzhmk71qf
7m97g38hba1-base-prelude-1.0.1.1.drv",["out"]),("/nix/store/61fzrmaxsfc9q4qzsdcr
saqgg05hr6xi-bifunctors-5.4.2.drv",["out"]),("/nix/store/6l4s2nlxc9fq8c3y3j2k2c7
af5llx278-hashable-1.2.6.0.drv",["out"]),("/nix/store/6n2kl1fnn66a24ipjm1dxjhhvn
i1404r-mtl-2.2.1.drv",["out"]),("/nix/store/6qggipw2ra59q6333y25gywllbbcx3p5-hou
rglass-0.2.10.drv",["out"]),("/nix/store/7545pmiaccgvkxjfvl9cm0qk7y1x96wi-reflec
tion-2.1.2.drv",["out"]),("/nix/store/75iir4x52007r0fq41kwk5cdfvmi02jp-profuncto
rs-5.2.drv",["out"]),("/nix/store/7ah4kd8kbwsfr350wkr0y4i0h6gm7vc8-base64-bytest
ring-1.0.0.1.drv",["out"]),("/nix/store/7d6yxihb828lgs4199f81k17jh8987z6-lndir-1
.0.3.drv",["out"]),("/nix/store/7f6ddryzkw9jckayqs1gdz18njrqd0fq-random-1.1.drv"
,["out"]),("/nix/store/8p1f0rs49czq74yxlfcimlag9wnbwsc5-http-client-tls-0.3.4.1.
drv",["out"]),("/nix/store/9w2n7jqc9ll78r7xj31ckrqcq6g8g8kf-integer-logarithms-1
.0.1.drv",["out"]),("/nix/store/a2ar311g8chbi4ila55qzi3dfp9g5zr6-blaze-html-0.8.
1.3.drv",["out"]),("/nix/store/ahypsxsxcczsllax40jnccdg5ilps2lq-http-client-0.5.
6.1.drv",["out"]),("/nix/store/as62r0pdaq0q76rxz719xy33vqa7xcal-double-conversio
n-2.0.2.0.drv",["out"]),("/nix/store/b67b65arib97rsl4z5iqz03gf24ymvz5-http-types
-0.9.1.drv",["out"]),("/nix/store/bczn7hbvp39aplp70gvmyijdysvkyspg-primitive-0.6
.1.0.drv",["out"]),("/nix/store/bwf0a834k4jf5ss2ccribn9w7g2r3j3m-stdenv.drv",["o
ut"]),("/nix/store/ckl2x2vkqj82k4b7c5l8p611g6jmfbsz-zlib-0.6.1.2.drv",["out"]),(
"/nix/store/clxg57lhlflbjrk6w3fv51fxjnqkk7q4-transformers-compat-0.5.1.4.drv",["
out"]),("/nix/store/d1n1p6mdabwkgkc7y6151j37c4kqh1a2-exceptions-0.8.3.drv",["out
"]),("/nix/store/dg6n7519y227s9c867wqi2v40cj41zqy-attoparsec-0.13.1.0.drv",["out
"]),("/nix/store/f3l740wl94r84fgsiindy88jppcjya6l-text-format-0.3.1.1.drv",["out
"]),("/nix/store/f67vqhk71lrab7ncx8fz8bj7iggmm66f-cryptonite-0.21.drv",["out"]),
("/nix/store/fdq2dn4gal13xl9jbyk8igvaw5f2x9b5-blaze-builder-0.4.0.2.drv",["out"]
),("/nix/store/fr1acpclaljwizrvic520wdf36kmxjwr-blaze-markup-0.7.1.1.drv",["out"
]),("/nix/store/fyi4gg70v1lgjz03v07flnmjr8x55mqk-async-2.1.1.1.drv",["out"]),("/
nix/store/ginljsxbpxli394mc06gvqkmvddhqwlc-x509-store-1.6.2.drv",["out"]),("/nix
/store/gq055a1910w9q6mbb5kf6p6igzg6b5ai-StateVar-1.1.0.4.drv",["out"]),("/nix/st
ore/hhx5xjb6cm5rdkri763669bf6karrnpn-parsec-3.1.11.drv",["out"]),("/nix/store/ip
7nh1r7mj4qwgra27x8i6nyz6yd1ggd-prelude-extras-0.4.0.3.drv",["out"]),("/nix/store
/iqd84gv7b8dq5kddxyjimaqqlxjpqdzk-vector-0.11.0.0.drv",["out"]),("/nix/store/j24
c6d5zv7nim3rkmzzapk6x61lzgizq-charset-0.3.7.1.drv",["out"]),("/nix/store/j6zji0j
n6cm8b4i0fmakksk1cp54bhn0-asn1-types-0.3.2.drv",["out"]),("/nix/store/l3wmibr3b1
b3a8ql8ypy860209iqbasg-connection-0.2.8.drv",["out"]),("/nix/store/lg64zgciix964
4hzkfc02rfbq4qgcrf8-memory-0.14.3.drv",["out"]),("/nix/store/lnxgjiywc89iaby3g0n
a1sc4hryvnikq-trifecta-1.6.2.1.drv",["out"]),("/nix/store/lvm3zp40qfdqr0v9i27z7d
qpdwlxprbl-text-1.2.2.1.drv",["out"]),("/nix/store/m7l8bg4k82snsl759k2mlkjlb8g03
52a-foundation-0.0.7.drv",["out"]),("/nix/store/mi1fdfdkc5qc7iq2ry6095ayp9cqn075
-x509-system-1.6.4.drv",["out"]),("/nix/store/mpql2q0b6a1m2vkb114f9l2s8dhy09zv-a
sn1-parse-0.9.4.drv",["out"]),("/nix/store/mq338r0an8lj00g88c6rpylbnmds7fbx-adju
nctions-4.3.drv",["out"]),("/nix/store/n4wyn46xw0nw8a3rhqw47xd4h6bgnn5w-lens-4.1
5.1.drv",["out"]),("/nix/store/nv7frilmipcpylijp492l3hc0s2cmgw6-tls-1.3.10.drv",
["out"]),("/nix/store/nwapw7zf014frf49c0b7y5694jyc38hm-streaming-commons-0.1.17.
drv",["out"]),("/nix/store/pcg29qa8fm9niixbjy0r7bbp3s4jxk62-neat-interpolation-0
.3.2.1.drv",["out"]),("/nix/store/pg609c09rfqzyfn8l4hsc1q2xy50w4p8-semigroupoids
-5.1.drv",["out"]),("/nix/store/pra6ynwnksgks1xxv2l7h48swjq4vb2j-data-default-cl
ass-0.1.2.0.drv",["out"]),("/nix/store/pz3s86hbxvwr7m4x7cpz5h8z124wgk4x-x509-1.6
.5.drv",["out"]),("/nix/store/qi0668xlc3q03n74k1wrqri7ss7bvphk-stm-2.4.4.1.drv",
["out"]),("/nix/store/ql8bpbnl7x7ybn3rnsknpkpwvlz7s2nz-distributive-0.5.2.drv",[
"out"]),("/nix/store/qr8wf0b1lqwxwi6ban2k307jy91bj640-reducers-3.12.1.drv",["out
"]),("/nix/store/r44a3jm3q5rhi75rl1m6jr1vgwpiyw02-hnix-0.3.4.drv",["out"]),("/ni
x/store/rqcq6jigs1sj53f8wrbff3s06wzazfqw-comonad-5.0.1.drv",["out"]),("/nix/stor
e/s1ymda8d763cn5gq4cw107h19xs1ddz0-ansi-wl-pprint-0.6.7.3.drv",["out"]),("/nix/s
tore/sdx411558r03fdvfi3p6wzfsi701sv4w-system-fileio-0.3.16.3.drv",["out"]),("/ni
x/store/v0srwl68sz6dirasq53bd3ddjipa1d5b-deriving-compat-0.3.6.drv",["out"]),("/
nix/store/vpqjk2wral953nnqnhvp8zbmkbhnyxls-x509-validation-1.6.5.drv",["out"]),(
"/nix/store/vr8scnq8lxgc0m6k7bqjwi4fg0k55lxn-data-fix-0.0.4.drv",["out"]),("/nix
/store/vwhic7ibwkzqk65mqicb29d5qz06gkns-socks-0.5.5.drv",["out"]),("/nix/store/w
6a3c55nhmpcia6cvdg31nqsc7v910lc-ansi-terminal-0.6.2.3.drv",["out"]),("/nix/store
/wdgbs33iwqadfmlaymw00k6iwnf3as7z-mime-types-0.1.0.7.drv",["out"]),("/nix/store/
wld7wjy6lws02rky68mpg0x591wv0j6v-pem-0.2.2.drv",["out"]),("/nix/store/wx9vx1z55b
zkzym0lzbgpzd7rrsx9w9b-scientific-0.3.4.12.drv",["out"]),("/nix/store/x2dkgpklc1
adq1cgg1k8ykdqv7ghwhzm-system-filepath-0.4.13.4.drv",["out"]),("/nix/store/x50y5
qihwsn0lfjhrf1s81b5hgb9w632-bash-4.4-p5.drv",["out"]),("/nix/store/x8k0rsb1ig82v
dls0dc6jdlny7r04izj-parallel-3.2.1.1.drv",["out"]),("/nix/store/xbygsq84395vhj7b
nh7786i9864jf9i9-ghc-8.0.2.drv",["out"]),("/nix/store/xp7jayhmiphx0zqxx9dxrk673s
hhj89l-optparse-applicative-0.13.2.0.drv",["out"]),("/nix/store/xzda3rxckhf0h3lp
1hr6wanyig9s9y1p-utf8-string-1.0.1.1.drv",["out"]),("/nix/store/y4ll9c29g76jzycl
7zhdmqzxgciyrfr1-case-insensitive-1.2.0.9.drv",["out"]),("/nix/store/y8l0lv08hfi
6qnrzd25dxgi4712yjf9f-base-orphans-0.5.4.drv",["out"]),("/nix/store/z036z61lsrk2
gqbwljix0akzhz2bgl8j-semigroups-0.18.2.drv",["out"]),("/nix/store/z8vpk1rwkikc8p
g20vyg5kvsdv626ksw-dhall-1.3.0.drv",["out"]),("/nix/store/zdx2r8q401h7xcyh7jg0cn
p092iwlhmv-contravariant-1.4.drv",["out"]),("/nix/store/zg5as9jrs5vfa5iw7539vihm
wm436g1q-network-uri-2.6.1.0.drv",["out"]),("/nix/store/zvxd18a65gwcg3bz7v1rb0h5
9w9wwi9d-network-2.6.3.1.drv",["out"])],["/nix/store/9krlzvny65gdc8s7kpb6lkx8cd0
2c25b-default-builder.sh"],"x86_64-linux","/nix/store/fi3mbd2ml4pbgzyasrlnp0wyy6
qi48fh-bash-4.4-p5/bin/bash",["-e","/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-
default-builder.sh"],[("allowSubstitutes",""),("buildCommand","mkdir -p $out\nfo
r i in $paths; do\n  /nix/store/lnai0im3lcpb03arxfi0wx1dm7anf4f8-lndir-1.0.3/bin
/lndir $i $out\ndone\n. /nix/store/plmya6mkfvq658ba7z6j6n36r5pdbxk5-hook/nix-sup
port/setup-hook\n\n# wrap compiler executables with correct env variables\n\nfor
 prg in ghc ghci ghc-8.0.2 ghci-8.0.2; do\n  if [[ -x \"/nix/store/s0hpng652hsn4
0jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg\" ]]; then\n    rm -f $out/bin/$prg\n    
makeWrapper /nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg $out/
bin/$prg                           \\\n      --add-flags '\"-B$NIX_GHC_LIBDIR\"'
                   \\\n      --set \"NIX_GHC\"        \"$out/bin/ghc\"     \\\n 
     --set \"NIX_GHCPKG\"     \"$out/bin/ghc-pkg\" \\\n      --set \"NIX_GHC_DOC
DIR\" \"$out/share/doc/ghc/html\"                  \\\n      --set \"NIX_GHC_LIB
DIR\" \"$out/lib/ghc-8.0.2\"                  \\\n      \n  fi\ndone\n\nfor prg 
in runghc runhaskell; do\n  if [[ -x \"/nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx
9l-ghc-8.0.2/bin/$prg\" ]]; then\n    rm -f $out/bin/$prg\n    makeWrapper /nix/
store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg $out/bin/$prg         
                  \\\n      --add-flags \"-f $out/bin/ghc\"                     
      \\\n      --set \"NIX_GHC\"        \"$out/bin/ghc\"     \\\n      --set \"
NIX_GHCPKG\"     \"$out/bin/ghc-pkg\" \\\n      --set \"NIX_GHC_DOCDIR\" \"$out/
share/doc/ghc/html\"                  \\\n      --set \"NIX_GHC_LIBDIR\" \"$out/
lib/ghc-8.0.2\"\n  fi\ndone\n\nfor prg in ghc-pkg ghc-pkg-8.0.2; do\n  if [[ -x 
\"/nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg\" ]]; then\n   
 rm -f $out/bin/$prg\n    makeWrapper /nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9
l-ghc-8.0.2/bin/$prg $out/bin/$prg --add-flags \"--global-package-db=$out/lib/gh
c-8.0.2/package.conf.d\"\n  fi\ndone\n$out/bin/ghc-pkg recache\n\n$out/bin/ghc-p
kg check\n\n"),("buildInputs",""),("builder","/nix/store/fi3mbd2ml4pbgzyasrlnp0w
yy6qi48fh-bash-4.4-p5/bin/bash"),("extraOutputsToInstall","out doc"),("ignoreCol
lisions",""),("name","ghc-8.0.2-with-packages"),("nativeBuildInputs",""),("out",
"/nix/store/w3zbr9zj9mn08hnirn34wsxhry40qi3c-ghc-8.0.2-with-packages"),("passAsF
ile","buildCommand"),("paths","/nix/store/rlsammwp1ib8d3d9qgbppmdhkbdfg3i9-deriv
ing-compat-0.3.6 /nix/store/v2qsqznrik64f46msahvgg7dmaiag18k-hnix-0.3.4 /nix/sto
re/vbkqj8zdckqqiyjh08ykx75fwc90gwg4-optparse-applicative-0.13.2.0 /nix/store/6m7
qia8q0rkdkzvmiak38kdscf27malf-optparse-generic-1.1.5 /nix/store/r687llig7vn9x15h
hkmfak01ff7082n6-utf8-string-1.0.1.1 /nix/store/j6gvad67dav8fl3vdbqmar84kgmh5gar
-reducers-3.12.1 /nix/store/i8wf08764lknc0f9ja12miqvg509jn1k-fingertree-0.1.1.0 
/nix/store/301hq4fabrpbi3l47n908gvakkzq1s88-blaze-markup-0.7.1.1 /nix/store/055m
hi44s20x5xgxdjr82vmhnyv79pzl-blaze-html-0.8.1.3 /nix/store/vnc1yyig90skcwx3l1xrb
p1jqwmmb9xv-trifecta-1.6.2.1 /nix/store/vraffi24marw5sks8b78xrim6c8i1ng6-double-
conversion-2.0.2.0 /nix/store/kwdk03p0lyk5lyll1fp7a6z20j17b3sx-text-format-0.3.1
.1 /nix/store/zn5hlw3y94sbli4ssygr2w04mpb396zs-system-filepath-0.4.13.4 /nix/sto
re/jn7lbnk0gsirj8kb02an31v8idy7ym3c-system-fileio-0.3.16.3 /nix/store/9frfci9ywf
9lc216ci9nwc1yy0qwrn1b-integer-logarithms-1.0.1 /nix/store/rps46jwa7yyab629p27la
r094gk8dal2-scientific-0.3.4.12 /nix/store/c4a3ynvnv3kdxgd7ngmnjhka4mvfk8ll-atto
parsec-0.13.1.0 /nix/store/kc34l1gpzh65y4gclmv4dgv6agpmagdi-parsers-0.12.4 /nix/
store/1kf78yxf3lliagb5rc5din24iq40g96y-base-prelude-1.0.1.1 /nix/store/hi868d12p
kzcbzyvp7a7cigc58mp2lmg-neat-interpolation-0.3.2.1 /nix/store/h00jrbdvzj4yfy796j
8vq00lkd1gxr6w-primitive-0.6.1.0 /nix/store/vys8qsf317rn8qwy00p80zlywb47lqwz-vec
tor-0.11.0.0 /nix/store/wchch11312m3lxkwl8rad04x02svcs3i-reflection-2.1.2 /nix/s
tore/jj1kfv52mjxp54flz8v5ba64va3hvy22-parallel-3.2.1.1 /nix/store/jwj23y7vfvs14j
drkw1py9q7lm9fyhy4-adjunctions-4.3 /nix/store/px4979la9b98knwv36551zg3p5jb69lw-k
an-extensions-5.0.1 /nix/store/2cp1ar0f73jrcn231ai07zpwayy735j2-semigroupoids-5.
1 /nix/store/3nkxw5wdadckz28laijrvwdkkfqp07sb-profunctors-5.2 /nix/store/bd3njvy
0ahcsqw47vaz5zayhx34hari7-prelude-extras-0.4.0.3 /nix/store/zdp7zqasz1l1wifpngbg
6ngq189gbbqh-free-4.12.4 /nix/store/n7c5ynfqc6j570bbyaajqx34c3pvfvph-tagged-0.8.
5 /nix/store/xdkhd7mkqj2mmcami8ycmf7j0valwp5h-distributive-0.5.2 /nix/store/9dxb
a4g9x0xjj21r3vchqnh4rdwbc31b-void-0.7.2 /nix/store/dahah2ivrn4hc5gjygnlvxlad2399
zqh-StateVar-1.1.0.4 /nix/store/f2rdi1bx46fs165n1j316k5w90ab6lwy-contravariant-1
.4 /nix/store/mgg9rsvhvn4dd4qzv559nn24iqvspjnb-comonad-5.0.1 /nix/store/18n8i570
pf4gpszdyc0bki9qxm1p9xd7-bifunctors-5.4.2 /nix/store/d8ys5wq4wrvdjqw0bzv3y23zqpr
khjs2-base-orphans-0.5.4 /nix/store/j4hbyhnj4a2z4z4vb1437vk7ha0b287a-lens-4.15.1
 /nix/store/ra3jh12mbyz82n4gvj2bam77vl8aabbq-x509-system-1.6.4 /nix/store/ps8915
q1047frp891jg1anp85ads0s9b-x509-validation-1.6.5 /nix/store/5vrgrls6l1cdsbbznis3
9chx8scq2r98-x509-store-1.6.2 /nix/store/7vvg8y8fp0s50qiciq11irfvh31f1q58-pem-0.
2.2 /nix/store/myv75wk9s19f8vms2dcy6sl773288zy4-asn1-parse-0.9.4 /nix/store/kwyc
1jdz09lazw21qpc96wyamxalcg11-x509-1.6.5 /nix/store/gadc7c6d1lqn0wqk29bhn56is67x0
r45-cryptonite-0.21 /nix/store/ix26y5rpidwpgjzrsixz0ff59j1p1swr-foundation-0.0.7
 /nix/store/n784p4qh18zx9v8ag3n3ypszq1kifjjr-memory-0.14.3 /nix/store/h3qq6m5ahd
b4kw784gcvx2skil8ilks8-hourglass-0.2.10 /nix/store/dn65dl65spk4j0sky2zpdig75c42y
cj1-asn1-types-0.3.2 /nix/store/s5jklkk0y6i7d8h3akgsciv1kv2js786-asn1-encoding-0
.9.5 /nix/store/g5qjgns5cyz9c5xw4w5s2iji1kbhg47z-tls-1.3.10 /nix/store/iyllk46by
75f428pwis9v74jpr1rmk4x-cereal-0.5.4.0 /nix/store/b22wyyl3wdl6kb7gkpk3yxnynk340l
ya-socks-0.5.5 /nix/store/05r3i8w2n7hbxqyb4w8rina9rldyacd3-byteable-0.1.1 /nix/s
tore/xjbl6w60czyfqlfwwfs5q93by144yr1n-connection-0.2.8 /nix/store/j10yqzk323rvnw
gsk3nj7rgmvqlv035a-http-client-tls-0.3.4.1 /nix/store/vf84v2398g55mai2gjh2d9gipw
izhhzd-zlib-0.6.1.2 /nix/store/7h7vy3mi603y536dgvxwfglaacxw5ra8-async-2.1.1.1 /n
ix/store/y6hh2ifv35afw1j5phpzp1y72x532izn-streaming-commons-0.1.17 /nix/store/f5
jdarp8djisa1wrv4bv1saimrabcb3f-random-1.1 /nix/store/18vpnmd28bnjib6andw8bx522wc
b3zwa-parsec-3.1.11 /nix/store/i3ra66pcpj0v9wq3m00gh9i72br2bki3-network-uri-2.6.
1.0 /nix/store/2ck9avbwacfpi16p2ib2shw951mx33pz-network-2.6.3.1 /nix/store/rz022
7nv8n8kdrxjg3arya6r2ixxjh4h-mime-types-0.1.0.7 /nix/store/rx71j4kg0l02dginiswnmw
swdq9i9msv-http-types-0.9.1 /nix/store/y2ca4scn0n2f9qsmvsiixcnx11793jlf-transfor
mers-compat-0.5.1.4 /nix/store/bzicr83ibzzzbab6cjkb3i95sc8cvxy9-stm-2.4.4.1 /nix
/store/qk5pl6r2h0vfkhhwjgrv8x1ldf8dyj5a-mtl-2.2.1 /nix/store/0d6k71ljl108dgq1l7l
3pz12bfwv0z4h-exceptions-0.8.3 /nix/store/z5k23ymwjhhpd670a7mcsm1869hlpncf-old-l
ocale-1.0.0.7 /nix/store/k4an783d4j3m48fqhx7gpnizqg2ns38j-data-default-class-0.1
.2.0 /nix/store/p5867jsig02zi0ynww9w4916nm0k527s-cookie-0.4.2.1 /nix/store/wy7j4
2kqlw1sskagmyc1bzb0xv04s2na-case-insensitive-1.2.0.9 /nix/store/j35339b0nk7k3qaq
3m75nl3i4x603rqf-blaze-builder-0.4.0.2 /nix/store/33mip0ql9x1jjbhi34kf8izh4ilyf2
k0-base64-bytestring-1.0.0.1 /nix/store/29a73kd2jkwvfdcrhysmi5xjr7nysrxf-http-cl
ient-0.5.6.1 /nix/store/d2hy666g79qvhmbh520x5jclwvnr1gk2-text-1.2.2.1 /nix/store
/2bdzia66lg08d5zngmllcjry2c08m96j-hashable-1.2.6.0 /nix/store/7kdgc6c0b21s9j5qgg
0s0gxj7iid2wk5-unordered-containers-0.2.8.0 /nix/store/zsryzwadshszfnkm740b2412v
88iqgi4-semigroups-0.18.2 /nix/store/h2c0kz3m83x6fkl2jzkmin8xvkmfgs7s-charset-0.
3.7.1 /nix/store/gapj6j0ya5bi9q9dxspda15k50gx8f1v-ansi-terminal-0.6.2.3 /nix/sto
re/l46769n2p6rlh936zrbwznq3zxxa6mjd-ansi-wl-pprint-0.6.7.3 /nix/store/p7zmpgz0sq
5pamgrf1xvhvidc3m4cfmk-dhall-1.3.0 /nix/store/938ndd0mqfm148367lwhl6pk5smv5bm0-d
ata-fix-0.0.4 /nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2"),("preferLo
calBuild","1"),("propagatedBuildInputs",""),("propagatedNativeBuildInputs",""),(
"stdenv","/nix/store/685n25b9yc8sds57vljk459ldly1xyhn-stdenv"),("system","x86_64
-linux")])

7 comments:

  1. Not sure. Is this a typo? "isBuiltin :: Derivation :: Bool"

    ReplyDelete
  2. I love this article. Awesome.

    btw, s/auto-generated/auto-generate

    ReplyDelete
  3. Thank you! I had fun playing around with this code. On my machine (MacBook Air), I found that I had to switch to `Data.Attoparsec.ByteString.Char8` to get performance on par with `nix-store --query`. It's probably safe as the .drv files here report to be ASCII.

    ReplyDelete
  4. Would it be possible to do the opposite? Haskell Parser to C++ ? Just wondering? This is really awesome article.

    ReplyDelete
    Replies
    1. Possibly, although I don't know for sure since I'm not familiar with available C++ parsing libraries. Specifically, I don't know what C++ options there are for building recursive descent backtracking parsers (which is the usual idiom for Haskell parsers).

      Delete