Thursday, July 19, 2012

First-class modules without defaults

Recently Chris Doner proposed a first-class module approach which uses the Default type-class and then I revised his approach to not use type-classes at all, encoding all the information in dictionaries. I'm using this post to expand upon my variation of Chris's approach and show how one would translate his approach to my approach and explain what I believe are the advantages of this improvement.


Dictionaries


This approach builds off the classic encoding of a type-class as a dictionary. The trick is simple, first you convert a module to an equivalent type-class representing that module's interface, then you convert that type-class to a dictionary.

As an example, I'm going to start from Dan Burton's Modular Prelude and rework his ByteStringModule type to use my improvement and then show how they differ.

Dan's ByteStringModule looks like this:
data ByteStringModule = S
  { map :: (Word8 -> Word8) -> ByteString -> ByteString
  , concatMap :: (Word8 -> ByteString) -> ByteString -> ByteString
  , filter :: (Word8 -> Bool) -> ByteString -> ByteString
  , length :: ByteString -> Int
  , singleton :: Word8 -> ByteString
  , null :: ByteString -> Bool
  , pack :: [Word8] -> ByteString
  , unpack :: ByteString -> [Word8]
  , empty :: ByteString
  , readFile :: FilePath -> IO ByteString
  , writeFile :: FilePath -> ByteString -> IO ()
  , break :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
  , span :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
  , dropWhile :: (Word8 -> Bool) -> ByteString -> ByteString
  , takeWhile :: (Word8 -> Bool) -> ByteString -> ByteString
  , any :: (Word8 -> Bool) -> ByteString -> Bool
  , all :: (Word8 -> Bool) -> ByteString -> Bool
  , splitAt :: Int -> ByteString -> (ByteString, ByteString)
  }


_Data_ByteString_ :: ByteStringModule
_Data_ByteString_ = S
  { null = ...
    ... }

instance Default ByteStringModule where
  def = _Data_ByteString_
To use my approach, you instead parametrize the dictionary on the type of the "string-like" thing:
data StringModule s = String
  { map :: (Word8 -> Word8) -> s -> s
  , concatMap :: (Word8 -> s) -> s -> s
  , filter :: (Word8 -> Bool) -> s -> s
  , length :: s -> Int
  , singleton :: Word8 -> s
  , null :: s -> Bool
  , pack :: [Word8] -> s
  , unpack :: s -> [Word8]
  , empty :: s
  , readFile :: FilePath -> IO s
  , writeFile :: FilePath -> s -> IO ()
  , break :: (Word8 -> Bool) -> s -> (s, s)
  , span :: (Word8 -> Bool) -> s -> (s, s)
  , dropWhile :: (Word8 -> Bool) -> s -> s
  , takeWhile :: (Word8 -> Bool) -> s -> s
  , any :: (Word8 -> Bool) -> s -> Bool
  , all :: (Word8 -> Bool) -> s -> Bool
  , splitAt :: Int -> s -> (s, s)
  }
Using this approach you can encode String, ByteString and Text, all using the same dictionary type:
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import qualified Data.Text as T
import qualified Prelude as P

lazyByteString :: StringModule L.ByteString
lazyByteString = String {
    null = L.null,
    ... }

strictByteString :: StringModule S.ByteString
strictByteString = String {
    null = S.null,
    ... }

text :: StringModule T.Text
text = String {
    null = T.null,
    ... }

string :: StringModule P.String
string = String {
    null = P.null,
    ... }
Here I've parametrized the dictionary on the "string-like" type, so we can reuse the same dictionary type for all of them. This represents the dictionary equivalent of the following type-class:
class StringModule s where
    map :: (Word8 -> Word8) -> s -> s
    concatMap :: (Word8 -> s) -> s -> s
    filter :: (Word8 -> Bool) -> s -> s
    length :: s -> Int
    singleton :: Word8 -> s
    null :: s -> Bool
    pack :: [Word8] -> s
    unpack :: s -> [Word8]
    empty :: s
    readFile :: FilePath -> IO s
    writeFile :: FilePath -> s -> IO ()
    break :: (Word8 -> Bool) -> s -> (s, s)
    span :: (Word8 -> Bool) -> s -> (s, s)
    dropWhile :: (Word8 -> Bool) -> s -> s
    takeWhile :: (Word8 -> Bool) -> s -> s
    any :: (Word8 -> Bool) -> s -> Bool
    all :: (Word8 -> Bool) -> s -> Bool
    splitAt :: Int -> s -> (s, s)    

Comparison


This improvement has several advantages over Chris's original approach:
  • You don't have to define a new data structure for each "instance" of the module.
  • The shared dictionary type guarantees that all "instance"s expose the same signature
  • You can program generically over the StringModule "class"
  • No type-classes are required, so you can define multiple competing "instances" of the same module without conflicts.
  • You can qualify first-class modules instead of completely unpacking them.

Syntax


Now I'll show how you syntactically translate all the features of Chris's modules to my improved version. Let's assume we have some module Data.String, that exports the above four StringModule instances (i.e. lazyByteString, strictByteString, text, and string.

Using Chris's approach, you distinguish which module you wish to unpack into the current scope by choosing which constructor you unpack into:
zot :: (L.ByteString, S.ByteString)
zot = (a,b) where
  a = pack [1,2,3] where L{..} = def
  b = pack [1,2,3] where B{..} = def
Using the improved version, you distinguish which module you wish to unpack by explicitly selecting which dictionary you unpack:
zot = (a,b) where
  a = pack [1,2,3] where String{..} = lazyByteString
  b = pack [1,2,3] where String{..} = strictByteString
Hmmm, those are long names. Wouldn't it be nice if we could somehow alias them?
l = lazyByteString
s = strictByteString

zot = (a,b) where
  a = pack [1,2,3] where String{..} = l
  b = pack [1,2,3] where String{..} = s
Oh yeah! Dictionaries are first-class because they are ordinary Haskell values, so renaming them is easy. We can even locally alias modules, something which the ordinary module system cannot do:
zot = (a,b) where
  l = lazyByteString
  s = strictByteString
  a = pack [1,2,3] where String{..} = l
  b = pack [1,2,3] where String{..} = s
Also, using my improvement we can program generically over the string dictionary type:
zot :: StringModule s -> (s, s)
zot s = (a,b) where
  String{..} = s
  a = pack [1,2,3]
  b = pack [1,2,3]
This is just the dictionary version of a class constraint, where you pass the instance as an ordinary parameter. This is not possible using Chris's approach, which is one reason I am advocating this change.

Another advantage of this improvement is that you don't even need to unpack the module at all. You get qualification for free:
l = lazyByteString
s = strictByteString

zot = (a,b) where
  a = pack l [1,2,3]
  b = pack s [1,2,3]
In fact, if you alias first-class modules to single-letter names (as you might do with ordinary modules), then the syntactic overhead is identical to using ordinary modules: 2 extra characters, except as a suffix instead of a prefix.

Using Chris's approach, you would have to use the ordinary module system to qualify which pack you meant:
zot s = (a, b) where
  a = L.pack def [1,2,3]
  b = S.pack def [1,2,3]
So using his approach there is actually more syntactic overhead for qualifying imports, plus you must rely on the ordinary module system to namespace qualified imports.

Also, using his approach there is no good way to nest a module within a module and still qualify a nested import, since you can't program generically over the outer module. So there would be no good way to do something like this:
data OuterModule a = Out { outVal :: a }
data InnerModule a = In  {  inVal :: a }

dict :: OuterModule (InnerModule String)
dict = Out (In "Hello, world!")

contrived :: String
contrived = inVal (outVal dict) -- nested qualified import


Common features


No matter which approach you like, there are several cool features that both approaches share. For example, you can unpack unqualified names into the top-level global namespace. For Chris's approach you would insert the following top-level declaration:
L {..} = def
... and for my variation you would use:
String {..} = lazyByteString
Also, I will argue (to the death!) that both approaches are superior to type-class-based approaches. While I believe type-classes are okay for theoretically-grounded constructs (like Monad or Category), I believe that the dictionary approach is superior for banal interfaces like ListLike/ListModule or StringLike/StringModule since it is completely first-class.


Conclusions


This post is NOT intended to rip on Chris, but simply to improve on his original proposal. He had the two brilliant ideas of both using "type-classes" as modules and using the RecordWildCards extension to unpack names unqualified. I think the only mistake he made was unnecessarily filtering everything through the `Default` type-class and I only want to say that I think the pure dictionary approach is a strict improvement on his otherwise already brilliant idea.

15 comments:

  1. I'm thinking. A common porpouse of modules is to hide implementation info to the user, then they are not only interfaces.

    For example:

    --------------
    module (Stack,push,pop,emptyS,isEmptyS) where

    data Stack a = MkStack {stack :: [a]} -- we don't export the abstraction function

    push x = MkStack . (x:) . stack
    pop = MkStack . tail . stack
    emptyS = MkStack []
    isEmptyS = null . stack
    --------------

    And the user doesn't know that I implement a Stack using a List. I can impose invariants too, because I know that the user cannot get access to any implementation of functions provided by the interface. I can't do the same with these first-class modules that you are talking about, right? (I don't see how)

    ReplyDelete
    Replies
    1. module Gabriel.Stack (S.Stack, someStackInstance)

      import qualified Sawady.Stack as S

      data StackLike a s = StackLike {
      push :: a -> s -> s,
      pop :: s -> s,
      emptyS :: s,
      isEmptyS :: s -> Bool }

      someStackInstance :: StackLike a (S.Stack a)
      someStackInstance = StackLike {
      push = S.push,
      pop = S.pop,
      emptyS = S.emptyS,
      isEmptyS = S.isEmptyS }

      -- Some other module
      {-# LANGUAGE RecordWildCards #-}

      import Gabriel.Stack

      StackLike{..} = someStackInstance

      Think of these dictionaries as being just like type classes. The above is the dictionary equivalent of a multi-parameter type-class.

      Delete
    2. Oh sorry, I thought that you want to replace the haskell module system too, but I see that It's only for typeclasses.

      One last question though, how can I provide a default implementation for some function of the interface (like typeclasses can)

      Delete
    3. No, it IS a replacement for the Haskell module system. Chris's big insight was that modules and type-classes are basically the same thing.

      For default function implementations, I create a smart constructor where you pass it the minimal subset necessary to define the interface and it fills out the rest.

      Delete
    4. Why do you use "module" and "import" keywords in your example then?

      My point is: I can choose what to export with modules but this don't apply for typeclasses. Are they same thing? Modules don't provide had-oc polymorphism by themselves either.

      Delete
    5. The import was just a cute thing to refer to your pseudocode. You could just as well have put those in the same file as the functions you wrote, assuming you rename your hidden primitives and only export the dictionary.

      So, there is only one thing you cannot conveniently do, which is selectively hide certain members. You can still do something like:

      Stack push pop emptyS _ = someStackInstance

      ... but this would grow cumbersome if it was a big module and you only wanted to hide one member.

      However, selectively importing just a few elements is pretty easy:

      (push, pop) = ((,) <$> push <*> pop) someStackInstance

      Ordinary modules don't provide ad-hoc polymorphism, but modules as dictionaries do (as I illustrated in the post). This is an advantage of first-class modules over ordinary modules. This lets you just import just one module for bytestrings (i.e. Data.ByteString could now export both strict and lazy bytestring implementations), and then you can very easily switch between both implementations using all the tricks I described above. Ordinary modules don't allow you to locally alias modules or unpack names within a local scope, but first-class modules do.

      Delete
  2. What you call "Chris's big insight" is well known Haskell folklore.

    ReplyDelete
    Replies
    1. Oops! :)

      Although, I think the Haskell old guard takes for granted how much the community newcomers (like me) understand. If there's no oral tradition then we are forced to reinvent everything ourselves.

      Delete
  3. This can't really be considered a replacement for Haskell modules when it only encapsulates value components as opposed to the full shebang of values, data types, classes, and instances. How would I define, e.g., a Set type and hide the constructors so that clients are forced to use my provided functions to interact with it?

    What this shows is that Haskell modules could really use a notion of interface -- analogous to your moduley data types -- to allow matching, parameterization, and perhaps even separate compilation.

    ReplyDelete
    Replies
    1. Yeah, several people on reddit pointed this out to me. I agree with your entire comment.

      Delete
    2. By the way, in case you haven't seen it, this paper is highly relevant:

      Mark P. Jones, "Using Parameterized Signatures to Express Modular Structure", POPL 1996.

      Delete
    3. No, I haven't. I will read it. Thanks!

      Delete
    4. Note that you can do what you ask with GADTs or ExistentialQuantification, although there do appear to be limitations (e.g. no defining of classes within a value etc.):

      {-# LANGUAGE ExistentialQuantification, Rank2Types, RecordWildCards #-}
      import qualified Data.Set as Set

      data SetModule = forall s. SetModule {
      empty :: forall a. s a
      , insert :: forall a. Ord a => a -> s a -> s a
      , null :: forall a. s a -> Bool
      }

      dataSetM :: SetModule
      dataSetM = SetModule {
      empty = Set.empty
      , insert = Set.insert
      , null = Set.null
      }


      useSets :: SetModule -> Bool
      useSets sm@SetModule{..} = null (insert 3 empty)

      Delete
  4. String and Text work with Char, not Word8. ByteString works with Word8, not Char. How does this design address that?

    ReplyDelete
    Replies
    1. You can add another type parameter for the individual element type:

      data StringModule s c = String
      __{ map :: (c -> c) -> s -> s
      __, concatMap :: (c -> s) -> s -> s
      __, filter :: (c -> Bool) -> s -> s
      __, length :: s -> Int
      __...

      .. then you'd have a module of type `StringModule ByteString Word8` for bytestrings and a module of type `StringModule Text Char` for text.

      Delete