Sunday, March 3, 2013

binary 0.7

binary 0.7 has just been released and compared to the current version included in the Haskell Platform, binary, it brings a few new things. Most of this has also been available in the 0.6.* versions which already have been available on hackage for some time.

Incremental interface and improved error reporting

This is probably the most requested feature, and was introduced already in binary 0.6.
In older versions of binary, you could run the Get monad with the following function:
runGet :: Get a -> Lazy.ByteString -> a
It takes the monad you want to execute, together a with a lazy ByteString containing all the input it will need to finish. While this is straight forward and easy to use, it does not meet all needs:
  1. There is no error reporting (compare with a function returning Maybe a, or Either ErrorMessage a). If a decoder error happens, it'll just call error :: String -> a.
  2. What if you don't have all input when you start to decode, like for example if you're reading from a large file or a socket? Previously, lazy IO has been used to address this, but we will look into why this is not always suitable.
To address (1), we introduce the data type Decoder as the return type for our new run function.
-- | Incremental interface to run a decoder.
runGetIncremental :: Get a -> Decoder a
A Decoder can have successfully finished in which case it'll be the Done constructor and have the final value, or have encountered a problem, in which case it'll be the Fail constructor together with the error message. In both cases, you will also get the remaining unused input, and the number of bytes consumed when the decoder succeeded or failed. In the next section, we will see why the this new run function does not need input as a second argument.

For (2), the traditional answer has been to use lazy I/O to get a lazy stream of data, which will hide that I/O is going on, and we can pretend that we already have all the input in memory. While this is fine for some scenarios, we have given up control over error handling and file handle resources. For some applications, this is simply not good enough.

Here's where the incremental interface can help. It works in the same way in binary as seen in other libraries for parsing. Instead of taking any input, it immediately produces a Decoder.  In addition to Done and Fail mentioned above, it also has the Partial constructor, meaning that the Decoder needs more input in order to finish.
-- | A decoder produced by running a 'Get' monad.
data Decoder a
  = Done !B.ByteString !ByteOffset a
    -- ^ The decoder has successfully finished. Except for the
    -- output value you also get any unused input as well as the
    -- number of bytes consumed.
  | Fail !B.ByteString !ByteOffset String
    -- ^ The decoder ran into an error. The decoder either used
    -- 'fail' or was not provided enough input. Contains any
    -- unconsumed input and the number of bytes consumed.
  | Partial (Maybe B.ByteString -> Decoder a)
    -- ^ The decoder has consumed the available input and needs
    -- more to continue. Provide 'Just' if more input is available
    -- and 'Nothing' otherwise, and you will get a new 'Decoder'.

GHC Generics

binary has also got support for automatically generating instances of the Binary class for your data types, using GHC's generics.
{-# LANGUAGE DeriveGeneric #-}

import Data.Binary
import GHC.Generics (Generic)

data Foo = Foo
         deriving (Generic)

-- GHC will automatically fill out the instance
instance Binary Foo
The was committed to binary by Bryan O'Sullivan, though pieces copied from the cereal package where it was written by Bas van Dijk.


binary now implements the Alternative class, and you can use the <|> combinator to try a decoder and fallback to another if the first failed. Although many binary formats don't strictly require backtracking in the decoder, it can be used for convenience or if it leads to code which is easier to read. In the example below, we parse frames in some binary format, trying first one kind of frames, then the other. If it's neither kind, we fail altogether.

data Frame = FrameA {- some fields specific to A frames -}
           | FrameB {- some fields specific to B frames -}

decodeFrame :: Get Frame
decodeFrame = decodeFrameA <|> decodeFrameB <|> fail "not a frame!"

decodeFrameA :: Get Frame
decodeFrameA = do
  word <- getWord8
  unless (isFrameAMagicWord word) $ fail "not an A-frame!"
  {- continue decoding... -}
  return (FrameA {..})

decodeFrameB :: Get Frame
decodeFrameB = do
  word <- getWord8
  unless (isFrameBMagicWord word) $ fail "not an B-frame!"
  {- continue decoding... -}
  return (FrameB {..})

In this particular example, though, it's not required to have backtracking, as we could have first decoded the first Word8, and then chosen to continue with either FrameA or FrameB. This continues to be the better choice for performance and should be preferred when possible, but one can imagine a more complicated binary format where using <|> is required or simply leads to cleaner code.


For each use of a primitive decoder, such as getWord8, getWord32le, and so on, binary needs to check whether there is sufficient input left, or if it needs to demand more input from the caller (by returning a Partial, as discussed above).

decodeWordTuple :: Get (Word32, Word16)
decodeWordTuple = do
  a <- getWord32le
  b <- getWord16le
  return (a,b)

Here, binary will check twice whether there is enough input. First for the 4 bytes of the Word32, and a second time whether there is 2 bytes left for the Word16. Once could think this is wasteful, and that we should only check once at the beginning for the 6 bytes that we know will be required. Unfortunately, with a monadic API this is not possible (maybe possible with some new fancy kind of rewrite rules?).
When using the applicative class though, binary tries to be clever and join these checks.

decodeWordTuple :: Get (Word32, Word16)
decodeWordTuple = (,) <$> getWord32le <*> getWord16le

It does this by using GHC's rewrite rules to rewrite the expression into a single decoder. Note that for this to make a worthwhile difference to the performance of your application, your application needs to already be quite tuned. The rewriting also relies heavily on that all decoder functions gets inlined, and can therefore be a bit hard to trigger.

Backwards compatibility

I'm pleased to say that while the internals of binary has undergone major changes, it still largely provides the same API to the developer. A few convenience functions have been added, and a few functions that doesn't make sense any more have been removed. Read the haddocks for the full API here.

decodeFileOrFail :: Binary a => FilePath
                 -> IO (Either (ByteOffset, String) a)
decodeOrFail :: Binary a =>
  Lazy.ByteString -> Either (Lazy.ByteString, ByteOffset, String)
                            (Lazy.ByteString, ByteOffset, a)

runGetIncremental :: Get a -> Decoder a
runGetOrFail :: Get a -> Lazy.ByteString
  -> Either (Lazy.ByteString, ByteOffset, String)
            (Lazy.ByteString, ByteOffset, a)
pushChunk :: Decoder a -> ByteString -> Decoder a
pushChunks :: Decoder a -> Lazy.ByteString -> Decoder a
pushEndOfInput :: Decoder a -> Decoder a
uncheckedLookAhead :: Int64 -> Get Lazy.ByteString
uncheckedSkip :: Int64 -> Get ()

The functions lookAhead and lookAheadM which disappeared in binary-0.6.* and have been brought back in this release, making it easier to transition from binary-0.5.


The package is available on hackage.
The code is available on github.

Contributors since the last Haskell Platform version:

  • Lennart Kolmodin
  • Johan Tibell
  • Bryan O'Sullivan
  • Bas van Dijk
  • Paolo Capriotti
  • Eyal Lotem
  • Gabor Greif


Andrea Vezzosi said...

How does the performance compare to binary-0.5? is the better error reporting costing much?

Lennart Kolmodin said...

binary is now using continuation-passing style, which GHC can generate efficient code from. The old binary was a state monad.

binary-0.7 seems to be as fast as binary-0.5, never slower, and at times up to 5x faster (or even more in some extreme cases).
Of course, to see a large speedup it requires that your code is already quite tuned. If the biggest cost in your decoder is not the code from binary, then even binary-0.7 won't make it faster.

So, it seems the error reporting comes for free.

fmaste said...

Have you thought of using the operational package to solve the problem of the sufficient input left? You could reify the monad and obtain something like an AST of the parser. Although I don't know about the performance of the solution.

Lennart Kolmodin said...

'operational' looks like deep embedding of a dsl.
binary indeed does something similar. However, binary is shipped together with GHC, so I don't want to add dependencies which are not strictly required.

fmaste said...

Do you think that knowing in advance how many bytes a monadic function like for example {getWord8 >> getWord8 >> getWord8} needs will provide a significant speed improvement?

Is it possible to do this with the kind of deep embedding you have now?

If yes to the first question and no to the second I can try building a wrapper package using the operational or free package to see how it works.

Lennart Kolmodin said...

It's not possible to build a monadic parser that statically can determine how much input it'll require, so the binary package is not able to do this.
It does not matter whether it's a deep or shallow embedding, it just can't be done with a monad.
To statically determine the required input, I've found the best way is to use arrows.
Read about the limitation of the monads, and why arrows can solve this, here's a great tutorial I found:

You can find my binary-arrow experiment here;
However, it seems that it does not always result in a great speedup, it seems that doing allocation is more expensive than boundary checks, and therefore there is little to save by eliminating the boundary checks.
Also, the way I wrote the arrow library, it's not possible to have recursive parsers due to the very agressive inlining that it does.
Next, I wanted to rewrite it into CPS in hope that GHC can optimise better, but I have not yet come around to doing it.
If you come to a different conclusion about the efficiency of statically knowing more about the parser, I'm of course interested to hear :)

fmaste said...

Thanks for such a complete answer, it seems that you have tried all the possibilities. I'll look at them and specially the arrows tutorial.