Version française
Home     About     Download     Resources     Contact us    
Browse thread
Serialisation of PXP DTDs
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Mauricio Fernandez <mfp@a...>
Subject: Re: [Caml-list] Re: Serialisation of PXP DTDs
On Fri, Oct 24, 2008 at 10:03:47AM -0400, Markus Mottl wrote:
> On Fri, Oct 24, 2008 at 5:11 AM, Mikkel Fahnøe Jørgensen
> <mikkel@dvide.com> wrote:
> > I guess this discussion is an overkill for the problem at hand, but
> > speaking of binary extensible protocols, have you looked at ASN.1? It
> > is an abstraction over any number of encodings. At least one binary
> > encoding has extension bits to allow future growth of object
> > collections and similar.
> 
> Note that it is perfectly safe to grow sum types with bin-prot.  It
> was designed that way intentionally.  It's just not safe to reorder or
> remove elements.  Nobody needs to reorder elements, because it doesn't
> make any operational difference in the program.  Backward
> compatibility of protocols you define necessarily requires the
> presence of old constructors in sum types anyway so you may not want
> to remove those in any case.  There is hardly any harm from the
> protocol perspective in leaving old constructors in there.
> 
> Note, too, that polymorphic variants even allow reordering with
> bin-prot. (...)
> 
> Except for human-readability, I think bin-prot should scale very well
> on the other requirements of serialization protocols once it has been
> ported to architectures with unusual endianness (almost all machines
> are little endian nowadays so hardly anybody on this list should be
> affected).

Unfortunately, growing sum types is far from being the only protocol extension
of interest. There's a trivial extension which, I suspect, will be at
least as common in practice, namely adding new fields to a record (or new
elements to a tuple). bin-prot is unable to handle it adequately --- a
self-describing format like the one I'm working on is required.

You might argue that this extension is subsumed by the ability to grow sum types,
since you can go from

    type record = { a : int } with bin_io
    type msg = A of record

to 

    type record1 = { a : int } with bin_io
    type record2 = { a' : int; b : int } with bin_io
    type msg = A of record1 | B of record2

(Note how special care has to be taken to tag the record --- "explicit
tagging" in ASN.1 parlance.)

However, this merely solves a part of a problem: that all serializations
according to an old type belong to the possible serializations for an
updated type, or, in other words, that new consumers be able to read data
written by old producers. Even with the above encoding (not with any arbitrary
type definition, but with a carefully constructed one), with bin-prot, this
implies that producers not be updated before consumers.

My design lifts that restriction and allows an old consumer to read the data
from a new producer when new fields have been added to a record or a tuple. 
It even allows a node to operate on data it doesn't understand completely
(e.g., when a new constructor is used): it can for instance update one
field it does know while leaving those it is unable to interpret (or doesn't
even know about!) unmodified. I think this is very important in many of the
scenarios where one would need an extensible binary protocol. Google's
Protocol Buffers support this; I'm not sure this is explicitly supported by
Facebook's Thrift compiler, but IIRC the protocol should allow it.

AFAICS the ability to process data not understood in full requires the use of
a self-describing format like the one I'm working on.

-- 
Mauricio Fernandez  -   http://eigenclass.org