Version française
Home     About     Download     Resources     Contact us    
Browse thread
Serialisation of PXP DTDs
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Mauricio Fernandez <mfp@a...>
Subject: Re: [Caml-list] Re: Serialisation of PXP DTDs
On Thu, Oct 23, 2008 at 12:26:54PM -0700, Dario Teixeira wrote:
> > I mean, as long as types are as simples are pairs we will
> > probably write down the very same S-expression, but for more
> > complex types you hand up having to choose how to encode them
> > in S-expressions.  Such design choices can need to be changed
> > in the future as more types will be supported.  I fail to see
> > why the future-proofness of such choices
> > should be better than that of bin-prot.
> Hi,
> Well, there's several types of "future-proofness".  If in the far-future I
> was faced with the task of reverse-engineering and deserialising a structure
> about whose contents I only had a rough idea, then a human-readable
> text-format like that of S-expressions would simplify things enormously.  On
> a more down-to-earth scenario, bear in mind that S-expressions offer
> forward-compatibility as long as you are only adding to a structure.
> For example, suppose I have a type foobar_t with two
> constructors:
> type foobar_t = One | Two
> If later on I add a third constructor "Three" to this type,
> the deserialiser for the new version can still read S-expressions
> written with the serialiser for the old version.

I have been working for a while on a self-describing, compact, extensible
binary protocol, along with an OCaml implementation which I intent to release
in not too long.

It differs from sexplib and that bin-prot in two main ways:
* the data model is deliberately more limited, as the format is meant to be
  de/encodable in multiple languages.
* it is extensible at several levels, achieving both forward and backward
  compatibility across changes in the data type

You can think of it as an extensible Protocol Buffers[1] with a richer data
model (albeit not in 1:1 accordance with OCaml's for the above mentioned

In the criteria you gave in another message, namely
(1) ease of use
(2) "future-proofness"
(3) portability
(4) human-readability,

it does fairly well at the 3 first ones --- especially at (2) and (3), which
were poorly supported by existing solutions (I looked into bin-prot, sexplib,
Google's Protocol Buffers, Thrift and XDR; I also referred to IIOP and ITU-T
X.690 DER during the design). Being a binary format, it obviously doesn't do
that well at (4), but it is possible to get a human-readable dump of the
binary data even in the absence of the interface definition, making
reverse-engineering no harder than sexplib (and arguably easier in some ways).

For example, here's a bogus message definition to illustrate (2) and (4).
This protocol definition is fed to the compiler, which generates the OCaml
type definitions, as well as the encoders/decoders and pretty-printers (as you
can see, the specification uses a mix of OCaml, Haskell and C++ syntax, but
it's pretty clear IMO)

    type sum_type 'a 'b 'c = A 'a | B 'b | C 'c

    message complex_rtt =
      A {
	a1 : [(int * [|bool|])];
	a2 : [ sum_type<int, string, long> ]
    | B {
	b1 : bool;
	b2 : (string * [int])

The protocol is extensible in the sense that you can add new constructors to a
sum or message type, add new elements to a tuple, and replace any primitive
type by a sum type including the original type. For instance, if at some point
in time we find that the b1 field should have a different type, we can do

    type bool_or_something 'a = Orig unboxed_bool | New_constructor 'a

and then 
   | B { b1 : bool_or_something<some_type>; ... }

This, along with a way to specify default values, allows both forward and
backward compatibility.

The compiler generates a pretty printer for these structures, useful for
debugging. Here's a message generated randomly:

  Complex_rtt.a1 =
   [ ((-5378), [| false; false; false; true; true |]);
     (3942717140522000971, [| false; true; true; true; false |]);
     ((-6535386320450295), [| false |]); ((-238860767206), [|  |]);
     (1810196202, [| false; false; true; true |]) ];
  Complex_rtt.a2 =
   [ Sum_type.A (-13830); Sum_type.A 369334576; Sum_type.A 83;
     Sum_type.A (-3746796577167465774); Sum_type.A (-1602586945) ] }

Now, this is the information decoded in the absence of the above definitions
(iow., what you'd have to work with if you were reverse-engineering the

T0 {
     T0 [
          T0 { Vint_t0 (-5378);
               T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 0; Vint_t0 (-1);
                    Vint_t0 (-1)]};
          T0 { Vint_t0 3942717140522000971;
               T0 [ Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1); Vint_t0 (-1);
                    Vint_t0 0]};
          T0 { Vint_t0 (-6535386320450295); T0 [ Vint_t0 0]};
          T0 { Vint_t0 (-238860767206); T0 [ ]};
          T0 { Vint_t0 1810196202;
               T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}];
     T0 [ T0 { Vint_t0 (-13830)}; T0 { Vint_t0 369334576}; T0 { Vint_t0 83};
          T0 { Vint_t0 (-3746796577167465774)}; T0 { Vint_t0 (-1602586945)}]}

(I'm still changing some details so it might look better than this shortly.)

It's not a drop-in solution like sexplib's "with sexp", by design (since it is
meant to allow interoperability between different languages), but it's still
fairly easy to use.

If you're interested in this, tell me and I'll let you know when it's ready for
serious usage.


Mauricio Fernandez  -