Browse thread
Serialisation of PXP DTDs
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2008-10-23 (22:17) |
From: | Gerd Stolpmann <info@g...> |
Subject: | Re: [Caml-list] Re: Serialisation of PXP DTDs |
Am Donnerstag, den 23.10.2008, 23:05 +0200 schrieb Mauricio Fernandez: > I have been working for a while on a self-describing, compact, extensible > binary protocol, along with an OCaml implementation which I intent to release > in not too long. > > It differs from sexplib and that bin-prot in two main ways: > * the data model is deliberately more limited, as the format is meant to be > de/encodable in multiple languages. > * it is extensible at several levels, achieving both forward and backward > compatibility across changes in the data type > > You can think of it as an extensible Protocol Buffers[1] with a richer data > model (albeit not in 1:1 accordance with OCaml's for the above mentioned > reason). Have you looked at ICEP (see zeroc.com)? It has bindings for many languages, even for Ocaml (http://oss.wink.com/hydro/). It is, however, not self-describing. Anyway, you may find there ideas for portability. Gerd > In the criteria you gave in another message, namely > (1) ease of use > (2) "future-proofness" > (3) portability > (4) human-readability, > > it does fairly well at the 3 first ones --- especially at (2) and (3), which > were poorly supported by existing solutions (I looked into bin-prot, sexplib, > Google's Protocol Buffers, Thrift and XDR; I also referred to IIOP and ITU-T > X.690 DER during the design). Being a binary format, it obviously doesn't do > that well at (4), but it is possible to get a human-readable dump of the > binary data even in the absence of the interface definition, making > reverse-engineering no harder than sexplib (and arguably easier in some ways). > > For example, here's a bogus message definition to illustrate (2) and (4). > This protocol definition is fed to the compiler, which generates the OCaml > type definitions, as well as the encoders/decoders and pretty-printers (as you > can see, the specification uses a mix of OCaml, Haskell and C++ syntax, but > it's pretty clear IMO) > > type sum_type 'a 'b 'c = A 'a | B 'b | C 'c > > message complex_rtt = > A { > a1 : [(int * [|bool|])]; > a2 : [ sum_type<int, string, long> ] > } > | B { > b1 : bool; > b2 : (string * [int]) > } > > The protocol is extensible in the sense that you can add new constructors to a > sum or message type, add new elements to a tuple, and replace any primitive > type by a sum type including the original type. For instance, if at some point > in time we find that the b1 field should have a different type, we can do > > type bool_or_something 'a = Orig unboxed_bool | New_constructor 'a > > and then > ... > | B { b1 : bool_or_something<some_type>; ... } > > This, along with a way to specify default values, allows both forward and > backward compatibility. > > The compiler generates a pretty printer for these structures, useful for > debugging. Here's a message generated randomly: > > { > Complex_rtt.a1 = > [ ((-5378), [| false; false; false; true; true |]); > (3942717140522000971, [| false; true; true; true; false |]); > ((-6535386320450295), [| false |]); ((-238860767206), [| |]); > (1810196202, [| false; false; true; true |]) ]; > Complex_rtt.a2 = > [ Sum_type.A (-13830); Sum_type.A 369334576; Sum_type.A 83; > Sum_type.A (-3746796577167465774); Sum_type.A (-1602586945) ] } > > Now, this is the information decoded in the absence of the above definitions > (iow., what you'd have to work with if you were reverse-engineering the > protocol): > > T0 { > T0 [ > T0 { Vint_t0 (-5378); > T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 0; Vint_t0 (-1); > Vint_t0 (-1)]}; > T0 { Vint_t0 3942717140522000971; > T0 [ Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1); Vint_t0 (-1); > Vint_t0 0]}; > T0 { Vint_t0 (-6535386320450295); T0 [ Vint_t0 0]}; > T0 { Vint_t0 (-238860767206); T0 [ ]}; > T0 { Vint_t0 1810196202; > T0 [ Vint_t0 0; Vint_t0 0; Vint_t0 (-1); Vint_t0 (-1)]}]; > T0 [ T0 { Vint_t0 (-13830)}; T0 { Vint_t0 369334576}; T0 { Vint_t0 83}; > T0 { Vint_t0 (-3746796577167465774)}; T0 { Vint_t0 (-1602586945)}]} > > (I'm still changing some details so it might look better than this shortly.) > > It's not a drop-in solution like sexplib's "with sexp", by design (since it is > meant to allow interoperability between different languages), but it's still > fairly easy to use. > > If you're interested in this, tell me and I'll let you know when it's ready for > serious usage. > > [1] http://code.google.com/p/protobuf/ > -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------