About adding embedded user types into Ocaml bytecode runtime

From: Basile STARYNKEVITCH (Basile.Starynkevitch@wanadoo.fr)
Date: Fri Feb 19 1999 - 23:53:43 MET


From: Basile STARYNKEVITCH <Basile.Starynkevitch@wanadoo.fr>
Date: Fri, 19 Feb 1999 23:53:43 +0100 (CET)
To: caml-list@inria.fr
Subject: About adding embedded user types into Ocaml bytecode runtime

[[posted to the CAML list]]

         Time-stamp: <1999-Feb-19 23:50:06 Basile STARYNKEVITCH>

                 Some thoughts and suggestions about
                    adding embedded user types
              into the Ocaml 2.01 byte-code runtime system

                      Basile STARYNKEVITCH

               home: <basile.starynkevitch@wanadoo.fr>
                 work: <basile.starynkevitch@cea.fr>
________________________________________________________________________

Ocaml can be used to drive or embed existing applications or routines
(such as numerical computations, image processing, databases, ...)
coded or interfaced to C. We assume in the sequel that this is a
legitimate (even if unexpected or unusual) usage for Ocaml. Other
existing embedding languages (e.g. Python, Tcl, Perl, ...) were designed
for easy embedding, but lack Ocaml's features that we consider very
important (such as the functional and object paradigms, the Ocaml type
safety, and performance of the implementation). We are dealing only
with the byte-code interpreter and compiler (i.e. ocaml, ocamlc,
ocamlrun, ocamlmktop) and not with the native compiler (ocamlopt). We
refer to the ocaml-2.01 byte-code implementation (released in December
1998, see "http://caml.inria.fr/pub/old_caml_site/ocaml/" for details).

We assume the reader does know very well how to interface C with
Objective Caml and vice versa. See the Ocaml 2.01 reference manual,
chapter 14 for many details. We also assume some familiarity with
byte-code interpreter and garbage collection implementation
techniques, since the suggestions given below are aimed towards the
Ocaml developpers or runtime-system hackers.

Several ideas expressed here are inspired (or copied) from Mathias
Blume's vscm (see "http://www.kurims.kyoto-u.ac.jp/~blume/vscm/" for
details) Scheme implementation. I did start to implement them, but
lack of time and political issues (I failed to get from inside my
organization written permission to publish substantial patches to the
ocaml runtime system) made me abort this work. I won't be able to
continue it, since I'm taking a new job. I'm publishing this on this
list in case someone becomes interested.... (and a disk crash wiped
out the small stuff I did actually begin to implement, so even if I
wanted and was permitted to, I can't give any code, since I lost it;
however design ideas are still fresh in my head...)

An important issue about embedding existing applications into Ocaml is
supporting existing (although perhaps patch-able) user data types,
whose representation is already somehow fixed (e.g., as a C
structure). For instance, an hypothetical geometrical package might
expect to deals with 3d points like

       struct point3d_st {
          float p3_x,p3_y,p3_z;
       }

of course, this can be declared in Ocaml as

      type point3d_t = { p3x: float; p3y: float; p3z: float }

but this is represented very differently in Ocaml (which uses boxed
double-precision numbers) - in an incompatible way with the previous C
declaration (which uses unboxed native single precision floats).

The previous struct point3d_st could partly be used in the existing
ocaml-2.01 implementation: just box it as an abstract value,
alloc()-ated with Abstract_tag. Doing so works fine for the garbage
collector, but we have to give up the very useful serialization
facilities (ie the Marshal module from the ocaml standard library). We
could also just allocate such point3d_st structures outside the Ocaml
heap (this is possible since ocaml permits as values but ignores any
pointer outside its heap) but we then loose the garbage collector.

Another example would be a numerical package. Suppose we want to have
single precision matrices, each belonging to a numerical mesh and
grid. In an hypothetical numerical code, the high level computation
could be coded in (interpreted byte-coded) Ocaml while the low level
numerical engine is coded in C (or even Fortran77 glued thru C). So,
meshes and grids are completely represented by Ocaml values, but
matrices would be something like

       struct matrix_st {
       /* the mesh owning this matrix */
          value mesh; /* an Ocaml value */
       /* matrix dimensions */
          int width, height;
       /* the grid related to this matrix */
          value grid; /* an Ocaml value */
       /* the matrix proper, actually width*height doubles */
          double mat[0];
       }

Alternatively, the mat field could be a (separately) malloc()-ed array
(to overcome the 8Mbyte ocaml limitation on 32 bits machines), then
the last field would just be

      /* the matrix pointer, should be malloc()-ed & free()-d */
          double* mat;

This struct matrix_st is *not* usable in the current ocaml-2.01
runtime, which assumes that every ocaml value is either made only of
ocaml values (ie ocaml allocated pointers or tagged integers) or only
of non pointer data.

Our aim is just to be able to deal with such simple data types as
given above, with minimal changes to the existing ocaml-2 runtime
machinery. We do not pretend to handle more complex stuff (such as C++
virtual classes mixing ocaml values with Xlib windows!). So the main
assumption is that user data types can [pretend to] be represented in
a contiguous memory zone (as all ocaml values are), and contain a
fixed number of ocaml values at fixed offsets; also, it has sense to
move or copy (word by word) such user data. We do assume that the user
might need several (say a few dozens, perhaps a hundred) distinct user
types. We also suppose that such user data values won't be common, and
so need not to be efficiently handled.

User types could be assigned a new User_tag, and the ocaml runtime
could be patched to handle it appropriately. This involve changes to
the allocator, garbage collector, and serializer. Assigning a separate
tag to each different user type would be another option, but the tag
space is very tight on 32 bits machines (only one byte for tags), so I
believe this is not practical.

As always, user data would start with a one word header containing
User_tag and the allocated size in words. After the header, a pointer
to a user data descriptor (which we'll define later) will give the
necessary meta-information. Each user data type has its own user
supplied data descriptor. So the matrix structure is actually

       struct matrix_st {
            /* all user data start with _header & _descr */
       /* the header word */
          header_t _header;
       /* the data descriptor */
          struct userdatadescr_st *_descr;
       /* the mesh owning this matrix */
          value mesh; /* an Ocaml value */
       /* matrix dimensions */
          int width, height;
       /* the grid related to this matrix */
          value grid; /* an Ocaml value */
      /* the matrix pointer, unique [=unshared], should be malloc()-ed */
          double* mat;
       }

All such matrices share a common descriptor:

       struct userdatadescr_st matrix_desc;

This descriptor have a unique (human and machine readable) name (a
string), say "matrixd"

The Ocaml runtime cares about user data thru the memory management
(allocation & garbage collection) and the serialization. The garbage
collector is generational copying -hence with moving data & pointers.

I consider that the user data has a scalar and a composite part. The
atomic part contains all non pointer, this means all non value fields
(ie width, height, mat for our matrix_st structure). The composite
part contains all value fields (ie mesh and grid for our matrix_st).

Memory management (ie GC) cares mostly about pointers, so is mostly
concerned with the composite part. Serialization also deals with the
atomic part (which can be safely copied by the GC moving routines).

I suppose that the composite part is made of a fixed number of values
having fixed offsets. This could be described by some data, but I
borrowed M.Blume's idea of defining an iterator for them. This means
that the user should provide (in the data descriptor) a routine
which calls a function f on all the values field addresses. In our
matrix case, the iterator is just

      void matrix_iterator(value mat_v, void (*f)(), void* data)
      {
         struct matrix_st *m = mat_v;
         f(&m->mesh, data);
         f(&m->grid, data);
      }

Notice that the iterator applies the function f to each pointer
address (and not to each value). So f takes a value* first argument
and a void* second (generic) argument. In particular, the garbage
collector calls the iterator to move pointers (this involves a bit of
trickery: some parts of the GC will use the iterator on old objects to
compute offset of values, and copy them in the new object at the same
offset)!

  
I did check that this is sufficient in all ocaml-2.01's runtime to
handle inside value pointers. (Some examples below).

Serialization of user data can be done as follow (see files
ocaml-2.01/byterun/intern.c & ocaml-2.01/byterun/extern.c which should
both be significantly patched and enhanced) :

  * Every user data has a descriptor (such as matrixdesc). This
    descriptor has a unique name. We could always output after a code
    for user data the full descriptor name, but this might take a lot
    of space if a given user data type is used a lot in a single
    serialization operation (for instance, if we marshal an array of
    our matrices, it would be stupid to output a lot of times the
    "matrixd" string). It is better to output this name the first time
    (and register it in a dictionnary with a unique index) and to just
    output an index the other times. So we need to handle a set of
    already used data descriptors in extern.c. If a user data
    (e.g. one of our matrices) is to be externed a first time (see in
    file extern.c:205 the extern_rec routine) we output CODE_USERDATA
    then its new index (eg 3 if we had seen three other user type
    descriptors before) then the descriptor name (here the null
    terminated string "matrixd") . When another of our matrices is to
    be externed, we output CODE_SEENUSERDATA followed with its index
    number -ie 3 again-.

  * since extern.c needs both the size_32 (size in words on a 32 bits
    machine) and the size_64 (size in words on a 64 bits machine), the
    user should provide in the descriptor a routine computing the
    size_32 and another routine computing the size_64. Here they are
    for our matrices (the size that count is only the zone allocated
    by ocaml; the malloc-ed matrix doesn't count; here the size is 7
    words on both 32 & 64 bits architectures):

    /* value of matrix_desc.desc_sizer32 */
    unsigned matrix_size_32(value mat_v) {
         struct matrix_st *m = mat_v;
         return 7;
    }

    /* value of matrix_desc.desc_sizer64 */
    unsigned matrix_size_64(value mat_v) {
         struct matrix_st *m = mat_v;
         return 7;
    }

  the extern_rec routine should be patched to compute and output the
  allocated size32 and size64 (in words) of the user data.

  * After that, in all cases, we call a user provided
    externalization routine to output the atomic part. This routine
    uses a small set of tiny externalization primitives (to be coded
    once for all in extern.c and given public names such as
    camlextern_*) for primitive C types (ints, floats, doubles,
    strings...). For our matrices the externalizer is therefore :

      /* externalize the atomic part of matrices */
    /* value of matrix_desc.desc_externalizer */
      void matrix_externalizer(value mat_v) {
         struct matrix_st *m = mat_v;
         int ix, dim;
         camlextern_int(m->width);
         camlextern_int(m->height);
         dim=m->width*m->height;
         assert(dim==0 || m->mat!=(float*)0);
         for (ix=0; ix<dim; ix++)
            camlextern_float(m->mat[i]);
      }

   * of course, dual to the externalizer, the user should provide an
     internalizer (see in intern.c:109 the intern_rec routine), like:

      /* internalize the atomic part of matrices - return 0 iff ok */
    /* value of matrix_desc.desc_internalizer */
      int matrix_internalizer(value mat_v) {
         struct matrix_st *m = mat_v;
         int ix, dim;
         camlintern_int(m->width);
         camlintern_int(m->height);
         dim=m->width*m->height;
         if (dim>0) {
            m->mat=malloc(sizeof(m[0])*dim);
            if (!m->mat)
               return 1;
            for (ix=0; ix<dim; ix++)
               camlintern_float(&m->mat[i]);
         } else
           m->mat=0;
         return 0;
      }
      
To support our claim that an iterator on all values adresses' is
enough, here is the case to add to extern_rec (in extern.c:310)
for user data:

     case User_tag:
        { struct userdatadescr_st *desc=Userdata_descr(v);
          int id, ln, sz32, sz64;
          id=seendescrindex(desc); /* seen index of descriptor or -1 */
          if (id>=0) {
            writecode16(CODE_SEENUSERDATA, id);
          } else { /* new descriptor */
            id=registerdescr(desc); /* register this descriptor */
            writecode16(CODE_USERDATA, id);
            ln=strlen(desc->descr_name);
            /* write the name length and the name string */
            Write(ln);
            writeblock(desc->descr_name, ln);
         };
         /* compute and write the 32 & 64 bits sizes, so allocation on
            interning is easy */
         sz32=(desc->descr_sizer32)(v);
         sz64=(desc->descr_sizer64)(v);
         write32(sz32);
         write32(sz64);
         /* externalize the atomic part */
         (desc->descr_externalizer)(v);
         /* iterate on the composite part */
         (desc->descr_iterator)(v,extern_rec_wrapper,(void*)0);
        };
        break;

   Of course we need the obvious:

        static void extern_rec_wrapper(value *v, void* d) {
           extern_rec(*v);
        }

The garbage collector should support finalization of user data (with
the same restrictions as for Final_tag). When destroyed by the GC, our
matrices should free the malloc-ed float array:

   /* finalizer of matrices */
    /* value of matrix_desc.desc_finalizer */
      void matrix_finalizer(value mat_v) {
         struct matrix_st *m = mat_v;
         if (m->mat) free(m->mat);
         m->mat = (double*)0;
         m->width = m->height = 0;
     }

Now we can give the data descriptor declaration (to be added into the
ocaml runtime, which should also expose now some previously static
routines, like extern_rec and so on...):

   struct userdatadescr_st {
     unsigned descr_magic; /* always 0xbad570f */
     char *descr_name;
     void (*desc_iterator)(value v, void (*f)(), void *data);
     unsigned (*desc_sizer32)(value v);
     unsigned (*desc_sizer64)(value v);
     void (*desc_externalizer)(value v);
     int (*desc_internalizer)(value v);
     void (*desc_finalizer)(value v);
   }

Dealing with named data descriptors is very similar to dealing with
named external primitives. The runtime should be patched accordingly
(this involve changing the format of bytecode executables)

Of course, the user data should be allocated by user supplied
primitives calling alloc with User_tag.

The garbage collector can be patched to accomodate such user
data. However, on some 32 bits machines (those, like old Sparcs, which
requires doubleword aligned double precision floats) the user data
objects should either always be allocated at a doubleword (this is
probably tricky, but useful) or their double fields should be accessed
with a special routine (similar to Store_double_val)

Implementing this ideas and submitting a patch to Ocaml developpers at
INRIA is leaved as an exercice to the intrepid and clever reader. I am
pretty sure it can be done. However, I would suggest first to the
eventual implementor to check before that he or she will be permitted
to submit patches to INRIA. Good luck!

Please let me know if you actually do use these (or similar) ideas to
enhance Ocaml (interpreted bytecode) runtime [[I'll even be very
pleased & honored if you cite my name in the source code in that
case; but this is up to you]].

If anyone want some more precision, please email me at work
<basile.starynkevitch@cea.fr> and at home
<basile.starynkevitch@wanadoo.fr>.

Thanks for reading.

-- 

Basile STARYNKEVITCH - 8 rue de la Faiencerie, 92340 BOURG LA REINE (France) tel 01.46.65.45.53. email = basile point starynkevitch at wanadoo point fr



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:20 MET