Re: building stubs and modules from ANSI C include files

Xavier Leroy (Xavier.Leroy@inria.fr)
Wed, 20 Sep 1995 14:12:42 +0200 (MET DST)

From: Xavier Leroy <Xavier.Leroy@inria.fr>
Message-Id: <199509201212.OAA12294@pauillac.inria.fr>
Subject: Re: building stubs and modules from ANSI C include files
To: jsparkes@bnr.ca (jeff)
Date: Wed, 20 Sep 1995 14:12:42 +0200 (MET DST)
In-Reply-To: <"24806 Tue Sep 19 11:05:34 1995"@bnr.ca> from "jeff" at Sep 19, 95 11:03:00 am

> It seems that the "external" keyword in CSL gives most of the
> information about how to call a C function. It should be possible to
> build the stub routines from a module description.
>
> And it should be possible to build a module description from an ANSI C
> include file which gives the types of all of the arguments.

Well - almost. This has been done for some of the X11 include files by
Brad Chen and also Daniel de Rauglaudre, but it worked well because
the X11 functions have a consistent, fairly well-behaved interface.

For other C libraries, there are a number of problems to address,
those you mentioned:

> - no way to say that returned memory needs to be freed
> - how to raise exceptions, and which ones...
> - descriptions of C structures that are returned or passed in
> - currently no interface to dlopen if it exists. Easy to add on
> Solaris, at least.

and also:

- "out" and "in/out" parameters; in particular, C functions that
return several results via out parameters are best encoded as
Caml functions that return a tuple or record;

- to copy or not to copy; for instance, a C function that takes a
string can usually be passed a pointer to the first character of the
Caml string in the Caml heap; but if the function is going to keep
that pointer around (e.g. in a global or static variable), then the
GC will move the Caml string unpredictably, so it must be copied to a
malloc-ed block which is then passed to the C function.

The latter point shows that neither the Caml type "string" nor the C
type "char *" is sufficient to generate the stub code; extra info is
needed to determine whether string parameters must be copied or not.

Shail Aditya worked on this problem during a summer internship at
INRIA in '92. His approach was to provide declarations for the C types
and functions in a language (called MLC) that is a superset of Caml's
interfaces. From these declarations, both the Caml interface and the C
stub functions are generated.

For instance, there would be two string types in MLC: "string" and
"copied_string". Both are mapped to "string" in Caml and "char *" in
C, but the conversion functions are different (no copy for "string",
malloc and copy for "copied_string").

MLC also provides various mechanisms to add pre- and post-processing
(arbitrary C code) to generated stubs. A typical post-processing is to
check the return code and raise an exception if it denotes an error.

For various reasons, Shail's generator wasn't used much afterwards,
but I still have the code and can make it available if there's
interest and Shail agrees.

Nowadays, there might be other options worth investigating. For
instance, if your libraries have interfaces written in IDL, then it
could make sense to start with the IDL spec to generate the stub code.

Regards,

- Xavier Leroy