Previous Contents Next

Creating and modifying Objective CAML values from C

A C function called from Objective CAML can modify its arguments in place, or return a newly-created value. This value must match the Objective CAML type for the function result. For base types, several C macros are provided to convert a C datum to an Objective CAML value. For structured types, the new value must be allocated in the Objective CAML heap, with the correct size, and its fields initialized with values of the correct types. Considerable care is required here: it is easy to construct bad values from C, and these bad values may crash the Objective CAML program.

Any allocation in the Objective CAML heap can trigger a garbage collection, which will deallocate unused memory blocks and may move live blocks. Therefore, any Objective CAML value manipulated from C must be registered with the Objective CAML garbage collector, if they are to survive the allocation of a new block. These values must be treated as extra memory roots by the garbage collector. To this end, several macros are provided for registering extra roots with the garbage collector.

Finally, C code can allocate Objective CAML heap blocks that contain C data instead of Objective CAML values. This C data will then benefit from Objective CAML's automatic memory management. If the C data requires explicit deallocation, a finalization function can be attached to the heap block.

Modifying Objective CAML values

The following macros allow the creation of immediate Objective CAML values from the corresponding C data, and the modification of structured values in place.


   
Val_long(l) return the value representing the long integer l
Val_int(i) return the value representing the integer l
Val_bool(x) return false if x=0, true otherwise
Val_true the representation of true
Val_false the representation of false
Val_unit the representation of ()
   
Store_field(b,n,v) store the value v in the n-th field of block b
Store_double_field(b,n,d) store the float d in the n-th field of the float array b

Figure 12.10: Creation of immediate values and modification of structured blocks.


Moreover, the macros Byte and Byte_u can be used on the left-hand side of an assignment to modify the characters of a string. The Field macro can also be used for assignment on blocks with tag Abstract_tag or Final_tag; use Store_field for blocks with tag between 0 and No_scan_tag-1. The following function reverses a character string in place:

#include <caml/mlvalues.h>
value swap_char(value v, int i, int j)
{ char c=Byte(v,i); Byte(v,i)=Byte(v,j); Byte(v,j)=c; }
value swap_string (value v)
{
int i,j,t = string_length(v) ;
for (i=0,j=t-1; i<t/2; i++,j--) swap_char(v,i,j) ;
return v ;
}

# external mirror : string -> string = "swap_string" ;;
external mirror : string -> string = "swap_string"
# mirror "abcdefg" ;;
- : string = "gfedcba"


Allocating new blocks

The functions listed in figure 12.11 allocate new blocks in the Objective CAML heap.

   
alloc(n, t) return a new block of size n words and tag t
alloc_tuple(n) same, with tag 0
alloc_string(n) return an uninitialized string of length n characters
copy_string(s) return a string initialized with the C string s
copy_double(d) return a block containing the double float d
alloc_array(f, a) return a block representing an array, initialized by applying
  the conversion function f to each element of the C array of
  pointers a, null-terminated.
copy_string_array(p) return a block representing an array of strings, obtained
  from the C string array p (of type char **), null-terminated.

Figure 12.11: Functions for allocating blocks.


The function alloc_array takes an array of pointers a, terminated by a null pointer, and a conversion function f taking a pointer and returning a value. The result of alloc_array is an Objective CAML array containing the results of applying f in turn to each pointer in a. In the following example, the function make_str_array uses alloc_array to convert a C array of strings.

#include <caml/mlvalues.h>
value make_str (char *s) { return copy_string(s); }
value make_str_array (char **p) { return alloc_array(make_str,p) ; }


It is sometimes necessary to allocate blocks of size 0, for instance to represent an empty Objective CAML array. Such a block is called an atom.

# inspect [| |] ;;
....memory block: size=0 - structured block (tag=0):
- : '_a array = [||]


Because atoms are allocated statically and do not reside in the dynamic part of the Objective CAML heap, the allocation functions in figure 12.11 must not be used to allocate atoms. Instead, atoms are created in C by the macro Atom(t), where t is the desired tag for the block of size 0.

Storing C data in the Objective CAML heap

It is sometimes convenient to use the Objective CAML heap to store arbitrary C data that does not respect the constraints imposed by the garbage collector. In this case, blocks with tag Abstract_tag must be used.

A natural example is the manipulation of native C integers (of size 32 or 64 bits) in Objective CAML. Since these integers are not tagged as the Objective CAML garbage collector expects, they must be kept in one-word heap blocks with tag Abstract_tag.

#include <caml/mlvalues.h>
#include <stdio.h>

value Cint_of_OCAMLint (value v)
{
value res = alloc(1,Abstract_tag) ;
Field(res,0) = Long_val(v) ;
return res ;
}

value OCAMLint_of_Cint (value v) { return Val_long(Field(v,0)) ; }

value Cplus (value v1,value v2)
{
value res = alloc(1,Abstract_tag) ;
Field(res,0) = Field(v1,0) + Field(v2,0) ;
return res ;
}

value printCint (value v)
{
printf ("%d",(long) Field(v,0)) ; fflush(stdout) ;
return Val_unit ;
}

# type cint
external cint_of_int : int -> cint = "Cint_of_OCAMLint"
external int_of_cint : cint -> int = "OCAMLint_of_Cint"
external plus_cint : cint -> cint -> cint = "Cplus"
external print_cint : cint -> unit = "printCint" ;;


We can now work on native C integers, without losing the use of the tag bit, while remaining compatible with Objective CAML's garbage collector. However, such integers are heap-allocated, instead of being immediate values, which renders arithmetic operations less efficient.

# let a = 1000000000 ;;
val a : int = 1000000000
# a+a ;;
- : int = -147483648
# let c = let b = cint_of_int a in plus_cint b b ;;
val c : cint = <abstr>
# print_cint c ; print_newline () ;;
2000000000
- : unit = ()
# int_of_cint c ;;
- : int = -147483648


Finalization functions

Abstract blocks can also contain pointers to memory blocks allocated outside the Objective CAML heap. We know that Objective CAML blocks that are no longer used by the program are deallocated by the garbage collector. But what happens to a block allocated in the C heap and referenced by an abstract block that was reclaimed by the GC? To avoid memory leaks, we can associate a finalization function to the abstract block; this function is called by the GC before reclaiming the abstract block.

An abstract block with an attached finalization function is allocated via the function alloc_final (n, f, used, max) . For efficiency reasons, the Objective CAML garbage collector does not reclaim heap blocks as soon as they become unused, but some time later. The ratio used/max controls the proportion of finalized abstract blocks that the garbage collector may leave allocated while they are no longer used. A ratio of 0 (that is, used = 0) lets the garbage collector work at its usual pace; higher ratios (no greater than 1) cause it to work harder and spend more CPU time finding unused finalized blocks and reclaiming them.

The following program manipulates arrays of C integers allocated in the C heap via malloc. To allow the Objective CAML garbage collector to reclaim these arrays automatically, the create function wraps them in a finalized abstract block, containing both a pointer to the array and the finalization function finalize_it.

#include <malloc.h>
#include <stdio.h>
#include <caml/mlvalues.h>

typedef struct {
int size ;
long * tab ; } IntTab ;

IntTab *alloc_it (int s)
{
IntTab *res = malloc(sizeof(IntTab)) ;
res->size = s ;
res->tab = (long *) malloc(sizeof(long)*s) ;
return res ;
}
void free_it (IntTab *p) { free(p->tab) ; free(p) ; }
void put_it (int n,long q,IntTab *p) { p->tab[n] = q ; }
long get_it (int n,IntTab *p) { return p->tab[n]; }

void finalize_it (value v)
{
IntTab *p = (IntTab *) Field(v,1) ;
int i;
printf("reclamation of an IntTab by finalization [") ;
for (i=0;i<p->size;i++) printf("%d ",p->tab[i]) ;
printf("]\n"); fflush(stdout) ;
free_it ((IntTab *) Field(v,1)) ;
}
value create (value s)
{
value block ;
block = alloc_final (2, finalize_it,Int_val(s)*sizeof(IntTab),100000) ;
Field(block,1) = (value) alloc_it(Int_val(s)) ;
return block ;
}
value put (value n,value q,value t)
{
put_it (Int_val(n), Long_val(q), (IntTab *) Field(t,1)) ;
return Val_unit ;
}
value get (value n,value t)
{
long res = get_it (Int_val(n), (IntTab *) Field(t,1)) ;
return Val_long(res) ;
}
The C functions visible from Objective CAML are: create, put and get.

# type c_int_array
external cia_create : int -> c_int_array = "create"
external cia_get : int -> c_int_array -> int = "get"
external cia_put : int-> int -> c_int_array -> unit = "put" ;;


We can now manipulate our new data structure from Objective CAML:

# let tbl = cia_create 10 and tbl2 = cia_create 10
in for i=0 to 9 do cia_put i (i*2) tbl done ;
for i=0 to 9 do print_int (cia_get i tbl) ; print_string " " done ;
print_newline () ;
for i=0 to 9 do cia_put (9-i) (cia_get i tbl) tbl2 done ;
for i=0 to 9 do print_int (cia_get i tbl2) ; print_string " " done ;;
0 2 4 6 8 10 12 14 16 18
18 16 14 12 10 8 6 4 2 0 - : unit = ()


We now force a garbage collection to check that the finalization function is called:

# Gc.full_major () ;;
reclaimation of an IntTab by finalization [18 16 14 12 10 8 6 4 2 0 ]
reclaimation of an IntTab by finalization [0 2 4 6 8 10 12 14 16 18 ]
- : unit = ()
In addition to freeing C heap blocks, finalization functions can also be used to close files, terminate processes, etc.

Garbage collection and C parameters and local variables

A C function can trigger a garbage collection, either during an allocation (if the heap is full), or voluntarily by calling void Garbage_collection_function ().

Consider the following example. Can you spot the error?

#include <caml/mlvalues.h>
#include <caml/memory.h>

value identity (value x)
{
Garbage_collection_function() ;
return x;
}

# external id : 'a -> 'a = "identity" ;;
external id : 'a -> 'a = "identity"
# id [1;2;3;4;5] ;;
- : int list = [538917758; 538917752; 538917746; 538917740; 538917734]
The list passed as parameter to id, hence to the C function identity, can be moved or reclaimed by the garbage collector. In the example, we forced a garbage collection, but any allocation in the Objective CAML heap could have triggered a garbage collection as well. The anonymous list passed to id was reclaimed by the garbage collector, because it is not reachable from the set of known roots. To avoid this, any C function that allocates anything in the Objective CAML heap must tell the garbage collector about the C function's parameters and local variables of type value. This is achieved by using the macros described next.

For parameters, these macros are used within the body of the C function as if they were additional declarations:
CAMLparam1(v) : for one parameter v of type value
CAMLparam2(v1,v2) : for two parameters
...   ...
CAMLparam5(v1,...,v5) : for five parameters
CAMLparam0 ; : required when there are no value parameters.
If the C function has more than five value parameters, the first five are declared with the CAMLparam5 macro, and the remaining parameters with the macros CAMLxparam1, ..., CAMLxparam5, used as many times as necessary to list all value parameters.

CAMLparam5(v1,...,v5);
CAMLxparam5(v6,...,v10);
CAMLxparam2(v11,v12); : for 12 parameters of type value

For local variables, these macros are used instead of normal C declarations of the variables. Local variables of type value must also be registered with the garbage collector, using the macros CAMLlocal1, ..., CAMLlocal5. An array of values is declared with CAMLlocalN(tbl,n) where n is the number of elements of the array tbl. Finally, to return from the C function, we must use the macro CAMLreturn instead of C's return construct.

Here is the corrected version of the previous example:

#include <caml/mlvalues.h>
#include <caml/memory.h>
value identity2 (value x)
{
CAMLparam1(x) ;
Garbage_collection_function() ;
CAMLreturn x;
}

# external id : 'a -> 'a = "identity2" ;;
external id : 'a -> 'a = "identity2"
# let a = id [1;2;3;4;5] ;;
val a : int list = [1; 2; 3; 4; 5]
We now obtain the expected result.

Calling an Objective CAML closure from C

To apply a closure (i.e. an Objective CAML function value) to one or several arguments from C, we can use the functions declared in the header file callback.h.
callback(f,v) : apply the closure f to the argument v,
callback2(f,v1,v2) : same, to two arguments,
callback3(f,v1,v2,v3) : same, to three arguments,
callbackN(f,n,tbl) : same, to n arguments stored in the array tbl.
All these functions return a value, which is the result of the application.

Registering Objective CAML functions with C

The callback functions require the Objective CAML function to be applied as a closure, that is, as a value that was passed as an argument to the C function. We can also register a closure from Objective CAML, giving it a name, then later refer to the closure by its name in a C function.

The function register from module Callback associates a name (of type string) with a closure or with any other Objective CAML value (of any type, that is, 'a). This closure or value can be recovered from C using the C function caml_named_value, which takes a character string as argument and returns a pointer to the closure or value associated with that name, if it exists, or the null pointer otherwise.

An example is in order:

# let plus x y = x + y ;;
val plus : int -> int -> int = <fun>
# Callback.register "plus3_ocaml" (plus 3);;
- : unit = ()
#include <caml/mlvalues.h>
#include <caml/memory.h>
#include <caml/callback.h>

value plus3_C (value v)
{
CAMLparam1(v);
CAMLlocal1(f);
f = *(caml_named_value("plus3_ocaml")) ;
CAMLreturn callback(f,v) ;
}

# external plusC : int -> int = "plus3_C" ;;
external plusC : int -> int = "plus3_C"
# plusC 1 ;;
- : int = 4
# Callback.register "plus3_ocaml" (plus 5);;
- : unit = ()
# plusC 1 ;;
- : int = 6
Do not confuse the declaration of a C function with external and the registration of an Objective CAML closure with the function register. In the former case, the declaration is static, the correspondence between the two names is established at link time. In the latter case, the binding is dynamic: the correspondence between the name and the closure is performed at run time. In particular, the name--closure binding can be modified dynamically by registering a different closure with the same name, thus modifying the behavior of C functions using that name.


Previous Contents Next