Pointers exist in Caml, and in fact they spread all over the place.
They are used either implicitly (in the most cases), or explicitly
(in the rare occasions where implicit pointers are not more handy).
The vast majority of pointers usages that are found in usual
programming languages simply disappear in Caml, or more exactly, those
pointers are totally automatically handled by the compiler and the
Caml programmer can safely just ignore their existence, focusing on
the semantics of his program.
For instance lists or trees are defined without explicit pointers
(we simply use a concrete data type definition). The underlying
implementation indeed uses pointers, but this is transparent to the
programmer since pointers handling is done by the compiler.
In the rare occasions where explicit pointers are needed (the most common case is when translating in Caml an algorithm described in a classic imperative language), Caml provides references that are full-fledged pointers, even first class citizen pointers (references can be passed as argument, embedded into arbitrary data structures, and returned as function results).
ref
You can program directly with explicit references if you want to, but this is normally a vast of time and effort.
Let's examine the simple example of linked lists (integer lists to be simple). This data type is defined in C (or in Pascal) using explicit pointers, for instance:
/* Cells and lists type */ struct cell { int hd; struct cell *tl; }; typedef struct cell cell, *list;
We can translate this in Caml, using a sum type definition, without pointers:
type list = | Nil | Cons of int * list;;
Cell lists are thus represented as pairs, and the recursive
structure of lists is evident, with the two alternatives: empty list
(the Nil
constructor) and non empty list (the
Cons
constructor).
Automatic management of pointers and automatic memory allocation
shine when allocating list values:
one just writes Cons (x, l)
to add
x
in front of the list l
. In C, you need to
write this function, to allocate a new cell and then fill its
fields. For instance:
/* The empty list */ #define nil NULL /* The constructor of lists */ list cons (element x, list l) { list result; result = (list) malloc (sizeof (cellule)); result -> hd = x; result -> tl = l; return (result); }
We thus see that fields of list cells in the C program have to be mutable, otherwise initialization is impossible. By contrast in Caml, allocation and initialization are merged into a single basic operation: constructor application. This way, immutable data structures are definable (those data types are often referred to as ``pure'' or ``functional'' data structures). If physical modifications are necessary for other reasons than mere initialization, Caml provides records with mutable fields. For instance, a list type defining lists whose elements can be in place modified could be written:
type list = | Nil | Cons of cell and cell = {mutable hd : int; tl : list};;
If the structure of the list itself must also be modified (cells must be
physically removed from the list), the tl
field would also
be declared as mutable:
type list = | Nil | Cons of cell and cell = {mutable hd : int; mutable tl : list};;
Physical assignments are still useless to allocate mutable data:
you write Cons {hd = 1; tl = l}
to add 1
to
the list l
. Physical assignments that remain in Caml programs
should be just those assignments that are mandatory to implement the
algorithm at hand.
Very often, pointers are used to implement physical modifications of data structures. In Caml programs, this means using vectors or mutable fields in records. For this kind of use of pointers, the Pascal's instruction:
x^.label := val
(where x
is a value of a
record having a label
field)
corresponds to the Caml construct
x.label <- val
((where x
is a value of a
record having a label
mutable field)
^
symbol simply disappears, since
dereferencing is automatically handled by the Caml compiler.
In conclusion:
You can use explicit pointers in Caml, exactly as in Pascal or C, but
this is not natural, since you get back the usual drawbacks and
difficulties of explicit pointers manipulation of classical
algorithmic languages. See a more complete example below.
The general pointer type can be defined using the semantics definition of an explicit pointer: an explicit pointer is either null, or a pointer to an assignable memory location:
type 'a pointer = Null | Pointer of 'a ref;;
Basic operations of dereferencing and assignment for explicit
pointers are easy to define. Dereferencing corresponds to reading the
pointer's designated values and assignment means writing to the
pointer's designated memory location.
In order to use notations reminiscent the traditional Pascal
usage, we define dereferencing as a prefix operator named
!^
, and assignment as the infix operator
^:=
.
let ( !^ ) = function | Null -> invalid_arg "Attempt to dereference the null pointer" | Pointer r -> !r;; val ( !^ ) : 'a pointer -> 'a = <fun> let ( ^:= ) p v = match p with | Null -> invalid_arg "Attempt to assign the null pointer" | Pointer r -> r := v;; val ( ^:= ) : 'a pointer -> 'a -> unit = <fun>
We also have to define the allocation of a new (fresh) pointer, initially pointing to a given value:
let new_pointer x = Pointer (ref x);; val new_pointer : 'a -> 'a pointer = <fun>
For instance, let's define and then assign a pointer to an integer:
#let p = new_pointer 0;; val p : int pointer = Pointer (ref 0) #p ^:= 1;; - : unit = () #!^p;; - : int = 1
The definition of lists with explicit pointers involves the
recursive definition of two types: the cell
type that
describes the contents of lists' cells and the ilist
type
that describes lists as explicit pointers to the initial cell of the
designated list (this is just mimicking the classical encoding of
lists in usual imperative languages):
(* The list type ``à la Pascal'' *) type ilist = cell pointer and cell = {mutable hd : int; mutable tl : ilist};;
We also have to define allocation of a new cell, the list constructor, and its associated destructors.
let new_cell () = {hd = 0; tl = Null};; val new_cell : unit -> cell = <fun> let cons x l = let c = new_cell () in c.hd <- x; c.tl <- l; (new_pointer c : ilist);; val cons : int -> ilist -> ilist = <fun> let hd (l : ilist) = !^l.hd;; val hd : ilist -> int = <fun> let tl (l : ilist) = !^l.tl;; val tl : ilist -> ilist = <fun>
We can now write all kind of classical algorithms, based on pointers manipulation, with their associated loops, their unwanted sharing problems and their null pointer errors. For instance, list concatenation, as often described in the literature on algorithms, physically modifies its first list argument, hooking the second list to the end of the first:
(* Physical append *) let append (l1 : ilist) (l2 : ilist) = let temp = ref l1 in while tl !temp <> Null do temp := tl !temp done; !^ !temp.tl <- l2;; val append : ilist -> ilist -> unit = <fun>
(* An example: *) let l1 = cons 1 (cons 2 Null);; val l1 : ilist = Pointer (ref {hd = 1; tl = Pointer (ref {hd = 2; tl = Null})}) let l2 = cons 3 Null;; val l2 : ilist = Pointer (ref {hd = 3; tl = Null}) append l1 l2;; - : unit = ()
The lists l1
and l2
are effectively catenated:
l1;; - : ilist = Pointer (ref {hd = 1; tl = Pointer (ref {hd = 2; tl = Pointer (ref {hd = 3; tl = Null})})})
Don't miss this typical (and unexpected) side effect of physical
list concatenation: the list l1
now contains the
concatenation of the two lists l1
and l2
,
thus the list l1
no longer exists: in some sense,
append
consumes or destroys its first
argument. In other words, the value of a list data now can depend on
its history, meaning that a list value can be a function of the entire
sequence of function calls that have used this list value since its
initial creation. This strange behavior can be powerful, but it also
leads to a lot of difficulties. Try for instance, the seemingly
harmless:
append l1 l1;; - : unit = ()
Then evaluate l1
:
l1;;
In conclusion: general explicit pointers manipulation is possible and easy in Caml. However, this programming style should be reserved to peculiar situation where the more natural functional data structures cannot be used. Since explicit pointers manipulation is always difficult and error prone, it should be considered as a kind of black magic that is devoted to brave soul experts or masochists. You have been warned!
To go beyond the Pascal type system, we define polymorphic lists using explicit pointers; here is a simple implementation of those polymorphic mutable lists:
type 'a list = 'a cell pointer and 'a cell = {mutable hd : 'a pointer; mutable tl : 'a list};; let new_cell () = {hd = Null; tl = Null};; let cons x l = let c = new_cell () in c.hd <- new_pointer x; c.tl <- l; (new_pointer c : 'a lists);; let hd (l : 'a lists) = !^l.hd;; let tl (l : 'a lists) = !^l.tl;; let append (l1 : 'a lists) (l2 : 'a lists) = let temp = ref l1 in while tl !temp <> Null do temp := tl !temp done; !^ !temp.tl <- l2;;
Contact the author Pierre.Weis@inria.fr