This site is updated infrequently. For up-to-date information, please visit the new OCaml website at ocaml.org.

cyclic value construction ("let rec")
• Benoit Deboursetty
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
 Date: -- (:) From: Benoit Deboursetty Subject: cyclic value construction ("let rec")
```Following the "let rec" thread, let me sum up a few of the problems which
occur frequently during value creation, as this list showed it.

* Currently, O'CaML does not allow the creation of arbitrarily big cycles
in a non-mutable value. For instance, you cannot write a program that
computes a cyclic list ln of size n
l_n = 1 :: 2 :: 3 :: ... :: n :: l_n
using only the built-in list type.

* This is annoying because O'CaML handles immutable lists and trees so
well sometimes you wish it could do the same with anything else.

* I have seen three basic classes of solutions when you want to deal
with values possibly showing cycles. Here I will assume that the values
you're working on is an arbitrary graph and its nodes carry data .

1) index each node of the value in an array and instead of pointing at
a data, point at its index.

type 'a graph = ('a * int list) array

let circle n =
Array.init n (fun i -> (i, [(i+1) mod n]))

circle : int -> int graph

Pros: by externally describing links, you are able to "see what you're
doing". In particular, when you traverse the value, you can know in
constant time whether you are on a data node that you have already seen.
(you can mark where you have been in an array with the same indices). You
need this whenever you want to implement something like "map" on the data
carried by nodes of your graph. The majority of graph algorithms also
require this "outsight" of the value.

Cons: joining two such graphs is complex, thus this construction is only
good for graphs you want to build at once. Arrays are mutable (with all
the problems possibly induced by passing around mutable arguments);
invalid values exist (for instance when they refer out-of-bound indices);
access to neighbours is indirect (you first go to the index in the table).

2) use mutable fields, then build the data incrementally.

type 'a node = { data: 'a; mutable next: 'a node list }
let circle n =
let temp = Array.init n (fun i -> { data = i; next = [] }) in
for i = 0 to n - 1 do
temp.(i).next <- [temp.((i+1) mod n)];
done;
temp.(0)

circle : int -> int node

Pros: resembles C and its "do-what-I'm-telling-you" coding style.
Sometimes it's good. Sometimes you don't have the extra indirection caused
by the index. Values are always valid. You have access to neighbours in
patterns.

Cons: resembles C and its "do-what-I'm-telling-you" coding style.
Program-wide mutability is a design problem when you only need it at value
building time. Also, sometimes, you are obliged to use an option for the
type of your mutable field. In the end, you find yourself with a complex
data structure with 'Some _'s everywhere and not a single 'None'. In those
cases when after build you expect every option of a given field to be a
Some, this represents another extra indirection (*) and the source of
invalid values.

3) use lazy evaluation for the successors of each node.

type 'a node = { data: 'a ; next: 'a node list Lazy.t }

let circle n =
let links = Array.make n [] in
let nodes = Array.init n
(fun i -> { data = i; next = lazy links.(i) })
in
for i = 0 to n - 1 do
done;
(* just to show the resulting structure is finite *)
Array.iter (fun x -> ignore (Lazy.force x.next)) nodes;
nodes.(0)

Pros: gives the same power as mutable fields and more, with a less
imperative style. The only way of handling infinite structures. Depending
on the situation, you may have access to neighbours in patterns, but not
surely.

Cons: correct graph building with lazy evaluation is not easy. Infinite
recursion is a common pitfall; building an infinite value when you wanted
it finite is another one. Also, with current O'CaML version, Lazy.t is not
abstract, and is implemented as a reference, hence it is mutable. Finally,
even if your were so careful as to keep your graph finite, you still have
to use Lazy.force each time you want to access a node's neighbour even
when you could have force'd everything in advance. You have no means of
assessing that a value is finite, you cannot tell whether marshalling
such a value will work.

I must have omitted many other solutions which I don't know. From this
build difficulty come many frustrations like "if only I could freeze all
mutable fields of this value at once" or things like that. I am in no way
criticizing O'CaML for being unable to handle things C, C++ and Java do
with "null pointers"! Just trying to figure out what is at the basis of
all these questions.

I have this little hack that will allow you to freeze mutable fields, but
it relies on a current exposed unsafeness with Marshalling. You can freeze
a value with mutable fields by marshalling it into a string, then
un-marshalling it back into a value of the same type with nonmutable
fields... This makes basically a copy of a value and changes the type on
the way.

It would be interesting to know whether people have other workarounds.
Also, a generic graph module which would implement "map", "iter" etc. in a
cycle-safe way (like Marshal) would perhaps be interesting in the
'user-contributed stdlib extension'...

What do you think?

Benoît de Boursetty.

(*) I don't know how options are implemented, but at least at