cyclic value construction ("let rec")

From: Benoit Deboursetty (debourse@pascal.enst.fr)
Date: Thu Mar 30 2000 - 22:12:33 MET DST

  • Next message: Jacques Garrigue: "Re: Semantic of label: The best (only ?) solution to merge both mode"

    Following the "let rec" thread, let me sum up a few of the problems which
    occur frequently during value creation, as this list showed it.

     * Currently, O'CaML does not allow the creation of arbitrarily big cycles
    in a non-mutable value. For instance, you cannot write a program that
    computes a cyclic list ln of size n
       l_n = 1 :: 2 :: 3 :: ... :: n :: l_n
    using only the built-in list type.

     * This is annoying because O'CaML handles immutable lists and trees so
    well sometimes you wish it could do the same with anything else.

     * I have seen three basic classes of solutions when you want to deal
    with values possibly showing cycles. Here I will assume that the values
    you're working on is an arbitrary graph and its nodes carry data .

        1) index each node of the value in an array and instead of pointing at
    a data, point at its index.

    type 'a graph = ('a * int list) array

    let circle n =
      Array.init n (fun i -> (i, [(i+1) mod n]))

    circle : int -> int graph

      Pros: by externally describing links, you are able to "see what you're
    doing". In particular, when you traverse the value, you can know in
    constant time whether you are on a data node that you have already seen.
    (you can mark where you have been in an array with the same indices). You
    need this whenever you want to implement something like "map" on the data
    carried by nodes of your graph. The majority of graph algorithms also
    require this "outsight" of the value.

      Cons: joining two such graphs is complex, thus this construction is only
    good for graphs you want to build at once. Arrays are mutable (with all
    the problems possibly induced by passing around mutable arguments);
    invalid values exist (for instance when they refer out-of-bound indices);
    access to neighbours is indirect (you first go to the index in the table).
    You don't have access to neighbours in patterns.

        2) use mutable fields, then build the data incrementally.

    type 'a node = { data: 'a; mutable next: 'a node list }
    let circle n =
      let temp = Array.init n (fun i -> { data = i; next = [] }) in
      for i = 0 to n - 1 do
        temp.(i).next <- [temp.((i+1) mod n)];
      done;
      temp.(0)

    circle : int -> int node

      Pros: resembles C and its "do-what-I'm-telling-you" coding style.
    Sometimes it's good. Sometimes you don't have the extra indirection caused
    by the index. Values are always valid. You have access to neighbours in
    patterns.

      Cons: resembles C and its "do-what-I'm-telling-you" coding style.
    Program-wide mutability is a design problem when you only need it at value
    building time. Also, sometimes, you are obliged to use an option for the
    type of your mutable field. In the end, you find yourself with a complex
    data structure with 'Some _'s everywhere and not a single 'None'. In those
    cases when after build you expect every option of a given field to be a
    Some, this represents another extra indirection (*) and the source of
    invalid values.

       3) use lazy evaluation for the successors of each node.

    type 'a node = { data: 'a ; next: 'a node list Lazy.t }

    let circle n =
      let links = Array.make n [] in
      let nodes = Array.init n
        (fun i -> { data = i; next = lazy links.(i) })
      in
      for i = 0 to n - 1 do
        links.(i) <- [nodes.((i+1) mod n)]
      done;
      (* just to show the resulting structure is finite *)
      Array.iter (fun x -> ignore (Lazy.force x.next)) nodes;
      nodes.(0)

      Pros: gives the same power as mutable fields and more, with a less
    imperative style. The only way of handling infinite structures. Depending
    on the situation, you may have access to neighbours in patterns, but not
    surely.

      Cons: correct graph building with lazy evaluation is not easy. Infinite
    recursion is a common pitfall; building an infinite value when you wanted
    it finite is another one. Also, with current O'CaML version, Lazy.t is not
    abstract, and is implemented as a reference, hence it is mutable. Finally,
    even if your were so careful as to keep your graph finite, you still have
    to use Lazy.force each time you want to access a node's neighbour even
    when you could have force'd everything in advance. You have no means of
    assessing that a value is finite, you cannot tell whether marshalling
    such a value will work.

    I must have omitted many other solutions which I don't know. From this
    build difficulty come many frustrations like "if only I could freeze all
    mutable fields of this value at once" or things like that. I am in no way
    criticizing O'CaML for being unable to handle things C, C++ and Java do
    with "null pointers"! Just trying to figure out what is at the basis of
    all these questions.

    I have this little hack that will allow you to freeze mutable fields, but
    it relies on a current exposed unsafeness with Marshalling. You can freeze
    a value with mutable fields by marshalling it into a string, then
    un-marshalling it back into a value of the same type with nonmutable
    fields... This makes basically a copy of a value and changes the type on
    the way.

    It would be interesting to know whether people have other workarounds.
    Also, a generic graph module which would implement "map", "iter" etc. in a
    cycle-safe way (like Marshal) would perhaps be interesting in the
    'user-contributed stdlib extension'...

    What do you think?

    Benoît de Boursetty.

    (*) I don't know how options are implemented, but at least at
    language-level this adds an indirection



    This archive was generated by hypermail 2b29 : Sun Apr 02 2000 - 23:37:43 MET DST