implementation of set (standard library)

From: Markus Mottl (mottl@miss.wu-wien.ac.at)
Date: Tue Jan 26 1999 - 01:15:37 MET


From: Markus Mottl <mottl@miss.wu-wien.ac.at>
Message-Id: <199901260015.BAA17324@miss.wu-wien.ac.at>
Subject: implementation of set (standard library)
To: caml-list@inria.fr (OCAML)
Date: Tue, 26 Jan 1999 01:15:37 +0100 (MET)

Hello again,

I have taken a look at the implementation of sets in the standard library
and thought that the "add" function could be implemented differently
(possibly faster). Here the alternative code:

  exception Found

  let rec add2 x s =
    let rec add2' = function
      Empty -> Node(Empty, x, Empty, 1)
    | Node(l, v, r, _) as t ->
        let c = Ord.compare x v in
        if c = 0 then raise Found else
        if c < 0 then bal (add2' l) v r else bal l v (add2' r) in
    try add2' s with Found -> s

I have measured the performance of this function:

If about 5% (or more, probably less) of all "adds" meet an already
existing element in the set, this function is faster.

One should also note that the memory requirements are lower: there is
no copying of paths in the case of already existing elements.

This trick could also be applied to "add" in "Map".
Maybe there are other functions where it can be applied.

Two functions I would really like to see in "Set":

  exception Found

  let add_elt x s =
    let el = ref x in
    let rec add_elt' = function
        Empty -> Node(Empty, x, Empty, 1)
      | Node(l, v, r, _) as t ->
          let c = Ord.compare x v in
          if c = 0 then begin el := v; raise Found end else
          if c < 0 then bal (add_elt' l) v r
          else bal l v (add_elt' r)
    in try (add_elt' s, x) with Found -> (s, !el)

  let rec find x = function
      Empty -> raise Not_found
    | Node(l, v, r, _) ->
        let c = Ord.compare x v in
        if c = 0 then v else
        if c < 0 then find x l
        else find x r

The "alternative" add function returns a tuple of the set and the element.
Note: the element could be a (mutable) object. It could be possible that
one wants to insert an object into a set and send further messages to it.
But if the object already exists in the set (or better: the comparison
function tells me so), it would be fine to send the messages to the object
already in the set and to discard the object that would have been added.

At least the "find"-function in "set" would be fine. The reason is the
same as above: one could send messages to an object found in the set.

The only workaround I have at the moment is to use a map instead, where
key and value are the same object -> this allows me to use find. Probably
not very elegant...

A "last wish": an iteration function that respects the order of elements
in set and also map. At the moment there is just the function "elements"
(only in set), which allows to get the elements "in order". An ordered
insertion function would be very powerful, because one could emulate other
functions like e.g. "fold" with it (with mutable accumulators), too - but
the elements would be used in the computation in the appropriate order...

Best regards,
Markus

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:18 MET