This site is updated infrequently. For up-to-date information, please visit the new OCaml website at ocaml.org.

[Caml-list] Efficient and canonical set representation?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
 Date: 2003-11-06 (17:03) From: Brian Hurt Subject: Re: [Caml-list] Efficient and canonical set representation?
```On Thu, 6 Nov 2003, Harrison, John R wrote:

> Does anyone know a representation of finite sets over an orderable polymorphic type
> that's (1) efficient and (2) canonical? Even better would be a CAML or OCaml
> implementation. More precisely I'm looking for:
>
>   1. Log-time lookup and insertion, and linear-time union, intersection etc.
>
>   2. Equal sets are represented by the same object.

Two is the tricky one to implement.  Imagine a case where I have set A
with it's elements, and set B with all the elements less one of set A, but
inserted in a different order.  B is a different object than A (the two
sets are not equal).  Now you add that one last element from A, you want
the insert routine to return A.  This means that the insert routine has to
know that A exists, and has to compare the new B to A to determine that it
should return A and not B.  It can be done but it's not trivial.

Games with structure definitions don't help, because Ocaml will happily
allocate different structures with the same data (this is why 1. == 1. is
false).  With a balanced tree structure you can implement the naive
equality comparison in linear time (the sequence i/2^i converges, allowing
you enumerate the elements in linear time).  If you need faster (average)
compares, there are a number of short cuts you can do.  For example, you
can keep the number of elements currently in the set handy, and if the
number of elements don't match, obviously the sets won't be equal.
Fancier, you can also keep a hash of all elements in the set- the hashs
aren't equal, you can gaurentee the sets aren't equal.  Be carefull with