This site is updated infrequently. For up-to-date information, please visit the new OCaml website at ocaml.org.

Canonical Set/Map datastructure?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
 Date: 2008-03-05 (17:27) From: Alain Frisch Subject: Re: [Caml-list] Canonical Set/Map datastructure?
```Berke Durak wrote:
> The Map and Set modules use AVL trees which are efficient but not
> canonical - a given
> set of elements can have more than one representation.  This means that
> you cannot use
> ad hoc comparison on sets and maps, and this is why they are presented
> as functors.
>
> Does anyone know if, in the many years that have passed since the
> implementation of
> those fine modules, someone has invented a (functional) datastructure
> that is as
> efficient while being canonic?

Well, Patricia trees have been around for many years and they satisfy
this property. They also allow set operations (union, intersection, ...)
in linear time (and I explain below how this can be optimized to
something which is really efficient for some applications).
Jean-Christophe FilliÃ¢tre has an implementation on its web page.

Patricia trees work fine when the set elements can easily be represented
as strings of bits. So if you can map your elements to integers, that's
ok. Otherwise, you can hash-cons your elements to get unique integers
for them.

Something that Jean-Christophe's implementation doesn't do but which is
quite easy to add is to use hash-consing on patricia trees themselves,
that is, to memoize their constructors in order to get unique physical
representation and maximal sharing. That way, you get:

structural equality = physical equality = set equality

With this property, set operations on patricia trees can be optimized
with reflexivity properties (e.g. the inner loop of the union function
can start by checking equality of its arguments).

Also, you get a nice unique integer for each tree. This allow you to
memoize efficiently set operations (like union, intersection, for which
you can use memoization in the inner loop, not only at toplevel), and to
build sets of sets (and so on).

-- Alain

```