Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Canonical Set/Map datastructure?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-03-05 (17:27)
From: Alain Frisch <alain@f...>
Subject: Re: [Caml-list] Canonical Set/Map datastructure?
Berke Durak wrote:
> The Map and Set modules use AVL trees which are efficient but not 
> canonical - a given
> set of elements can have more than one representation.  This means that 
> you cannot use
> ad hoc comparison on sets and maps, and this is why they are presented 
> as functors.
> Does anyone know if, in the many years that have passed since the 
> implementation of
> those fine modules, someone has invented a (functional) datastructure 
> that is as
> efficient while being canonic?

Well, Patricia trees have been around for many years and they satisfy 
this property. They also allow set operations (union, intersection, ...) 
in linear time (and I explain below how this can be optimized to 
something which is really efficient for some applications). 
Jean-Christophe Filliâtre has an implementation on its web page.

Patricia trees work fine when the set elements can easily be represented 
as strings of bits. So if you can map your elements to integers, that's 
ok. Otherwise, you can hash-cons your elements to get unique integers 
for them.

Something that Jean-Christophe's implementation doesn't do but which is 
quite easy to add is to use hash-consing on patricia trees themselves, 
that is, to memoize their constructors in order to get unique physical 
representation and maximal sharing. That way, you get:

  structural equality = physical equality = set equality

With this property, set operations on patricia trees can be optimized 
with reflexivity properties (e.g. the inner loop of the union function 
can start by checking equality of its arguments).

Also, you get a nice unique integer for each tree. This allow you to 
memoize efficiently set operations (like union, intersection, for which 
you can use memoization in the inner loop, not only at toplevel), and to 
build sets of sets (and so on).

-- Alain