English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Canonical Set/Map datastructure?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-03-07 (10:09)
From: Berke Durak <berke.durak@e...>
Subject: Re: [Caml-list] Canonical Set/Map datastructure?
Harrison, John R a écrit :
> Hi Berke,
> | However, the idea of combining hash-consing and Patricia trees,
> | however elegant, does not suit my problem.  Basically, you are
> | cheating by using an auxiliary data structure, the hashtable (which is
> | also O(n^2) worst-case).
> The code I pointed you to uses hashes, but not hash tables. As for the
> worst-case efficiency, what you say is true, but it's unlikely to
> matter in practice. And I suspect it may be unavoidable in principle,
> which is the interesting thing I learned last time this was discussed.
> See for example the following paper and its references:

Hi John,

Thanks for your code.  I'm sorry I thought you maintained a separate hash table.
It's very interesting code and I'll give it a try.

But it seems to have two properties which make it unsuitable for
the particular application I had in mind:

- The theoretical worst-case performance of hash-based data structures
can be attained if the hash function has bad dispersal.  As the code
relies on Hashtbl.hash, which does not hash deeply, this could
potentially be a proble, in particular if your keys have long "common
prefixes" that are not distinguished by the hash function.  It might
work well in practice but I feel a little uncomfortable.

- Also, it does not preserve the natural order for keys, and that
is particularly bad, because I often use, for instance, float-indexed
maps or sets as a substitute for heaps.

The paper with the inverse cubic lower bound is *very* interesting; we don't
see plausible lower bounds often in complexity theory, especially with such
large assumptions (just bounded out-degree for the graph nodes).

So it seems there is little hope to have a drop-in replacement for Set
or Map that is compatible with the natural order of things, a.k.a.,