On Fri, 23 Apr 2004, Xavier Leroy wrote:
> > I think a good addition to the Hashtbl-module
> > would be a function, that gives back a list of keys
> > that are in the hash.
>
> With your specification (no repetitions in the list), that function
> would run in quadratic time, which is a sure sign that lists aren't
> the right data structure here. (More generally speaking, "lists
> without repetitions" is almost always the wrong data structure.)
No, I think creating such a list would take O(n log n) time.
OK, we're starting with a hash table. That means we have a set of
buckets, each bucket is a set of key/data pairs. Assume the same key can
be inserted multiple times (can it?)- in this case, all duplicate keys
should be in the same bucket. So, for each bucket, I sort all entries in
the bucket by key (worst case I only have one bucket and sorting is O(n
log n)). Once sorted, I go throught and eliminate duplicates, which is
now an O(n) algorithm:
let uniq lst =
let rec loop accum = function
| [] -> List.rev accum
| x :: [] -> List.rev (x :: accum)
| x :: y :: t ->
if (x = y) then
loop accum (x :: t)
else
loop (x :: accum) (y :: t)
in
loop [] lst
;;
(You can do it more efficiently that this, but this gets the idea across)
Viola- uniqueness in subquadratic time. And in practice, approaching
linear time- hashtables with lots of elements in a single bucket are
computationally expensive, so you're likely to be sorting a whole bunch of
short (1 and 2 element) lists.
