Version française
Home     About     Download     Resources     Contact us    
Browse thread
[oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: oliver@f...
Subject: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
----- Forwarded message from oliver -----

To: Xavier Leroy <xavier.leroy@inria.fr>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote:
> > Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
> > list, I first sort it in O(n log n), then remove the repetitions in O(n).
> > Jean-Baptiste.
> 
> OK, fair enough :-)  The point I was trying to make (not very well, I
> agree) is that "list without repetition" or "sorted list without
> repetition" is often not the best data structure for the job.
[...]

NOT the datastructure FOR THE JOB BUT FOR THE RESULT!


So, ok, because people now are posting their code, I send
one of me. I had many different implementations, maybe
one ore the other is more suited to do the task.
The one that I invented some minutes ago looks like this:



let keys_of_hash hash =
     let keys = Hashtbl.create 10000 in (* stupid hard coded value *)
     let sample k v = Hashtbl.replace keys k true in
     Hashtbl.iter sample hash;
     Hashtbl.fold (fun k v init -> k::init) keys []


Because there is no Hashtbl.size in the standard lib,
which gives back the number of over-all bindings in
the hash, I had to insert some stupid value.
(I was to lazy to write it by myself. It also
 does not make sense to walk through the whole
 hash to get the Hashtbl.size and then to allocate
 the keys-.Hash and then To iterate and then to fold...
 ...there will be more efficient solutions than that...
 btw: I don't know how inefficient it would be to use
 Hashtbl.create 1  and then let grow the hash itself...)


OK, it's possible to calculate Hashtbl.size with
Hastbl.iter, but when Hashtbl.size would be part
of the standard Hashtbl-lib, this would make sense.

I hope we can agree on that Hashtbl.size would return
an int?! ;-)

Hashtbl.create needs an int for the initial size, and
doing things like the above stuff, would be more
consistently when there would be a Hashtbl.size
inside the standard Hastbl-module.
a) The return type is clear (isn't it?!)
b) Estimating initial size of hashtables can be predicted in
   similar cases like the above one.

But nevertheless I insist in Hashtbl.keys as a very useful
function.
You then can use Hastbl.find as well as Hashtbl.find_all
for retrieval of all contents, depending on your needs.

Why I think Hashtbl.keys makes more sense than Hashtbl.values?
Because it reflects the sense of a Hashtable, to have direct
access to the values VIA THE KEYS.
And that also is the reason, why I think Hastbl.keys should
necessarily be part of the lib.

Also I think it will be (much) more efficient to do it inside the
Hashtbl-lib instead of doing it with such outside-approaches.
The direct way inside the lib seems to make more sense to me here.


Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners