Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Hashtbl and destructive operations on keys
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jacques Garrigue <garrigue@k...>
Subject: Re: [Caml-list] Hashtbl and destructive operations on keys
From: Thomas Fischbacher <Thomas.Fischbacher@Physik.Uni-Muenchen.DE>

> What I rather want to do is roughly along the following lines:
> 
> 
> let nonconsing_accum f_combine f_copy ht h_key h_val =
>   if Hashtbl.mem ht h_key
>   then Hashtbl.replace ht h_key 
>       (f_combine (Hashtbl.find ht h_key) h_val)
>   else Hashtbl.add ht (f_copy h_key) h_val
> ;;
> 
> let result =
>   (
>    let ht = Hashtbl.create 10 in
>    let arr = Array.make 4 0 in
>    for i = 0 to 1000 do
>      for k = 0 to 4-1 do
>        arr.(k) <- Random.int 5;
>      done;
>      nonconsing_accum (+) Array.copy ht arr 1;
>    done;
>    Hashtbl.fold (fun k v so_far -> (k,v)::so_far) ht [];
>   )
> ;;
> 
> Former experience with CMU CL tells me that this technique of not 
> unnecessarily consing hash keys can easily lead to a performance gain of 
> a factor of 10 - depending on the application, of course.
> 
> But I have a very bad feeling about it as long as I do not have any 
> guarantee that I really can re-use the corpus of a hash key passed to 
> Hashtbl.replace.

If I understand correctly your intention, you are asking whether
Hashtbl.replace may end-up putting parts of your key inside the
hash table?
Looking at the source code for Hashtbl.replace, this may only happen
if an equivalent key was not already in the table, and since your
program only calls replace for keys already in the table, this is safe.

But you probably already know that, so what you really want is a
sentence like "the key itself is inserted in the hashtable only if
there was no element to replace, otherwise the original key is kept"
added to the documentation.
Note however that such a comment may be seen as an overspecification:
it is more confusing than useful for normal use...

Moreover, the behaviour for the Map module is different: there, the
new key is used. (This is for the add function, but it happens to have
the same semantics as Hashtbl.replace)
And this is also different from using Hashtbl.remove followed by
Hashtbl.add, which was supposed to have the same semantics as
Hashtbl.replace.

> Yes, one answer certainly is "just make the hash value a ref and work on 
> its contents". But there are indeed some crazy situations where one is 
> better off using the above technique.

I'm wondering what kind of crazy situation.
Looks to me that that using a ref would be more efficient:
  let nonconsing_accum f_combine f_copy ht h_key h_val =
    try
      let h_ref = Hashtbl.find ht h_key in
      h_ref := f_combine !h_ref h_val
    with Not_found ->
      Hashtbl.add ht (f_copy h_key) (ref h_val)
  ;;
In your approach your are accessing 3 times the hash table, where once
should be enough.

However one might argue that nonconsing_accum should be a primitive on
hashtables, as it can be implemented more efficiently by directly
accessing the internal representation.

Jacques Garrigue

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners