Version française
Home     About     Download     Resources     Contact us    
Browse thread
Alternative generic hash function
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Dmitri Lomov <dsl@t...>
Subject: Re: Alternative generic hash function
AFA I understand, Hastable.hash_pure will be hard to implement in run-time,
as now at run-time it is not known whether given data structure, or 
given data structure field is mutable or not.
Such information is not propagated to run-time by ocaml compiler.

Possibly your option here is to divide your data structure into mutable
and immutable parts, and than use Hashtable.hash on immutable part.
Then all will work well.

This, however, does not cover the example you present. 
But actually the nature of hashing assumes the hashed object has
some immutable part. For example, when hashing strings you do not assume 
to get the same hash value for mutable string if you change some character of it.
In your example everything in an object changes - it does not have any 
immutable properties/elements/fields to "hook" hash to.

By the way, a rough hashing function for all O'Caml objects can be based
on block length and/or tag (they are accesible via Obj module).
Such hash function will tend to be very biased, due to biased
distribution of lengths and tags, but in very bad situations it 
may help.

I use such function in my (forthcoming :-)) dynamic code generation library 
for O'Caml to keep external references from code block. 

Regards,
Dmitri

Manuel Fahndrich wrote:
> 
> The generic hash function Hashtbl.hash traverses into mutable data
> structures. So I get different hash values depending on updates of the data
> structure.
> 
> # let x = ref None;;
> val x : '_a option ref = {contents=None}
> # Hashtbl.hash x;;
> - : int = 0
> # x := Some 5;;
> - : unit = ()
> # Hashtbl.hash x;;
> - : int = 5
> 
> As was pointed out times before on this list, Hashtbl.hash is thus
> unsuitable for hashing on data structures that contain mutable data.
> 
> I'm wondering what the design rational is for this choice. I'd rather have a
> generic hash function (Hashtbl.hash_pure) that never looks into mutable
> structures (skips refs and mutable fields of records and classes).
> The reason being that if I want to hash on data that contains references,
> I'm more likely to want to find the structure independent of the contents of
> any mutable subparts.
> 
> So yes, a hashtable containing int refs would pointless, because it maps
> them all to the same bucket. But I frequently use hashing with equality
> being ==. Whenever I do this, I need to actually add some unique identifier
> to the data structure just for the hashing operation. Having an alternate
> hash function (Hashtbl.hash_pure) would make such id's (and their managment)
> unnecessary.
> 
> I see how the current working of Hashtbl.hash is useful for data structures
> that contain mutable parts in the case where such structures don't change
> between the time they are stored and later looked up in the hash table. But
> I'm wondering how often this actually arises.
> 
> Furthermore, using Hashtbl.hash_pure in place of Hashtbl.hash would not
> break any old code (except for performance), since all it does is make the
> equivalence classes bigger. On the other hand, using Hashtbl.hash with the
> misconception that it acts as Hashtbl.hash_pure breaks lots of code.
> 
> I rest my case.
> 
>         Manuel Fahndrich
 
_________________________________________________________________
Dmitri S. Lomov
mailto:dsl@tepkom.ru    ICQ#: 20524819 (Rusty)
http://users.tepkom.ru/dsl, http://young.tepkom.ru
ftp://young.tepkom.ru
+7 (812) 428-46-57 (b)   +7 (812) 295-94-15 (h)