Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[ANN] Weaktbl: a weak hash table library
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-10-03 (10:51)
From: Zheng Li <li@p...>
Subject: Re: [ANN] Weaktbl: a weak hash table library


skaller <> writes:
>>  * Both keys and associated values are weakly stored. A binding exists until
>>    the key is no longer referenced anywhere
> I'm a bit confused what it means..
> If the key is weak, it is useless except for terms, i.e. variants
> or records of another data structure, where you are placing a pointer
> in the table as a key. You can't put values (int, float, etc) in the 
> key field, because they're immediately unreferenced.
What happen if you put a noncollectable value (e.g. int) into a weak data
structure (e.g. standard weak array/hashtable)? It won't disappear from the
data structure unless you manually remove it. 

It's the same here, if you use noncollectable data as the key, the key won't
disappear, hence the binding won't disappear (according to our principle -- key
decide whether the binding disappear). So at least it's consistent with the
standard Weak module. If you're using noncollectable value as key, you're
actually using Weaktbl as normal Hashtbl, that's still not bad at all.

> So you could use this thing as a cache, where for example you have
> a tree and calculate some property of each node, and cache it
> in the table.
> The problem is .. physical comparison in Ocaml is unstable,
> so you cannot hash pointers using polymorphic hash, 
> but pointers are the only kind that persist. 
> So I have to conclude you must use the Weaktbl with a functor,
> instantiating the hash function to something which does a computation
> on the value, and this claim:

Weaktbl provides the same interface as Hashtbl, i.e. a polymorphic one and a
functorial one. The functorial signature is exactly:

  module Make (H : Hashtbl.HashedType) : Hashtbl.S with type key = H.t

The polymorphic one is equivalent to having "equal x y = (compare x y) = 0" and
"hash = Hashtbl.hash", just as the Hashtbl's polymorphic interface did.

You don't have to (shouldn't, as in standard Hashtbl) use a hash function
related to physical property of data, just a normal hash (as the standard one)
based on structural property is enough.

Actually, it's the "equal" matters to the semantic. Here I'd like to clarify it
a bit, and ask for some *advices*:

- If physical equivalence (==) provided as "equal", there should be nothing
  ambiguous here, because every key is unique. A key disappears, all its
  bindings disappear. Querying with a key (e.g. using find/find_all/remove
  etc.), only those bindings added with this exact key will be given. The
  semantic is clear, every key is only related to binding added by itself ---
  because in each equivalent class there is just one value.

- If other equivalent relation provided as "equal", e.g. "let equal = (=)" or
  "let equal x y = (fst x) = (fst y)". No doubt, when queering with a key , all
  bindings being added by its equivalent class should be given --- that's what
  we define the equivalent relation for. However, on the other problem: when a
  binding should disappear, we have two kinds of semantics as candidate:

  * The first one is used in the current library: a binding is physically
    related to the key with which it was added with. I.e., each key is still
    physically unique, and its presence decide the aliveness of all the
    bindings added with it. Still, a key disappear, the bindings added by it
    disappear. This has nothing to do with the equivalent relationship ---
    equivalent class relation is only used for querying and manipulation. E.g.,
    if we define "equal = (=)" and  $ka and $kb are keys structurally
    equivalent but physically nonequivalent, the presence of bindings added by
    $ka and bindings added by $kb is still independently decided by the
    presence of $ka and $kb. But when doing find_all/find/remove/replace etc.,
    using no matter $ka or $kb should give you the same result.

  * Another possibility is: a binding will exist until all keys that have
    contributed to this equivalent class disappear, or say as far as there is
    at least one key that ever contributed to this class, all bindings of this
    class will remain; when the last key of this class disappears, the bindings
    belong to this class disappear all together.

  The rational behind the second approach is: people can sometimes mix up
  structurally equivalent but physically nonequivalent keys. E.g. they can use
  key $ka to add some binding, but *accidentally* switch to $ka's structural
  equivalence $kb afterwards to add more bindings. (That's normal, for example,
  you filter $ka through a function which doesn't return $ka itself but its
  copy among the result, and you don't insist in using the original $ka, but
  instead using the one in the output.) After a while,  $ka is probably no
  longer used and hence GCed sometime later. In the first approach, those
  bindings added with $ka will just disappear which sounds mysterious; in the
  second approach, all bindings remain as far as one of $ka or $kb is alive.

  So people may think the second approach is usually convenient, because it's
  more conservative and safe, and usually people don't care much about the more
  memory occupation as far as it won't leak overall. But the second approach
  doesn't fully solve its problem and can produce even more mysterious
  situation: since its rational is assume people can make mistakes, then
  think about the following situation: after you using $kb as the key for a
  while, you accidentally (again) switch to $kc, however you hasn't got any
  chance to use $kc to manipulate bindings yet (so the weaktbl knows nothing
  about the existence of $kc), you just hold it because you believe it's $k(b)
  (and $k(a)). After a while, $ka and $kb disappear, all the bindings are're still holding $kc.

I actually implemented both of them, the switch is just one line -- the first
approach use a weak stack inside, the second use a common stack. However I
don't want to provide both to confuse people. I'd like to hear more suggestions
to decide which one is desired.

Thanks in advance.

Zheng Li