Alternative generic hash function

From: Manuel Fahndrich (maf@microsoft.com)
Date: Thu May 25 2000 - 00:04:11 MET DST

  • Next message: Pierre Weis: "Re: Alternative generic hash function"

    The generic hash function Hashtbl.hash traverses into mutable data
    structures. So I get different hash values depending on updates of the data
    structure.

    # let x = ref None;;
    val x : '_a option ref = {contents=None}
    # Hashtbl.hash x;;
    - : int = 0
    # x := Some 5;;
    - : unit = ()
    # Hashtbl.hash x;;
    - : int = 5

    As was pointed out times before on this list, Hashtbl.hash is thus
    unsuitable for hashing on data structures that contain mutable data.

    I'm wondering what the design rational is for this choice. I'd rather have a
    generic hash function (Hashtbl.hash_pure) that never looks into mutable
    structures (skips refs and mutable fields of records and classes).
    The reason being that if I want to hash on data that contains references,
    I'm more likely to want to find the structure independent of the contents of
    any mutable subparts.

    So yes, a hashtable containing int refs would pointless, because it maps
    them all to the same bucket. But I frequently use hashing with equality
    being ==. Whenever I do this, I need to actually add some unique identifier
    to the data structure just for the hashing operation. Having an alternate
    hash function (Hashtbl.hash_pure) would make such id's (and their managment)
    unnecessary.

    I see how the current working of Hashtbl.hash is useful for data structures
    that contain mutable parts in the case where such structures don't change
    between the time they are stored and later looked up in the hash table. But
    I'm wondering how often this actually arises.

    Furthermore, using Hashtbl.hash_pure in place of Hashtbl.hash would not
    break any old code (except for performance), since all it does is make the
    equivalence classes bigger. On the other hand, using Hashtbl.hash with the
    misconception that it acts as Hashtbl.hash_pure breaks lots of code.

    I rest my case.

            Manuel Fahndrich



    This archive was generated by hypermail 2b29 : Thu May 25 2000 - 09:07:03 MET DST