Version française
Home     About     Download     Resources     Contact us    
Browse thread
Maximum non-constant constructors
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jacques Garrigue <garrigue@m...>
Subject: Re: [Caml-list] Maximum non-constant constructors
From: Jon Harrop <jon@ffconsultancy.com>
> On Thursday 17 March 2005 14:00, Eric Cooper wrote:
> > For example, for all strings XXX, the variants `XXX and `zyctRecXXX
> > collide.
> > My guess is that none of them are likely to be chosen by
> > humans, but might occur in program-generated code.
> 
> Good point! I think this justifies the existence of a library function to 
> compare the hashes of string names of polymorphic variants. Does such a 
> function exist?

Here it is:

let hash_variant s =
  let accu = ref 0 in
  for i = 0 to String.length s - 1 do
    accu := 223 * !accu + Char.code s.[i]
  done;
  (* reduce to 31 bits *)
  accu := !accu land (1 lsl 31 - 1);
  (* make it signed for 64 bits architectures *)
  if !accu > 0x3FFFFFFF then !accu - (1 lsl 31) else !accu

It is also available in the C runtime (where it is useful for
interfacing) but if you really need it in your program just copy the
above code.

To give you an idea of the probability of conflict, here are the
conclicts I found in a 235882 word dictionary (/usr/share/dict/web2 on
FreeBSD)
Tag `Cretacic conflicts with `cornigerous
Tag `gedackt conflicts with `aquicolous
Tag `myeloencephalitis conflicts with `adequative
Tag `Nemorensian conflicts with `condor
Tag `nonrioter conflicts with `anematosis
Tag `omphaloma conflicts with `crimelessness
Tag `persecute conflicts with `paraenetic
Tag `soredioid conflicts with `genitorial
Tag `subpopulation conflicts with `oxyquinone
Tag `sympathy conflicts with `herbman
Tag `trophaeum conflicts with `prepossession
Tag `unbreaded conflicts with `neback
Tag `undragooned conflicts with `nuisance
Tag `unlistened conflicts with `disturb
Tag `veratrine conflicts with `absolutely
You will get different conflict if you change the capitalization, but
not more.

So much for the birthday paradox: variants fully use 31 bits, so that
the probability only gets more than 0.5 when you have more than 32000
constructors in the same type... Of course this is assuming a perfect
distribution, but as the above dictionary shows, reality is close to
that.

By the way, the check is not at link time, as was stated in another
message, but at compile time. It is the typing of variants itself that
guarantees that no such conflict will go undetected. So you will be
warned early enough.
The same hash function is also used for method names (with the same
compile-time check), but fortunately it is hard to imagine an object
with more than 10000 methods.

Jacques Garrigue