Browse thread
range of hash function
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2010-02-20 (12:39) |
From: | ygrek <ygrekheretix@g...> |
Subject: | Re: [Caml-list] range of hash function |
On Sat, 20 Feb 2010 11:52:05 +0100 Goswin von Brederlow <goswin-v-b@web.de> wrote: > Grégoire Seux <kamaradclimber@gmail.com> writes: > > > hello ! > > > > i would like to use the polymorhpic hash function on strings. But i would like > > to know what is the probability of a collision between two hashes. > > > > my first question is about the range of the Hashtbl.hash function: what is its > > range ? ( string -> [1..N] ?) > > It is range [min_int..max_int]. Meaning 31 or 63 bits. > > > the second question is : can i assume that the result is a uniform distribution > > over [1..N] ? (for 10? words which is an estimation of the english vocabulary > > size) The best way is to test yourself. Or rely on someone else's tests :) http://alaska-kamtchatka.blogspot.com/2009/06/hash-n-mix.html > You can read the source, analyze the math and try to figure out a > collision. From past experience every widely used hash function so far > has been shown to be vulnerable. Like md5 is now considered compromised > due to the ease of creating a collision. Sha1 is deemed unsafe too. It > is just a matter of how much brainpower and cpu time you want to throw > at it. Or pure bad luck. Hashtbl.hash function is not required to be cryptographically strong, but rather uniformly random on "usual" input data and fast. -- ygrek http://ygrek.org.ua