Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Big executables from ocamlopt; dynamic libraries again
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: tim@f...
Subject: Hashing research (was Re: [Caml-list] Big executables ...)
>IMHO this a perfect research problem:
>Find a mapping H:S->B where S is the set of module signatures and
>B is the set of binary (arbitrary length) strings. Such that if and only if
>s_1 is a subset of s_2 then there is some relation between H(s_1) and
>H(s_2), thus  s_1<s_2 iff H(s_1) R H(s_2).
>Perhaps you could drop "and only if" and let H(s_1) R H(s_2) imply
>s_1 < s_2 with 99.9...% certainty.

I think you can't do it with constant-sized hashes.  For instance, if
s_2 has 100 elements, then it has 2 ** 100 subsets.  Since R has to
behave correctly on most of those 2 ** 100 subsets, those subsets need
to have almost 2 ** 100 different hashes, so your hash can't be less
than 100 bits.

You have to know the name for each entry point into the library anyway
so you can do the linking.  We could just have one hash for the type
per entry point.  Hmm; MD5 is only 16 bytes, or 32 bytes of hex, or 22
bytes of base 62 (digits plus upper and lower case letters), so maybe
we just append the MD5 checksum to the end of the symbol.  If that's
too much and we're willing to have less-than-cryptographic security we
could truncate the added checksum to whatever number of bits is small
enough and still have a very good chance of getting the right answer.

Tim Freeman
To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: