[Camllist] Big executables from ocamlopt; dynamic libraries again
Subject:  Hashing research (was Re: [Camllist] Big executables ...) 
From: georg.g@home.se >IMHO this a perfect research problem: > >Find a mapping H:S>B where S is the set of module signatures and >B is the set of binary (arbitrary length) strings. Such that if and only if >s_1 is a subset of s_2 then there is some relation between H(s_1) and >H(s_2), thus s_1<s_2 iff H(s_1) R H(s_2). > >Perhaps you could drop "and only if" and let H(s_1) R H(s_2) imply >s_1 < s_2 with 99.9...% certainty. I think you can't do it with constantsized hashes. For instance, if s_2 has 100 elements, then it has 2 ** 100 subsets. Since R has to behave correctly on most of those 2 ** 100 subsets, those subsets need to have almost 2 ** 100 different hashes, so your hash can't be less than 100 bits. You have to know the name for each entry point into the library anyway so you can do the linking. We could just have one hash for the type per entry point. Hmm; MD5 is only 16 bytes, or 32 bytes of hex, or 22 bytes of base 62 (digits plus upper and lower case letters), so maybe we just append the MD5 checksum to the end of the symbol. If that's too much and we're willing to have lessthancryptographic security we could truncate the added checksum to whatever number of bits is small enough and still have a very good chance of getting the right answer.  Tim Freeman tim@fungible.com  To unsubscribe, mail camllistrequest@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/camlbugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners