Browse thread
[Caml-list] Explaining bit sets
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | tim@f... |
| Subject: | Re: [Caml-list] Holding a set of random integers of very wide range? |
>For reasons I tried to explain in a previous draft of this e-mail and >has been cut due to the amount of space it took, I need to generate >multiple permuted lists of integers ranging from 0 to approximately >2^100 (or more, it's a bit open-ended unfortunately). > >Since I only need one value at a time, I can use a lazy list for >this. i.e., the head holds a value and the tail is a suspension of the >state I need to generate the next value. No big deal there. > >The big deal, for me anyways, is that the state I need is tracking the >integers that I've already used (so that I don't generate the same one >twice) given that the range of possible values is so large. If your integers have enough bits, then a list of random numbers is indistinguishable from a random permutation. This assumes you only have time to look at a reasonable number of values from the permutation. Do the math to figure out what "big enough" is. For example, in the past year I designed a scheme that assumed that two files were identical if and only if their 64-bit checksums were identical, and there were only a million or so files, so I did the math and concluded that the chances of a collision were small enough. It worked fine. If you're paranoid then use the MD5 checksums in the Digest module to generate random numbers; otherwise you can use a linear congruential pseudo-random number generator as mentioned by another poster and hope that there's no interesting interaction between the number generation scheme and the other details of your algorithm. >I'm currently using the very low-tech solution of maintaining a list >of ranges of used values. (i.e., when I use a value that would link 2 >different ranges, I go coalesce the list. The worst case would be if >I picked all the odd integers or all the even integers.) > >Given the recent talk of Patricia trees and bit sets, I was wondering >if some variant of either one of those, or some other representation >would be better for what I'm doing? (I figure at the very least I >ought to be using a tree of ranges of used values. I'm currently >using the Big_int module to represent my integers.) If your integers are large enough, then you can ignore the possiblity of a collision and no data structure is needed. Otherwise, is there a reason not to use a hash table? -- Tim Freeman tim@fungible.com GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D 7180 76DF FE00 34B1 5C78 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners