Browse thread
large hash tables
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Berke Durak <berke.durak@e...> |
| Subject: | Re: [Caml-list] large hash tables |
Francois Rouaix a écrit : > In the resizing code there is a non-tailrec function (insert_bucket). > This is most likely causing the stack overflow, as I can't see any other > non tail recursive function at first glance. Looks like it's not tail > rec in order to maintain an invariant on the order of elements. If that > invariant is not useful to you, you might want to write a slightly > different version of the Hashtbl module, where insert_bucket would be > tail rec. > Also, during resizing, memory usage will be twice the memory required > for the table (roughly), since the bucket array remains available until > the resize is completed, so all the bucket contents exist in two > versions (old and new). You might want to stick to a large initial size > and do not attempt resizing. In that casse a quick hack could also be to slightly randomize the key, as in let digest_bytes = 5 let keyify u = (String.substring (Digest.string u) 0 digest_bytes, u) > let read_whole_chan chan = > let movieMajor = Hashtbl.create 777777 in > > let rec loadLines count = > let line = input_line chan in > let murList = Pcre.split line in > match murList with > | m::u::r::[] -> > let rFloat = float_of_string r in > Hashtbl.add (keyify movieMajor) m (u, rFloat); > if (count mod 10000) == 0 then Printf.printf "count: > %d, m: %s, u: %s, r: %f \n" count m u rFloat; > loadLines (count + 1) > | _ -> raise SplitError > in > > try > loadLines 0 > with > End_of_file -> () > ;; -- Berke DURAK