Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Re: large hash tables
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-02-20 (08:37)
From: David Allsopp <dra-news@m...>
Subject: RE: [Caml-list] Re: large hash tables
One further comment on the code below, as you've said that you're a
long-time C++ coder is:

> if (count mod 1000000) == 0 then begin

I think you probably mean to use a single = here: although the semantics for
(=) and (==) are the same for ints, I reckon you may have used == as the
C/C++ equality operator. The section "Comparisons" in module Pervasives
explains the difference between the = and == operators in OCaml.

Additionally, there is syntactic sugar for Array.set allowing you write:

> Array.set movieMajor mInt newList;


movieMajor.(mInt) <- newList;

IMHO, the Stack_overflow you saw with the Hashtbl implementation  in the
StdLib (assuming it really is there) is a bug and should be reported - if
OCaml's got enough memory left to fill the table, then it shouldn't crash
resizing, even it means that the resizing is potentially slower (a slower
version could always be used if the buckets are detected to be huge or in
response to a caught Stack_overflow - though that could prove tricky).


-----Original Message-----
[] On Behalf Of John Caml
Sent: 20 February 2008 06:19
Subject: [Caml-list] Re: large hash tables

Thank you all for the assistance.

I've resolved the Stack_overflow problem by using an Array instead of
a Hashtbl; my keys were just consecutive integers, so this later
approach is clearly preferable.

However, the memory usage is still pretty takes nearly an
order of magnitude more memory than the equivalent C++ program. While
the C++ program required 800 MB, my ocaml program requires roughly 6
GB. Am I doing something very inefficiently? My revised code appears

Also, if you have any other coding suggestions I'd appreciate hearing
them. I'm a long-time coder but new to Ocaml and eager to learn.


exception SplitError

let loadWholeFile filename =
    let infile = open_in filename
    and movieMajor = Array.make 17770 [] in

    let rec loadLines count =
        let line = input_line infile in
        let murList = Pcre.split line in

        match murList with
            | m::u::r::[] ->
                let rFloat = float_of_string r
                and mInt = int_of_string m
                and uInt = int_of_string u in

                let newElement = (uInt, rFloat)
                and oldList = movieMajor.(mInt) in
                let newList = List.rev_append [newElement] oldList in
                Array.set movieMajor mInt newList;

                if (count mod 1000000) == 0 then begin
                    Printf.printf "count: %d\n" count;
                    flush stdout;

                    loadLines (count + 1)

            | _ -> raise SplitError

        loadLines 0
        End_of_file -> close_in infile;

let filename = Sys.argv.(1);;
let str = loadWholeFile filename;;

Caml-list mailing list. Subscription management:
Beginner's list:
Bug reports: