Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Re: large hash tables
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-02-20 (05:18)
From: John Caml <camljohn42@g...>
Subject: Re: large hash tables
Thank you all for the assistance.

I've resolved the Stack_overflow problem by using an Array instead of
a Hashtbl; my keys were just consecutive integers, so this later
approach is clearly preferable.

However, the memory usage is still pretty takes nearly an
order of magnitude more memory than the equivalent C++ program. While
the C++ program required 800 MB, my ocaml program requires roughly 6
GB. Am I doing something very inefficiently? My revised code appears

Also, if you have any other coding suggestions I'd appreciate hearing
them. I'm a long-time coder but new to Ocaml and eager to learn.


exception SplitError

let loadWholeFile filename =
    let infile = open_in filename
    and movieMajor = Array.make 17770 [] in

    let rec loadLines count =
        let line = input_line infile in
        let murList = Pcre.split line in

        match murList with
            | m::u::r::[] ->
                let rFloat = float_of_string r
                and mInt = int_of_string m
                and uInt = int_of_string u in

                let newElement = (uInt, rFloat)
                and oldList = movieMajor.(mInt) in
                let newList = List.rev_append [newElement] oldList in
                Array.set movieMajor mInt newList;

                if (count mod 1000000) == 0 then begin
                    Printf.printf "count: %d\n" count;
                    flush stdout;

                    loadLines (count + 1)

            | _ -> raise SplitError

        loadLines 0
        End_of_file -> close_in infile;

let filename = Sys.argv.(1);;
let str = loadWholeFile filename;;