Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
large hash tables
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-02-22 (14:19)
From: Brian Hurt <bhurt@j...>
Subject: Re: [Caml-list] large hash tables
John Caml wrote:

>The equivalent C++ program uses 874 MB of memory in total. Each of the
>1 million records is stored in a vector using 1 single-precision float
>and 1 int. Indeed, my machine is AMD64 so Ocaml int's are presumably 8
C int's on AMD64 are still 4 bytes- longs are 8 bytes.  You can prove 
this by compiling a quick program:
#include <stdio.h>
int main(void) {
    printf("Ints are %lu bytes long.\n", (unsigned long) sizeof(int));
    return 0;

>I've rewritten my Ocaml program again, this time using Bigarray. Its
>memory usage is now the same as under C++, so that's good news.
>However, my program is quite ugly now, and it's actually more than
>twice as long as my C++ program. Any suggestions for simplifying this
>program? The way I initialize the "movieMajor" Array seems especially
>wonky, but I couldn't figure out a better way.
It's generally a good idea to back off and think about what problem 
you're trying to solve.

Where Ocaml generally wins on memory utilization is using immutable data 
structures and sharing data, instead of copying them.  This is where a 
lot of decisions Ocaml made on how to represent things suddenly make a 
lot of sense, if you think in terms of data sharing.  And in lots of 
complicated "real" code, the memory gains made by sharing are huge 
compared to the losses not incurred by not copying.