Browse thread
Re: large hash tables
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Damien Doligez <damien.doligez@i...> |
| Subject: | Re: [Caml-list] Re: large hash tables |
On 2008-02-20, at 06:18, John Caml wrote: > However, the memory usage is still pretty bad...it takes nearly an > order of magnitude more memory than the equivalent C++ program. While > the C++ program required 800 MB, my ocaml program requires roughly 6 > GB. Am I doing something very inefficiently? My revised code appears > below. In order to optimize for space, you need to understand how OCaml represents things in memory. > match murList with > | m::u::r::[] -> > let rFloat = float_of_string r > and mInt = int_of_string m > and uInt = int_of_string u in > > let newElement = (uInt, rFloat) > and oldList = movieMajor.(mInt) in > let newList = List.rev_append [newElement] oldList in > Array.set movieMajor mInt newList; Here, you have an array of lists of pairs. The space used by the array is 1 + length + size of contents. The space used by a list is 3 * length + size of the contents. The space used by a pair is 3 + size of contents. The space used by an int is 0 (it's already counted in the container). The space used by a float is a bit tricky: it's normally 3, but float arrays and records of floats are special: it's 1 in this case. All these numbers are in 4-byte words, assuming a 32-bit architecture. On a 64-bit machine, the unit is 8-byte words and floats are size 2 (normally) or 0 (in arrays and records of floats). Here your memory usage will be : 1 + 17770 + 300M + 300M + 200M = 800M, or about 6.4 GB (assuming you have a 64-bit machine). You can do much better by using a tuple of arrays instead of an array of lists of tuples. If you use three arrays (for m, u, and r), read all the data in the arrays, and then do a radix sort (and fill a small indexing array), you'll be down to 2.4 GB of memory. If that's still too much, you could use bigarrays with 32-bit ints and floats, and get down to 1.2 GB. I don't see how you can get 800M in your C program unless your ints are 16-bit, in which case you can do it with bigarrays too. The only problem with all this, is that you'll have to write the sorting code yourself. In conclusion OCaml is not really well-suited to tight memory situations, but usually you can manage. -- Damien