Version française
Home     About     Download     Resources     Contact us    
Browse thread
large hash tables
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Richard Jones <rich@a...>
Subject: Re: [Caml-list] large hash tables
Mine version's a bit longer than your version, but hopefully more
idiomatic and easier to understand.

Program - http://www.annexia.org/tmp/movies.ml
Create the test file - http://www.annexia.org/tmp/make_movies.ml

It's best to read the program like this:

(1) Start with the _interface_ ('signature') of the new ExtArray1
module & type.  _Ignore_ the implementation of this module for now.

(2) Then look at the main part of the program (from where we allocate
the result array down through the loop which reads the data).

(3) Then look at the implementation of the module.  The main
complexity is that you can't just extend a Bigarray, but you have to
keep reallocating it (in large chunks for efficiency).

I measured it as taking some 230 MB for a 10 million line data file,
but that doesn't necessarily mean it'll take 2 GB for 100 million
lines because there's some space overhead which will decline as a
proportion of the total memory used.

Rich.

-- 
Richard Jones
Red Hat