Browse thread
large hash tables
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | John Caml <camljohn42@g...> |
| Subject: | Re: [Caml-list] large hash tables |
Richard, Thank you so much revising my program. I learned a lot from
reading over your changes, and the program works very nicely now. 1.2
GB for all 1 million items, which is efficient enough for all
practical purposes. Thanks again.
John
On Thu, Feb 21, 2008 at 4:33 PM, Richard Jones <rich@annexia.org> wrote:
> Mine version's a bit longer than your version, but hopefully more
> idiomatic and easier to understand.
>
> Program - http://www.annexia.org/tmp/movies.ml
> Create the test file - http://www.annexia.org/tmp/make_movies.ml
>
> It's best to read the program like this:
>
> (1) Start with the _interface_ ('signature') of the new ExtArray1
> module & type. _Ignore_ the implementation of this module for now.
>
> (2) Then look at the main part of the program (from where we allocate
> the result array down through the loop which reads the data).
>
> (3) Then look at the implementation of the module. The main
> complexity is that you can't just extend a Bigarray, but you have to
> keep reallocating it (in large chunks for efficiency).
>
> I measured it as taking some 230 MB for a 10 million line data file,
> but that doesn't necessarily mean it'll take 2 GB for 100 million
> lines because there's some space overhead which will decline as a
> proportion of the total memory used.
>
>
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>