Version française
Home     About     Download     Resources     Contact us    
Browse thread
large hash tables
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: John Caml <camljohn42@g...>
Subject: Re: [Caml-list] large hash tables
Richard, Thank you so much revising my program. I learned a lot from
reading over your changes, and the program works very nicely now. 1.2
GB for all 1 million items, which is efficient enough for all
practical purposes. Thanks again.

John


On Thu, Feb 21, 2008 at 4:33 PM, Richard Jones <rich@annexia.org> wrote:
> Mine version's a bit longer than your version, but hopefully more
>  idiomatic and easier to understand.
>
>  Program - http://www.annexia.org/tmp/movies.ml
>  Create the test file - http://www.annexia.org/tmp/make_movies.ml
>
>  It's best to read the program like this:
>
>  (1) Start with the _interface_ ('signature') of the new ExtArray1
>  module & type.  _Ignore_ the implementation of this module for now.
>
>  (2) Then look at the main part of the program (from where we allocate
>  the result array down through the loop which reads the data).
>
>  (3) Then look at the implementation of the module.  The main
>  complexity is that you can't just extend a Bigarray, but you have to
>  keep reallocating it (in large chunks for efficiency).
>
>  I measured it as taking some 230 MB for a 10 million line data file,
>  but that doesn't necessarily mean it'll take 2 GB for 100 million
>  lines because there's some space overhead which will decline as a
>  proportion of the total memory used.
>
>
>
>  Rich.
>
>  --
>  Richard Jones
>  Red Hat
>