Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Gripes with array
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Basile Starynkevitch [local] <basile.starynkevitch@i...>
Subject: Re: [Caml-list] Gripes with array
On Thu, Sep 09, 2004 at 11:08:50AM +0200, Olivier Andrieu wrote:
>  Richard Jones [Thu, 9 Sep 2004]:
>  > On Thu, Sep 09, 2004 at 09:17:25AM +0200, Jean-Christophe Filliatre wrote:
>  > > 
>  > > Jon Harrop writes:
>  > >  > 
>  > >  > Does anyone have any pointers to information about the origin of the size 
>  > >  > limit for arrays? [....]
>  > > 
>  > > In ocaml sources, the  file byterun/mlvalues.h gives all details about
>  > > the  block  header structure.  [....]
>  > > 
>  > > But I must  agree with you: this is definitely too  small and we could
>  > > imagine  that, when the  tag says  a block  is an  array, the  size is
>  > > stored within the first (or the last) field instead.

I do agree that using Bigarrays is the way to go, until everyone has a
64 bits machine. For completeness, we could also consider the
following scheme (which I am NOT volunteering to implement!)

   The header layout remains the same (so only 22 bits for size on 32
   bits machine), but if the size is all bit ones, the block is
   actually a fixed block, and the real array size is the word before
   the header.

However, I see another potential problem (which could already
potentially appear today, with standard 0caml 3.08, on 64 bits
machines with more than 1Gbyte of RAM). If you have a huge array of
pointers, the garbage collector (even the minor one) has to scan the
full array - and this scan is "atomic" in the sense that it is not
interruptible (and I believe that designing a GC which incrementally
scans big values by chunks is not trivial, given Ocaml GC performance
needs). So for an array of say 300 million pointers, the GC has to
scan it, which takes a significant time (I would guess several tenths
of seconds at least, just scanning this single array).

I am asking, do lucky people with a 64 bits machines and plenty of RAM
did experiment some bizarre GC behavior when handling such monster
pointer arrays in Ocaml - for example,
   let monster =  Array.init 300_000_000 (fun i -> ((Printf.sprintf "int%d" i), i))

More practically, I would be curious to hear from people having run
Ocaml programs (on rather big 64 bits hardware, with multigigabyte
RAM) in processes of more than a gigabyte of Ocaml heap! Does the GC
works well in its default setting, or have they to tune it?

>  > I have a similar problem with the maximum size of strings.  In
>  > practical terms, it limits the size of file uploads to COCANWIKI to
>  > around 6 MB (ie., not very much) [not the full 16 MB because of
>  > character escaping, but even 16 MB would be far too small]. [....]

FWIW, I had similar limitations in Poesia more than a year ago. I
solved it by specifying that Poesia (a web filter) won't handle web
content of more than ten million bytes. (Maybe an enhanced buffer
package, representing buffer contents in array of strings, could

>  > Does the tag field need to be so wide?  What does the tag mean if it
>  > has different values < No_scan_tag (251)?
> it's for variants (with or without arguments)

Apparently, people are less bitten by the maximal number of
variants. I guess that most applications don't have big sum (ie
variant) types with more than a hundred of non-empty choices (ie
Choice of ... construct).

Regarding very big data structures, I tend to believe that they should
be more organized than just a single monster array (hence the current
array limits on 32 bits machine is a sensible tradeoff), even on 64
bits irons. But I have no practical experience on these.

Basile STARYNKEVITCH -- basile dot starynkevitch at inria dot fr
Project -  temporarily --- all opinions are only mine 

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: