Version française
Home     About     Download     Resources     Contact us    
Browse thread
Array 4 MB size limit
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: [Caml-list] Array 4 MB size limit
> I was greatly surprised when I found out there was such a low limit on
> arrays.  Is there a reason for this?  Will this limit ever be increased?

As Brian Hurt explained, this limit comes from the fact that heap
object sizes are stored in N-10 bits, where N is the bit width of the
processor (32 or 64).

Historical digression: this representation decision was initially
taken when designing Caml Light in 1989-1990.  At that time, even
professional workstations had 16 M of RAM at best, so limiting arrays to
4M elements was reasonable.  The decision was then reconsidered in
1995 during the redesign that led to OCaml.  At that time,
64-bit architectures were all the rage: OCaml was actually implemented
on a 64-bit Alpha, and only then backported to 32-bit machines.
So, the original header format was kept, since it makes complete sense
on a 64-bit architecture.  Little did I know that the 32-bitters would
survive so long.  Now, it's 2006, and 64-bit processors are becoming
universally available, in desktop machines at least.  (I've been
running an AMD64 PC at home since january 2005 with absolutely zero
problems.)  So, no the data representations of OCaml are not going to
change to lift the array size limit on 32-bit machines.

> Is the limit a limit on the number of elements or the total size?  The
> language in Sys.max_array_size implies the former, but the fact the
> limit is halved for floats implies the latter.  If I had a record type
> with 5 floats, will the limit then by Sys.max_array_size / 10?

No.  In general, Caml arrays are not unboxed, meaning that your array
of 5-float records is actually an array of pointers to individual
blocks holding 5 floats.  The only exception is for arrays of floats,
which are unboxed.

> Is there
> some sort of existing ArrayList module that works around this problem?
> Ideally, I'd like to have something like C++'s std::vector<> type, which
> can be dynamically resized.  Do I have to write my own? :(

A better idea would be to determine exactly what data structure you need:
which abstract operations, what performance requirements, etc.  C++
and Lisp programmers tend to encode everything as arrays or lists,
respectively, but quite often these are not the best data structure
for the application of interest.

> Also, the fact that using lists crashes for the same data set is
> surprising.  Is there a similar hard limit for lists, or would this be a
> bug?  Should I post a test case?

Depends on the platform you use.  In principle, Caml should report
stack overflows cleanly, by throwing a Stack_overflow exception.
However, this is hard to do in native code as it depends a lot on the
processor and OS used.  So, some combinations (e.g. x86/Linux) will
report stack overflows via an exception, and others will let the
kernel generate a segfault.

If you're getting the segfault under x86/Linux for instance, please
post a test case on the bug tracking system.  It's high time that
Damien shaves :-)

- Xavier Leroy