Browse thread
Array 4 MB size limit
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Xavier Leroy <Xavier.Leroy@i...> |
| Subject: | Re: [Caml-list] Array 4 MB size limit |
> I was greatly surprised when I found out there was such a low limit on > arrays. Is there a reason for this? Will this limit ever be increased? As Brian Hurt explained, this limit comes from the fact that heap object sizes are stored in N-10 bits, where N is the bit width of the processor (32 or 64). Historical digression: this representation decision was initially taken when designing Caml Light in 1989-1990. At that time, even professional workstations had 16 M of RAM at best, so limiting arrays to 4M elements was reasonable. The decision was then reconsidered in 1995 during the redesign that led to OCaml. At that time, 64-bit architectures were all the rage: OCaml was actually implemented on a 64-bit Alpha, and only then backported to 32-bit machines. So, the original header format was kept, since it makes complete sense on a 64-bit architecture. Little did I know that the 32-bitters would survive so long. Now, it's 2006, and 64-bit processors are becoming universally available, in desktop machines at least. (I've been running an AMD64 PC at home since january 2005 with absolutely zero problems.) So, no the data representations of OCaml are not going to change to lift the array size limit on 32-bit machines. > Is the limit a limit on the number of elements or the total size? The > language in Sys.max_array_size implies the former, but the fact the > limit is halved for floats implies the latter. If I had a record type > with 5 floats, will the limit then by Sys.max_array_size / 10? No. In general, Caml arrays are not unboxed, meaning that your array of 5-float records is actually an array of pointers to individual blocks holding 5 floats. The only exception is for arrays of floats, which are unboxed. > Is there > some sort of existing ArrayList module that works around this problem? > Ideally, I'd like to have something like C++'s std::vector<> type, which > can be dynamically resized. Do I have to write my own? :( A better idea would be to determine exactly what data structure you need: which abstract operations, what performance requirements, etc. C++ and Lisp programmers tend to encode everything as arrays or lists, respectively, but quite often these are not the best data structure for the application of interest. > Also, the fact that using lists crashes for the same data set is > surprising. Is there a similar hard limit for lists, or would this be a > bug? Should I post a test case? Depends on the platform you use. In principle, Caml should report stack overflows cleanly, by throwing a Stack_overflow exception. However, this is hard to do in native code as it depends a lot on the processor and OS used. So, some combinations (e.g. x86/Linux) will report stack overflows via an exception, and others will let the kernel generate a segfault. If you're getting the segfault under x86/Linux for instance, please post a test case on the bug tracking system. It's high time that Damien shaves :-) - Xavier Leroy