Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Bigarray is a pig
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: David McClain <dmcclain1@m...>
Subject: Re: [Caml-list] Bigarray is a pig
Here is a case where even paying the price of a function call, no matter how
indirect, is well worth it...

I just finished an implementation of memory mapped files for binary array
access from scientific datasets stored on disk. The implementation was in
C++ along with natural usage pseudo-pointers. Every array access has to
check whether or not the position is currently mapped into memory. If so, it
simply returns that address for get or set. If not, then it has to sync the
dirty pages to disk, then remap another segment of the file into memory
before continuing as before.

I am in the process of writing an OCaml interface to this as we speak. But
is it worth doing? My tests show that for pure sequential writing, the
memory mapped I/O is about 350% faster than using buffered I/O with
fwrite(). That's nice... but for pseudo-random access, where I sequentially
march upward in memory and write that address and the two surrounding
addresses at +/-4KB, which is similar to some scientific array access
patterns, the speed of the memory mapped I/O is 200 times faster (20,000%)
than using buffered I/O.

Of course, I chose that 4 KB as a ticklish offset because it both matches
the page frame size and will cause some stumbling in the memory mapped I/O.
And it also happens to be the more or less standard size of a buffer for
fwrite. I'm writing 24 MBytes of data overall, one longword at a time.

The effective throughput is about 72 MB/sec for sequential access and 100
MB/sec for randomized access, compared with 20 MB/sec sequential and 0.5
MB/sec randomized for fwrite buffered I/O.

Disks are slow. File systems are slower yet. By letting the Mach kernel
handle the I/O directly on page faults, I end up squeezing a lot more
performance from the data handling system. This is still orders of magnitude
slower than my effective computation rate, and so the cost of all the bounds
checking and subroutine calling is lost in the noise.

[Tests were performed on a stock 1.25 GHz G4 eMac].

David McClain
Senior Corporate Scientist
Avisere, Inc.

+1.520.390.7738 (USA)
david.mcclain@avisere.com



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners