New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array.init of a float array needlessly initialize array with (f 0) #6065
Comments
Comment author: @jhjourdan What architecture are you using ? Are you sure the initialization is the actual bottleneck (would it be the allocation ?) ? Do you have some practical benchmark showing this ? If necessary, I would rather optimize caml_make_vect: this is not normal that a simple memset-like loop is a CPU bottleneck. The actual initialization by the Ocaml code (map, init) should use at least comparable time. |
Comment author: @rixed I use everyone's amd64 PC.
Annotated caml_make_vect at hottest point:
12.05 : 64fbae: add $0x8,%rax I tried to implement the suggested patch to measure the gain but I failed to compile the compiler with new runtime ; I tried make core/coreboot/bootstrap and other recipes mentioned here and there to no avail. |
Comment author: @xavierleroy Interesting suggestion, thanks. Some thoughts:
|
Comment author: @rixed Thank you for the suggestions. Not sure If I will be able to specialize my Array.map since the float type comes from a functor argument, though. Typical size of my arrays are 2 to 4 items (so, quite small). Also, what's the procedure to change the runtime in recent ocaml source distribution? Maybe I should open another feature request to have this documented in the README? |
Comment author: @jhjourdan I am a bit surprised by your performance problems in the case of that small arrays. Could you please provide some repro case ? |
Comment author: @rixed The actual program is a small demonstration of a vector graphic library ; see for yourself: opam switch testperf --alias-of 4.02.0dev+fp should compile and run the program and record it's runtime performances. |
Comment author: @jhjourdan I get the following: Building world of radius 150.... Using gdb, the backtrace is: #0 0x0000000000000000 in ?? () So it seems it is trying to draw something, but my openGL implementation does not like this. Is there a way to run it without rendering ? |
Comment author: @jhjourdan I have finally been able to run it. I did the following experiment : I duplicated the initialization loop in caml_make_vect (so the second instance of the loop does nothing interesting, it just takes time). After recompiling everything, it seems like the second instance takes much less time than the first. For me, it precisely means the first one is almost always a cache miss, while the second one is a cache hit. I precise the assembly code generated for both loops are very similar. My conclusion is that if we remove the loop (or replace it by the initialization of the first field), the performance won't be improved a lot, because of the cache misses (they will appear in Array.map or whatever anyway). |
Comment author: @rixed Sorry for the segfault, you probably figured out how to disable rendering. Yes my use case may not benefit a lot from tis change, but for larger arrays the first initialization may load the same cache lines several times, entailing more latency than necessary. |
The As I mentioned in my comment above, similar optimizations might be possible for generic array operations such as Closing this report. |
Original bug ID: 6065
Reporter: @rixed
Status: acknowledged (set by @xavierleroy on 2013-07-06T08:34:18Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.00.1
Target version: later
Category: standard library
Monitored by: @gasche @jmeber
Bug description
I'm using small arrays of float to store vectors and caml_make_vect is high in CPU consumption. Looking at the code it seams to me that in many cases (Array.init, Array.map, etc, actually all but Array.make) caml_make_vect could let the array uninitialized (since it's a Double_array_tag block) because the code of Array.init, map, etc, is going to properly initialize the array.
So I propose to introduce caml_init_vect, which will be the same as caml_make_vect without the Store_double_field() calls that copy init value into
the float array. Of course both caml_init_vect and caml_make_vect could be wrappers around a third one taking an additional parameter such as "need_init".
The text was updated successfully, but these errors were encountered: