New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster bigarray access #5203
Comments
Comment author: @alainfrisch I'm playing with a similar test: let ba () = let arr () = I obtain: ba 1.62s In the "ba" case, the inner loop is: .L103: Several things can be improved with only a minor effect on performance:
The real win is obtained by lifting the extra pointer indirection related to bigarrays outside the loop:
.L103: With only this modification, we get the same performance as for regular arrays (this is not suprising, given that the inner loop is now the same). How difficult would it be to perform this optimization automatically? Is it specified that the "data" field of the caml_ba_array structure is immutable (it should not be modified by C code)? |
Comment author: @alainfrisch As far as I can say, one of the difficulties to implement this optimization (compared to unboxing of floats, for instance) is that we need to manipulate |
Comment author: @Chris00
I do not remember reading such a thing but I do not know either any library that changes the underlying data pointer after creating the bigarray. I believe lifting this indirection as much as possible outside blocks — especially loops — would be worthwhile because bigarrays get/set are of course usually mixed with loops (whose efficiency is important) and it would be nice to be as fast as C. |
Comment author: @alainfrisch I'm attaching a patch that tries to remove the extra indirection. An expression let a = E1 (* some bigarray expression ) in is compiled by extracting the data pointer from the bigarray once (after a is bound) and using this pointer in E2. Some care is needed to ensure that the bigarray value itself is not collected. I'm not 100% sure this patch is sound! On my example, I get the same performance for the array and bigarray versions. |
Comment author: @alainfrisch I forgot this chunk in the attached patch: Index: asmcomp/amd64/emit.mlp--- asmcomp/amd64/emit.mlp (revision 10927)
|
Comment author: @Chris00 Here are my experiments. The patch has indeed a good impact. “set_ba” and “set_arr” are your tests, “ba” and “arr” are mine (creating the array in the function and with a “for j = 0 to m-1” loop around the loop over the array). I set n = 10_000 and m = 1000. ocaml 3.11.2 (unpatched): Rate set_ba ba arr set_arr set_ba 49.6+-0.1/s -- -30% -34% -50% ba 71.2+-0.2/s 43% -- -5% -28% arr 74.7+-0.1/s 50% 5% -- -25% set_arr 99.0+-0.4/s 100% 39% 33% -- ocaml 3.12.1+dev5 (2010-10-12), no patch: Rate set_ba ba arr set_arr set_ba 49.7+-0.1/s -- -30% -50% -50% ba 70.9+-0.4/s 43% -- -29% -29% arr 99.3+-0.3/s 100% 40% -- [-0%] set_arr 99.4+-0.2/s 100% 40% [0%] -- ocaml 3.12.1+dev5 (2010-10-12), after patch: Rate ba set_ba set_arr arr ba 71.9+-0.2/s -- [-2%] -26% -26% set_ba 73.2+-1.5/s [2%] -- -25% -25% set_arr 97.0+-1.0/s 35% 32% -- [-1%] arr 97.8+-0.3/s 36% 33% [1%] -- |
Comment author: @Chris00 Once (something like) this patch is determined sound, it would be good to perform the optimization also on “fun vec -> E2” (loops over bigarrays usually happen in functions computing things). |
Comment author: @damiendoligez You should make sure that the data pointer is not considered as a root by the GC. I looked at the patch but couldn't see anything related to this problem. This is because in the (not so far) future, it will be forbidden to have pointers outside the heap as OCaml values. |
Comment author: @alainfrisch I don't think there is a way, currently, to mark values at the cmm level as being non-GC pointers. Maybe this should be added at the same time pointers outside the heap are forbidden (do you have an idea when this will happen?). Also, the patch is unsafe in the presence of GADTs, so I assume it won't be included anyway in its current form. |
The CSE optimization that was added in 4.02 can often factor out the extra indirection caused by bigarrays. But it cannot lift it outside loops, like this report suggests. Other code motion optimizations could be considered. But I'd rather have a generic optimization than something specific to bigarrays. So, I'll close this report. |
Original bug ID: 5203
Reporter: @Chris00
Status: acknowledged (set by @damiendoligez on 2011-01-20T09:46:52Z)
Resolution: open
Priority: normal
Severity: feature
Version: 3.12.1+dev
Category: back end (clambda to assembly)
Tags: patch
Monitored by: @ygrek @hcarty @Chris00 @alainfrisch
Bug description
Bigarray access seems to be 20–30% slower than array access. For example,
type vec = (float, float64_elt, c_layout) Array1.t
let ba (a: vec) =
let s = ref 0. in
for i = 0 to n-1 do s := !s +. a.{i} done
let arr (a: float array) =
let s = ref 0. in
for i = 0 to n-1 do s := !s +. a.(i) done
ran on (big)arrays of size 10000 gives
ba 72519+-1524/s -- -26%
arr 97826+-1351/s 35% --
on an Intel core i7 Q 820 @ 1.73GHz (x86_64) running Debian GNU/Linux. Would it be possible to make access to (float) bigarrays as fast as (float) arrays when their type is statically known? That would be very much appreciated!
File attachments
The text was updated successfully, but these errors were encountered: