Browse thread
[Caml-list] matrix-matrix multiply - O'Caml is 6 times slower than C
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Xavier Leroy <xavier.leroy@i...> |
| Subject: | Re: [Caml-list] matrix-matrix multiply - O'Caml is 6 times slower than C |
> Looking at the assembly produced by O'Caml and GCC, it appears that GCC > is performance loop unrolling (as requested with -funroll-loops) and > strength reduction in the inner loops. I can easily see why these two > optimizations can result in such a tremendous performance difference. > > My question is this: I can obviously performance loop unrolling myself > by hand - does ocamlopt perform strength reduction? No, it doesn't. It doesn't hoist loop-invariant computations either. (See http://caml.inria.fr/ocaml/numerical.html) Multiplications are pretty expensive on modern processors, so your code would run much faster if you hoist the computation i*120 out of the j and k loops, and strength-reduce k*120 in the innermost loop. Another option worth considering is to swap the j and k loops, thus making k*120 and a.(i*120+k) two invariants of the inner loop. In this case, you might not even have to strength-reduce k*120. Other loop optimizations such as loop unrolling would (I guess) make much less of a difference -- modern processors are very good at branch prediction, making loop overhead low. Some more thoughts: Bagley's code works on a "proper" matrix (an array of arrays), while yours flatten the matrix into one single array. There are pros and cons to either approach, but notice that your approach (without strength reduction) entails fewer loads but more multiplications, which can be more expensive... Tiling loops is good for really large matrices, but yours (in this example) occupies "only" 115 Kb, thus fits comfortably in L2 cache. Perhaps tiling isn't beneficial in this case. - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners