Browse thread
Performance questions, -inline, ...
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Kuba Ober <ober.14@o...> |
| Subject: | Re: [Caml-list] Performance questions, -inline, ... |
On Saturday 05 January 2008, Jon Harrop wrote: > Optimizing numerical code is discussed in detail in my book OCaml for > Scientists. You may also be interested in a very similar thread where I > optimized someone's almost identical code on the beginners list recently. > There is also this relevant blog post of mine: > > http://ocamlnews.blogspot.com/2007/09/spectral-norm.html > > Essentially, your benchmark has rediscovered the fact that abstractions > (HOFs, polymorphism etc.) are prohibitively slow for high-performance > numerics and must be avoided. In the case of the particular OCaml implementation - yes. In general - no. I hope we agree here. > On Thursday 03 January 2008 16:28:30 Kuba Ober wrote: > > I haven't looked at assembly output yet, but I've run into some > > unexpected behavior in my benchmarks. > > Your benchmarks aren't sufficiently well defined to convey information > about anything specific, so you're highly likely to misinterpret what you > see. They are straight rewrites from C code and are used to compare how gcc and OCaml stack up. > > This was compiled by ocamlopt -inline 100 -unsafe, > > You should use Array.unsafe_get and _set rather than the command-line > option -unsafe because the latter breaks correct code. This option is not affecting the execution speed in my case and thus can be dropped. > > What I wonder is why vector-to-vector add is so much faster than > > (constant) scalar to vector add. Vectors are preinitialized each time > > with a 1.0000, 1.0001, ... sequence. > > This is also highly likely to be platform and architecture dependent. The benchmark was run on the same machine and on the same day as the C code it was rewritten from :) > > (* generic scalar operation *) > > let op1 op const nloop = > > let accum = ref start in > > for i = 1 to nloop do > > accum := op !accum const > > done > > You probably meant to return "!accum". The return value is ignored anyway. It's a benchmark, noone cares what the result is. Or is it for performance reasons?? > > (* generic vector operation *) > > let op2 op const a b (nloop : int) = > > let len = Array.length a in > > for j = 0 to len-1 do > > for i = 0 to len-1 do > > b.(i) <- op a.(i) b.(i) > > done; > > done > > > > (** addition **) > > let add1 nloop = > > let accum = ref start in > > for i = 1 to nloop do > > accum := !accum +. addconst > > done > > Again, should probably return "!accum". However, you can encourage OCaml to > unbox the float by returning "!accum + 0.0" instead. OK, I don't quite get it. Are you talking about what the function should return? If so, are you implying that the function body will be compiled differently (better?) if a different type is returned? > > let add2 = op1 ( +. ) addconst > > This should be slower than "add1". But shouldn't. The way I write the code in this case should not affect the assembly the least bit. The loop with explicit operand, functional recursion and generic function-as-an-argument approach should all generate same assembly. Obviously enough they don't :) > > let add3 a b nloop = > > let len = Array.length a in > > for j = 0 to len-1 do > > for i = 0 to len-1 do > > b.(i) <- a.(i) +. addconst > > done; > > done > > The loop over "j" can be removed. Well, the goal was to iterate Array.length^2 times :) > > let add4 = op2 ( +. ) addconst > > This will be slow because "op2" is polymorphic and "+." will not be > inlined. Yeah, I see that, but that shouldn't be the case if OCaml were to be serious in that department :) > > let add5 a b nloop = > > let len = Array.length a in > > for j = 0 to len-1 do > > for i = 0 to len-1 do > > b.(i) <- a.(i) +. b.(i) > > done; > > done > > This increments "b" by "a" many times. Replace the repeated adds with a > single multiply for each element. It's benchmark code. It's supposed to check performance of increment-vector-by-a-vector operation. I was hoping it would be obvious enough :( Cheers, Kuba