Performance questions, inline, ...
[
Home
]
[ Index:
by date

by threads
]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
Date:   (:) 
From:  Kuba Ober <ober.14@o...> 
Subject:  Performance questions, inline, ... 
I haven't looked at assembly output yet, but I've run into some unexpected behavior in my benchmarks. This was compiled by ocamlopt inline 100 unsafe, the results and code are below (MIPS is obtained by dividing 50 million iterations by (Unix.times ()) . Unix.tms_utime it took to run). I haven't included the timing etc. code (it's part of a larger benchmark). What I wonder is why vectortovector add is so much faster than (constant) scalar to vector add. Vectors are preinitialized each time with a 1.0000, 1.0001, ... sequence. Also, the very bad performance from generic vectortovector *with* inlining is another puzzler, whereas generic add of scalartoscalar performs similarly to straightcoded one. Cheers, Kuba * add1: add scalar to scalar 120 MIPS * add3: add scalar to vector 250 MIPS * add5: add vector to vector 320 MIPS * add2: generic add scalar to scalar 100 MIPS * add4: generic add vector to vector 38 MIPS let start = 1.3 (* generic scalar operation *) let op1 op const nloop = let accum = ref start in for i = 1 to nloop do accum := op !accum const done (* generic vector operation *) let op2 op const a b (nloop : int) = let len = Array.length a in for j = 0 to len1 do for i = 0 to len1 do b.(i) < op a.(i) b.(i) done; done (** addition **) let add1 nloop = let accum = ref start in for i = 1 to nloop do accum := !accum +. addconst done let add2 = op1 ( +. ) addconst let add3 a b nloop = let len = Array.length a in for j = 0 to len1 do for i = 0 to len1 do b.(i) < a.(i) +. addconst done; done let add4 = op2 ( +. ) addconst let add5 a b nloop = let len = Array.length a in for j = 0 to len1 do for i = 0 to len1 do b.(i) < a.(i) +. b.(i) done; done