This site is updated infrequently. For up-to-date information, please visit the new OCaml website at ocaml.org.

[Caml-list] Does Caml have slow arithmetics ?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
 Date: -- (:) From: Jacques GARRIGUE Subject: Re: [Caml-list] Does Caml have slow arithmetics ?
```From: Jon Harrop <postmaster@jdh30.plus.com>

> Funny you should say that. I've only recently been trying my hand at such
> low-level optimisations in OCaml and, to my surprise, I'm terrible at it (no
>
> Specifically, I recalled a post that ocamlopt doesn't do CSE. On the basis of
> this I thought that I could optimise my original code:
>
>       let pivot = (first + last)/2 in
>       let sum1=helper (scale+1) (2*entry) first pivot in
>       let sum2=helper (scale+1) (2*entry+1) pivot last in
>       Array.unsafe_set data2 entry (sum1 -. sum2);
>       sum1 +. sum2
>
> by performing some CSE manually, for example:
>
>       let pivot = (first + last)/2 in
>       let (sum1, sum2) =
>         let scale=scale+1 and entry=2*entry in
>         (helper scale entry first pivot, helper scale (entry+1) pivot last)
>       in
>       Array.unsafe_set data2 entry (sum1 -. sum2);
>       sum1 +. sum2

Interesting.
Actually, I suppose that this kind of situation could be optimized,
by working at the Clambda level. This is a bit unfortunate, as most
"non-local" optimizations are expected to be done at earlier stages,
but would be necessary to cope with inlining too.

Yet, in this example sharing the computation for scale and entry is
useless: addition and multiplication by a power of 2 cost virtually
nothing anyway.

> I also tried applying part of the arguments to Array.set to avoid
> the creation of a pair:
>
>       let pivot = (first + last)/2 in
>       let set=Array.unsafe_set data2 entry in
>       let scale=scale+1 and entry=2*entry in
>       let sum1=helper scale entry first pivot in
>       let sum2=helper scale (entry+1) pivot last in
>       set (sum1 -. sum2);
>       sum1 +. sum2
>
> Contrary to my expectations, these revised versions are 30 and 70%
> slower than the original, respectively.

This is a well-known behaviour, building the closure "set" is costly
for two reasons: a data structure holding it must be allocated
(causing eventually GC), and calls through a closure cost more than
direct ones (particularly if the "helper" function is known.)
In your case, there is even a 3rd reason: unsafe_set is not a real
function, but a primitive, which just generates the operations to
modify an array. By wrapping it in set, you make it a function.
And for some reason, ocamlopt does not inline set, leaving you with a
function call where there should have been none.

You may look at an abstraction of the code generated with the -cmm
option. For the above code it gives you:

(data
int 3319
"camlPivot__2":
int 7

(if (!= (load unsigned int8 (+a prim/73 -4)) 254)
(extcall "caml_modify" (+a (+a prim/73 (<< prim/72 1)) -2) prim/71 unit)
(store float64u (+a (+a prim/73 (<< prim/72 2)) -4)
1a)

(function camlPivot__pivot_57
(let
(pivot/64 (+ (<< (/ (>>s (+ (+ first/58 last/59) -1) 1) 2) 1) 1)
set/65 (app "caml_apply2" data2/60 entry/61 "camlPivot__2" addr)
scale/66 (+ scale/62 2) entry/67 (+ (* 4 (>>s entry/61 1)) 1)
sum1/68
(app "caml_apply4" scale/66 entry/67 first/58 pivot/64 helper/63 addr)
sum2/69
(app "caml_apply4" scale/66 (+ entry/67 2) pivot/64 last/59 helper/63
unit)