ocamlopt
compiler is poorly named because it
basically does not optimize the user's code. That is, everything will
be executed pretty much as written in the source. The only exception
is function applications, where the compiler inlines small function
bodies. But function definitions fun x -> ...
will
always heap-allocate a closure (no lambda-lifting, no closure
hoisting). Arithmetic expressions will be computed at the point where
they occur in the source (no loop-invariant code motion). So, the
programmer is very much in control.
a.(i).(j)
reads the i
-th
element of a, which is a pointer to an array, then read the
j
-th element of that array. It's about as efficient as
the standard layout for matrices when the dimensions are not known at
compile-time (we're trading a dereference for a multiplication), but
can be a big loss if the dimensions are statically known and one
happens to be, say, a power of 2.
A first consequence is that rows of a matrix are really individual
arrays: they need not be the same length (great for triangular matrices)
and a whole row of a matrix can be replaced in constant time: just do
a.(i) <- some_array
.
There might also be some creative things you can
do by sharing a row between several matrices.
Another consequence is that matrices are row-major (hope that's the right term): the first coordinate should vary slower than the second. E.g. to sum two N*N matrices, don't write
for j = 0 to N-1 do for i = 0 to N-1 do c.(i).(j) <- a.(i).(j) + b.(i).(j) done donebut write:
for i = 0 to N-1 do let row_a = a.(i) and row_b = b.(i) and row_c = c.(i) in for j = 0 to N-1 do row_c.(j) <- row_a.(j) + row_b.(j) done done(Notice the manual optimization: three loop-invariant expressions have been lifted outside of the inner loop.)
One of the ancestors of Objective Caml, the Gallium experimental compiler, implemented a fairly aggressive type-based unboxing strategy that would completely eliminate boxing from Fortran-style code. i.e. code without polymorphic functions. Unfortunately, it turned out to complicate greatly the runtime system and make the garbage collector less efficient, hence hampering the performance of symbolic code. Since symbolic computation is still the main application of ML, I dropped this unboxing strategy.
However, some of the unboxing tricks developed for Gallium are still used in Objective Caml, at least on a local scale (where they don't interact with the garbage collector and everything). Here are some hints on which floats are boxed and which are not:
Example: fun x -> 3.14 *. x *. x
The parameter x
and the result are boxed, but not
x *. x
.
let
-bound results of arithmetic operations are not boxed
if they are only used later as arguments to arithmetic operations.
Example:
let x = y +. z in x *. x x not boxed let x = y +. z in x *. x *. f x x boxed (passed as argument)
An access in a float array or in such a record avoids boxing in the same way as for arithmetic operations.
Example: a.(i) <- 2.0 *. a.(i)
performs only one float
read and one float store, instead of two reads, a boxing and a store as
in most ML implementations.
for
and while
loops instead
of recursive functions. let x = y +. z in for i = 0 to N-1 do a.(i) <- x *. a.(i) donedoes not box x, but
let x = y +. z in let rec iter i = if i >= N then () else (a.(i) <- x *. a.(i); iter(i+1)) in iter 0is not only unreadable, but causes
x
to be boxed because it is
free in iter
.
type complex = {re: float; im: float}saves 4 loads and 2 allocations for each complex addition compared with
type complex = float * float
The illusion of a parametric array type 'a array
is maintained
at some cost: when accessing an array whose static type is
'a array
(e.g. in a polymorphic function), a run-time test
is generated to distinguish the
two array formats, and some additional boxing/unboxing may take place
if it's a float array. When more is known on the static type of the
array, no test is generated and no extra boxing takes place.
So, for maximal performance, don't operate over float arrays with polymorphic array functions. For instance, transposing a float matrix with
let transpose m = let m' = Array.new_matrix (Array.length m.(0)) (Array.length m) in for i = 0 to Array.length m do let row_i = m.(i) in for j = 0 to Array.length row_i do m'.(j).(i) <- row_i.(j) done done; m'is much more efficient if you add a type constraint:
let transpose (m: float array array) = ...
=
, <
,
compare
, etc, return incorrect results when one of the
arguments is a float NaN (``Not a Number''). Namely, NaNs are
considered equal to any other float, while they should be different
from any float including themselves. However, when =
,
<
, etc, are used with the static type
float -> float -> bool
, then the compiler bypasses the
polymorphic comparisons and calls the floating-point comparisons
directly. The floating-point comparisons do conform to the IEEE
standard. So, if you want to define a ``is Not a Number'' predicate that builds on the fact that NaN is the only number different from itself, don't write
let isnan x = x <> xbecause you'll get a polymorphic comparison that doesn't work over NaN. Instead, force the floating-point comparison by writing
let isnan (x : float) = x <> x
ocamlopt
.
Caml -
Objective Caml -
Projet Cristal -
Homepages -
Contact the Caml maintainers
Author: Xavier Leroy --
Last modified: 2002/07/31