Version française
Home     About     Download     Resources     Contact us    
Browse thread
Multiplication of matrix in C and OCaml
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: ls-ocaml-developer-2006@m...
Subject: Re: [Caml-list] Multiplication of matrix in C and OCaml

Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> writes:

> From: Frédéric Gava <gava@univ-paris12.fr>
>
>> Sorry for the inconvenience and this stupid error: I am a very bad C 
>> programmer.
>> 
>> But, I do not obtain the performance of Jacques Garrigue :-( I try to 
>> bench a parallel matrix multiplication algorithm and test the difference 
>> between C+MPI and OCaml+MPI (I try to prove that OCaml is efficient 
>> enought for high-performance, in this community, they largely prefer 
>> Fortran or C...))
>> 
>> 
>> a) with a "polymorphic" C program (using 
>> "multiply_complex_generic(i,complexe_add,complexe_mult,a,b,c);")
>> 
>> time ./cmult 600 2 602 1
>> real    0m18.402s
>> user    0m17.333s
>> sys     0m0.044s
>> 
>> b) for a monomorphic C programs (using "multiply_complex(i,a,b,c);");
>> 
>> time ./cmult 600 2 602 1
>> real    0m5.604s
>> user    0m5.556s
>> sys     0m0.036s
>
> Interesting. It all depends on the compiler.
> With gcc 3.4, as provided in FreeBSD, I get almost no difference
> between your polymorphic and monomorphic versions. But if I switch to
> gcc 4.1, the monomorphic version is indeed much faster. Actually, what
> I get is:
> gcc 3.4 polymorphic: 15s
> gcc 4.1 polymorphic: 20s
> gcc 3.4 monomorphic: 15s
> gcc 4.1 monomorphic:  7s
> So it looks like gcc 4.1 is better for monomorphic code, but worse for
> function calls...
> Note that in my case, this is still within a factor 2 of ocaml (which
> is about the same as gcc 3.4).

> But your C compiler may be doing some other platform specific
> optimizations. The only way to know what is happening is to look at the
> generated assembler.

I'm just wondering about that. All the data produced during the
matrix-multiplication is AFAICS not used for anything. So I wouldn't
exclude that in the case of monomorphic code gcc 4.1 is just deciding
to ditch most of the actual work after doing some data flow
analysis. Gcc 4.1 is (I think) known to do some aggressive
optimizations. I'd feel better if the code is benchmarked in a way
that the result of the multiplication is output to a file and to
subtract the constant contribution of that to the run time that the
time is measured for various problem sizes (number of matrices). One
would get a linear dependency

   t(n) = C + K*n

and fitting a straight line against the data points could obtain K to
compare the efficiency of C (under various compiler versions and
options) and Ocaml against each other without having to worry about
no-op optimizations or constant startup costs (like load times and run
time initialization: which might be a bit higher with Ocaml, though
certainly not in the order of seconds).

Comparing the output to expected results would also help to ensure
that the code is correct which I still find difficult to assert at
first glance.

Regard -- Markus