English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
OC4MC : OCaml for Multicore architectures
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2009-09-25 (21:28)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Friday 25 September 2009 05:07:21 Jacques Garrigue wrote:
> Your benchmark seems strange to me, as you are comparing apples with
> oranges.

In some sense, yes. I was interested in the performance of the 
defacto-standard hash table implementations and not the performance that can 
be obtained by reinventing the wheel.

> Hashtables in Python are a basic feature of the language, 
> and they are of course implemented in C. In ocaml, they are
> implemented in ocaml (except the hashing function, which has to be
> polymorphic), using an array of association lists!
> (Actually the pairs are flattened for better performance, but still)
> What is impressive is that you don't need any special optimization to
> get reasonably good performance.

OCaml is 4x slower than F# on that benchmark for several reasons:

1. Overhead of 31-bit int arithmetic.

2. Lack of constant table sizes in the implementation and OCaml's failure to 
optimize mod-by-a-constant.

3. No monomorphization.

You can write a far more efficient hash table implementation in F# than you 
can in OCaml because it addressed all of those deficiencies.

> Actually the only tuning you need is to start from a reasonable table size,
> which you didn't... 

No, the exact opposite is true: OCaml had the unfair advantage of starting 
from the optimal table size for the problem whereas F# started from the 
default size and had to resize. If you level the playing field then OCaml is 
8x slower than F#.

> > Even if that were not the case, the idea of cherry picking interpreted
> > scripting languages to compete with because OCaml has fallen so far
> > behind mainstream languages (let alone modern languages) is embarrassing.
> > What's next, OCaml vs Bash for your high performance needs?
> OCaml was never touted as an HPC language!

I started learning OCaml because people were running high performance OCaml 
code on a 256-CPU supercomputer in Cambridge. I have been touting OCaml for 
HPC ever since. Thousands of scientists and engineers all over the world have 
used OCaml for technical computing and chose it precisely because it was 
competitively performant.

> The only claim I've seen is that it intends to stay within 2x of C for most
> applications. (Which is not so easy these days, gcc getting much faster.)

Yes. The infrastructure for compiler writers is improving rapidly as well 
though, e.g. LLVM.

> Actually, I believe that Philippe's point is rather different.
> Making a functional language work well on multicores is difficult.
> If I tell you that you just have to modify a bit your program to get a
> near linear speedup, then it looks great. But in practice it is rather
> having to rethink completely your algorithm,

Sure. The free lunch is over. However, the solution usually consists either of 
spawning independent computations or parallelizing outer loops, both of which 
can be made very easy by the language implementor.

> to eventually get a speedup bounded by bandwidth,

For some applications under certain circumstances, yes.

> and starting from a point lower than the original single thread program.


> There are applications for that (ray tracing is one), but this is not the
> kind of needs most people have. 

Not the kind of needs the remaining OCaml programmers have, perhaps. Outside 
the OCaml world, a lot of people are now programming for multicores.

> By the way, I was discussing with numerical computation people working
> on BLAS the other day, and their answer was clear: if you need high 
> performance, better use a grid than SMP, since bandwidth is 
> paramount. 

That is a false dichotomy. Grids are inevitably composed of multicores so you 
will still lose out if you fail to leverage SMP when programming for a grid.

> ...And you have to write in C or FORTRAN (or asm), because the timing of
> instructions matter. 

I have written linear algebra code in F# that outperforms Intel's vendor tuned 
Fortran (the MKL) by a substantial margin on Intel hardware. Moreover, their 
code only works on certain types whereas mine is generic.

OCaml is an excellent language for this kind of work but it requires an 
implementation with a performance profile that is very different from 

> The funniest part was that those people were working on integer
> computations, but had to stick to floating point, because timing on integers
> is unpredictable, making synchronization harder.   


Dr Jon Harrop, Flying Frog Consultancy Ltd.