English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    
Browse thread
Raising an old issue : true concurrency in OCaml [Xavier, Damien, any]
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Yoann Fabre <Yoann.Fabre@w...>
Subject: Raising an old issue : true concurrency in OCaml [Xavier, Damien, any]

Well... shame on me!
I'm afraid I'm going to restart the "annual discussion on threads". I'm very
sorry about that Xavier, but I do believe that the recent trend in generic
CPU design calls for a new discussion. I can only hope all of you will
forgive my too quickly written English... and of course the fact that I'm
definitely "just a nameless student" :-)

Since it seems to be fashionable here nowadays, I introduce myself quickly
in the .sig. I also invite you to read the last (full) Xavier's reply about
this topic (link and quotes included below).

My point is the following (let's be blatant): I'm afraid any pure-OCaml
program will be limited to at most 50% of the CPU power on about 50% of the
machines in the next five years ; and probably 25/30% on the remaining 40%
of medium to high-end machines.

Why? Because current CPU design is going toward true parallelism with
multi-core and hyper-threading like techs.
This is /not/ yet another marketing hype... there're profound physical
reason being such a move. It seems to be impossible to scale up in frequency
any more due to leaking current etc. See articles on
http://www.anandtech.com/cpuchipsets/ for a short introduction and more
links. (I'm ashamed of such a lake of true reference, but I'm too lazy to
find them among my papers right now...)

So Intel, AMD, IBM/Motorola and Sun are all releasing multi-core/HT CPU
right now. SMP systems will not be the exception any more, but the rule!

See:
Intel Pentium 4C (1 HT core ~ 1.7 cores)
Intel Pentium 4D (2 cores)
Intel Pentium EE 840 (2 HT cores ~ 3.4 cores)
New Sun micro-arch, a kind of Intel HT
New (incoming) multi-core AMD64
New multi-core PowerPC
IBM Cell (at least 1+6 cores)
etc...

And of course, we cannot execute more than one Caml thread at a time...

So the questions are:
- Do we need to do something about it?
- If yes, what can be done in the short term?
- And in the long term?

IMHO, I think the Caml team need to do something... I do /love/ Ocaml but I
also do run hungry code (Non-ODE solving, temporal series discovery...) and
it seems that a lot of that code can be parallelized quite easily. I'm
already tired not to be able to go above about 65% CPU usage on my P4C 3.2
(Intel VTune, not Task-Manager figures). I've run the FPU part of some
constraint solving based on interval arithmetic in a C thread while the high
level constraint management was still in OCaml (tricky business). The FPU
ASM was nearly the same between ocamlopt and GCC 3.2 but I've nearly doubled
the perf! A simple and efficient usage of hyper-threading between strongly
FPU and ALU oriented threads (and maybe a better cache management provided
by the use of the two context entries). For some project I've seriously
considered using some language with a concurrent GC (IBM's java VM ?, C# ?).
What an awful thought, isn't it? Their god, help me! Avoid me C++ in any
way... (See also my short reply to Xavier at the end of this mail.)

So what can we do?
Here's my two cents proposal for the short/long term:

Phase 1 - Maybe provide some module in the stdlib to:
- allow easy management of multiple Caml /processes/
- allow easy communication with message passing (MPI like + Marshal)
- allow easy synchronisations between these pseudo-threads
Phase 2 - Add another module to:
- provide a standard interface to memory sharing
- allow some awful dirty hack to GC-allocate some special blocks into that
region? Some "custom2" with reference counting (ouch!) between the GCs
Phase 3 - write a concurrent GC (CGC) (re-ouch!)

Well, I've studied a bit the CGC field and read Damien's Phd... It's
frightening! But, to recast the issue in a somewhat broader context, can we
really still pretend that a "modern" generic language can live without true
concurrency in 2005?

All in all Xavier's recognized the importance of threading for OCaml, since
he's written the first linux pthread library for that very purpose...
Granted it was not aimed at performances, but it's now an open question. The
old dream of "never two times slower than C" is (was?) a reality. I use
OCaml partly for that kind of perf... Is "never height times slower than C"
(4 cores) still an acceptable tradeoffs? I don't think so and I'm worried.

Only hoping to trigger a constructive discussion,
Cordially,
Yoann.Fabre@lip6.fr

I code in Caml since 1995 (grateful victim of the ENS push on Prepa
program). I've read this list nearly daily for six years. I've written more
the 200.000 lines of OCaml code (wc -l) ranging from 3D system to ML like
byte code compiler and type-system, going through C binding of things like
CrystalSpace... I've also done dirty unsatisfactory hack in the runtime
system to implements a ultra-lightweight concurrent language (a la
Peyton-Jones feather-weight continuations) and some true interval arithmetic
etc...
Some poor fellow 	were also victim of my hard Ocaml evangelism -- Hi
Nadji, got that you're now in PhD with Francois, how is it going? Hi SebC !
Well I do not want to be conceited... I've only tried to establish that I've
got some XP in OCaml, since I know from practice that it does help in a
debate with some of the world-class coders who are wandering here...

-----------------------------------------------------------------------

Xavier Leroy 
Re: [Caml-list] Why systhreads?
2002-11-25 (10:01)              
http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2ca
cb73338fb15.en.html

What about parallelism on SMP machines?  The main issue here is that the
runtime system, and in particular the garbage collector and memory manager,
must be MP-safe. [...] a concurrent GC avoids this problem, but adds
tremendous complexity.
(Of course, all this SMP support stuff slows down the runtime system even if
there is only one processor, which is the case for almost all our users...)
[...]
Why was Concurrent Caml Light abandoned?
- Too complex; too hard to debug
- Dubious practical interest. Shared-memory multiprocessors have never
really "taken off", at least in the general public
[...]

>>> True, until now.

[...]
Even if you have a 4-processor SMP machine, it isn't clear whether you
should write your program using shared memory or using message passing
[...]

>>> OK if you only need to copy large linear block of memory. But seriously,
do you really want to marshal ~50Mo of complex Caml data structure? This
/has/ a non negligible cost and I don't want to think about maintaining
sharing (which may be mandatory). It's, what, 1.5 million accesses to the
hash table in marshal.c (with a mean of 8*4 bytes by block).

[...]
What about hyperthreading?  Well, I believe it's the last convulsive
movement of SMP's corpse :-)  We'll see how it goes market-wise.
[...] 

>>> All current CPU designs seem to disagree.

[...]
At any rate, the speedups announced for hyperthreading in the Pentium 4 are
below a factor of 1.5; probably not enough to offset the overhead of making
the OCaml runtime system thread-safe.  
[...]

>>> The ratio is more about 1.7. I've used HT for more than a year, and I
think it's a truly effective tech. Maybe the most effective tech. (from a
complexity/perf ratio viewpoint) introduced by intel since the P2.

[...]
In summary: there is no SMP support in OCaml, and it is very very unlikely
that there will ever be.

>>> Ouch!