Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: OCaml is broken
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <gerd@g...>
Subject: Re: [Caml-list] Re: OCaml is broken
> The following web page describes a commercial machine sold by Azul Systems 
> that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in a flat 
> SMP configuration:
> 
>   http://www.azulsystems.com/products/compute_appliance.htm
> 
> As you can see, a GC with shared memory can already scale across dozens of 
> cores and memory access is no more heterogeneous than it was 20 years ago. 
> Also, note that homogeneous memory access is a red herring in this context 
> because it does not undermine the utility of a shared heap on a multicore.

The benchmarks they mention can all easily be parallelized - that stuff
you can also do with multi-processing. The interesting thing would be an
inherent parallel algorithm where the same memory region is accessed by
multiple threads. Or at least a numeric program (your examples seem to
be mostly from that area).

> > - Have you considered that many Ocaml users prefer a GC that offers maximum
> > single core performance, 
> 
> OCaml's GC is nowhere near offering maximum single core performance. Its 
> uniform data representation renders OCaml many times slower than its 
> competitors for many tasks. For example, filling a 10M float->float hash 
> table is over 18x slower with OCaml than with F#. FFT with a complex number 
> type is 5.5x slower with OCaml than F#. Fibonacci with floats is 3.3x slower 
> with OCaml than my own HLVM project (!).

Sure, but these micro benchmarks are first seldom correct, and do not
really count for real-world programs.

For example, an important parameter of such benchmarks is the frequency
the GC runs. Ocaml runs the GC very often - good for latencies, but bad
for micro benchmarks because other runtimes simply delay the GC until
some limits are exceeded, so these other runtimes often haven't run the
GC even once in the short period of time the benchmark runs.

It is simply a fact that the ocaml developers had some preferences. E.g.
allocating and freeing short-living values is extremely fast (often
<10ns). This is very good when you do symbolic computations, or have
lots of small strings, but ignorable for numeric stuff, or for programs
where the lifetime of allocated memory is bound to server sessions. The
minor GC is very fast, but, as you observe, the uniform representation
has costs elsewhere.

> >   because their application is parallelised via multiple processes
> >   communicating via message passing? 
> 
> A circular argument based upon the self-selected group of remaining OCaml 
> users. Today's OCaml users use OCaml despite its shortcomings. If you want to 
> see the impact of OCaml's multicore unfriendliness, consider why the OCaml 
> community has haemorrhaged 50% of its users in only 2 years.

Don't see that. That's just speculation - maybe some win32 ocaml users
switched to F#, but there are for sure also other reasons than multicore
support, e.g. GUIs and better Windows integration. Btw, where do you get
your numbers from?

There are many, many users for whom multicore is just a useless hype.
Either the algorithms are inherently difficult to parallelize (and this
is vast majority), or are that easy (like all client/server stuff) that
multi-processing is sufficient. You can consider multicore as a
marketing trick of the chip industry to let the ordinary desktop user
pay for a feature that is mostly interesting for datacenters.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------