Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: Why OCaml sucks
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <info@g...>
Subject: Re: [Caml-list] Re: Why OCaml rocks

Am Freitag, den 09.05.2008, 06:09 +0100 schrieb Jon Harrop:
> On Friday 09 May 2008 05:45:53 you wrote:
> > On Thu, May 8, 2008 at 5:39 PM, Jon Harrop <jon@ffconsultancy.com> wrote:
> > > 1. Lack of Parallelism: Yes, this is already a complete show stopper.
> > > Exploiting multicores requires a scalable concurrent GC and message
> > > passing (like JoCaml) is not a substitute. Unfortunately, this is now
> > > true of all functional languages available for Linux, which is why we
> > > have now migrated entirely to Windows and F#. I find it particularly
> > > ironic that the Haskell community keep hyping the multicore capabilities
> > > of pure code when the rudimentary GC in Haskell's only usable
> > > implementation already stopped scaling.
> >
> > Fork?  For something like a raytracer, I do not see how threads would be
> > any more useful than fork.

I think the parallelism capabilities are already excellent. We have been
able to implement the application backend of Wink's people search in
O'Caml, and it is of course a highly parallel system of programs. This
is not the same class raytracers or desktop parallelism fall into - this
is highly professional supercomputing. I'm talking about a cluster of
~20 computers with something like 60 CPUs.

Of course, we did not use multithreading very much. We are relying on
multi-processing (both "fork"ed style and separately started programs),
and multiplexing (i.e. application-driven micro-threading). I especially
like the latter: Doing multiplexing in O'Caml is fun, and a substitute
for most applications of multithreading. For example, you want to query
multiple remote servers in parallel: Very easy with multiplexing,
whereas the multithreaded counterpart would quickly run into scalability
problems (threads are heavy-weight, and need a lot of resources).

> There are two problems with that:
> 
> . You go back to manual memory management between parallel threads/processes.

I guess you refer to explicit references between processes. This is a
kind of problem, and best handled by avoiding it. We have some cases
where we have to keep remote state. The solution was to have a timer,
and delete it after some time of not accessing it. 

After all, most state is only temporary, and if it is lost, it can be
created again (at some cost, of course).

> . Parallelism is for performance and performance requires mutable data 
> structures.

In our case, the mutable data structures that count are on disk.
Everything else is only temporary state.

I admit that it is a challenge to structure programs in a way such that
parallel programs not sharing memory profit from mutable state. Note
that it is also a challenge to debug locks in a multithreaded program so
that they run 24/7. Parallelism is not easy after all.

> Then you almost always end up copying data unnecessarily because you cannot 
> collect it otherwise, which increases memory consumption and massively 
> degrades performance that, in turn, completely undermines the original point 
> of parallelism. 

Ok, I understand. We are complete fools. :-)

I think that the cost of copying data is totally overrated. We are doing
this often, and even over the network, and hey, we are breaking every
speed limit.

> The cost of interthread communication is then so high in 
> OCaml that you will rarely be able to obtain any performance improvement for 
> the number of cores desktop machines are going to see over the next ten 
> years, by which time OCaml will be 10-100x slower than the competition.

This is a quite theoretical statement. We will rather see that most
application programmers will not learn parallelism at all, and that
consumers will start question the sense of multicores, and the chip
industry will search for alternatives.

And _if_ application programmers learn parallelism, then rather in the
multi-processing/multiplexing setup we use, and not the multithreading
style you propagate. And on servers (where parallelism ought to happen),
the poor support of Windows for it (lacking "fork" and other cool
features) is no problem.

Gerd


> 
> > When was the last time you heard of a cool new windows app anyway?
> 
> The last time we released a product. :-)
> 
> > > . No 16Mb limit.
> >
> > What do you mean by 16mb limit?
> 
> OCaml's strings and arrays are limited to 16Mb in 32-bit.
> 
> > > . Inlining.
> >
> > isn't it best for the compiler to handle that?  I wouldn't mind hearing
> > another perspective on this, but I thought that compilers were smarter
> > these days.
> 
> Definitely not. Compilers uniformly suck at inlining. For example, agressive 
> inlining is often beneficial in numerical code and often damaging in symbolic 
> code. Compilers cannot tell the difference.
> 
> This is very similar to "unboxed data structures are always better", which 
> also isn't generally true.
> 
> I've got more gripes to add:
> 
> . Missing types, like float32 and int16.
> . DLLs.
> 
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------