Version française
Home     About     Download     Resources     Contact us    
Browse thread
OCaml Summer Project decisions are in
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: [Caml-list] OCaml Summer Project decisions are in
Berke Durak wrote:

> - How "stoppy" would a stop-the-world parallel GC be in practice?
> The more parallelism you have, the more work is done, the higher the
> frequency of a major collection.

By Amdahl's law, a stop-the-world GC cannot scale well, but it might
be good enough for 2 to 4 cores.  For instance, assuming a sequential
program that spends 25% of its time in garbage collection and
parallelizes with low overhead, you would get a speedup of 2 on
a 4-core machine.  A related question is how long it takes to stop a
large number of threads.  But only experimentation can tell.

> - I'm afraid true concurrency will introduce an awful lot of bugs in
> native bindings.  Thread-unsafe libraries will have to be replaced
> (Str, etc.)

Libraries with stateful APIs (such as Str, but also Hashtbl, etc)
already need user-level locking to be safely used in concurrent programs,
even with today's concurrent-but-not-parallel threading model for Caml.
The only additional problem I can see is with C bindings that expose a
functional API but have global state in their implementation, but I'm
not sure there are many. So, I wouldn't worry too much about this at
the moment.  But there are more acute problems with global C state in
the Caml run-time system itself (see below).

> Also what would be the CPU and memory costs?  Don't concurrent GCs
> require extra colors?

Not the ones I know.

Benjamin Canou wrote:

> So our proposal is to let this project be more "a first reusable step
> toward parallelism in OCaml" than "a parallel OCaml".
> More practically, we propose the following subtasks:
>   1. To strip down the current run-time library, rewriting some parts
> which are too much dependent on the current GC
>   2. To clean the (small) parts of the compiler preventing us from
> changing the allocator (for example, OCaml inlines some allocations by
> directly emitting code which modifies the heap pointer).
>   3. To define a clean and documented interface for adding new GCs,
> ideally adding a run-time switch to choose the GC.
>   4. To to reinject the current GC, or a simpler sequential GC we
> already wrote for another work, using this interface to validate the
> approach.
>   5. To design a first parallel GC, simple enough for us to be able to
> test and benchmark it before the end of the project and to implement it
> within our interface.

This sounds globally reasonable for a first step, although you should
be careful not to completely shift your objectives from parallelism to
plug-in GCs.

There are two point that I'd like to emphasize. The first step towards
any form of true parallelism in Caml (including message-passing
between threads having their own heaps and sequential GCs) is to clean
up the numerous global C variables that the run-time system uses.
But maybe this was implicit in your point 1.

Another crucial point is the ability to stop all threads and obtain
their roots, which is not obvious at all and a prerequisite for any
form of concurrent garbage collection.

At any rate, it's probably better not to ask too many questions at
this point and let you concentrate on the task at hand.  Feel free to
contact me and Damien Doligez for specific questions or general
advice.

- Xavier Leroy