Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
How to re-implement the GC?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2010-09-13 (21:26)
From: Jon Harrop <jonathandeanharrop@g...>
Subject: RE: [Caml-list] Re: How to re-implement the GC?
Hi Eray,

Retrofitting a new multicore-friendly GC onto OCaml is difficult for two
main reasons:

1. You want plug-and-play GCs but the OCaml compiler is tightly coupled to
the old GC (although OC4MC has decoupled them!).

2. Recovering similar performance whilst reusing the same data
representation is extremely difficult because the current design relies so
heavily on lightweight allocation. You really want to change the data
representation to avoid unnecessary boxing (e.g. never box or tag int,
floats or tuples) in order to reduce the allocation rate and, consequently,
the stress on the garbage collector but OCaml cannot express value types and
its ability to represent polymorphic recursion makes this extremely
difficult to implement.

As Sylvain has said, OC4MC is your best bet if you want to try to write a
new GC for OCaml. However, making more extensive changes has the potential
to address many more problems (e.g. convey run-time type information for
generic printing) so you might consider alternatives like trying to compile
OCaml's bytecode into HLVM's IR for JIT compilation because HLVM already has
a multicore friendly GC and it is much easier to develop.

> Ah, that's interesting. I wonder if it provides any real speedup on new
> architectures compared to storing the pointer in RAM.

For a multicore GC, using a register to refer to thread-local data is a huge
win because accessing thread-local data is very slow. I made that mistake in
HLVM's first multicore-capable GC and it was basically useless as a
consequence because all of the time was spent looking up thread-local data.
When I started passing the thread-local data around as an extra argument to
every function (not necessarily consuming a register all the time because
LLVM is free to spill it), performance improved enormously.