Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: [Caml-list] How does OCaml update references when values are moved by the GC?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Michael Ekstrand <michael@e...>
Subject: Re: [Caml-list] How does OCaml update references when values are moved by the GC?
On 10/30/2010 12:15 PM, Jon Harrop wrote:
> I was hoping for a little more detail, of course. :-)
> 
> How is the mapping from old to new pointers stored? Does the GC rewrite all
> of the thread-local stacks in series before allowing any of them to
> continue?

I imagine so.

> Does the write barrier record pointers written into the major heap
> so only those specific locations are rewritten at minor heap collections but
> the entire major heap is rewritten upon compaction?

Yes.  The runtime maintains a 'remembered set', a list of pointers from
the major heap back to the minor heap.  Maintaining this set is why
mutable data can be expensive in OCaml - any time you store a pointer
into a mutable field, the runtime must check whether the new link is
from the major to the minor heap and update the refs list accordingly.
Richard WM Jones has details here:

http://rwmj.wordpress.com/2009/08/08/ocaml-internals-part-5-garbage-collection/

> Can the GC distinguish
> between an array of ints and an array of pointers at run-time in order to
> avoid traversing all of the ints when trying to rewrite pointers?

Not that I know of.  The tag block does not have a documented reserved
value to indicate that - there are values to indicate an unboxed float
array, a string, and an array of opaque values, but not an integer array
(unless the opaque value flag is set for integer arrays).

> Also, any idea what the maximum proportion of the running time of a program
> is spent doing this rewriting? For example, if you fill a float->float hash
> table with values does updating the pointers in the major heap account for a
> significant proportion of the total running time of the program?

In my data analysis jobs (which wind up allocating quite large heaps),
the compactor almost never (detectably) runs.  Minor cycles and major
slices are a much larger concern in my experience.  I work around that
by increasing the minor heap size to decrease minor heap thrashing (my
general rule of thumb is that I want each "work unit", whatever that may
be, to fit in the minor heap).

It could well be that other applications will have different
characteristics that trigger more compactions.  I cannot speak for those
applications.  Further, when I have huge floating-point data structures,
I'm usually using bigarrays (not because I choose them over arrays,
typically, but because such code in my work frequently has to interact
with BLAS or SPARSKIT at some point).

- Michael