Re: Final_tag and garbage collection

Damien Doligez (Damien.Doligez@inria.fr)
Wed, 8 Nov 95 14:56:10 +0100

Date: Wed, 8 Nov 95 14:56:10 +0100
From: Damien.Doligez@inria.fr (Damien Doligez)
Message-Id: <9511081356.AA14159@couchey.inria.fr>
To: caml-list@pauillac.inria.fr
Subject: Re: Final_tag and garbage collection

This is what I've replied to Robbert. Forgot to do the Cc. This may
be of interest to a few other people.

-- Damien

-----
>From doligez Tue Nov 7 14:25:42 1995
To: rvr@cs.cornell.edu
Subject: Re: Final_tag and garbage collection

You're pointing to a lack of documentation here. It's perfectly
normal that objects allocated in the minor heap don't get finalized,
because the minor GC is a copying GC, so it doesn't examine dead
objects.

The runtime of Caml Light version 0.7 has a function "alloc_final"
that you should use to allocate finalized objects. It calls
alloc_shr.

The reasoning behind this design is that it would be very costly to
have the minor GC examine all dead objects to find the finalized
objects. Moreover, calling the finalization function is moderately
costly, so the additional cost of garbage collection (caused by
bypassing the minor heap) is not too much overhead.

Please tell us if this is not the case in your application.

-- Damien
-----

[Robbert then told me that the GC is not finalizing his objects fast
enough, and asked what the parameters "mem" and "max" are used for in
alloc_final]

-----
>From doligez Wed Nov 8 14:49:10 1995
To: rvr@CS.Cornell.EDU
Subject: Re: Final_tag and garbage collection

>Is it that alloc_shr blocks are allocated
>in the major heap, and it will just take a long time before I see the
>finalization of the alloc()'ed objects?

Yes. The problem here is that the major GC speed (relative to the
program) is set by measuring the allocation rate of the program. This
means that if your finalized objects are small, there may be a lot of
them in the state of "floating garbage" (unreferenced objects that the
GC has not yet collected), with huge amounts of resources referenced
(out of the heap) by these objects.

There is a workaround.

>I found alloc_final, but besides the size and a finalization function it
>also takes two arguments mem and max. What are those for?

This is the workaround. When you allocate a finalized object, you
specify what amount of out-of-heap resources it consumes, so the GC
knows it must work faster in order to collect these resources, and not
only the in-heap size of the object. These out-of-heap resources may
be malloc'ed memory, or file descriptors, or whatever. You give the
amount of resources allocated to this object as parameter "mem", and
the maximum amount that you want to see in floating garbage as
parameter "max". The units are whatever you want (they are only used
relative to each other by the GC).

If you allocate several kinds of out-of-heap resources, you can use
different units in different calls to alloc_final. This will make the
GC work faster than necessary (using more CPU time and less memory);
hopefully this will never become a problem.

I really need to document this. Tell me if the above description is
clear enough.

-- Damien

P.S. It may be the case that I'm off by a factor of two and the
maximum amount of resources in floating garbage is twice "max".
The calculations are tricky and I don't remember well.