Version française
Home     About     Download     Resources     Contact us    
Browse thread
The GC is not collecting... my mistake?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Markus Mottl <markus.mottl@g...>
Subject: Re: [Caml-list] The GC is not collecting... my mistake?
On 11/6/07, Loup Vaillant <loup.vaillant@gmail.com> wrote:
> I thought the GC could collect the first values of my streams when the
> program don't need them any more, but it doesn't seem to be the case.
> Unfortunately, I was unable to reduce my problem to a proper minimum
> example, so I send it all.

Funny, my colleagues and I are also currently investigating a space
leak in OCaml.  Here is a short example:

file: foo.ml
---------------------------------------------------------------------------------------------
let alloc () = String.create 1, String.create 2
let alloc_loop () = for i = 1 to 1_000_000 do ignore (alloc ()) done
let print_len str = Printf.printf "%d\n%!" (String.length str)
let finaliser str = Printf.printf "finalized\n%!"

let main1 () =
  let a, b = alloc () in
  Gc.finalise finaliser a;
  print_len a;
  alloc_loop ();
  print_len b

let main2 () =
  let a, b = alloc () in
  Gc.finalise finaliser a;
  print_len a;
  let b_ref = ref b in
  alloc_loop ();
  print_len !b_ref

let () = if Sys.argv.(1) = "1" then main1 () else main2 ()
---------------------------------------------------------------------------------------------

If you compile this to native code, running "foo 1" will print "1"
followed by "2".  If you run "foo 2" it will print "1", then
"finalized", then "2".  Byte code will only print "1" and "2" in any
case - hm, weird.

Obviously, OCaml does not reclaim the tuple during the allocation loop
even though it could (and IMHO should).  This can introduce
substantial space leaks as happened to us.

It seems that OCaml keeps tuples around as long as you can still
access a binding that was created by matching the tuple.  Though
deferring access to tuple elements might slightly improve performance,
because there is no need to push things on the stack, etc., it can
make reasoning about space usage quite hard.

My colleagues and I agree that the current implementation violates
user expectations.  People will generally assume that heap objects
which are only referenced by a binding will go away as soon as they
cannot be accessed anymore.  The workaround as shown above in main2
(using intermediate references) is cumbersome and obviously doesn't
work with byte code anyway.

The remaining question is how the correct behavior could be
efficiently implemented.    I have no idea how the code generator and
runtime interact, but my guess is that some simple heuristics could be
used whether to defer access to tuple elements or not:

Tuple access can always be deferred as long as the tuple as a whole
could still be referenced in the same function (e.g. in the case of a
"let x, y, z as tpl" binding, where "tpl" may still be used).

If there is no function call (after inlining) or loop between the
creation of the bindings and their last use, then the access to the
tuple fields can also be deferred.  This would be beneficial e.g. in
the presence of branches that only use some of the bound variables,
but the user wants to create the match in one place only for
convenience.

Otherwise the tuple could be copied into the stack (or even
registers), and we use this transient object for access.  As soon as
some element cannot be accessed anymore, we simply overwrite the
corresponding slot on the stack (or the register) with an atomic value
(e.g. Val_unit) to destroy the reference to the object so that the GC
can reclaim it.

I hope this provides some food for thought.  Are there any plans to
fix this problem?

Best regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com