Memory leak in ocaml runtime with backtraces+threads #7220

vicuna · 2016-04-11T12:47:11Z

Original bug ID: 7220
Reporter: robhoes
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2017-09-24T15:32:22Z)
Resolution: fixed
Priority: normal
Severity: minor
Platform: amd64
OS: Linux, Mac OS X
Version: 4.02.3
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: runtime system and C interface
Related to: #4702 #5188
Monitored by: lindig braibant @hcarty @dbuenzli

Bug description

An OCaml program with backtrace recording enabled leaks memory each time a new thread is created and completes.

Steps to reproduce

The following example program, compiled with the -g option, will very quickly eat up all your memory:

let _ =
Printexc.record_backtrace true;
let rec loop () =
let t = Thread.create (fun () ->
try
raise Not_found
with Not_found ->
()
) () in
Thread.join t;
loop ()
in
loop ()

Additional information

I’ve tried this on various compilers (4.02.3, 4.01.0 and 3.12.1), on Linux and Mac, native and bytecode, and see the problem on all of them.

Valgrind reports that the allocation of caml_backtrace_buffer in the function caml_stash_backtrace in asmrun/backtrace_prim.c is the culprit (https://github.com/ocaml/ocaml/blob/trunk/asmrun/backtrace_prim.c#L98):

if (caml_backtrace_buffer == NULL) {
Assert(caml_backtrace_pos == 0);
caml_backtrace_buffer = malloc(BACKTRACE_BUFFER_SIZE
* sizeof(backtrace_slot));
if (caml_backtrace_buffer == NULL) return;
}

The caml_backtrace_buffer variable is a global variable initialised to NULL in byterun/backtrace.c. However, it seems to be local to a thread (how does it become thread local?), is set to NULL whenever a thread is created (where?), and a new malloc happens when an exception is raised. It is not freed when the thread completes.

File attachments

PR7220-fix-backtrace-memory-leak-in-systhreads.patch

The text was updated successfully, but these errors were encountered:

vicuna · 2016-04-11T12:59:34Z

Comment author: @gasche

I'm tentatively setting the target for 4.03; we're very late in the release cycle, so if the bugfix turns out to be invasive we may have to delay it for the next release, but I hope there is a simple and safe solution.

vicuna · 2016-04-11T14:29:24Z

Comment author: bobot

how does it become thread local?
You can take a look at otherlibs/thread/scheduler.c function schedule_thread.

It is not free when the thread completes
It should be in the function thread_kill:

  if (th->backtrace_buffer != NULL) {
    free(th->backtrace_buffer);
    th->backtrace_buffer = NULL;
  }

The problem seems more complicated.

vicuna · 2016-04-11T14:35:49Z

Comment author: robhoes

Thanks, I was grepping for caml_backtrace_buffer and didn't notice the #define backtrace_buffer.

I suppose that the thread_kill function corresponds to the OCaml function Thread.kill, and is not called when the thread exits otherwise.

vicuna · 2016-04-11T14:46:24Z

Comment author: @gasche

I looked at the issue a bit as well. I don't know anything about threads, but my understanding is that the code in
otherlibs/threads/scheduler.c corresponds to vmthreads.cm(x)a
otherlibs/systhreads/st_stubs.c corresponds to threads.cm(x)a
in particular, the problem that is being observed here is in the interaction between backtrace_prim.c (in either byterun or asmrun) and systhreads/st_stubs.c, rather than scheduler.c -- but we should also fix scheduler.c.

vicuna · 2016-04-11T19:26:23Z

Comment author: @gasche

Here is my understanding of what is happening right now.

caml_thread_{enter,leave}_blocking_section follow a bracketing discipline where

thread_enter_blocking_section will save global resources into
a thread-specific storage space (the caml_thread_struct),
including caml_backtrace_buffer and related backtrace-related globals
thread_leave_blocking_section will restore the global resources
from the thread-specific storage

st_stubs.c also has a "master runtime lock" that protects the global
data of the scheduler (list of current threads, etc.). This lock must
be held at thread creation, deletion, scheduling time. It is also held
as long as a caml thread is still running, that is outside the
blocking section: it is acquired by
caml_thread_leave_blocking_section, and released by
caml_thread_enter_blocking_section.

I will call "thread-specific backtrace buffer" the value of the
thread-specific stored backtrace buffer, and "global backtrace buffer"
the global caml_backtrace_buffer value.

Upon creating a new thread, the thread-specific backtrace buffer is
set to NULL. Upon freeing a thread, it is freed if not NULL
anymore.

The memory leak happens as the following events happen in order
in your reproduction case, when a thread is created and then joined
from the "main thread":

thread creation: the new thread is created with its thread-specific
buffer set to NULL
thread join (into the scheduler): the main thread releases control, enters the
blocking section, storing the current global backtrace buffer as
its thread-specific buffer.
thread join (out of the scheduler): the new thread gains control,
leaves the blocking section, restoring its current thread-specific
buffer (NULL) into the global backtrace buffer.
An exception is raised with backtrace enabled; the runtime tries to
collect the backtrace into the global backtrace buffer, notices that it is NULL,
and malloc() a new backtrace buffer in it. The thread-specific buffer value is
not changed, it is still NULL.
When a thread terminate, caml_thread_stop() is called (see
caml_thread_start declaration for how this is
implemented). caml_thread_stop calls caml_thread_remove_info which
cleanups the thread-local structure; but the thread-local backtrace
buffer of the new thread is still NULL, so nothing happens. Control
is given back to the scheduler by releasing the master runtime
lock, without calling caml_thread_enter_blocking_section (this
would be invalid as the thread-specific storage does not exist, and
also we must leave in the same state we were when we started the
thread, and that is just after calling
caml_thread_leave_blocking_section from the main thread).

The memory leak happens between steps (3) and (4): a new backtrace
buffer has been allocated by the backtrace code, but the thread-local
structure does not know about it so it is not properly cleaned-up when
stopping the thread.

(Possible ideas on how to fix the leak in a separate message.)

vicuna · 2016-04-11T19:36:39Z

Comment author: @gasche

Possible fixes:

At the time of stopping a thread, store the current global
backtrace buffer in the thread-local buffer, so that it gets
cleaned up correctly by caml_thread_remove_info.

I'm not fond of this approach because this is a hardcoded fix for
a specific problem that doesn't feel right in the code, we may be
forgetting about other similar issues or other instances of the
problem.
Add a hook to the runtime so that when a new backtrace is
allocated, the thread-local storage is updated correspondingly.

This is an invasive change and it risks degrading the performance
of the single-threaded code, which we don't want.
Clarify the invariant between the thread-local storage and the
global runtime data, and restore it more often during thread
scheduling operations.

This seems more general than (1) but also possibly a source of
inefficiency for threaded code.
Always assume that threaded code may enable backtrace collection
and allocate an array of the right size instead of NULL at thread
creation time.

This makes thread creation more costly. It is also a rather
special-case change, similar to (1), so we may be missing other
issues.
Split caml_thread_{enter,leave}_blocking_section in the
local-store/global-restore logic and the runtime lock control
logic. At thread stopping time, call the local-store logic.

I think this is the most principled solution, and also a good one
in terms of performance (thread stopping becomes marginally slower,
but I don't think this is an issue).

vicuna · 2016-04-11T19:56:18Z

Comment author: @gasche

I just uploaded a patch implementing approach (5), that fixes the leak on the provided reproduction code on my machine. Interestingly, this allows to remove some lines that would save curr_thread->stack_low at the very same point of code for the exact same reason -- save the full thread-local state would have avoided the present bug:

https://github.com/ocaml/ocaml/blob/5401ce8/otherlibs/systhreads/st_stubs.c#L462-L466

vicuna · 2016-04-11T20:17:45Z

Comment author: @gasche

I think that the proposed patch fixes the issue as reported, but I wonder whether there aren't similar bug in other parts of the codebase:

there is a way to register and unregister threads from C code instead of the Thread module, does it suffer from the same leak?
does the -vmthread-compiled code (running thread/scheduler.c) suffer from the same leak?

For (1) I think that no change is required: when you unregister a thread from the C code, you must already be inside the blocking section, which means that enter_blocking_section should have been called since the last time OCaml code was running, so the thread-local data should be up-to-date with respect to the global runtime state.

For (2) I also think that no change is required. The thread_kill is the one that does cleanup the thread-local backtrace buffer; if the thread being killed is the current thread, it first activates another thread
https://github.com/ocaml/ocaml/blob/5401ce8/otherlibs/threads/scheduler.c#L741-L747
which means that the save-the-global-state logic is being run before the cleanup logic. (If the thread being killed is not the current thread, it has been interrupted first and so the global state was saved then.)

vicuna · 2016-04-11T20:25:38Z

Comment author: bobot

@gasche as usual well investigated and well explained!

Remarks on the patch:

nice refactoring
/* Restore the stack-related global variables */ this copy pasted comment seems out of date, or at least limiting.
+ curr_thread->backtrace_buffer = caml_backtrace_buffer; seems to be an investigation left over, it is already done in caml_thread_save_thread_state();., no?

FWIW I think it is small enough for 4.03.

vicuna · 2016-04-11T20:27:44Z

Comment author: braibant

Looking at

https://github.com/ocaml/ocaml/blob/5401ce8/otherlibs/systhreads/st_stubs.c#L362

don't you need to perform the cleaning of the other threads there too? That is, why isn't line 372 a call to caml_thread_remove_info?

vicuna · 2016-04-11T20:38:38Z

Comment author: @gasche

bobot: oops, indeed this is a leftover, I'll remove this line, thanks!

braibant: my understanding is that this function is only called after fork(), only in the newly forked (child) process. This means that there is no need (and it would be wrong) to reclaim and cleanup here: the threads may still be used in the parent process, and will be cleaned up there.

vicuna · 2016-04-12T09:30:26Z

Comment author: robhoes

Fantastic, thanks for the quick investigation and patch, gasche! This seems to work for me on my example program.

This issue turned out to be quite a big deal for XenServer. We are currently using OCaml 4.02.2, and your patch applies there as well. I am currently testing it to see if our memory leaks go away.

vicuna · 2016-04-12T14:39:12Z

Comment author: @damiendoligez

OK for 4.03.

vicuna · 2016-04-12T17:10:50Z

Comment author: robhoes

My testing of today suggests that this patch fixes the leak in the xenopsd daemon of XenServer as well. Thanks again!

vicuna · 2016-04-12T20:54:04Z

Comment author: @gasche

I committed the fix in 4.03:
bf775ed

(bobot: in the comment I renamed "stack-local variables" into "runtime state", and accordingly caml_thread_{save,restore}thread_state are now caml_thread{save,restore}_runtime_state)

Thinking more about braibant's remark, I'm not sure I understand myself why freeing the thread structure at the beginning of the forked child's life suffices: if the backtrace buffers are copied at fork time, then we should free the thread structure in depth (also free the backtrace buffers reachable from them) instead of shallowly as is done currently. However, I have tried to produce a reproduction case for a leak and I was not able to: the code at the end of this comment will leak without the uploaded patch, but does not leak anymore with the proposed patch.

I suspect that the reason why just a shallow free of the thread structure suffices for the child may be the copy-on-write implementation of fork on my system (and, I would assume, most fork-supporting systems): the thread-specific backtrace buffers would only actually be copied if they were modified in either the parent or the child's copy of the memory. This is impossible in the child, as they are discarded at child-birth time; in theory we could have a race where the parent process mutates its own copies of the backtrace buffer during the fork, turning the child's buffers into unreclaimed copy, but I don't know how to reproduce that.

Here is the code I tried (assuming that the child's copy of the parent's thread-local backtrace buffers would not be freed) and which does not leak on my system. If someone knows how to find a system where fork() performs a deep copy instead of copy-on-write, it would be interesting to try there.

let create_backtrace () = try raise Not_found with Not_found -> ()

let _ =
Printexc.record_backtrace true;
create_backtrace ();
let rec loop n =
if n = 0 then ()
else begin
Unix.sleepf 0.001;
match Unix.fork () with
| 0 ->
let threads =
Array.init 10_000
(fun _ -> Thread.create create_backtrace ()) in
Array.iter Thread.join threads;
loop (n - 1)
| pid -> exit 0
| exception _ -> exit 0
end
in
loop (try int_of_string (Sys.argv.(1)) with _ -> max_int)

vicuna · 2016-04-20T00:35:39Z

Comment author: junsli

Just curious: What does the "st" mean in the interface file? It mentioned "st interface." poSix Threads? Standard Threads?

vicuna · 2016-04-20T01:24:35Z

Comment author: @gasche

I would guess "system threads", by opposition with the green threads implementation where the scheduling is implemented by the OCaml runtime.

vicuna closed this as completed Sep 24, 2017

vicuna added the stdlib label Mar 14, 2019

vicuna added this to the 4.03.0 milestone Mar 14, 2019

vicuna assigned gasche Mar 14, 2019

This was referenced Apr 11, 2016

C-threads and callbacks #4702

Closed

double free corruption with bytecode system threads and stack reallocation #5188

Closed

vicuna added the bug label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in ocaml runtime with backtraces+threads #7220

Memory leak in ocaml runtime with backtraces+threads #7220

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 20, 2016

vicuna commented Apr 20, 2016

Memory leak in ocaml runtime with backtraces+threads #7220

Memory leak in ocaml runtime with backtraces+threads #7220

Comments

vicuna commented Apr 11, 2016

Bug description

Steps to reproduce

Additional information

File attachments

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 11, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 12, 2016

vicuna commented Apr 20, 2016

vicuna commented Apr 20, 2016