Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007220OCamlruntime system and C interfacepublic2016-04-11 14:472017-09-24 17:32
Reporterrobhoes 
Assigned Togasche 
PrioritynormalSeverityminorReproducibilityalways
StatusclosedResolutionfixed 
Platformamd64OSLinux, Mac OS XOS Version
Product Version4.02.3 
Target Version4.03.0+dev / +beta1Fixed in Version4.03.0+dev / +beta1 
Summary0007220: Memory leak in ocaml runtime with backtraces+threads
DescriptionAn OCaml program with backtrace recording enabled leaks memory each time a new thread is created and completes.
Steps To ReproduceThe following example program, compiled with the `-g` option, will very quickly eat up all your memory:

let _ =
 Printexc.record_backtrace true;
 let rec loop () =
   let t = Thread.create (fun () ->
     try
       raise Not_found
     with Not_found ->
       ()
   ) () in
   Thread.join t;
   loop ()
 in
 loop ()
Additional InformationI’ve tried this on various compilers (4.02.3, 4.01.0 and 3.12.1), on Linux and Mac, native and bytecode, and see the problem on all of them.

Valgrind reports that the allocation of `caml_backtrace_buffer` in the function `caml_stash_backtrace` in asmrun/backtrace_prim.c is the culprit (https://github.com/ocaml/ocaml/blob/trunk/asmrun/backtrace_prim.c#L98 [^]):

 if (caml_backtrace_buffer == NULL) {
   Assert(caml_backtrace_pos == 0);
   caml_backtrace_buffer = malloc(BACKTRACE_BUFFER_SIZE
                                  * sizeof(backtrace_slot));
   if (caml_backtrace_buffer == NULL) return;
 }

The `caml_backtrace_buffer` variable is a global variable initialised to NULL in byterun/backtrace.c. However, it seems to be local to a thread (how does it become thread local?), is set to NULL whenever a thread is created (where?), and a new malloc happens when an exception is raised. It is not freed when the thread completes.
TagsNo tags attached.
Attached Filespatch file icon PR7220-fix-backtrace-memory-leak-in-systhreads.patch [^] (7,810 bytes) 2016-04-11 22:42 [Show Content]

- Relationships
related to 0005188closedxleroy double free corruption with bytecode system threads and stack reallocation 
related to 0004702closedxleroy C-threads and callbacks 

-  Notes
(0015704)
gasche (administrator)
2016-04-11 14:59

I'm tentatively setting the target for 4.03; we're very late in the release cycle, so if the bugfix turns out to be invasive we may have to delay it for the next release, but I hope there is a simple and safe solution.
(0015705)
bobot (reporter)
2016-04-11 16:29

> how does it become thread local?
You can take a look at otherlibs/thread/scheduler.c function schedule_thread.

> It is not free when the thread completes
It should be in the function thread_kill:
```
  if (th->backtrace_buffer != NULL) {
    free(th->backtrace_buffer);
    th->backtrace_buffer = NULL;
  }
```

The problem seems more complicated.
(0015706)
robhoes (reporter)
2016-04-11 16:35

Thanks, I was grepping for `caml_backtrace_buffer` and didn't notice the `#define backtrace_buffer`.

I suppose that the `thread_kill` function corresponds to the OCaml function `Thread.kill`, and is not called when the thread exits otherwise.
(0015707)
gasche (administrator)
2016-04-11 16:46

I looked at the issue a bit as well. I don't know anything about threads, but my understanding is that the code in
  otherlibs/threads/scheduler.c corresponds to vmthreads.cm(x)a
  otherlibs/systhreads/st_stubs.c corresponds to threads.cm(x)a
in particular, the problem that is being observed here is in the interaction between backtrace_prim.c (in either byterun or asmrun) and systhreads/st_stubs.c, rather than scheduler.c -- but we should also fix scheduler.c.
(0015709)
gasche (administrator)
2016-04-11 21:26

Here is my understanding of what is happening right now.

caml_thread_{enter,leave}_blocking_section follow a bracketing discipline where
- thread_enter_blocking_section will save global resources into
  a thread-specific storage space (the caml_thread_struct),
  including caml_backtrace_buffer and related backtrace-related globals
- thread_leave_blocking_section will restore the global resources
  from the thread-specific storage

st_stubs.c also has a "master runtime lock" that protects the global
data of the scheduler (list of current threads, etc.). This lock must
be held at thread creation, deletion, scheduling time. It is also held
as long as a caml thread is still running, that is outside the
blocking section: it is acquired by
caml_thread_leave_blocking_section, and released by
caml_thread_enter_blocking_section.

I will call "thread-specific backtrace buffer" the value of the
thread-specific stored backtrace buffer, and "global backtrace buffer"
the global caml_backtrace_buffer value.

Upon creating a new thread, the thread-specific backtrace buffer is
set to NULL. Upon freeing a thread, it is freed if not NULL
anymore.

The memory leak happens as the following events happen in order
in your reproduction case, when a thread is created and then joined
from the "main thread":

0. thread creation: the new thread is created with its thread-specific
   buffer set to NULL

1. thread join (into the scheduler): the main thread releases control, enters the
   blocking section, storing the current global backtrace buffer as
   its thread-specific buffer.

2. thread join (out of the scheduler): the new thread gains control,
   leaves the blocking section, restoring its current thread-specific
   buffer (NULL) into the global backtrace buffer.

3. An exception is raised with backtrace enabled; the runtime tries to
   collect the backtrace into the global backtrace buffer, notices that it is NULL,
   and malloc() a new backtrace buffer in it. The thread-specific buffer value is
   *not* changed, it is still NULL.

4. When a thread terminate, caml_thread_stop() is called (see
   caml_thread_start declaration for how this is
   implemented). caml_thread_stop calls caml_thread_remove_info which
   cleanups the thread-local structure; but the thread-local backtrace
   buffer of the new thread is still NULL, so nothing happens. Control
   is given back to the scheduler by releasing the master runtime
   lock, *without* calling caml_thread_enter_blocking_section (this
   would be invalid as the thread-specific storage does not exist, and
   also we must leave in the same state we were when we started the
   thread, and that is just after calling
   caml_thread_leave_blocking_section from the main thread).

The memory leak happens between steps (3) and (4): a new backtrace
buffer has been allocated by the backtrace code, but the thread-local
structure does not know about it so it is not properly cleaned-up when
stopping the thread.

(Possible ideas on how to fix the leak in a separate message.)
(0015710)
gasche (administrator)
2016-04-11 21:36

Possible fixes:

1. At the time of stopping a thread, store the current global
   backtrace buffer in the thread-local buffer, so that it gets
   cleaned up correctly by caml_thread_remove_info.

   I'm not fond of this approach because this is a hardcoded fix for
   a specific problem that doesn't feel right in the code, we may be
   forgetting about other similar issues or other instances of the
   problem.

2. Add a hook to the runtime so that when a new backtrace is
   allocated, the thread-local storage is updated correspondingly.

   This is an invasive change and it risks degrading the performance
   of the single-threaded code, which we don't want.

3. Clarify the invariant between the thread-local storage and the
   global runtime data, and restore it more often during thread
   scheduling operations.
   
   This seems more general than (1) but also possibly a source of
   inefficiency for threaded code.

4. Always assume that threaded code may enable backtrace collection
   and allocate an array of the right size instead of NULL at thread
   creation time.

   This makes thread creation more costly. It is also a rather
   special-case change, similar to (1), so we may be missing other
   issues.

5. Split caml_thread_{enter,leave}_blocking_section in the
   local-store/global-restore logic and the runtime lock control
   logic. At thread stopping time, call the local-store logic.

   I think this is the most principled solution, and also a good one
   in terms of performance (thread stopping becomes marginally slower,
   but I don't think this is an issue).
(0015711)
gasche (administrator)
2016-04-11 21:56

I just uploaded a patch implementing approach (5), that fixes the leak on the provided reproduction code on my machine. Interestingly, this allows to remove some lines that would save curr_thread->stack_low at the very same point of code for the exact same reason -- save the full thread-local state would have avoided the present bug:

  https://github.com/ocaml/ocaml/blob/5401ce8/otherlibs/systhreads/st_stubs.c#L462-L466 [^]
(0015712)
gasche (administrator)
2016-04-11 22:17

I think that the proposed patch fixes the issue as reported, but I wonder whether there aren't similar bug in other parts of the codebase:

1. there is a way to register and unregister threads from C code instead of the Thread module, does it suffer from the same leak?

2. does the -vmthread-compiled code (running thread/scheduler.c) suffer from the same leak?

For (1) I think that no change is required: when you unregister a thread from the C code, you must already be inside the blocking section, which means that enter_blocking_section should have been called since the last time OCaml code was running, so the thread-local data should be up-to-date with respect to the global runtime state.

For (2) I also think that no change is required. The thread_kill is the one that does cleanup the thread-local backtrace buffer; if the thread being killed is the current thread, it first activates another thread
  https://github.com/ocaml/ocaml/blob/5401ce8/otherlibs/threads/scheduler.c#L741-L747 [^]
which means that the save-the-global-state logic is being run before the cleanup logic. (If the thread being killed is not the current thread, it has been interrupted first and so the global state was saved then.)
(0015713)
bobot (reporter)
2016-04-11 22:25

@gasche as usual well investigated and well explained!

Remarks on the patch:
 * nice refactoring
 * `/* Restore the stack-related global variables */` this copy pasted comment seems out of date, or at least limiting.
 * `+ curr_thread->backtrace_buffer = caml_backtrace_buffer;` seems to be an investigation left over, it is already done in `caml_thread_save_thread_state();`., no?

FWIW I think it is small enough for 4.03.
(0015714)
braibant (reporter)
2016-04-11 22:27
edited on: 2016-04-12 16:37

Looking at

https://github.com/ocaml/ocaml/blob/5401ce8/otherlibs/systhreads/st_stubs.c#L362 [^]

don't you need to perform the cleaning of the other threads there too? That is, why isn't line 372 a call to caml_thread_remove_info?

(0015718)
gasche (administrator)
2016-04-11 22:38
edited on: 2016-04-12 01:34

bobot: oops, indeed this is a leftover, I'll remove this line, thanks!

braibant: my understanding is that this function is only called after fork(), only in the newly forked (child) process. This means that there is no need (and it would be wrong) to reclaim and cleanup here: the threads may still be used in the parent process, and will be cleaned up there.

(0015719)
robhoes (reporter)
2016-04-12 11:30

Fantastic, thanks for the quick investigation and patch, gasche! This seems to work for me on my example program.

This issue turned out to be quite a big deal for XenServer. We are currently using OCaml 4.02.2, and your patch applies there as well. I am currently testing it to see if our memory leaks go away.
(0015721)
doligez (administrator)
2016-04-12 16:39

OK for 4.03.
(0015723)
robhoes (reporter)
2016-04-12 19:10

My testing of today suggests that this patch fixes the leak in the xenopsd daemon of XenServer as well. Thanks again!
(0015725)
gasche (administrator)
2016-04-12 22:54

I committed the fix in 4.03:
  https://github.com/ocaml/ocaml/commit/bf775edeea322de7b52e6c5142e58a7ddb64e585 [^]

(bobot: in the comment I renamed "stack-local variables" into "runtime state", and accordingly caml_thread_{save,restore}_thread_state are now caml_thread_{save,restore}_runtime_state)

Thinking more about braibant's remark, I'm not sure I understand myself why freeing the thread structure at the beginning of the forked child's life suffices: if the backtrace buffers are copied at fork time, then we should free the thread structure in depth (also free the backtrace buffers reachable from them) instead of shallowly as is done currently. However, I have tried to produce a reproduction case for a leak and I was not able to: the code at the end of this comment will leak without the uploaded patch, but does not leak anymore with the proposed patch.

I suspect that the reason why just a shallow free of the thread structure suffices for the child may be the copy-on-write implementation of fork on my system (and, I would assume, most fork-supporting systems): the thread-specific backtrace buffers would only actually be copied if they were modified in either the parent or the child's copy of the memory. This is impossible in the child, as they are discarded at child-birth time; in theory we could have a race where the parent process mutates its own copies of the backtrace buffer during the fork, turning the child's buffers into unreclaimed copy, but I don't know how to reproduce that.

Here is the code I tried (assuming that the child's copy of the parent's thread-local backtrace buffers would not be freed) and which does not leak on my system. If someone knows how to find a system where fork() performs a deep copy instead of copy-on-write, it would be interesting to try there.

  let create_backtrace () = try raise Not_found with Not_found -> ()
  
  let _ =
   Printexc.record_backtrace true;
   create_backtrace ();
   let rec loop n =
     if n = 0 then ()
     else begin
         Unix.sleepf 0.001;
         match Unix.fork () with
         | 0 ->
            let threads =
              Array.init 10_000
                (fun _ -> Thread.create create_backtrace ()) in
            Array.iter Thread.join threads;
            loop (n - 1)
         | pid -> exit 0
         | exception _ -> exit 0
     end
   in
   loop (try int_of_string (Sys.argv.(1)) with _ -> max_int)
(0015821)
junsli (reporter)
2016-04-20 02:35

Just curious: What does the "st" mean in the interface file? It mentioned "st interface." poSix Threads? Standard Threads?
(0015822)
gasche (administrator)
2016-04-20 03:24

I would guess "system threads", by opposition with the green threads implementation where the scheduling is implemented by the OCaml runtime.

- Issue History
Date Modified Username Field Change
2016-04-11 14:47 robhoes New Issue
2016-04-11 14:58 gasche Priority high => normal
2016-04-11 14:58 gasche Severity major => minor
2016-04-11 14:58 gasche Status new => acknowledged
2016-04-11 14:58 gasche Target Version => 4.03.0+dev / +beta1
2016-04-11 14:59 gasche Note Added: 0015704
2016-04-11 16:29 bobot Note Added: 0015705
2016-04-11 16:35 robhoes Note Added: 0015706
2016-04-11 16:46 gasche Note Added: 0015707
2016-04-11 21:26 gasche Note Added: 0015709
2016-04-11 21:36 gasche Note Added: 0015710
2016-04-11 21:54 gasche File Added: PR7220-fix-backtrace-memory-leak-in-systhreads.patch
2016-04-11 21:56 gasche Note Added: 0015711
2016-04-11 21:57 gasche Relationship added related to 0005188
2016-04-11 22:17 gasche Note Added: 0015712
2016-04-11 22:18 gasche Relationship added related to 0004702
2016-04-11 22:25 bobot Note Added: 0015713
2016-04-11 22:27 braibant Note Added: 0015714
2016-04-11 22:38 gasche Note Added: 0015718
2016-04-11 22:40 gasche File Deleted: PR7220-fix-backtrace-memory-leak-in-systhreads.patch
2016-04-11 22:40 gasche File Added: PR7220-fix-backtrace-memory-leak-in-systhreads.patch
2016-04-11 22:41 gasche File Deleted: PR7220-fix-backtrace-memory-leak-in-systhreads.patch
2016-04-11 22:41 gasche File Added: PR7220-fix-backtrace-memory-leak-in-systhreads.patch
2016-04-11 22:41 gasche File Deleted: PR7220-fix-backtrace-memory-leak-in-systhreads.patch
2016-04-11 22:42 gasche File Added: PR7220-fix-backtrace-memory-leak-in-systhreads.patch
2016-04-12 01:34 gasche Note Edited: 0015718 View Revisions
2016-04-12 11:30 robhoes Note Added: 0015719
2016-04-12 16:37 doligez Note Edited: 0015714 View Revisions
2016-04-12 16:39 doligez Note Added: 0015721
2016-04-12 19:10 robhoes Note Added: 0015723
2016-04-12 22:54 gasche Note Added: 0015725
2016-04-12 23:41 gasche Status acknowledged => resolved
2016-04-12 23:41 gasche Fixed in Version => 4.03.0+dev / +beta1
2016-04-12 23:41 gasche Resolution open => fixed
2016-04-12 23:41 gasche Assigned To => gasche
2016-04-20 02:35 junsli Note Added: 0015821
2016-04-20 03:24 gasche Note Added: 0015822
2017-02-23 16:43 doligez Category OCaml runtime system => runtime system
2017-03-03 17:45 doligez Category runtime system => runtime system and C interface
2017-09-24 17:32 xleroy Status resolved => closed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker