|Anonymous | Login | Signup for a new account||2019-02-24 02:46 CET|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006776||OCaml||runtime system and C interface||public||2015-02-10 12:25||2017-02-16 15:14|
|Target Version||4.03.0+dev / +beta1||Fixed in Version||4.03.0+dev / +beta1|
|Summary||0006776: The runtime does not expose a way to reliably kill the "tick" thread, leading to segfault after exiting the runtime|
|Description||In the wake of http://caml.inria.fr/mantis/view.php?id=6764 [^] I am looking at embedding the OCaml runtime in a shared library. This shared library might be loaded and unloaded several time by an application that makes heavy use of threads.|
The code hosted here https://github.com/braibant/ocaml-pr6764 [^] (which was originally written by Mark Shinwell during the discussion of the aforementioned related issue) illustrates what happens when the shared object is loaded and unloaded several time in a row.
Debugging the issue with this example, it appear that the "tick" thread might still be alive after the dlclose function returns. The tick thread should have been cleaned up by the `caml_thread_cleanup` function, which is registered in the `at_exit` calls. However, the implementation of `st_thread_kill` in `otherlibs/systhreads/st_posix.c` is
static void st_thread_kill(st_thread_id thr)
From http://man7.org/linux/man-pages/man3/pthread_cancel.3.html [^] the thread cancellation happens asynchronously, and "the return status of pthread_cancel() merely informs the caller whether the cancellation request was successfully queued."
A suitable fix would be to use `pthread_join` after the `pthread_cancel`. This would still not have the same behavior as the win32 implementation (which uses `TerminateThread` and thus cannot execute any user-mode code).
Another fix would be to replace the `st_thread_kill` in `caml_thread_cleanup` with something a bit more agressive about termination (a new `st_thread_real_kill`?)
|Steps To Reproduce||git clone email@example.com:braibant/ocaml-pr6764.git|
|Tags||No tags attached.|
|Attached Files||systhread-tick-exit.diff [^] (3,930 bytes) 2015-03-31 22:48 [Show Content]|
I do not think st_thread_kill is a proper solution. For instance, some libc's (e.g. Bionic) do not implement any way to kill a thread. Either way, using pthread_kill is a known antipattern.
Instead, the caml_thread_cleanup should set a flag to tell the tick thread to terminate and then join it.
@whitequark: Do we agree that there are two different issue? 1) whether `st_thread_kill` is a proper solution for killing the tick thread 2) whether the `st_thread_kill` function is valid implementation for killing a thread in general.
The current implementation does not use `pthread_kill`, but good to know that it is a known anti-pattern.
Do we agree that something like the following would work?
pthread_cancel has the exact same issues as pthread_kill. (I could not recall which exactly did st_thread_kill use).
From the Bionic docs:
>pthread_cancel() will not be supported in Bionic, because doing this would involve making the C library significantly bigger for very little benefit. [...] All of this is contrary to the Bionic design goals. If your code depends on thread cancellation, please consider alternatives.
So, no, that would not work even for tick thread.
|Actually, why does the tick thread even exist? Its sole purpose is to send SIGVTALRM. Can't you use setitimer for that?|
> Actually, why does the tick thread even exist?
Probably because Windows.
I agree that the proper way is to set a flag and then join. The only downside is that you might have to wait for the duration of a whole tick.
|I think that I am okay with waiting a whole tick, if the alternative is a segfault :).|
@braibant I'll do a fix soon, but I'd like to know if you would push for it to be included in 4.02.2 or if you're OK with waiting for 4.03.
It's not a completely trivial fix and it'll have to be tested on all architectures, so there is a risk here.
|I will not push for it to be included in 4.02.2 per se, but would be *very* happy to test it as soon as possible.|
|Here is a patch. I applied it in trunk (rev 15975). I can't test your program just now because it doesn't work on 64-bit Linux.|
|In the absence of new information, let's consider this problem resolved by commit 15975.|
|2015-02-10 12:25||braibant||New Issue|
|2015-02-15 17:57||whitequark||Note Added: 0013276|
|2015-02-16 09:46||braibant||Note Added: 0013292|
|2015-02-16 10:32||whitequark||Note Added: 0013293|
|2015-02-16 12:01||whitequark||Note Added: 0013294|
|2015-02-18 23:52||doligez||Status||new => acknowledged|
|2015-02-18 23:52||doligez||Target Version||=> 4.02.2+dev / +rc1|
|2015-03-23 16:28||doligez||Note Added: 0013527|
|2015-03-23 16:30||braibant||Note Added: 0013528|
|2015-03-30 19:22||doligez||Note Added: 0013591|
|2015-03-31 17:33||braibant||Note Added: 0013606|
|2015-03-31 22:47||doligez||Note Added: 0013614|
|2015-03-31 22:48||doligez||File Added: systhread-tick-exit.diff|
|2015-03-31 22:49||doligez||Status||acknowledged => feedback|
|2015-03-31 22:49||doligez||Resolution||open => fixed|
|2015-03-31 22:49||doligez||Fixed in Version||=> 4.03.0+dev / +beta1|
|2015-04-28 17:36||doligez||Target Version||4.02.2+dev / +rc1 => 4.03.0+dev / +beta1|
|2015-05-06 17:31||shinwell||Relationship added||related to 0006764|
|2015-10-27 16:28||xleroy||Note Added: 0014625|
|2015-10-27 16:28||xleroy||Status||feedback => resolved|
|2017-02-16 15:14||xleroy||Status||resolved => closed|
|2017-02-23 16:43||doligez||Category||OCaml runtime system => runtime system|
|2017-03-03 17:45||doligez||Category||runtime system => runtime system and C interface|
|Copyright © 2000 - 2011 MantisBT Group|