Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006776OCamlruntime system and C interfacepublic2015-02-10 12:252017-02-16 15:14
Reporterbraibant 
Assigned To 
PriorityhighSeveritycrashReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version 
Target Version4.03.0+dev / +beta1Fixed in Version4.03.0+dev / +beta1 
Summary0006776: The runtime does not expose a way to reliably kill the "tick" thread, leading to segfault after exiting the runtime
DescriptionIn the wake of http://caml.inria.fr/mantis/view.php?id=6764 [^] I am looking at embedding the OCaml runtime in a shared library. This shared library might be loaded and unloaded several time by an application that makes heavy use of threads.

The code hosted here https://github.com/braibant/ocaml-pr6764 [^] (which was originally written by Mark Shinwell during the discussion of the aforementioned related issue) illustrates what happens when the shared object is loaded and unloaded several time in a row.

Debugging the issue with this example, it appear that the "tick" thread might still be alive after the dlclose function returns. The tick thread should have been cleaned up by the `caml_thread_cleanup` function, which is registered in the `at_exit` calls. However, the implementation of `st_thread_kill` in `otherlibs/systhreads/st_posix.c` is

```
static void st_thread_kill(st_thread_id thr)
{
  pthread_cancel(thr);
}
```

From http://man7.org/linux/man-pages/man3/pthread_cancel.3.html [^] the thread cancellation happens asynchronously, and "the return status of pthread_cancel() merely informs the caller whether the cancellation request was successfully queued."

A suitable fix would be to use `pthread_join` after the `pthread_cancel`. This would still not have the same behavior as the win32 implementation (which uses `TerminateThread` and thus cannot execute any user-mode code).

Another fix would be to replace the `st_thread_kill` in `caml_thread_cleanup` with something a bit more agressive about termination (a new `st_thread_real_kill`?)
Steps To Reproducegit clone git@github.com:braibant/ocaml-pr6764.git
cd ocaml-pr6764
make
make run
TagsNo tags attached.
Attached Filesdiff file icon systhread-tick-exit.diff [^] (3,930 bytes) 2015-03-31 22:48 [Show Content]

- Relationships
related to 0006764closedshinwell Registering threads created from C 

-  Notes
(0013276)
whitequark (developer)
2015-02-15 17:57

I do not think st_thread_kill is a proper solution. For instance, some libc's (e.g. Bionic) do not implement any way to kill a thread. Either way, using pthread_kill is a known antipattern.

Instead, the caml_thread_cleanup should set a flag to tell the tick thread to terminate and then join it.
(0013292)
braibant (reporter)
2015-02-16 09:46

@whitequark: Do we agree that there are two different issue? 1) whether `st_thread_kill` is a proper solution for killing the tick thread 2) whether the `st_thread_kill` function is valid implementation for killing a thread in general.

The current implementation does not use `pthread_kill`, but good to know that it is a known anti-pattern.

Do we agree that something like the following would work?

```
pthread_cancel(thr);
pthread_join(thr);
```
(0013293)
whitequark (developer)
2015-02-16 10:32

pthread_cancel has the exact same issues as pthread_kill. (I could not recall which exactly did st_thread_kill use).

From the Bionic docs:
>pthread_cancel() will not be supported in Bionic, because doing this would involve making the C library significantly bigger for very little benefit. [...] All of this is contrary to the Bionic design goals. If your code depends on thread cancellation, please consider alternatives.

So, no, that would not work even for tick thread.
(0013294)
whitequark (developer)
2015-02-16 12:01

Actually, why does the tick thread even exist? Its sole purpose is to send SIGVTALRM. Can't you use setitimer for that?
(0013527)
doligez (administrator)
2015-03-23 16:28

> Actually, why does the tick thread even exist?

Probably because Windows.

I agree that the proper way is to set a flag and then join. The only downside is that you might have to wait for the duration of a whole tick.
(0013528)
braibant (reporter)
2015-03-23 16:30

I think that I am okay with waiting a whole tick, if the alternative is a segfault :).
(0013591)
doligez (administrator)
2015-03-30 19:22

@braibant I'll do a fix soon, but I'd like to know if you would push for it to be included in 4.02.2 or if you're OK with waiting for 4.03.

It's not a completely trivial fix and it'll have to be tested on all architectures, so there is a risk here.
(0013606)
braibant (reporter)
2015-03-31 17:33

I will not push for it to be included in 4.02.2 per se, but would be *very* happy to test it as soon as possible.
(0013614)
doligez (administrator)
2015-03-31 22:47

Here is a patch. I applied it in trunk (rev 15975). I can't test your program just now because it doesn't work on 64-bit Linux.
(0014625)
xleroy (administrator)
2015-10-27 16:28

In the absence of new information, let's consider this problem resolved by commit 15975.

- Issue History
Date Modified Username Field Change
2015-02-10 12:25 braibant New Issue
2015-02-15 17:57 whitequark Note Added: 0013276
2015-02-16 09:46 braibant Note Added: 0013292
2015-02-16 10:32 whitequark Note Added: 0013293
2015-02-16 12:01 whitequark Note Added: 0013294
2015-02-18 23:52 doligez Status new => acknowledged
2015-02-18 23:52 doligez Target Version => 4.02.2+dev / +rc1
2015-03-23 16:28 doligez Note Added: 0013527
2015-03-23 16:30 braibant Note Added: 0013528
2015-03-30 19:22 doligez Note Added: 0013591
2015-03-31 17:33 braibant Note Added: 0013606
2015-03-31 22:47 doligez Note Added: 0013614
2015-03-31 22:48 doligez File Added: systhread-tick-exit.diff
2015-03-31 22:49 doligez Status acknowledged => feedback
2015-03-31 22:49 doligez Resolution open => fixed
2015-03-31 22:49 doligez Fixed in Version => 4.03.0+dev / +beta1
2015-04-28 17:36 doligez Target Version 4.02.2+dev / +rc1 => 4.03.0+dev / +beta1
2015-05-06 17:31 shinwell Relationship added related to 0006764
2015-10-27 16:28 xleroy Note Added: 0014625
2015-10-27 16:28 xleroy Status feedback => resolved
2017-02-16 15:14 xleroy Status resolved => closed
2017-02-23 16:43 doligez Category OCaml runtime system => runtime system
2017-03-03 17:45 doligez Category runtime system => runtime system and C interface


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker