Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The runtime does not expose a way to reliably kill the "tick" thread, leading to segfault after exiting the runtime #6776

Closed
vicuna opened this issue Feb 10, 2015 · 10 comments

Comments

@vicuna
Copy link

vicuna commented Feb 10, 2015

Original bug ID: 6776
Reporter: braibant
Status: closed (set by @xavierleroy on 2017-02-16T14:14:44Z)
Resolution: fixed
Priority: high
Severity: crash
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: runtime system and C interface
Related to: #6764
Monitored by: @whitequark

Bug description

In the wake of #6764 I am looking at embedding the OCaml runtime in a shared library. This shared library might be loaded and unloaded several time by an application that makes heavy use of threads.

The code hosted here https://github.com/braibant/ocaml-pr6764 (which was originally written by Mark Shinwell during the discussion of the aforementioned related issue) illustrates what happens when the shared object is loaded and unloaded several time in a row.

Debugging the issue with this example, it appear that the "tick" thread might still be alive after the dlclose function returns. The tick thread should have been cleaned up by the caml_thread_cleanup function, which is registered in the at_exit calls. However, the implementation of st_thread_kill in otherlibs/systhreads/st_posix.c is

static void st_thread_kill(st_thread_id thr)
{
  pthread_cancel(thr);
}

From http://man7.org/linux/man-pages/man3/pthread_cancel.3.html the thread cancellation happens asynchronously, and "the return status of pthread_cancel() merely informs the caller whether the cancellation request was successfully queued."

A suitable fix would be to use pthread_join after the pthread_cancel. This would still not have the same behavior as the win32 implementation (which uses TerminateThread and thus cannot execute any user-mode code).

Another fix would be to replace the st_thread_kill in caml_thread_cleanup with something a bit more agressive about termination (a new st_thread_real_kill?)

Steps to reproduce

git clone git@github.com:braibant/ocaml-pr6764.git
cd ocaml-pr6764
make
make run

File attachments

@vicuna
Copy link
Author

vicuna commented Feb 15, 2015

Comment author: @whitequark

I do not think st_thread_kill is a proper solution. For instance, some libc's (e.g. Bionic) do not implement any way to kill a thread. Either way, using pthread_kill is a known antipattern.

Instead, the caml_thread_cleanup should set a flag to tell the tick thread to terminate and then join it.

@vicuna
Copy link
Author

vicuna commented Feb 16, 2015

Comment author: braibant

@whitequark: Do we agree that there are two different issue? 1) whether st_thread_kill is a proper solution for killing the tick thread 2) whether the st_thread_kill function is valid implementation for killing a thread in general.

The current implementation does not use pthread_kill, but good to know that it is a known anti-pattern.

Do we agree that something like the following would work?

pthread_cancel(thr);
pthread_join(thr);

@vicuna
Copy link
Author

vicuna commented Feb 16, 2015

Comment author: @whitequark

pthread_cancel has the exact same issues as pthread_kill. (I could not recall which exactly did st_thread_kill use).

From the Bionic docs:

pthread_cancel() will not be supported in Bionic, because doing this would involve making the C library significantly bigger for very little benefit. [...] All of this is contrary to the Bionic design goals. If your code depends on thread cancellation, please consider alternatives.

So, no, that would not work even for tick thread.

@vicuna
Copy link
Author

vicuna commented Feb 16, 2015

Comment author: @whitequark

Actually, why does the tick thread even exist? Its sole purpose is to send SIGVTALRM. Can't you use setitimer for that?

@vicuna
Copy link
Author

vicuna commented Mar 23, 2015

Comment author: @damiendoligez

Actually, why does the tick thread even exist?

Probably because Windows.

I agree that the proper way is to set a flag and then join. The only downside is that you might have to wait for the duration of a whole tick.

@vicuna
Copy link
Author

vicuna commented Mar 23, 2015

Comment author: braibant

I think that I am okay with waiting a whole tick, if the alternative is a segfault :).

@vicuna
Copy link
Author

vicuna commented Mar 30, 2015

Comment author: @damiendoligez

@braibant I'll do a fix soon, but I'd like to know if you would push for it to be included in 4.02.2 or if you're OK with waiting for 4.03.

It's not a completely trivial fix and it'll have to be tested on all architectures, so there is a risk here.

@vicuna
Copy link
Author

vicuna commented Mar 31, 2015

Comment author: braibant

I will not push for it to be included in 4.02.2 per se, but would be very happy to test it as soon as possible.

@vicuna
Copy link
Author

vicuna commented Mar 31, 2015

Comment author: @damiendoligez

Here is a patch. I applied it in trunk (rev 15975). I can't test your program just now because it doesn't work on 64-bit Linux.

@vicuna
Copy link
Author

vicuna commented Oct 27, 2015

Comment author: @xavierleroy

In the absence of new information, let's consider this problem resolved by commit 15975.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant