New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trouble with pthread_atfork on openBSD #7132
Comments
Comment author: @mshinwell I have asked someone with access to an OpenBSD system to look at this. |
Comment author: @mshinwell This one was a lot harder than expected. There is a lock in the OpenBSD C runtime library which is used to protect access to the list of handlers for pthread_atfork. This lock appears to be held in the child process after a fork. Since the test case ends up calling "exit" rather than "_exit" in the child process, the C runtime finalisation code is run: first for the statically-linked portion and then the stublibs/dllthreads.so library. Since the latter registered pthread_atfork handlers, it tries to take the lock to unregister them, but this deadlocks. (The reason it uses the lock is because threads have been created in the child process after the fork; it avoids the lock if no threads have been created.) The result is that the main program exits leaving the child process behind. For some reason that I don't understand, when the whole testsuite is run (not just "make one DIR=..."), this also causes make not to terminate. I think this is arguably a bug in OpenBSD and will report it as such, although it could be argued (as it says on the manual page for "fork") that we should use "_exit" rather than plain "exit", which does not do pthread_atfork unregistration and also has the good side-effect of avoiding e.g. double execution of C atexit handlers (once in the parent and once in the child). We would need to expose "_exit" somehow at the OCaml level though. The following testcase suffices to reproduce the problem: it should exit but doesn't. The child process gets stuck in _spinlock: #0 _dl_find_symbol (name=0xc87fb10ec18 "sched_yield", this=0x7f7ffffbd498, flags=48, ref_sym=0xc87fb10e8b0, req_obj=0xc87ac29a200, pobj=0x7f7ffffbd490) Makefileall: example example_dso.so: example_dso.c example: example_dso.so example.c
|
Comment author: @mshinwell Arguments about whether this is a bug or not probably centre on whether there is a requirement for functions run in a child process that was forked "in the presence of threads" to be async signal safe. This example uses functions that are not async signal safe in the child (e.g. pthread_create, exit). I have not managed to find (even in the POSIX standard) a precise definition of what forking in the presence of threads means. It seems clear that if you fork whilst there exist more than one thread for which pthread_join has not returned, then the child should use async signal safe functions. However what about where all previously-created threads have been joined to? That is the case here, and it isn't clear to me why there should be any requirement for a child process to only use async signal safe functions in that scenario. |
Comment author: @avsm _exit is specified in POSIX.1-2001, so binding it as Unix._exit and not calling the atexit handlers twice seems like a portable solution to this problem. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Looks like the issue is still there in OpenBSD 6.6, the latest at the time of this writing. That's not an OCaml issue, just an OpenBSD oddity. No action is needed on our side. |
Original bug ID: 7132
Reporter: @damiendoligez
Assigned to: @mshinwell
Status: assigned (set by @mshinwell on 2016-12-08T09:49:26Z)
Resolution: open
Priority: normal
Severity: minor
OS: openBSD
Version: 4.03.0+dev / +beta1
Category: runtime system and C interface
Bug description
The testsuite hangs on openBSD. The culprit is tests/lib-systhreads/testfork.ml.
Preliminary investigations seem to indicate that the process is looping within the C runtime's "exit" function, with a stack trace that contains stuff related to pthread_atfork.
The text was updated successfully, but these errors were encountered: