Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trouble with pthread_atfork on openBSD #7132

Closed
vicuna opened this issue Jan 29, 2016 · 6 comments
Closed

trouble with pthread_atfork on openBSD #7132

vicuna opened this issue Jan 29, 2016 · 6 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Jan 29, 2016

Original bug ID: 7132
Reporter: @damiendoligez
Assigned to: @mshinwell
Status: assigned (set by @mshinwell on 2016-12-08T09:49:26Z)
Resolution: open
Priority: normal
Severity: minor
OS: openBSD
Version: 4.03.0+dev / +beta1
Category: runtime system and C interface

Bug description

The testsuite hangs on openBSD. The culprit is tests/lib-systhreads/testfork.ml.

Preliminary investigations seem to indicate that the process is looping within the C runtime's "exit" function, with a stack trace that contains stuff related to pthread_atfork.

@vicuna
Copy link
Author

vicuna commented Dec 8, 2016

Comment author: @mshinwell

I have asked someone with access to an OpenBSD system to look at this.

@vicuna
Copy link
Author

vicuna commented Mar 15, 2017

Comment author: @mshinwell

This one was a lot harder than expected.

There is a lock in the OpenBSD C runtime library which is used to protect access to the list of handlers for pthread_atfork. This lock appears to be held in the child process after a fork. Since the test case ends up calling "exit" rather than "_exit" in the child process, the C runtime finalisation code is run: first for the statically-linked portion and then the stublibs/dllthreads.so library. Since the latter registered pthread_atfork handlers, it tries to take the lock to unregister them, but this deadlocks. (The reason it uses the lock is because threads have been created in the child process after the fork; it avoids the lock if no threads have been created.) The result is that the main program exits leaving the child process behind. For some reason that I don't understand, when the whole testsuite is run (not just "make one DIR=..."), this also causes make not to terminate.

I think this is arguably a bug in OpenBSD and will report it as such, although it could be argued (as it says on the manual page for "fork") that we should use "_exit" rather than plain "exit", which does not do pthread_atfork unregistration and also has the good side-effect of avoiding e.g. double execution of C atexit handlers (once in the parent and once in the child). We would need to expose "_exit" somehow at the OCaml level though.

The following testcase suffices to reproduce the problem: it should exit but doesn't. The child process gets stuck in _spinlock:

#0 _dl_find_symbol (name=0xc87fb10ec18 "sched_yield", this=0x7f7ffffbd498, flags=48, ref_sym=0xc87fb10e8b0, req_obj=0xc87ac29a200, pobj=0x7f7ffffbd490)
at /usr/src/libexec/ld.so/resolve.c:599
#1 0x00000c87a4b04a43 in _dl_bind (object=0xc87ac29a200, index=Variable "index" is not available.
) at /usr/src/libexec/ld.so/amd64/rtld_machine.c:363
#2 0x00000c87a4b00995 in _dl_bind_start () at /usr/src/libexec/ld.so/amd64/ldasm.S:152
#3 0x00000c87fb110a05 in _spinlock (lock=0xc87fb51743c) at /usr/src/lib/librthread/rthread.c:94
#4 0x00000c87a8f4daeb in *_libc___cxa_finalize (dso=0xc87b443dc78) at /usr/src/lib/libc/stdlib/atexit.c:190
#5 0x00000c87b403d895 in ?? () from ./example_dso.so
#6 0x00000c87b403d919 in _fini () from ./example_dso.so
#7 0x00000c85979008b0 in ?? () from /usr/local/ocaml/testsuite/tests/lib-systhreads/example
#8 0x00000c87a4b01067 in _dl_run_all_dtors () at /usr/src/libexec/ld.so/loader.c:134
#9 0x00000c87a4b064ae in _dl_unload_dlopen () at /usr/src/libexec/ld.so/library_subr.c:478
#10 0x00000c87a4b01ede in _dl_dtors () at /usr/src/libexec/ld.so/loader.c:156
#11 0x00000c87a8f4d997 in *_libc___cxa_finalize (dso=0x0) at /usr/src/lib/libc/stdlib/atexit.c:159
#12 0x00000c87a8ef597e in *_libc_exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:57
#13 0x00000c8597900c7c in main () from /usr/local/ocaml/testsuite/tests/lib-systhreads/example

Makefile

all: example

example_dso.so: example_dso.c
gcc -c -fPIC example_dso.c
gcc -Wl,-soname,example_dso.so -o example_dso.so -shared example_dso.o

example: example_dso.so example.c
gcc -o example -lpthread example.c

example_dso.c

#include <pthread.h>
#include <stdio.h>

static void at_fork(void)
{
}

void in_dso(void)
{
pthread_atfork(NULL, NULL, &at_fork);
}

example.c

#include <dlfcn.h>
#include <unistd.h>
#include <assert.h>
#include <stdlib.h>
#include <pthread.h>

static void* in_thread(void* arg)
{
return arg;
}

int main()
{
void* dso;
void (*in_dso)(void);
pthread_t thread;

dso = dlopen("./example_dso.so", RTLD_NOW);
assert(dso != NULL);

in_dso = dlsym(dso, "in_dso");
assert(in_dso != NULL);

in_dso();

if (pthread_create(&thread, NULL, &in_thread, NULL) != 0) {
assert(0);
}

if (pthread_join(thread, NULL) != 0) {
assert(0);
}

switch (fork()) {
case 0:
if (pthread_create(&thread, NULL, &in_thread, NULL) != 0) {
assert(0);
}
/* This "exit" triggers the problem */
exit(0);

case -1:
  assert(0);

default:
  wait(NULL);  /* won't return */

}

return 0;
}

Just run "make" and "./example".

@vicuna
Copy link
Author

vicuna commented Mar 15, 2017

Comment author: @mshinwell

Arguments about whether this is a bug or not probably centre on whether there is a requirement for functions run in a child process that was forked "in the presence of threads" to be async signal safe. This example uses functions that are not async signal safe in the child (e.g. pthread_create, exit).

I have not managed to find (even in the POSIX standard) a precise definition of what forking in the presence of threads means. It seems clear that if you fork whilst there exist more than one thread for which pthread_join has not returned, then the child should use async signal safe functions. However what about where all previously-created threads have been joined to? That is the case here, and it isn't clear to me why there should be any requirement for a child process to only use async signal safe functions in that scenario.

@vicuna
Copy link
Author

vicuna commented Mar 15, 2017

Comment author: @avsm

_exit is specified in POSIX.1-2001, so binding it as Unix._exit and not calling the atexit handlers twice seems like a portable solution to this problem.

@github-actions
Copy link

github-actions bot commented May 9, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 9, 2020
@xavierleroy
Copy link
Contributor

Looks like the issue is still there in OpenBSD 6.6, the latest at the time of this writing.

That's not an OCaml issue, just an OpenBSD oddity. No action is needed on our side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants