Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registering threads created from C #6764

Closed
vicuna opened this issue Jan 29, 2015 · 21 comments
Closed

Registering threads created from C #6764

vicuna opened this issue Jan 29, 2015 · 21 comments
Assignees
Milestone

Comments

@vicuna
Copy link

vicuna commented Jan 29, 2015

Original bug ID: 6764
Reporter: braibant
Assigned to: @mshinwell
Status: closed (set by @mshinwell on 2015-05-06T15:31:20Z)
Resolution: not a bug
Priority: normal
Severity: text
Version: 4.02.1
Target version: 4.02.2+dev / +rc1
Category: documentation
Related to: #4702 #6776
Monitored by: @gasche @yallop

Bug description

Looking at section 19.10.1 "Registering threads created from C" in the manual, I am slightly puzzled by the documentation. I am not quite sure if that means what it means in the context of an OCaml library that's called from a third party C program using dynamic linking.

Obviously, there is no way to register all the threads that this third party application is going to create. Does that mean that it is not possible to use this kind of mix of OCaml/C? This does not seem to be the case reading the next paragraph in the manual, so I am a bit lost here.

@vicuna
Copy link
Author

vicuna commented Jan 29, 2015

Comment author: @mshinwell

I think it is the case that you only need to use [caml_c_thread_register] from a particular thread if:

(a) the OCaml code is compiled to use threads; and
(b) the C code causes OCaml code to be called, or the OCaml runtime to be called (without actually running any OCaml code), from that thread.

If the OCaml code isn't compiled to use threads then you obviously also need to ensure that your C program does not attempt to execute OCaml code in a re-entrant or parallel manner.

I presume you have access to the third-party application's code, since you can't directly call an arbitrary OCaml function from a C program, owing to the difference in calling conventions. As such, I presume there is either a modification to the third-party program or some intervening veneer; in either case, you could call the thread registration function before the OCaml callback invocation, I suspect. In case it is useful: you can call [caml_c_thread_register] more than once from the same thread without trouble, and I think there's little performance penalty to doing so.

@vicuna
Copy link
Author

vicuna commented Jan 29, 2015

Comment author: braibant

Just to clarify: I do not have access to the third party application. I am building a shared library that implements a given interface. The app loads the dynamic library and perform lookup for known symbols. There is indeed a shim around the OCaml code, implemented using inverted stubs from Ctypes.

I will hack the ctypes generated code to add the correct calls to the thread registration function before the OCaml callback invocation. (Btwm I indeed noticed that the [caml_c_thread_register] function can be called more than once, but was a bit afraid of the runtime cost. Since the need for that function was not obvious the way I understood the documentation, I argued against adding these calls everywhere.)

@vicuna
Copy link
Author

vicuna commented Jan 29, 2015

Comment author: @mshinwell

Yeah, that solution sounds correct. I wouldn't worry about the runtime cost of [caml_c_thread_register], at least in the first instance.

@vicuna
Copy link
Author

vicuna commented Feb 2, 2015

Comment author: braibant

Ok, I think I need to increase the severity of this issue:

I have patched ctypes to wrap calls to [register] and [unregister] around the calls to OCaml callbacks. The code that is generated is similar to the following gist.

https://gist.github.com/braibant/827b3b2158f94f461847

Running the third party application with this code yields a segfault. The backtrace is the following one

(gdb) bt
#0  0x00007fffeda42f62 in caml_thread_remove_info (th=th@entry=0xbc7650) at st_stubs.c:350
#1  0x00007fffeda43c60 in caml_c_thread_unregister () at st_stubs.c:595
#2  0x00007fffeda497cc in C_GetFunctionList (x102=0x7fffffffcfe0) at pkcs11-rev/pkcs11_rev_stubs.c:195

The content of th is as follows

(gdb) frame 0
#0  0x00007fffeda42f62 in caml_thread_remove_info (th=th@entry=0xbc7650) at st_stubs.c:350
350	  th->prev->next = th->next;

(gdb) print th
$3 = (caml_thread_t) 0xbc7650
(gdb) print th->next
$4 = (struct caml_thread_struct *) 0xbc7a50
(gdb) print th->prev
$5 = (struct caml_thread_struct *) 0x0

I think this issue is somewhat similar to the one described in #4702

Namely:

  1. A C process creates a separate thread, loads Ocaml DLL and initializes Ocaml runtime.
  2. The C process performs a call to C_GetFunctionList which is a C function that wraps a call to an OCaml callback with [register] and [unregister].

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: @mshinwell

Ack

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: @mshinwell

I think the problem is that you need to call [caml_thread_initialize] before [caml_c_thread_register]. It may be worth trying to fix the runtime so this requirement can be removed.

For posterity, I managed to reproduce the bug using the following nasty little test case. (Revised version here now that actually works...)

==> build.sh <==
#!/bin/bash

set -x -u -e

OCAML=/path/to/ocaml/installation
INCLUDE=$OCAML/lib/ocaml

gcc -fPIC -g -c -I$INCLUDE shared_lib_c.c
$OCAML/bin/ocamlopt.opt -thread unix.cmxa threads/threads.cmxa -g -output-obj -o shared_lib_proto.so shared_lib.ml shared_lib_c.o
gcc -g -shared -o shared_lib.so shared_lib_proto.so -L$OCAML/lib/ocaml -L$OCAML/lib/ocaml/threads -L$OCAML/lib/ocaml/unix -lthreads -lunix -lasmrun
gcc -g -o main -I$INCLUDE -ldl -lpthread main.c -lm

==> main.c <==
#include <dlfcn.h>
#include <caml/mlvalues.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

static volatile int num_threads = 0;

static void* thread_body(void* lib)
{
pid_t tid;
useconds_t sleep_before, sleep_after;
void* fn;
tid = syscall(SYS_gettid);
sleep_before = rand() % 1000000; /* 1s /
sleep_after = rand() % 1000000; /
1s /
printf("thread_body tid %d, sleep before %ld us, sleep after %ld us, ",
tid, sleep_before, sleep_after);
printf("num active threads %d\n", ++num_threads);
fflush(stdout);
usleep(sleep_before);
fn = dlsym(lib, "call_ocaml_in_shared_lib");
if (!fn) abort();
if (((value (
)(void)) fn)() != Val_unit) abort();
usleep(sleep_after);
printf("tid %d exiting\n", tid);
fflush(stdout);
--num_threads;
return NULL;
}

int main(int argc, char* argv[])
{
void* lib;
void* fn;

lib = dlopen("shared_lib.so", RTLD_NOW);
if (!lib) {
printf("%s\n", dlerror());
fflush(stdout);
abort();
}

fn = dlsym(lib, "call_ocaml_init");
if (!fn) abort();
((void (*)(char**)) fn)(argv);

printf("starting thread creation loop\n");
fflush(stdout);
while (1) {
pthread_t handle;
pthread_create(&handle, NULL, &thread_body, lib);
usleep(100000); /* 100ms */
}

return 0;
}

==> shared_lib_c.c <==
#include <caml/callback.h>
#include <caml/mlvalues.h>
#include <caml/threads.h>
#include <stdio.h>

void call_ocaml_init(char** argv)
{
caml_startup(argv); (* this will run the module initializers )
caml_release_runtime_system(); (
drop lock acquired by [caml_startup] *)
}

value call_ocaml_in_shared_lib(void)
{
value r;
value* cb;
caml_c_thread_register();
caml_acquire_runtime_system();
r = caml_callback(*caml_named_value("ocaml_in_shared_lib"), Val_int(42));
caml_release_runtime_system();
caml_c_thread_unregister();
return r;
}

==> shared_lib.ml <==
(* The next line makes sure that caml_thread_initialize is called.
(Otherwise, using caml_c_thread_register etc will cause a segfault,
as in PR 6764.)
)
let (_ : Thread.t) = Thread.self () (
ensure Thread is linked *)

let ocaml_in_shared_lib x =
Printf.printf "ocaml in shared lib %d%!" x

let () =
Callback.register "ocaml_in_shared_lib" ocaml_in_shared_lib

Build and run (on Linux, after editing the path to your OCaml installation in build.sh) with:
./build.sh
LD_LIBRARY_PATH=$(pwd) ./main

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: braibant

I can reprodue a segfault with your example but I cannot reproduce a working behavior: when building and running your example, num active threads keeps increasing, and I never see a call to the ocaml_in_shared_lib function.

Are you seeing a different behavior?

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: braibant

I copied your code here: https://github.com/braibant/ocaml-pr6764

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: @mshinwell

Sorry, I've fixed the test case properly now.
Note the "Thread.self" line in the .ml file!

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: @mshinwell

As a follow-up: maybe it should be ensured by the compiler that the [Thread] module's initializer is always run whenever -thread is specified. Or something.

Also: you can't just call [caml_thread_initialize] from C directly; it needs to be called from an OCaml context, I think. It's safest just to make sure the [Thread] module is initialized.

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: braibant

Another follow up: if I remove the call to Callback.register, I get another segfault. It is not clear from the manual that this call to Callback.register is mandatory. Unfortunately, the way Ctypes uses callback, they are not always registered, and it's hard to register them all...

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: @mshinwell

I think you must always use [Callback.register] if you are going to invoke the function from C, otherwise [caml_named_value] will return a null pointer, dereferencing of which will cause a segfault.

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: braibant

Ok, I mixed up things and my comment was not founded.

Just to be crystal clear: one need to use [Callback.register] if and only if the callback is called from C using the [caml_named_value], right?

@vicuna
Copy link
Author

vicuna commented Feb 3, 2015

Comment author: @mshinwell

I believe that is correct, yes.

@vicuna
Copy link
Author

vicuna commented Feb 4, 2015

Comment author: @mshinwell

Does your program now work?

@vicuna
Copy link
Author

vicuna commented Feb 4, 2015

Comment author: braibant

I am still looking for the cause of a segfault, but it looks like it's coming from ctypes. I will keep you informed (and probably write a summary of this whole experiment)

@vicuna
Copy link
Author

vicuna commented Feb 4, 2015

Comment author: @mshinwell

In case it helps: in the case of a delayed segfault due to heap corruption, you can sometimes narrow it down by calling [Gc.compact] in an attempt to trigger it earlier. (For example, you could do this before releasing the runtime lock in each callback.)

@vicuna
Copy link
Author

vicuna commented Feb 9, 2015

Comment author: braibant

I am still working on a segfault in our dll. I have nailed down an issue in the code we call when the shared object is unloaded: the OCaml documentation is a bit scarce about what is required to cleanup the runtime. Any idea about how to do that in the multithreaded context of your above example?

(edit to link a related github pull request #71 )

@vicuna
Copy link
Author

vicuna commented Feb 9, 2015

Comment author: @mshinwell

Not completely sure, but if you're unloading the runtime anyway as part of the DLL, I'm not sure why it matters. Can you give any more information as to the location of the segfault?

@vicuna
Copy link
Author

vicuna commented Feb 9, 2015

Comment author: braibant

I have updated the code here https://github.com/braibant/ocaml-pr6764 to illustrate the issue.

@vicuna
Copy link
Author

vicuna commented May 6, 2015

Comment author: @mshinwell

I'm going to close this issue, since it appears #6776 (killing the ticker thread) is the only fix required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants