New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threads + signals: runtime hangs #3659
Comments
Comment author: administrator Dear Gerd,
Thank you for this interesting bug report and for the repro case, Damien Doligez and I partially fixed the bug in the CVS trunk (should be in The fix is partial in that it assumes that the POSIX thread function Given the way OCaml processes signals and the limitations of POSIX
Yes, and it calls many other functions that should not be called from Best wishes,
|
Comment author: administrator Am Freitag, den 29.07.2005, 15:48 +0200 schrieb Xavier Leroy:
I think one of the problems is that OCaml supports asynchronous signals, Interestingly, the multi-threading tick signal isn't affected at all, I think we need a reliable coexistence of threads and signals for many So my suggestion is to provide such a switch in 3.09, and allow people GerdGerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
|
Comment author: administrator Dear Gerd,
Interesting suggestion. Actually, my first attempt at signal Note that we could get the best of both worlds (?) as follows: use There are no problems with Windows: the only signal-like mechanism is
No, you're correct. Actually, the systhreads implementation (over
I still believe you're taking serious risks by mixing signals and Best wishes,
|
Comment author: administrator see also #3680. Fixed in 3.09 by XL and DD, 2005-07-29 |
Original bug ID: 3659
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: minor
Category: ~DO NOT USE (was: OCaml general)
Bug description
Full_Name: Gerd Stolpmann
Version: 3.08.3
OS: Linux, kernel 2.6
Submission from: p54a79e6c.dip0.t-ipconnect.de (84.167.158.108)
Hello,
I recently got a bug report for one of my libraries (equeue) that did not work
in a multi-threaded program. Actually, the program wasn't multi-threaded, but
compiled with -thread and threads.cma (because of a another library), so the
multi-threading machinery was initialized. My library is an enhanced version of
"system", i.e. does fork + exec, and sometimes SIGCHILD signals are emitted. I
never thought it worked in an mt program because of other reasons (problems with
fork, no access to the thread-specific signal mask). However, it turns out that
the problems are much more fundamental, and can hang the O'Caml runtime at any
time (although this is very unlikely if the program doesn't use signals for
application purposes).
Now here is a short program that almost always hangs the O'Caml runtime. It
sends lots of signals to a process that blocks from time to time:
let rec microsleep t =
let t0 = Unix.gettimeofday() in
try
ignore(Unix.select [] [] [] t)
with
Unix.Unix_error(Unix.EINTR,,) ->
microsleep (t -. (Unix.gettimeofday() -. t0))
;;
let pid = Unix.getpid() ;;
let generate_signals() =
match Unix.fork() with
| 0 ->
while true do
Unix.kill pid Sys.sigusr1;
done;
exit 0
;;
let _ = Thread.create in (* Ensure mt machinery is enabled )
let n = ref 0 in
Sys.set_signal Sys.sigusr1 (Sys.Signal_handle(fun _ -> incr n));
generate_signals();
let s = ref 0 in
for k = 1 to 1000 do
s := !s + k;
microsleep 0.0001 ( block for a short moment *)
done;
prerr_endline "Done!";
prerr_endline ("Number of signals: " ^ string_of_int !n)
;;
Compiled with:
ocamlopt -o signals unix.cmxa threads.cmxa -thread signals.ml
After a short time, the parent process freezes. strace shows the process hangs
in a futex system call. gdb shows more:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x4002dfae in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/tls/i686/cmov/libpthread.so.0
#2 0x08057b7b in caml_thread_leave_blocking_section ()
#3 0x0805ebaa in caml_leave_blocking_section ()
#4 0x0805ebd2 in handle_signal ()
#5
#6 0x0804a96c in ?? ()
#7 0x08057bb8 in caml_thread_leave_blocking_section ()
#8 0x0805ebaa in caml_leave_blocking_section ()
#9 0x0805afa5 in unix_select ()
(rest stripped)
Obviously, the signal appears in the middle of caml_leave_blocking_section, just
after the thread acquired the master lock. The signal handler, because still in
asynchronous mode, tries to acquire the master lock again - deadlock.
This dump is typical of most of the freezes, but not for all. There seem to be
other problems as well.
Although not tested, I have the impression that masking all signals during
caml_leave_blocking_section could help (until the asynchronous signal mode is
finished).
Another observation: In a signal handler, POSIX forbids to call any pthread
function. The O'Caml runtime does, however, call pthread functions (in
thread_enter/leave_blocking_section). Obviously, this works most of the time,
but maybe there are race conditions in libpthread that can be triggered under
these circumstances. A very known author writes in pthread_cond_signal(3): "In
particular, calling !pthread_cond_signal! or !pthread_cond_broadcast! from a
signal handler may deadlock the calling thread." (Linux man page, signed by XL).
Well, the above test was for a 2.6 kernel with new threading library, so the
code base has changed.
Gerd
The text was updated successfully, but these errors were encountered: