Bad locking in io.c #7503

vicuna · 2017-03-14T14:51:20Z

Original bug ID: 7503
Reporter: @stedolan
Status: confirmed (set by @mshinwell on 2017-03-15T16:25:51Z)
Resolution: open
Priority: normal
Severity: minor
Category: runtime system and C interface
Related to: #5141
Monitored by: @stedolan @gasche

Bug description

With ocamlopt, the locking in io.c is not sufficient in the presence of asynchronous operations like signal handlers and finalisers. Here's an example program badlock.ml:

let () =
  Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _ ->
    print_endline "signalled!"));
  while true; do
    print_endline "hello"
  done

The exact behaviour varies depending on whether systhreads is enabled. With trunk and 4.05.0+beta2,

ocamlopt badlock.ml -o badlock && ./badlock
^C

gets stuck in an infinite loop in caml_flush

ocamlopt -runtime-variant d badlock.ml -o badlock && ./badlock
^C

crashes with an assertion failure (the same issue)

ocamlopt -thread unix.cmxa threads.cmxa badlock.ml -o badlock && ./badlock
^C

deadlocks in pthread_mutex_lock

Version 4.04 and before do not initialise systhreads when merely linked with threads.cmxa if the program does not use any thread features, so they behave as the first case even when compiled with -thread.

I think I understand what's going on in the deadlock case, since I came across this issue when trying to fix a similar problem in the multicore branch (until KC pointed out that the same problem likely exists on trunk). Here's a backtrace from the deadlock in 4.05.0+beta2:

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f2cd0d32bc5 in __GI___pthread_mutex_lock (mutex=0x563e7210ea60) at ../nptl/pthread_mutex_lock.c:80
#2 0x0000563e70e55a6e in caml_io_mutex_lock ()
#3 0x0000563e70e6920d in caml_ml_output_bytes (vchannel=, buff=, start=,
length=) at io.c:660
#4 0x0000563e70e2e0da in camlPervasives__output_string_1213 () at pervasives.ml:349
#5 0x0000563e70e2e97a in camlPervasives__print_endline_1310 () at pervasives.ml:472
#6 0x0000563e70e786f4 in caml_start_program ()
#7 0x0000563e70e5db09 in caml_execute_signal (signal_number=signal_number@entry=2, in_signal_handler=in_signal_handler@entry=0)
at signals.c:176
#8 0x0000563e70e5dbda in caml_process_pending_signals () at signals.c:58
#9 0x0000563e70e5dc87 in caml_process_pending_signals () at signals.c:53
#10 caml_leave_blocking_section () at signals.c:131
#11 0x0000563e70e71d2b in caml_write_fd (fd=1, flags=, buf=buf@entry=0x563e720ee7a0, n=n@entry=6) at unix.c:94
#12 0x0000563e70e68480 in caml_flush_partial (channel=0x563e720ee750) at io.c:168
#13 0x0000563e70e68f48 in caml_flush (channel=) at io.c:182
#14 caml_ml_flush (vchannel=) at io.c:612
#15 0x0000563e70e2e9bf in camlPervasives__print_endline_1310 () at pervasives.ml:472
#16 0x0000563e70e24c47 in camlBadlock__entry ()
#17 0x0000563e70e22369 in caml_program ()
#18 0x0000563e70e786f4 in caml_start_program ()
#19 0x0000563e70e5c9b5 in caml_main (argv=0x7ffff6bb5a18) at startup.c:145
#20 0x0000563e70e2203c in main (argc=, argv=) at main.c:37

After performing the write, caml_write_fd calls leave_blocking_section which runs any pending signal handlers. However, the channel's lock is held because it was taken in caml_flush, so when the signal handler tries to re-acquire the lock it deadlocks. (Without -threads, I think it continues and corrupts the channel state, although I'm not sure of the details).

I think that it is unsafe in general to hold mutexes while interacting with the OCaml heap. Due to finalisers and signal handlers, several heap operations can run arbitrary user code (e.g. caml_alloc, caml_enter_blocking_section, and more in multicore), which is to be avoided in a critical section.

The text was updated successfully, but these errors were encountered:

vicuna · 2017-03-14T14:55:54Z

Comment author: @stedolan

I think that most of the issue can be remedied by avoiding holding channel mutexes outside of blocking sections, which means moving the entire I/O operation into a single blocking section (At the moment, there are two blocking sections: one to acquire the channel mutex, and then a seperate one to do the I/O. The channel mutex is released outside a blocking section).

The remaining issue is that signals can be handled inside blocking sections. If a signal arrives during a blocking section, I think the code in signals_asm.c will run arbitrary OCaml code from a signal handler context. This seems very unsafe, not just because of this sort of locking issue.

vicuna · 2017-03-30T09:26:42Z

Comment author: @stedolan

See #1128.

github-actions · 2020-05-08T04:16:46Z

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

lpw25 · 2020-05-08T09:11:09Z

Still being worked on in #1128

vicuna added the stdlib label Mar 14, 2019

vicuna mentioned this issue Jan 14, 2018

IO mutex and threads #5141

Closed

vicuna added the bug label Mar 20, 2019

github-actions bot added the Stale label May 8, 2020

lpw25 removed the Stale label May 8, 2020

lpw25 assigned stedolan May 8, 2020

stedolan mentioned this issue Jun 29, 2020

EINTR-based signals, again #9722

Merged

1 task

gadmm mentioned this issue Jul 12, 2020

Make systhread mutexes errorcheck #9757

Closed

stedolan linked a pull request Jul 27, 2020 that will close this issue

EINTR-based signals, again #9722

Merged

1 task

xavierleroy closed this as completed in #9722 Jul 28, 2020

Octachron mentioned this issue Dec 17, 2020

OCaml 4.12.0~alpha3 packages ocaml/opam-repository#17843

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad locking in io.c #7503

Bad locking in io.c #7503

vicuna commented Mar 14, 2017

vicuna commented Mar 14, 2017

vicuna commented Mar 30, 2017

github-actions bot commented May 8, 2020

lpw25 commented May 8, 2020