Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007503OCamlruntime system and C interfacepublic2017-03-14 15:512018-01-13 16:57
Assigned To 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0007503: Bad locking in io.c
DescriptionWith ocamlopt, the locking in io.c is not sufficient in the presence of asynchronous operations like signal handlers and finalisers. Here's an example program

    let () =
      Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _ ->
        print_endline "signalled!"));
      while true; do
        print_endline "hello"

The exact behaviour varies depending on whether systhreads is enabled. With trunk and 4.05.0+beta2,

    ocamlopt -o badlock && ./badlock

gets stuck in an infinite loop in caml_flush

    ocamlopt -runtime-variant d -o badlock && ./badlock

crashes with an assertion failure (the same issue)

    ocamlopt -thread unix.cmxa threads.cmxa -o badlock && ./badlock

deadlocks in pthread_mutex_lock

Version 4.04 and before do not initialise systhreads when merely linked with threads.cmxa if the program does not use any thread features, so they behave as the first case even when compiled with -thread.

I think I understand what's going on in the deadlock case, since I came across this issue when trying to fix a similar problem in the multicore branch (until KC pointed out that the same problem likely exists on trunk). Here's a backtrace from the deadlock in 4.05.0+beta2:

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f2cd0d32bc5 in __GI___pthread_mutex_lock (mutex=0x563e7210ea60) at ../nptl/pthread_mutex_lock.c:80
#2 0x0000563e70e55a6e in caml_io_mutex_lock ()
0000003 0x0000563e70e6920d in caml_ml_output_bytes (vchannel=<optimized out>, buff=<optimized out>, start=<optimized out>,
    length=<optimized out>) at io.c:660
0000004 0x0000563e70e2e0da in camlPervasives__output_string_1213 () at
0000005 0x0000563e70e2e97a in camlPervasives__print_endline_1310 () at
0000006 0x0000563e70e786f4 in caml_start_program ()
0000007 0x0000563e70e5db09 in caml_execute_signal (signal_number=signal_number@entry=2, in_signal_handler=in_signal_handler@entry=0)
    at signals.c:176
0000008 0x0000563e70e5dbda in caml_process_pending_signals () at signals.c:58
0000009 0x0000563e70e5dc87 in caml_process_pending_signals () at signals.c:53
0000010 caml_leave_blocking_section () at signals.c:131
0000011 0x0000563e70e71d2b in caml_write_fd (fd=1, flags=<optimized out>, buf=buf@entry=0x563e720ee7a0, n=n@entry=6) at unix.c:94
0000012 0x0000563e70e68480 in caml_flush_partial (channel=0x563e720ee750) at io.c:168
0000013 0x0000563e70e68f48 in caml_flush (channel=<optimized out>) at io.c:182
0000014 caml_ml_flush (vchannel=<optimized out>) at io.c:612
0000015 0x0000563e70e2e9bf in camlPervasives__print_endline_1310 () at
0000016 0x0000563e70e24c47 in camlBadlock__entry ()
0000017 0x0000563e70e22369 in caml_program ()
0000018 0x0000563e70e786f4 in caml_start_program ()
0000019 0x0000563e70e5c9b5 in caml_main (argv=0x7ffff6bb5a18) at startup.c:145
0000020 0x0000563e70e2203c in main (argc=<optimized out>, argv=<optimized out>) at main.c:37

After performing the write, caml_write_fd calls leave_blocking_section which runs any pending signal handlers. However, the channel's lock is held because it was taken in caml_flush, so when the signal handler tries to re-acquire the lock it deadlocks. (Without -threads, I think it continues and corrupts the channel state, although I'm not sure of the details).

I think that it is unsafe in general to hold mutexes while interacting with the OCaml heap. Due to finalisers and signal handlers, several heap operations can run arbitrary user code (e.g. caml_alloc, caml_enter_blocking_section, and more in multicore), which is to be avoided in a critical section.
TagsNo tags attached.
Attached Files

- Relationships
related to 0005141resolved IO mutex and threads 

-  Notes
stedolan (developer)
2017-03-14 15:55

I think that most of the issue can be remedied by avoiding holding channel mutexes outside of blocking sections, which means moving the entire I/O operation into a single blocking section (At the moment, there are two blocking sections: one to acquire the channel mutex, and then a seperate one to do the I/O. The channel mutex is released outside a blocking section).

The remaining issue is that signals can be handled inside blocking sections. If a signal arrives during a blocking section, I think the code in signals_asm.c will run arbitrary OCaml code from a signal handler context. This seems very unsafe, not just because of this sort of locking issue.
stedolan (developer)
2017-03-30 11:26

See GPR#1128.

- Issue History
Date Modified Username Field Change
2017-03-14 15:51 stedolan New Issue
2017-03-14 15:55 stedolan Note Added: 0017653
2017-03-15 17:25 shinwell Status new => confirmed
2017-03-30 11:26 stedolan Note Added: 0017711
2018-01-13 16:57 xleroy Relationship added related to 0005141

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker