Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug mode for native otherlibs/systhreads #6204

Closed
vicuna opened this issue Oct 6, 2013 · 1 comment
Closed

debug mode for native otherlibs/systhreads #6204

vicuna opened this issue Oct 6, 2013 · 1 comment

Comments

@vicuna
Copy link

vicuna commented Oct 6, 2013

Original bug ID: 6204
Reporter: @edwintorok
Status: acknowledged (set by @damiendoligez on 2014-06-04T20:11:46Z)
Resolution: open
Priority: normal
Severity: feature
Platform: x86_64
OS: Linux
Version: 4.01.0
Category: otherlibs
Tags: patch
Monitored by: @gasche @ygrek

Bug description

It would be useful to use the pthread provided facilities to detect mutex misuse
(EDEADLK/EPERM).
There already is a "-runtime-variant d" linker flag that enables more checks in the (GC) runtime. I propose same flag to enable more checks in st_stubs.c.

Attached patch implements this:

  • -with-debug-runtime causes a $(LIBDIR)/threadsd/libthreadnat.a to be installed (st_stubs built with -g -DDEBUG)
  • use PTHREAD_MUTEX_ERRORCHECK mutex types (exceptions raised when Mutex.lock/unlock is misused)
  • check the masterlock and give a fatal error message when double-release(EPERM) or double-acquire(EDEADLK) is detected
  • check return code of more pthread functions

Caveats:

  • only implemented for POSIX threads (st_posix.h)
  • only implemented for native code (ocamlopt), bytecode keeps using the non-debug runtime
  • if a C stub does just a caml_enter_blocking_section(),
    and returns without raising an exception, then the missing caml_leave_blocking_section() is detected only at the next caml_enter_blocking_section()/caml_raise* call.

The patch is just a draft, more checking could be implemented later.
AFAIK there are some more changes planned for systhreads and I don't know how this patch would conflict with those:
#5373
https://github.com/lucasaiu/ocaml

Suggestions welcome.

Steps to reproduce

Apply patch either with 'patch -p1 <combined.patch', or 'git am combined.patch' on top of latest trunk code:
git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@14214 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

Additional information

Example usage to debug the issue in OCamlnet described here:
https://sympa.inria.fr/sympa/arc/caml-list/2013-09/msg00342.html

With my patch the bug is pinpointed to "equeue_ssl_single_shutdown", and a somewhat helpful fatal error is printed by OCaml:
Fatal error: cannot release OCaml master lock
The OCaml master lock is owned by another thread!
(Did you forget a caml_leave_blocking_section() call?)

$ ocamlbuild -use-ocamlfind ./http_mt.native ./http_mt.byte -tag debug -lflags "-runtime-variant d"
$ gdb -batch -ex "b exit" -ex "r" -ex "bt" -ex "quit" ./http_mt.native
Breakpoint 1 at 0x406a40
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

OCaml runtime: debug mode

Initial minor heap size: 2048k bytes
Initial major heap size: 992k bytes
Initial space overhead: 80%
Initial max overhead: 500%
Initial heap increment: 992k bytes
Initial allocation policy: 0

OCaml systhreads: debug mode

<>Starting new major GC cycle

OCaml runtime: heap check

![New Thread 0x7ffff7bfd700 (LWP 7312)]
[New Thread 0x7ffff73fc700 (LWP 7313)]
[New Thread 0x7ffff6bfb700 (LWP 7314)]
[New Thread 0x7ffff63fa700 (LWP 7315)]
[New Thread 0x7ffff5bf9700 (LWP 7316)]
[New Thread 0x7ffff53f8700 (LWP 7317)]
[New Thread 0x7ffff4bf7700 (LWP 7318)]
[New Thread 0x7ffff43f6700 (LWP 7319)]
[New Thread 0x7ffff3bf5700 (LWP 7320)]
[New Thread 0x7ffff33f4700 (LWP 7321)]
[New Thread 0x7ffff2bf3700 (LWP 7322)]
<>$Growing heap to 1984k bytes
Growing page table to 2048 entries
<>Starting new major GC cycle

OCaml runtime: heap check

!Growing heap to 2976k bytes
Growing heap to 3968k bytes
Fatal error: cannot release OCaml master lock
The OCaml master lock is owned by another thread!
(Did you forget a caml_leave_blocking_section() call?)
[Switching to Thread 0x7ffff3bf5700 (LWP 7320)]

Breakpoint 1, __GI_exit (status=2) at exit.c:99
99 exit.c: No such file or directory.
#0 __GI_exit (status=2) at exit.c:99
#1 0x000000000054da3d in caml_fatal_error (msg=0x573c50 "Fatal error: cannot release OCaml master lock\nThe OCaml master lock is owned by another thread!\n(Did you forget a caml_leave_blocking_section() call?)\n") at misc.d.c:53
#2 0x0000000000545041 in st_masterlock_release (m=0x8a6a80 <caml_master_lock>) at st_posix.h:196
#3 0x0000000000545a13 in caml_thread_enter_blocking_section () at st_stubs.c:178
#4 0x0000000000545c41 in caml_io_mutex_unlock_exn () at st_stubs.c:262
#5 0x000000000054b5d1 in caml_raise (v=140737352117712) at fail.d.c:57
#6 0x000000000054b7e7 in caml_raise_with_arg (tag=140737351355792, arg=11) at fail.d.c:93
#7 0x0000000000538f2d in equeue_ssl_single_shutdown ()
#8 0x000000000040f2a6 in camlUq_ssl__fun_3959 () at uq_ssl.ml:626
#9 0x000000000040e957 in camlUq_ssl__fun_4167 () at uq_ssl.ml:805
#10 0x0000000000514161 in camlList__map_1040 () at list.ml:55
#11 0x0000000000411c3f in camlUq_ssl__fun_4127 () at uq_ssl.ml:798
#12 0x0000000000444c7e in camlUnixqueue_pollset__forward_event_to_1571 () at unixqueue_pollset.ml:768
#13 0x0000000000441848 in camlEqueue__fun_1257 () at equeue.ml:166
#14 0x000000000051cc59 in camlQueue__iter_1050 () at queue.ml:134
#15 0x00000000004422d2 in camlEqueue__run_1072 () at equeue.ml:159
#16 0x00000000004466f5 in camlUnixqueue_pollset__fun_3318 () at unixqueue_pollset.ml:999
#17 0x000000000040d37f in camlHttp_mt__f_1037 () at http_mt.ml:34
#18 0x0000000000506c99 in camlThread__fun_1081 () at thread.ml:37
#19 0x00000000005699de in caml_start_program ()
#20 0x0000000000909160 in ?? ()
#21 0x0000000000000000 in ?? ()
A debugging session is active.

Inferior 1 [process 7308] will be killed.

Quit anyway? (y or n) [answered Y; input not from terminal]

File attachments

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant