Browse thread
[Caml-list] In need of serious help regarding threading
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Xavier Leroy <Xavier.Leroy@i...> |
| Subject: | Re: [Caml-list] In need of serious help regarding threading |
> It appears (with the help of a friend good with a debugger) that > there's something going wrong with my code, and how it's interacting > with the GC. Please don't get offended by what I'm going to say, but I have the feeling that you're attacking extremely hard problems without adequate debugging tools and without enough understanding of the OCaml runtime system. I'll try to provide some explanations nonetheless, but please don't bombard this list with too many cries for help. What the debugging session shows is a problem with return address determination during the stack scanning performed by the GC. To find heap pointers contained in the stack, the GC scans it one frame at a time, using compiler-generated frame descriptors to locate the pointers. The frame descriptors are keyed to the return address in the Caml code through a hash table (variable frame_descriptors, hash table lookup at lines 135-141 and 249-255 in file asmrun/roots.c). Your run appears to be looping in the hash table lookup, indicating that 1- the return address being looked up (variable retaddr) is not in the table (this should never happen in normal operation), and 2- your environment lets you dereference the NULL pointer without crashing (bad idea!). A good way to debug this is to print the value of the "retaddr" local variable at lines 134 and 249 in asmrun/roots.c and correlate it with your disassembly. It should always refer to code addresses immediately following a "call camlModule__function" or a "call caml_call_gc" instruction. While you're at it, print also the "sp" variable: it should stay within the stack of a thread. The problem is likely to come from wrong values of the bottom_of_stack and last_return_address starting points for the stack walk. Your second test (Gc.full_major() in the main thread) further suggests that the problem does not occur if the main thread is the one calling the GC. Try to put Gc.full_major() in another thread to see what happens. That could narrow the problem to the saving and restoring of caml_bottom_of_stack and caml_last_return_address globals during context switches. Finally, notice that your stacks are tiny (4096 words???). Unless they are protected by guard pages, expect a lot of trouble when they overflow (they will). - Xavier Leroy