Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005166OCamlOCaml runtime systempublic2010-10-17 02:282013-10-10 17:51
Reporteryziquel 
Assigned Todoligez 
PrioritynormalSeveritymajorReproducibilityalways
StatusfeedbackResolutionopen 
PlatformOSOS Version
Product Version3.12.0 
Target Version4.01.1+devFixed in Version 
Summary0005166: Segfaults in bytecode callbacks
DescriptionI've been experimenting a few things with OCaml callbacks from C++ back into OCaml. And been experiencing segfaults.

I've been storing functions in an OCaml array, then registering it with caml_register_generational_global_roots. These functions are callbacks that are called by a C++ instance. They are typically called with caml_callback3.

In native code, everything seems fine, callbacks work rather seamlessly (although I'm still waiting for the next segfault behind the corner). In bytecode, however, I get segfaults. The issue is the following: I've got a (value *) ml_error pointing to an OCaml value in my array. This value (*ml_error) points to some closure (**ml_error). The first word of this closure points to some OCaml bytecode (***ml_error).

However, when passing the value (*ml_error) to caml_callback3, I get segfaults. Following with gdb caml_callback3 into caml_callbackN_exn and then into caml_interprete, I found that the 'jmpq *%rax' instruction that dispatches opcodes into the bytecode interpreter doesn't interpret correctly the closure: After the opcodes ACC 6 and APPLY, %rax gets set to (**ml_error) and not to (***ml_error), then the bytecode interpreter jumps to a wrong address and segfaults.

Things work fine in native code. So I suspect that this is because the callbacks API would need more work for OCaml bytecode in order to be able to pass closures directly to caml_callbackN_exn.

I have in fact been experiencing this behaviour after wrapping

let callback1 = function x y z -> ...

into

let callback2 = function x y z -> callback1 x y (do_something z)

It worked with callback1, and started segfaulting when wrapping callback1 into callback2 and registering callback2 instead of callback1. So I am wondering if the extra layer of indirection comes from this wrapping, and if that is responsible for the segfault.

Moreover, I am wondering something about the LOCAL_CALLBACK_BYTECODE macro. By default, it does not seem set. However, I do not see the point of not making it the default. When debugging, seeing a global variable like this makes me worry about all kinds of crazy threading issues. Why not make it the default? I do not think that speed is really an issue for people who wish to use bytecode callbacks, so what is the point there?
Tagssuggest_closing
Attached Files

- Relationships

-  Notes
(0005684)
yziquel (reporter)
2010-10-18 10:19

I have been trying to reproduce the error in a small code snippet, but have been unable to do so. Moreover, naming the arguments in the callback in the callback made the segfault disappear. I'll try to reproduce the error in a small code snippet. In the meantime, please feel free to ask privately for the crashing code.
(0005717)
phillyg (reporter)
2010-11-24 19:19

Hi, are you working on Linux? If so, I might be able to help you debug if you're still able to reproduce this same segfault on your machine. Please email me at philip@pgbovine.net if you'd like more details.
(0005916)
doligez (administrator)
2011-05-20 14:51

I'm worried about this:

  I've got a (value *) ml_error pointing to an OCaml value in my array.

If your array is an OCaml array, it is supposed to contain values, not pointer to values.

I'll probably ask for the crashing code after the 3.12.1 release.
(0005926)
yziquel (reporter)
2011-05-22 23:24

Nope. I've got a C++ class with members such as:

private:
  value callbacks_gc_root;
  value * ml_error;

The constructor is a C++ initialisation list

public:
  myclass (value callbacks_array):
  callbacks_gc_root (callbacks_array),
  ml_error ((reinterpret_cast<value *> (callbacks_array)) + OFFSET)
  {
    caml_register_generational_global_root(&callbacks_gc_root);
  }

and an error member function:

void error (const int id, const int code, const std::string msg) {
  caml_leave_blocking_section();
  CAMLparam0();
  CAMLlocal2(ml_id, ml_code);
  ml_id = caml_copy_int32(id);
  ml_code = caml_copy_int32(code);
  value ml_msg = Val_owned_obj<std::string>
    (new (std::nothrow) std::string(msg));
  caml_callback3(*ml_error, ml_id, ml_code, ml_msg);
  caml_local_roots = caml__frame;
  caml_enter_blocking_section();
  return;
}

Val_owned_obj<> is some custom templatised wrapper for C++ objects. Not the issue here.

ml_error is value * that is pointing to an OCaml value in the array. The array doesn't contain a value *, but a value. In fact, ml_error simply is callbacks_array + OFFSET using pointer arithmetics. So ml_error points into the array. The array is an array of closures.

  type closure
  external closure : ('a -> 'b) -> closure = "%identity"

  extern "C" CAMLprim value myclass_ctor (value ml_callbacks) {
    CAMLparam1(ml_callbacks);
    CAMLreturn (Val_owned_obj<myclass>
      (new (std::nothrow) myclass (ml_callbacks)));
  }

  external myclass_ctor : closure array -> pointer = "myclass_ctor"
  let myclass_ctor callbacks =
    let module C = (val callbacks : CALLBACKS) in
    myclass_ctor [|
      ...
      closure C.error;
      ...
    |]

Please ask for the full code when you need it. I have also followed step by step the execution of the bytecode interpreter to the point where it crashes and will gladly hand that to you.

Native code works fine (or rather worked, as this codebase has been dropped).
(0006907)
doligez (administrator)
2012-02-09 19:56

> ml_error is value * that is pointing to an OCaml value in the array.

I don't see how it could cause your problem, but you are not allowed to point in the middle of an OCaml array. If the GC moves it, you're doomed.

Do you still have the code?
(0006908)
yziquel (reporter)
2012-02-09 21:26

I still have the code, and somewhere on my hard drive, I still have details of my investigation following step by step the bytecode interprete (caml_interprete() and what follows). However, it's not really in any presentable state, and I've somehow lost interest in it. If you have an interested in these incomplete and perhaps out of date code samples and trace of the bytecode interpreter, I'm willing to dig all this up.

I do not believe that this a GC issue for the following reason: systematic failure in bytecode mode, and systemic success in native code. Moreover, to my understanding, using caml_register_generational_global_root(&callbacks_gc_root) should fix the array properly in memory.

As far as I've been able to debug the bytecode, it seems to me that it's a problem with the calling convention for bytecode callbacks. If I remember well, the bytecode pushes things in some accumulator, and at one point, it pops it into rax, and you have then have a jump to the address pointed to by rax (or to an offset). This jump is the moment when the bytecode interpreter shifts interpretation to the next bytecode instruction.

The issue was that there was garbage in %rax.

But that is described in the initial bug report.

May I send the code to your (i.e. Damien's) address and not put it on the bug tracker?
(0007047)
doligez (administrator)
2012-03-12 16:27

GC or not, this still looks like a bug in OCaml. Please send me the code and I'll see if I can ferret it out.
(0010071)
yallop (developer)
2013-08-01 21:44

I think this report might involve a misunderstanding about roots.

It seems like the mental model you (yziquel) have is as follows: in general it's unsafe to keep pointers into OCaml objects, since the GC can move them around, but registering a root "fixes" the corresponding value in memory so that it won't be moved by the GC. According to this model, your code is correct: you register callbacks_gc_root (an alias for callbacks_array) as a root, so callbacks_array shouldn't be moved by the collector, and ml_error (calculated as an offset from callbacks_array) should reliably point to a member of the immovable array.

The problem is that roots don't work that way. When you register a root the message to the GC is not "please don't move this OCaml value"; rather, it's "please update this pointer whenever the OCaml value is moved". (Registering a root has other effects: it prevents the value being collected, for one thing, but that's not really what concerns us here.) The GC's responsibility to update roots explains why you have to pass the address of the value rather than the plain value to the caml_register_generational_global_root function.

The easiest way to fix the C code, then, is to change ml_error from a 'value *' to a plain 'value', and register ml_error itself as a root. Since roots are updated when the corresponding values move, you can then be sure that ml_error always points to the correct location.
(0010452)
frisch (developer)
2013-10-10 17:51

yziquel: can you confirm yallop's interpretation?

- Issue History
Date Modified Username Field Change
2010-10-17 02:28 yziquel New Issue
2010-10-18 10:19 yziquel Note Added: 0005684
2010-11-24 19:19 phillyg Note Added: 0005717
2011-05-20 14:50 doligez Status new => assigned
2011-05-20 14:50 doligez Assigned To => doligez
2011-05-20 14:51 doligez Note Added: 0005916
2011-05-22 23:24 yziquel Note Added: 0005926
2012-02-09 19:56 doligez Note Added: 0006907
2012-02-09 19:56 doligez Status assigned => feedback
2012-02-09 19:56 doligez Category OCaml general => OCaml runtime system
2012-02-09 21:26 yziquel Note Added: 0006908
2012-02-09 21:26 yziquel Status feedback => assigned
2012-03-12 16:27 doligez Note Added: 0007047
2012-03-12 16:27 doligez Status assigned => feedback
2012-07-10 17:35 doligez Target Version => 4.01.0+dev
2012-07-31 13:36 doligez Target Version 4.01.0+dev => 4.00.1+dev
2012-09-19 14:15 doligez Target Version 4.00.1+dev => 4.00.2+dev
2013-06-14 14:19 frisch Severity minor => major
2013-06-14 14:19 frisch Target Version 4.00.2+dev => 4.01.0+dev
2013-08-01 21:44 yallop Note Added: 0010071
2013-08-02 16:27 yallop Tag Attached: suggest_closing
2013-08-02 16:35 doligez Target Version 4.01.0+dev => 4.01.1+dev
2013-10-10 17:51 frisch Note Added: 0010452


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker