Browse thread
Segfault in ARM EABI for programm compiled with ocamlopt 3.12.0
- rixed@h...
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2010-11-24 (00:22) |
From: | rixed@h... |
Subject: | Segfault in ARM EABI for programm compiled with ocamlopt 3.12.0 |
For some time now I'm after a bug hitting a program of mine when compiled on ARM with ocaml 3.12.0. I initially though my own C code was misbehaving but the program keep crashing, although not as early, if I comment out all calls to the C functions. The segfaults happen frequently during the GC, in oldify_one or oldify_mopup, but also in a few other places such as camlList__rev_append or caml__apply2 or any other places as well. In caml_oldify_one, for instance, the segfault always happen at the same location : the assertion that sz is not 0 (and of course when you read the code it's pretty clear that sz=0 correspond to the case "already forwarded" that's handled at the beginning of the function). The pattern, then, is that a register (usually r0, r2 or r5) is restored from the stack after a call to a function that might call the GC (or to a call to the GC itself), then dereferenced. It's obvious inspecting the stack with gdb that this very word was changed during the call and a value like 0, 3 or 1024 is read back into the register instead of an mlvalue. I didn't managed (yet) to reduce the size of the program to a small show case, and I am under the impression that all these components are required in order for the bug to happen 'fast enough' : - threads - floats - call to C function (greatly reduce the time to wait before the crash) I am also under the impression that the bug is affected by the new stack alignment requirement (because in one occurrence, calling or not a function that does nothing from within a function hit by the bug reduced drastically the probability of the bug, and the major difference I saw was that on one version of the function the stack size was 16 bytes and the other 24 bytes (16+4 apparently for the address of a "module" structure, aligned up to 24 bytes). I thus manually checked the generated framesets but they were allright as far as I understand them. Now I'm a little desperate since each recompile+test takes about 20 minutes and the bug is so erratic ; so if someone here is familiar with ARM arch and in particular the difference between old and new ABI please suggest me what I should check, or any hint whatsoever. I'd be very much grateful as this consumes a lot of my spare time. Also, I'm compiling ocaml with gcc 4.2.1 - do you think it may be a problem with gcc not following the very same ABI ? Also I've run the testsuite but it did not reveal anything.