[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | skaller <skaller@u...> |
| Subject: | Help finding possible bug in 3.09/ia64 |
I am wondering if anyone has access to an ia64 and who is willing to help track down a possible bug in Ocaml 3.09 for that platform. Details and background follow. I would prefer confirmation before reporting a bug, also kind of hard to report a bug on an architecture I don't have access to .. :) Mike Furr has found a problem with one of the Felix regression tests on the ia64 platform. It appears to me this is most likely a bug in the ia64 runtime, and next most likely a bug in the native code generator for ia64. The code in question works fine on i386, amd64, ppc and some other architectures. The program contains no uses of any C bindings, no use of the Obj module, and no unsafe array accesses. The error manifests as: PATH=bin:"$PATH" LD_LIBRARY_PATH=rtl:"$LD_LIBRARY_PATH" bin/flxg -Ilib tut/examples/mac126 .. ERROR CODE 0xb TESTFILE -- ERROR! tut/examples/mac126 during 'make test'. A segfault results from what appears to be a runaway loop in the garbage collector: (gdb) bt #0 0x40000000002a04e0 in caml_oldify_local_roots () #1 0x40000000002a5100 in caml_empty_minor_heap () #2 0x40000000002a5360 in caml_minor_collection () #3 0x40000000002a1b50 in caml_garbage_collection () #4 0x40000000002c5ca0 in caml_call_gc () #5 0x40000000002a5100 in caml_empty_minor_heap () #6 0x40000000002c5ca0 in caml_call_gc () #7 0x40000000002a5100 in caml_empty_minor_heap () #8 0x40000000002c5ca0 in caml_call_gc () #9 0x40000000002a5100 in caml_empty_minor_heap () #10 0x40000000002c5ca0 in caml_call_gc () #11 0x40000000002a5100 in caml_empty_minor_heap () I don't have access to an ia64, so I am unable to do much about this. The fault occurs in a (not uploaded) Debian packaging for Felix 1.1.1, the original tarball is located here: http://felix.sourceforge.net/flx_1.1.0_src.tgz It should build on Unix (or Windows XP64 if Ocaml supports that, though I haven't tried it). Yes, it IS possible there is a bug in the source algorithm -- in fact, there definitely used to be an unchecked overrun -- however the test is deterministic, so it should fail on all architectures with the same word size at least --- it works fine on amd64. [The algorithm DOES contain a potentially infinite recursion which is supposed to be limited] The actual algorithm is probably part of the flx_macro module, since the test is exercising the macro processor. It is (just) possible a deep recursion is overflowing the stack, corrupting memory, and causing the gc to get stuck. Exactly how this could happen I don't know (since it doesn't on other platforms). The test has been around for a long time (over a year I think). -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net