Version française
Home     About     Download     Resources     Contact us    
Browse thread
Help finding possible bug in 3.09/ia64
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Help finding possible bug in 3.09/ia64
I am wondering if anyone has access to an ia64 and who is
willing to help track down a possible bug in Ocaml 3.09
for that platform. Details and background follow.
I would prefer confirmation before reporting a bug,
also kind of hard to report a bug on an architecture 
I don't have access to .. :)

Mike Furr has found a problem with one of the Felix regression
tests on the ia64 platform. It appears to me this is
most likely a bug in the ia64 runtime, and next most likely
a bug in the native code generator for ia64. 

The code in question works fine on i386, amd64, ppc and
some other architectures. The program contains no uses
of any C bindings, no use of the Obj module, and no unsafe
array accesses.

The error manifests as:

PATH=bin:"$PATH" LD_LIBRARY_PATH=rtl:"$LD_LIBRARY_PATH" bin/flxg -Ilib
tut/examples/mac126
  .. ERROR CODE 0xb
TESTFILE -- ERROR! tut/examples/mac126

during 'make test'.

A segfault results from what appears to be a runaway
loop in the garbage collector:

(gdb) bt
#0  0x40000000002a04e0 in caml_oldify_local_roots ()
#1  0x40000000002a5100 in caml_empty_minor_heap ()
#2  0x40000000002a5360 in caml_minor_collection ()
#3  0x40000000002a1b50 in caml_garbage_collection ()
#4  0x40000000002c5ca0 in caml_call_gc ()
#5  0x40000000002a5100 in caml_empty_minor_heap ()
#6  0x40000000002c5ca0 in caml_call_gc ()
#7  0x40000000002a5100 in caml_empty_minor_heap ()
#8  0x40000000002c5ca0 in caml_call_gc ()
#9  0x40000000002a5100 in caml_empty_minor_heap ()
#10 0x40000000002c5ca0 in caml_call_gc ()
#11 0x40000000002a5100 in caml_empty_minor_heap ()

I don't have access to an ia64, so I am unable to
do much about this.

The fault occurs in a (not uploaded) Debian packaging
for Felix 1.1.1, the original tarball is located here:

http://felix.sourceforge.net/flx_1.1.0_src.tgz

It should build on Unix (or Windows XP64 if Ocaml
supports that, though I haven't tried it).

Yes, it IS possible there is a bug in the source
algorithm -- in fact, there definitely used to be
an unchecked overrun -- however the test is deterministic,
so it should fail on all architectures with the same
word size at least --- it works fine on amd64.

[The algorithm DOES contain a potentially infinite
recursion which is supposed to be limited]

The actual algorithm is probably part of the 
flx_macro module, since the test is exercising
the macro processor.

It is (just) possible a deep recursion is overflowing
the stack, corrupting memory, and causing the gc to
get stuck. Exactly how this could happen I don't know
(since it doesn't on other platforms). The test
has been around for a long time (over a year I think).


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net