Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core dump on Linux #3022

Closed
vicuna opened this issue Nov 7, 2001 · 4 comments
Closed

core dump on Linux #3022

vicuna opened this issue Nov 7, 2001 · 4 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Nov 7, 2001

Original bug ID: 615
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: minor
Category: ~DO NOT USE (was: OCaml general)

Bug description

Hi,

I got a core dump when running the Unison 2.7.7 program compiled with
Ocaml 3.0.2.

The stack trace is as follows:

Core was generated by `unison work1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
3012 malloc.c: No such file or directory.
(gdb) where
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
#1 0x809f51a in stat_free (blk=0x409b4640) at memory.c:335
#2 0x80a25c1 in finalize_channel (vchan=1083917876) at io.c:376
#3 0x809e57e in sweep_slice (work=96814) at major_gc.c:232
#4 0x809e7a6 in major_collection_slice () at major_gc.c:312
#5 0x809ed77 in minor_collection () at minor_gc.c:163
#6 0x809edca in check_urgent_gc (extra_root=1083917876) at minor_gc.c:174
#7 0x80a7be4 in alloc_custom (ops=0x80f7778, size=4, mem=1, max=1000)
at custom.c:39
#8 0x80a2615 in alloc_channel (chan=0x816c4d8) at io.c:397
#9 0x80a263e in caml_open_descriptor (fd=7) at io.c:405
#10 0x80823ab in Pervasives_open_in_gen_218 ()
Cannot access memory at address 0x21.

Thanks,
Rok

@vicuna
Copy link
Author

vicuna commented Nov 8, 2001

Comment author: administrator

Hi Xavier,

thank you for your advise. I am keen to try your suggestion.
However, it might take me a few days. I will send you results
as soon as possible.

Cheers,
Rok

On Thu, 8 Nov 2001, Xavier Leroy wrote:

The stack trace is as follows:

Core was generated by `unison work1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
3012 malloc.c: No such file or directory.
(gdb) where
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
#1 0x809f51a in stat_free (blk=0x409b4640) at memory.c:335
#2 0x80a25c1 in finalize_channel (vchan=1083917876) at io.c:376
#3 0x809e57e in sweep_slice (work=96814) at major_gc.c:232
#4 0x809e7a6 in major_collection_slice () at major_gc.c:312
#5 0x809ed77 in minor_collection () at minor_gc.c:163
#6 0x809edca in check_urgent_gc (extra_root=1083917876) at minor_gc.c:174
#7 0x80a7be4 in alloc_custom (ops=0x80f7778, size=4, mem=1, max=1000)
at custom.c:39
#8 0x80a2615 in alloc_channel (chan=0x816c4d8) at io.c:397
#9 0x80a263e in caml_open_descriptor (fd=7) at io.c:405
#10 0x80823ab in Pervasives_open_in_gen_218 ()
Cannot access memory at address 0x21.

It's usually impossible to debug a GC-related problem with just a
backtrace, however in this particular instance the backtrace is really
telling and I think I found what goes wrong.

If you're willing to recompile OCaml, could you please change
byterun/custom.c as follows:

CAMLextern value alloc_custom(struct custom_operations * ops,
unsigned long size,
mlsize_t mem,
mlsize_t max)
{
mlsize_t wosize;
value result;

wosize = 1 + (size + sizeof(value) - 1) / sizeof(value);
if (ops->finalize == NULL && wosize <= Max_young_wosize) {
result = alloc_small(wosize, Custom_tag);
Custom_ops_val(result) = ops;
} else {
result = alloc_shr(wosize, Custom_tag);
Custom_ops_val(result) = ops;
adjust_gc_speed(mem, max);
/* result = check_urgent_gc(result); <==== COMMENT OUT THIS LINE */
}
return result;
}

and tell us if the problem goes away?

Thanks,

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Nov 8, 2001

Comment author: administrator

The stack trace is as follows:

Core was generated by `unison work1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
3012 malloc.c: No such file or directory.
(gdb) where
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
#1 0x809f51a in stat_free (blk=0x409b4640) at memory.c:335
#2 0x80a25c1 in finalize_channel (vchan=1083917876) at io.c:376
#3 0x809e57e in sweep_slice (work=96814) at major_gc.c:232
#4 0x809e7a6 in major_collection_slice () at major_gc.c:312
#5 0x809ed77 in minor_collection () at minor_gc.c:163
#6 0x809edca in check_urgent_gc (extra_root=1083917876) at minor_gc.c:174
#7 0x80a7be4 in alloc_custom (ops=0x80f7778, size=4, mem=1, max=1000)
at custom.c:39
#8 0x80a2615 in alloc_channel (chan=0x816c4d8) at io.c:397
#9 0x80a263e in caml_open_descriptor (fd=7) at io.c:405
#10 0x80823ab in Pervasives_open_in_gen_218 ()
Cannot access memory at address 0x21.

It's usually impossible to debug a GC-related problem with just a
backtrace, however in this particular instance the backtrace is really
telling and I think I found what goes wrong.

If you're willing to recompile OCaml, could you please change
byterun/custom.c as follows:

CAMLextern value alloc_custom(struct custom_operations * ops,
unsigned long size,
mlsize_t mem,
mlsize_t max)
{
mlsize_t wosize;
value result;

wosize = 1 + (size + sizeof(value) - 1) / sizeof(value);
if (ops->finalize == NULL && wosize <= Max_young_wosize) {
result = alloc_small(wosize, Custom_tag);
Custom_ops_val(result) = ops;
} else {
result = alloc_shr(wosize, Custom_tag);
Custom_ops_val(result) = ops;
adjust_gc_speed(mem, max);
/* result = check_urgent_gc(result); <==== COMMENT OUT THIS LINE */
}
return result;
}

and tell us if the problem goes away?

Thanks,

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Nov 9, 2001

Comment author: administrator

Hi Xavier,

I was able to reproduce a core dump again. It is hard to do it
deterministically, since a large number of files need to be
synchronized with Unison, over 130,000, and the problem patterns
are not yet obvious. I have encountered problems at several
different stages of execution, resulting either in an infinite
loop or core dump. Two instances were reported on this mailing list.

After another core dump, I have tried your suggestion. It delayed
core dump for few directories, but did not prevent core dump.
The call stack was completely corrupted in gdb.

I have also tried Jerome's suggestion of upgrading to 3.03-alpha.
Everything worked fine with 3.03. I have also tried several other
scenarios with 3.03 that highly likely caused problems with 3.02
before. So far, there are no problems. It seems that 3.03 is the
way to go. If I find any further problems, I will send you a report.

Thanks a lot to everyone. Your help has been fantastic. Keep up
good work.

Cheers,
Rok

On Thu, 8 Nov 2001, Xavier Leroy wrote:

The stack trace is as follows:

Core was generated by `unison work1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
3012 malloc.c: No such file or directory.
(gdb) where
#0 0x4008abce in __libc_free (mem=0x409b4640) at malloc.c:3012
#1 0x809f51a in stat_free (blk=0x409b4640) at memory.c:335
#2 0x80a25c1 in finalize_channel (vchan=1083917876) at io.c:376
#3 0x809e57e in sweep_slice (work=96814) at major_gc.c:232
#4 0x809e7a6 in major_collection_slice () at major_gc.c:312
#5 0x809ed77 in minor_collection () at minor_gc.c:163
#6 0x809edca in check_urgent_gc (extra_root=1083917876) at minor_gc.c:174
#7 0x80a7be4 in alloc_custom (ops=0x80f7778, size=4, mem=1, max=1000)
at custom.c:39
#8 0x80a2615 in alloc_channel (chan=0x816c4d8) at io.c:397
#9 0x80a263e in caml_open_descriptor (fd=7) at io.c:405
#10 0x80823ab in Pervasives_open_in_gen_218 ()
Cannot access memory at address 0x21.

It's usually impossible to debug a GC-related problem with just a
backtrace, however in this particular instance the backtrace is really
telling and I think I found what goes wrong.

If you're willing to recompile OCaml, could you please change
byterun/custom.c as follows:

CAMLextern value alloc_custom(struct custom_operations * ops,
unsigned long size,
mlsize_t mem,
mlsize_t max)
{
mlsize_t wosize;
value result;

wosize = 1 + (size + sizeof(value) - 1) / sizeof(value);
if (ops->finalize == NULL && wosize <= Max_young_wosize) {
result = alloc_small(wosize, Custom_tag);
Custom_ops_val(result) = ops;
} else {
result = alloc_shr(wosize, Custom_tag);
Custom_ops_val(result) = ops;
adjust_gc_speed(mem, max);
/* result = check_urgent_gc(result); <==== COMMENT OUT THIS LINE */
}
return result;
}

and tell us if the problem goes away?

Thanks,

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Nov 23, 2001

Comment author: administrator

Seems fixed in 3.03 alpha

@vicuna vicuna closed this as completed Nov 23, 2001
@vicuna vicuna added the bug label Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant