Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006919OCamlruntime system and C interfacepublic2015-06-27 04:172015-07-10 16:09
Reporterygrek 
Assigned To 
PriorityurgentSeveritycrashReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version4.02.2 
Target Version4.02.3+devFixed in Version4.02.3+dev 
Summary0006919: corrupted final_table
DescriptionWe are experiencing strange crashes in Gc after switching from 4.02.1 to 4.02.2 but I don't have a small repro case for now (and as such cannot exclude misbehaving C bindings etc but the code is stable with 4.02.1), maybe you have a quick idea based on symptoms.
 My investigation led me to the following changeset :

 https://github.com/ocaml/ocaml/commit/444d6c2eb3e82a003c40c10b0b608909fa1f9a78#diff-ff9cb580dcca5bf97a4e407aba803b81R260 [^]

 AFAIU it changes behaviour in the way that final_table offsets are now not updated after every minor collection, I do not know whether
 it is an important invariant.

 Here are the details of my issue if of any use :

 It crashes when calling functions from final_table, in my case it is a Gc alarm registered by ocamlnet, but that alarm
 just sets one mutable variable, so it is not a suspect.

 At the start of program final_table looks alright with one entry like this :

(gdb) ml_dump/r final_table 4
*0x1c3c300: Closure( camlGc__call_alarm_1056 , 0x3 )
*0x1c3c308: ( ( 1 ) , Closure( camlNetsys_win32__fun_2219 , 0x3 ) )
*0x1c3c310: NULL

*0x1c3c318: NULL

 but at crash time it is obviously wrong :

(gdb) ml_dump final_table 4
*0x227e300: Closure( camlGc__call_alarm_1056 , 0x3 )
*0x227e308: u'Private_Dirty: 12 kB'
*0x227e310: NULL

*0x227e318: NULL

instead of "Private_Dirty" string it can be any ocaml value.

 Stack trace looks like this :

(gdb) bt
#0 0x00000000005d8467 in camlGc__call_alarm_1056 () at gc.ml:87
#1 0x00000000006633ba in caml_start_program ()
#2 0x000000000065f3db in caml_gc_compaction ()
0000003 0x00000000004acb71 in camlMemory__reclaim_s_1540 () at memory.ml:77
0000004 0x00000000004acdc5 in camlMemory__reclaim_1555 () at memory.ml:92

 When run with debug runtime it fails on assert on line 163 in byterun/finalize.c

void caml_final_do_strong_roots (scanning_action f)
{
  uintnat i;
  struct to_do *todo;

  Assert (old == young);

 I would be very much grateful for any pointers how to debug this or provide more info..
Steps To ReproduceNone for now, but I can reproduce it locally in less than 5 minutes.
Tagspatch
Attached Files

- Relationships

-  Notes
(0014165)
ygrek (reporter)
2015-06-27 05:22

This patch seems to fix it for me

diff --git a/byterun/minor_gc.c b/byterun/minor_gc.c
index 4aaec96..4db3f33 100644
--- a/byterun/minor_gc.c
+++ b/byterun/minor_gc.c
@@ -260,6 +260,10 @@ void caml_empty_minor_heap (void)
     caml_final_empty_young ();
     if (caml_minor_gc_end_hook != NULL) (*caml_minor_gc_end_hook) ();
   }
+ else
+ {
+ caml_final_empty_young ();
+ }
 #ifdef DEBUG
   {
     value *p;
(0014173)
doligez (administrator)
2015-06-29 17:58

Thanks for the report. I think you've nailed it, so you shouldn't spend time on a repro case.
(0014181)
edwin (reporter)
2015-07-01 18:28

FWIW I just ran into this (with various symptoms: application crashing in pthread_cancel unwinder on exit, segfault after fork when Lwt is built with libev but not when built without, or segfault after fork when using OpenSSL from Lwt even without libev): https://github.com/ocsigen/lwt/issues/168 [^]

I've created the testcase below before finding this bug (indeed from OCamlnet's Netsys_pollset_win32.ml), and I confirm that the patch fixes both the testcase and the segfaults in my application:

let x = ref false
let _ = Gc.create_alarm (fun () -> x := true)
let () =
  Gc.compact ();
  Gc.compact ()

(* ocamlc x.ml -runtime-variant d -o x && ./x
 ...
 file finalise.c; line 163 ### Assertion failed: old == young *)
(0014197)
doligez (administrator)
2015-07-10 16:09

Thanks for the report, the fix, and the test case.

Fixed in 4.02 branch (rev 16197).

- Issue History
Date Modified Username Field Change
2015-06-27 04:17 ygrek New Issue
2015-06-27 05:22 ygrek Note Added: 0014165
2015-06-29 17:51 doligez Tag Attached: patch
2015-06-29 17:51 doligez Priority normal => urgent
2015-06-29 17:51 doligez Status new => acknowledged
2015-06-29 17:58 doligez Note Added: 0014173
2015-07-01 18:28 edwin Note Added: 0014181
2015-07-10 16:09 doligez Note Added: 0014197
2015-07-10 16:09 doligez Status acknowledged => closed
2015-07-10 16:09 doligez Resolution open => fixed
2015-07-10 16:09 doligez Fixed in Version => 4.02.3+dev
2015-07-10 16:09 doligez Target Version => 4.02.3+dev
2017-02-23 16:43 doligez Category OCaml runtime system => runtime system
2017-03-03 17:45 doligez Category runtime system => runtime system and C interface


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker