Allocating custom blocks become expensive with large major heaps #7676

vicuna · 2017-11-22T15:00:35Z

Original bug ID: 7676
Reporter: @alainfrisch
Assigned to: @damiendoligez
Status: resolved (set by @damiendoligez on 2018-11-13T12:31:33Z)
Resolution: fixed
Priority: normal
Severity: minor
Fixed in version: 4.08.0+dev/beta1/beta2
Category: runtime system and C interface
Related to: #4108 #6133 #6294 #6962
Monitored by: @ygrek @dbuenzli @yakobowski

Bug description

Custom blocks allocated in or moved to the major heap add some "pressure" to trigger a major GC slice. The current criterion is such that a GC slice is triggered more often if the size of the major heap grows. This can result is noticeable slowdown.

The function implementing this strategy is:

CAMLexport void caml_adjust_gc_speed (mlsize_t res, mlsize_t max)
{
  if (max == 0) max = 1;
  if (res > max) res = max;
  caml_extra_heap_resources += (double) res / (double) max;
  if (caml_extra_heap_resources > 1.0){
    CAML_INSTR_INT ("request_major/adjust_gc_speed_1@", 1);
    caml_extra_heap_resources = 1.0;
    caml_request_major_slice ();
  }
  if (caml_extra_heap_resources
           > (double) caml_minor_heap_wsz / 2.0
             / (double) caml_stat_heap_wsz) {
    CAML_INSTR_INT ("request_major/adjust_gc_speed_2@", 1);
    caml_request_major_slice ();
  }
}

The second criterion becomes easier and easier to satisfy as the size of the major heap grows. It is not clear to me why the ratio of the size of the minor and major heaps is taken into account, nor why the constant 2 was chosen.

See below for a program that illustrates the behavior. It runs two computations starting from different heap sizes. The first one allocates a large block (in the major heap), then does many small allocations (to trigger a minor GC). The second one create an in_channel (which is a custom block, with a "pressure ratio" of 1/1000, i.e. one asks to force a full GC cycle every 1000 allocations). The timings are:

n =     2000 => 0.22s / 0.23s
n =     4000 => 0.22s / 0.35s
n =     6000 => 0.23s / 0.47s
n =     8000 => 0.22s / 0.73s
n =    10000 => 0.26s / 0.88s
n =    12000 => 0.23s / 1.06s
n =    14000 => 0.23s / 1.02s
n =    16000 => 0.24s / 1.19s
n =    18000 => 0.23s / 1.35s
n =    20000 => 0.25s / 1.30s
n =    22000 => 0.28s / 1.55s
n =    24000 => 0.25s / 1.83s
n =    26000 => 0.26s / 1.60s
n =    28000 => 0.23s / 1.88s
n =    30000 => 0.23s / 1.82s
n =    32000 => 0.25s / 2.79s
n =    34000 => 0.25s / 2.12s
n =    36000 => 0.23s / 2.66s
n =    38000 => 0.21s / 2.52s
n =    40000 => 0.25s / 2.38s

n is proportional to the size of the initial heap. One sees clearly that the speed of the first computation does not depend on the heap size, while the speed of the second one is directly impacted.

The marginal cost of opening a channel raises to about 2ms and would continue to grow with larger major heaps.

This specific problem with in_channel could be addressed in part by reducing the ratio 1/1000, but the real issue here might be with the criterion above, which impacts all kinds of custom objects (that add "pressure").

Additional information

let a = ref [||]

let test n =
  (* Force the major heap to grow *)
  Gc.compact ();
  a := Array.init n (fun _ -> Array.make 10000 0);
  Gc.compact ();
  let t0 = Sys.time () in
  (* First, try with allocating an array -> the speed of
     this is independent of the initial size of the major heap. *)
  for i = 1 to 1000 do
    let a = Array.make 4096 0 in
    (* Enough allocation to trigger a minor GC *)
    for _j = 1 to 1024 do ignore (Array.make 256 0) done;
    a.(0) <- i
  done;
  let t1 = Sys.time () in
  Gc.compact ();
  let t2 = Sys.time () in
  (* Second, try with allocating a custom (here, an in_channel) ->
     this becomes slower as the heap grows. *)
  for i = 1 to 1000 do
    let ic = open_in "foobar" in
    (* Enough allocation to trigger a minor GC *)
    for _j = 1 to 1024 do ignore (Array.make 256 0) done;
    close_in ic;
  done;
  let t3 = Sys.time () in
  Printf.printf "n = % 8i => %.02fs / %.02fs\n%!" n (t1 -. t0) (t3 -. t2)


let () =
  let oc = open_out "foobar" in close_out oc;
  for i = 1 to 20 do
    test (i * 2000)
  done

vicuna · 2017-12-20T17:06:18Z

Comment author: @xavierleroy

Is this another incarnation of the familiar "allocation of custom objects mess up the speed of the major GC" problem? Or a distinct, new problem?

vicuna · 2017-12-21T07:59:06Z

Comment author: @alainfrisch

Is this another incarnation of the familiar "allocation of custom objects mess up the speed of the major GC" problem? Or a distinct, new problem?

Of course it's in the same family of problems, but I believe it's distinct from other reports. What I find specifically weird here is the second condition:

(caml_extra_heap_resources > (double) caml_minor_heap_wsz / 2.0 / (double) caml_stat_heap_wsz)

which can ultimately lead to a major GC for every allocation of a custom block. I don't understand the rationale behind computing the ratio of the size of the two heaps and comparing it with "caml_extra_heap_resources".

The first condition, OTOH, seems well understood (including its problems), and corresponds to the documented behavior.

Other tickets are about:

tracking custom blocks in the minor heap
allowing to specify the pressure on the GC as a ratio of the major heap size
making it easy for libraries to give control to their client code about the ratio to be used for its custom blocks

vicuna · 2018-11-09T13:20:58Z

Comment author: @alainfrisch

The sample program now exihibits a much nicer behavior thanks to #1738 . That said, I believe it would still be worth revisiting the condition

if (caml_extra_heap_resources
> (double) caml_minor_heap_wsz / 2.0
/ (double) caml_stat_heap_wsz)

which looks super weird to me, so I'm leaving this ticket open.

vicuna · 2018-11-13T12:30:40Z

Comment author: @damiendoligez

#2144

vicuna closed this as completed Nov 13, 2018

vicuna added the stdlib label Mar 14, 2019

vicuna assigned damiendoligez Mar 14, 2019

This was referenced Dec 20, 2017

Wish: make max size of custom space runtime-configurable #6133

Closed

Memory-mapping of bigarrays may exhaust address space #4108

Closed

Poor tracking of extra heap resources #6294

Closed

bigarray lacks a free-like call. #6962

Closed

vicuna added the bug label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocating custom blocks become expensive with large major heaps #7676

Allocating custom blocks become expensive with large major heaps #7676

vicuna commented Nov 22, 2017

vicuna commented Dec 20, 2017

vicuna commented Dec 21, 2017

vicuna commented Nov 9, 2018

vicuna commented Nov 13, 2018

Allocating custom blocks become expensive with large major heaps #7676

Allocating custom blocks become expensive with large major heaps #7676

Comments

vicuna commented Nov 22, 2017

Bug description

Additional information

vicuna commented Dec 20, 2017

vicuna commented Dec 21, 2017

vicuna commented Nov 9, 2018

vicuna commented Nov 13, 2018