You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original bug ID: 7750 Reporter:@stijn-devriendt Assigned to:@damiendoligez Status: resolved (set by @alainfrisch on 2018-11-09T13:27:26Z) Resolution: fixed Priority: high Severity: major Platform: Linux OS: Ubuntu OS Version: 14.04 Version: 4.05.0 Target version: 4.08.0+dev/beta1/beta2 Category: runtime system and C interface Related to:#7158 Parent of:#7180 Monitored by:@nojb@gasche@hcarty@dbuenzli@alainfrisch
Bug description
In a long-running application we make heavy use of Bigarrays (easily pushing 250MiB/s per application). With 4 of these applications running on a server with 128GiB of RAM, the application stalled with Out_of_memory exceptions. Apparently each of them was eating 30GiB of memory with mostly stale Bigarrays.
After triggering compaction on one of the applications through RPC, memory use dropped to multiple MiBs (see additional info). Memory use as observed by the OS also fell steeply as all the stale bigarrays were disposed off. (This also proved that it wasn't a memory leak in other parts of our code).
In the output one can see that the ratio of minor/major collections is heavily skewed in favor of minor collections (36000/137). If many bigarrays live on the major heap, this means they will linger for a long time before getting cleaned up. Also, because the (default) space_overhead of 80% wasn't hit (1,3GiB/1,8GiB) major collection also wasn't sped up.
I created a small test-case that I believe sufficiently represents the issue at hand. The provided test-application easily takes 11GiB of memory, where only about 4-6 (?) would be expected. Settings GC parameters space_overhead or max_overhead does not help. I believe that it might help our normal application a bit since a more normal memory allocation pattern is expected there, but that remains to be tested.
The test application essentially does the following:
create a very large major heap (about 2GiB), causing the major slice to do less work
repeatedly create 800MiB of bigarrays, then flush the minor heap so they get promoted to the major heap
it also keeps track of the bigarrays for 3 iterations, so one would in the order of magnitude of 2GiB + (3*800MiB) = 4.4GiB of memory.
As for the trigger of the bad behavior, I believe the fixing of #7158 caused many fewer major garbage collections in our case (we also make use of Lwt_preemptive) - this issue started occuring after we upgraded the compiler from 4.03.0 to 4.05.0.
Secondly, I believe that the last question posed in #7180 may be related here:
if p is capped at 0.3, why do we reset caml_extra_heap_resources to 0.0?
Shouldn't it be decreased by the actual amount of work that was done to copy
with the steady state assumption?
In the test case, one can see that immediately after creating the bigarrays 'p' is capped to 0.3 and the remaining slices are rather small. In essence, it means that the garbage collector forgets about the excess and slows down significantly.
One potential fix would be to leave that excess in extra_heap_resources for the next iteration. Another improvement is probably to not only look at the ratio of free/total, but also have either a time-based or absolute-value based trigger (e.g. whenever promoted_words crosses a configurable boundary)
let minor_heap_size = Gc.((get ()).minor_heap_size)
let () = Printf.printf "Minor heap size = %dB\n%!" (minor_heap_size*8)
let display_stat state =
let open Gc in
let s = stat () in
Printf.printf
"%s: minor_collections(%d), major_collections(%d), compactions(%d), heap_words(%d), live_words(%d), free_words(%d)\n%!"
state
s.minor_collections
s.major_collections
s.compactions
s.heap_words
s.live_words
s.free_words
(* Create a bigarray and fill it to ensure memory is actually faulted in )
let loop_inner count =
let rec aux count acc =
if count > 0
then
let a = Array1.create Char c_layout (81024*1024) in
let () = Array1.fill a '0' in
aux (count - 1) (a :: acc)
else
acc
in
aux count []
(* Flush the minor heap 'count' times
returns the newly created data
*)
let flush_minor count =
let rec flush_once remaining acc =
if remaining < 0
then
acc
else
flush_once (remaining-1) (remaining :: acc)
in
let rec flush remaining acc =
if remaining = 0
then
acc
else
flush (remaining - 1) (flush_once (minor_heap_size/2) acc)
in
flush count []
(* Eat about 2MiB minor * 1000 = 2G of heap space *)
let large_heap = flush_minor 1000
let () =
let gc = Gc.get () in
let () = gc.Gc.verbose <- 0x2C7 in
(* The following does not help *)
(*let () = gc.Gc.space_overhead <- 20 in
let () = gc.Gc.max_overhead <- 300 in *)
Gc.set gc
(* Repeatedly create 'inner_count' bigarrays,
then flush the minor heap a couple of times
to ensure the bigarrays are promoted to the
major heap.
The min1/min2/min3 control how many iterations
each set of bigarray will live. This is used to
ensure they get promoted to the major heap *)
let rec loop_outer inner_count min1 min2 _min3 =
let array_list = loop_inner inner_count in
let () = display_stat "before minor" in
let _ = flush_minor 2 in
let () = display_stat "after minor" in
loop_outer inner_count array_list min1 min2
An update is due here. After further investigation it appears we found an issue with our code, where we passed a pre-allocated buffer into caml_ba_alloc.
Unfortunately, when doing this, the size variable (https://github.com/ocaml/ocaml/blob/trunk/byterun/bigarray.c#L93) remains zero and the call to caml_alloc_custom receives that zero as extra size. Because of this, the GC is not sped up.
After fixing this issue, memory usage drops significantly (to about 8GiB per process for the same workload).
I think all that remains is to assess whether it's expected that the provided example eats a significant amount of memory and whether limiting the major GC to p=0.3 doesn't risk throttling it too much. We did find that caml_alloc_custom calls caml_check_urgent_gc, which might reduce the risk of letting memory allocation get out of control.
Original bug ID: 7750
Reporter: @stijn-devriendt
Assigned to: @damiendoligez
Status: resolved (set by @alainfrisch on 2018-11-09T13:27:26Z)
Resolution: fixed
Priority: high
Severity: major
Platform: Linux
OS: Ubuntu
OS Version: 14.04
Version: 4.05.0
Target version: 4.08.0+dev/beta1/beta2
Category: runtime system and C interface
Related to: #7158
Parent of: #7180
Monitored by: @nojb @gasche @hcarty @dbuenzli @alainfrisch
Bug description
In a long-running application we make heavy use of Bigarrays (easily pushing 250MiB/s per application). With 4 of these applications running on a server with 128GiB of RAM, the application stalled with Out_of_memory exceptions. Apparently each of them was eating 30GiB of memory with mostly stale Bigarrays.
After triggering compaction on one of the applications through RPC, memory use dropped to multiple MiBs (see additional info). Memory use as observed by the OS also fell steeply as all the stale bigarrays were disposed off. (This also proved that it wasn't a memory leak in other parts of our code).
In the output one can see that the ratio of minor/major collections is heavily skewed in favor of minor collections (36000/137). If many bigarrays live on the major heap, this means they will linger for a long time before getting cleaned up. Also, because the (default) space_overhead of 80% wasn't hit (1,3GiB/1,8GiB) major collection also wasn't sped up.
I created a small test-case that I believe sufficiently represents the issue at hand. The provided test-application easily takes 11GiB of memory, where only about 4-6 (?) would be expected. Settings GC parameters space_overhead or max_overhead does not help. I believe that it might help our normal application a bit since a more normal memory allocation pattern is expected there, but that remains to be tested.
The test application essentially does the following:
As for the trigger of the bad behavior, I believe the fixing of #7158 caused many fewer major garbage collections in our case (we also make use of Lwt_preemptive) - this issue started occuring after we upgraded the compiler from 4.03.0 to 4.05.0.
Secondly, I believe that the last question posed in #7180 may be related here:
In the test case, one can see that immediately after creating the bigarrays 'p' is capped to 0.3 and the remaining slices are rather small. In essence, it means that the garbage collector forgets about the excess and slows down significantly.
One potential fix would be to leave that excess in extra_heap_resources for the next iteration. Another improvement is probably to not only look at the ratio of free/total, but also have either a time-based or absolute-value based trigger (e.g. whenever promoted_words crosses a configurable boundary)
Steps to reproduce
ocamlbuild -lbigarray test.native && ./test.native
BEWARE: easily eats around 11-12GiB of memory
open Bigarray
let minor_heap_size = Gc.((get ()).minor_heap_size)
let () = Printf.printf "Minor heap size = %dB\n%!" (minor_heap_size*8)
let display_stat state =
let open Gc in
let s = stat () in
Printf.printf
"%s: minor_collections(%d), major_collections(%d), compactions(%d), heap_words(%d), live_words(%d), free_words(%d)\n%!"
state
s.minor_collections
s.major_collections
s.compactions
s.heap_words
s.live_words
s.free_words
(* Create a bigarray and fill it to ensure memory is actually faulted in )
let loop_inner count =
let rec aux count acc =
if count > 0
then
let a = Array1.create Char c_layout (81024*1024) in
let () = Array1.fill a '0' in
aux (count - 1) (a :: acc)
else
acc
in
aux count []
(* Flush the minor heap 'count' times
returns the newly created data
*)
let flush_minor count =
let rec flush_once remaining acc =
if remaining < 0
then
acc
else
flush_once (remaining-1) (remaining :: acc)
in
let rec flush remaining acc =
if remaining = 0
then
acc
else
flush (remaining - 1) (flush_once (minor_heap_size/2) acc)
in
flush count []
(* Eat about 2MiB minor * 1000 = 2G of heap space *)
let large_heap = flush_minor 1000
let () =
let gc = Gc.get () in
let () = gc.Gc.verbose <- 0x2C7 in
(* The following does not help *)
(*let () = gc.Gc.space_overhead <- 20 in
let () = gc.Gc.max_overhead <- 300 in *)
Gc.set gc
(* Repeatedly create 'inner_count' bigarrays,
then flush the minor heap a couple of times
to ensure the bigarrays are promoted to the
major heap.
The min1/min2/min3 control how many iterations
each set of bigarray will live. This is used to
ensure they get promoted to the major heap *)
let rec loop_outer inner_count min1 min2 _min3 =
let array_list = loop_inner inner_count in
let () = display_stat "before minor" in
let _ = flush_minor 2 in
let () = display_stat "after minor" in
loop_outer inner_count array_list min1 min2
let () = loop_outer 100 [] [] []
Additional information
BEFORE:
{
'minor_words': 9435447783.000000,
'promoted_words': 778137986.000000,
'major_words': 2416836642.000000,
'minor_collections': 36080,
'major_collections': 137,
'heap_words': 225660928, = 2258 = 1,8GiB
'heap_chunks': 27,
'live_words': 62402914, = 628 = 480MiB
'live_blocks': 13320280,
'free_words': 163239041, = 1,3GiB
'free_blocks': 3332791,
'largest_free': 14304445,
'fragments': 18973, = 1MiB
'compactions': 5,
'top_heap_words': 427078656,
'stack_size': 393}
AFTER:
{
'minor_words': 9435449457.000000,
'promoted_words': 778139391.000000,
'major_words': 2416838047.000000,
'minor_collections': 36082,
'major_collections': 139,
'heap_words': 7190016, = 56MiB
'heap_chunks': 10,
'live_words': 3778914, = 24MiB
'live_blocks': 959892,
'free_words': 3411102, = 27MiB
'free_blocks': 3,
'largest_free': 1352704,
'fragments': 0,
'compactions': 6,
'top_heap_words': 427078656,
'stack_size': 393}
The text was updated successfully, but these errors were encountered: