Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007100OCamlotherlibspublic2015-12-18 11:242017-11-13 16:15
Reportertalex 
Assigned Todoligez 
PrioritynormalSeveritymajorReproducibilityalways
StatusassignedResolutionopen 
PlatformOSLinux and MirageOS Version
Product Version4.02.3 
Target VersionFixed in Version 
Summary0007100: Bigarray's caml_ba_alloc doesn't try GC if malloc fails
DescriptionIf there happens to be no memory available when allocating a bigarray because a GC is due then it raises Out_of_memory, even if memory would be available after GC.
Steps To ReproduceThis program crashes with "Fatal error: exception Out_of_memory" if run in an environment with limited memory (so that malloc may return null; tested with "ulimit -Sv 52000"):

open Bigarray

let () =
  let rec loop () =
    let x = Array1.create Char c_layout 102400 in
    ignore x;
    loop () in
  loop ()

However, it works with an explicit call to the GC:

let () =
  let rec loop () =
    let x =
      try Array1.create Char c_layout 102400
      with Out_of_memory ->
        print_endline "GC!";
        Gc.full_major ();
        Array1.create Char c_layout 102400 in
    ignore x;
    loop () in
  loop ()
Additional InformationMirageOS uses bigarrays extensively (via Cstruct), and this causes MirageOS unikernels to crash from time to time.
TagsNo tags attached.
Attached Files

- Relationships
related to 0007198assigneddoligez caml_alloc_custom/caml_alloc_final API easily leads to GC performance issues 
related to 0007180assigneddoligez Change of behavior in Gc.major_slice in 4.03 
related to 0007158resolved Event.sync forces a full major GC cycle every 5000 calls at most 
has duplicate 0007670resolvedfrisch Strange Out of memory when using Bigarray 
related to 0007671new Another Strange Out of memory using Bigarray 

-  Notes
(0015266)
doligez (administrator)
2016-01-22 17:25

The problem with "triggering a GC" is that you can easily get into a state where every allocation triggers a GC and the program gets bogged down to the speed of a snail, which is worse than crashing.
(0015268)
talex (reporter)
2016-01-23 18:28

Isn't that just how GC works? You run out of memory and then run a GC. Not running a GC because it *might* not free memory makes no sense to me (crashing afterwards if it fails might be OK though).

If OCaml doesn't run the GC when it runs out the memory, then applications have to instead. e.g. we currently have:

https://github.com/talex5/qubes-mirage-firewall/blob/26adeee1da5aa6f7d468f0ada7341b1756575a4c/memory_pressure.ml#L39 [^]

Each time we get a network packet, we check the memory situation. If less than 10% is free, we Gc.full_major. Compared to having OCaml do it, this means:

1. We become slow at close to 90% used, rather than close to 100%.
2. Sometimes we still crash (more margin => less chance of crash, but more RAM wasted).
3. Every input event (incoming packet, user commands, etc) needs to run the check.
(0016616)
rixed (reporter)
2016-11-30 21:27

I think the problem with MirageOs usage of bigarrays is more the value of CAML_BA_MAX_MEMORY (1Gb) that's very far from the average memory one wants to spend on bigarrays in a microkernel (which often times would run with 256Mb of RAM or even less).
Especially given the terrible page allocator of minios, which can allocate only power of two number of pages for large allocations.
Therefore, MirageOs is going to malloc, say, 32KiB for a 20KiB cstruct, and says the GC that "unless I have mallocated 50000 such blocks there is no need to run garbage collection".
(0017277)
xleroy (administrator)
2017-02-16 11:03

I think we or the Mirage people need to do something to address this issue, it's just unclear to me what needs to be done. @doligez could you please restart the discussion?
(0018488)
doligez (administrator)
2017-10-05 17:55

@talex, OCaml's GC doesn't work like that because it's incremental: it tries to do enough work, as the program is running, to make sure it won't ever run out of memory. When the program does run out of memory, we assume it means the program is allocating faster than it is dropping objects, which means its memory needs are increasing, so we increase the heap size.

For the CAML_BA_MAX_MEMORY problem, I have a posssible solution: instead of using a constant, use a proportion of the heap size, set by the user or by the program. For example, if you set it at 100%, it means you are allocating half your memory to the heap, and the other half to bigarrays (along with other external data, if you use other libraries with custom objects).

Would that be a workable solution?
(0018495)
talex (reporter)
2017-10-06 16:47

doligez: yes, I was confused when I wrote this. I was imagining that OCaml's GC worked like Java's.

rixed pointed out in https://github.com/mirage/io-page/issues/38 [^] that Mirage's io-page does not instruct the GC of how much memory could be free by a GC, so that could be a big part of the problem. We should probably fix that and reopen this issue if that doesn't fix it.
(0018654)
frisch (developer)
2017-11-13 15:10

I think there is a real problem here: caml_alloc_custom does not have any specific logic to trigger a minor GC when too many "external" memory is used by custom blocks in the minor heap. One can thus easily a lot of memory with e.g. bigarrays -- and reach an OOM -- before the GC even triggers. This does not even depend on the value for CAML_BA_MAX_MEMORY.

It seems one would need some logic to keep track of the "size" of external memory used by custom blocks in the minor heap (i.e. the mem/max arguments to caml_alloc_custom) and force a minor GC when a given threshold is reached.
(0018655)
frisch (developer)
2017-11-13 15:12

Alternatively, one could put a limit to the "external size" of custom blocks allocated in the minor heap. For instance, it makes sense to allocate "small float bigarrays" in the minor heap, but for large ones, the benefit is less clear.
(0018656)
frisch (developer)
2017-11-13 15:52

https://github.com/ocaml/ocaml/pull/1476 [^]

- Issue History
Date Modified Username Field Change
2015-12-18 11:24 talex New Issue
2016-01-22 17:25 doligez Note Added: 0015266
2016-01-22 17:25 doligez Severity crash => major
2016-01-22 17:25 doligez Target Version => 4.03.1+dev
2016-01-23 18:28 talex Note Added: 0015268
2016-11-30 21:27 rixed Note Added: 0016616
2017-02-16 11:03 xleroy Note Added: 0017277
2017-02-16 11:03 xleroy Status new => acknowledged
2017-02-16 11:03 xleroy Target Version 4.03.1+dev => 4.06.0 +dev/beta1/beta2/rc1
2017-02-23 16:42 doligez Category OCaml otherlibs => otherlibs
2017-03-10 11:22 shinwell Assigned To => doligez
2017-03-10 11:22 shinwell Status acknowledged => assigned
2017-10-05 17:55 doligez Note Added: 0018488
2017-10-05 17:55 doligez Target Version 4.06.0 +dev/beta1/beta2/rc1 =>
2017-10-05 18:07 doligez Relationship added related to 0007198
2017-10-05 18:07 doligez Relationship added related to 0007180
2017-10-05 18:09 doligez Relationship added related to 0007158
2017-10-06 16:47 talex Note Added: 0018495
2017-11-13 14:52 yallop Relationship added has duplicate 0007670
2017-11-13 15:10 frisch Note Added: 0018654
2017-11-13 15:12 frisch Note Added: 0018655
2017-11-13 15:52 frisch Note Added: 0018656
2017-11-13 16:15 frisch Relationship added related to 0007671


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker