Change of behavior in Gc.major_slice in 4.03 #7180

vicuna · 2016-03-14T20:53:28Z

Original bug ID: 7180
Reporter: braibant
Assigned to: @damiendoligez
Status: feedback (set by @alainfrisch on 2018-11-09T13:29:31Z)
Resolution: reopened
Priority: normal
Severity: minor
Category: runtime system and C interface
Related to: #7100
Child of: #7750
Monitored by: bgrundmann @gasche @diml bobot @hcarty @yakobowski @alainfrisch

Bug description

I am trying to understand the changes in the Gc heuristics in the upcoming 4.03 w.r.t custom blocks. More specifically, I looked at how the value of caml_extra_heap_resources in major_gc.c contributes to make the Gc run major collection more often.

I am puzzled by the output of the attached test program. It's a loop that allocates k custom blocks with used = 1, max = 1000. The payload of the blocks is a pointer to (big) heap allocated region (to detect issues). The main loop calls Gc.major_slice i every so often (where i is a small int).

In 4.02.3, the program reaches a steady state: the custom blocks are deallocated as expected, Gc.major_slice returns a non-zero integer.

In todays's 4.03 (or 4.03.0+beta1), depending on the value of k and i, it's possible to get a steady state, or to eat all the available memory. If Gc.major_slice is called with i = 0 or a small int, its return value is 0. If Gc.major_slice is called with i = 10, the return value of Gc.major_slice is still 0, and the program quickly eats all available memory.

Returning 0 seems to violate the documentation for 4.02.3 "In all cases, the result is the computed slice size." (It would be nice to document the unit of the size here.)

The fact that the program eats all memory seems to violate this part of the manual (which has not changed between 4.02.3 and 4.03.0+beta1)
https://github.com/ocaml/ocaml/blob/4.03/manual/manual/cmds/intf-c.etex#L1804-L1811

The way I read it is that I should expect one full major collection every N allocations (when used/max = 1 / N). It seems that giving "bad" slice sizes to Gc.major_slice confuses the speed heuristic.

I tried to have a look at the Gc code to understand better what was going on, and I am confused by the code that was added by this commit 0225ca0

https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L679-L680
https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L803

if p is capped at 0.3, why do we reset caml_extra_heap_resources to 0.0? Shouldn't it be decreased by the actual amount of work that was done to copy with the steady state assumption?

Steps to reproduce

opam switch 4.03.0+beta1
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state.
make failure # call Gc.major_slice 10, runs out of memory

opam switch 4.02.3
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state
make failure # call Gc.major_slice 10, reach a steady state

File attachments

The text was updated successfully, but these errors were encountered:

vicuna · 2016-03-21T10:38:39Z

Comment author: braibant

Looking at the changes introduced by the aforementioned commit, the return type of
caml_major_collection_slice was changed to void, and Gc.major_slice has been explicitly changed to return 0. If this change is intentional, I would be happy to edit the relevant entry to mark it as potentially breaking in the changelog and update the documentation in gc.mli to explain the change.

If the other change (the fact that a valid program in 4.02.3 runs out of memory in 4.03) is considered a bug, I think I understand enough to propose a patch, but I would like a confirmation that this is indeed a bug before spending time on this.

vicuna · 2016-03-21T14:22:54Z

Comment author: @damiendoligez

Yes, the change to Gc.major_slice is intentional, and yes the running out of memory is very probably a bug. I'm looking into it.

vicuna · 2016-03-22T14:43:19Z

Comment author: @damiendoligez

Here's the bug: Gc.major_slice triggers a minor collection before the major slice. This was needed by the GC invariants up to 4.02.3 but not for 4.03.0, and I forgot to remove it.

Triggering the minor collection like that means (in this program) that the minor heap is never half-full, so we never get an automatic-mode major slice, and the slice size computation is all wrong.

When I fixed this bug, I got another problem: your program does so many major slices between minor collections that it would do more major cycles than minor collections. But we need an empty minor heap to start the major GC, so a lot of these slices just end up doing nothing.

I solved this second problem by requesting a minor GC as soon as the major cycle is finished, so the next cycle can start right away.

Both changes are in commit 84b86b5 and they resolve the problem in your example program.

For the return value, I wanted to return something in the same unit as the input, but that would involve a backward calculation from the internal p parameter and I don't like that. Do you really need this output? I'm not sure the change has enough breaking potential to warrant a mark in the changelog.

vicuna · 2016-03-22T15:06:44Z

Comment author: @gasche

My approach to breaking-star marks in the changelog is that the entry containing them should be clear enough for users to be able to know whether they are affected just by reading it. If you follow this policy, the fatigue cost of an additional mark can remain modest. I would rather make more people read an irrelevant entry than have some careful people uninformed about code breakage despite their carefulness.

vicuna · 2016-03-22T16:47:55Z

Comment author: braibant

Thanks for the commit. I think it would still be nice to update the documentation to reflect the fact that the return value is now always 0. I am not sure to what extent this is a breaking change in itself.

I looked at the fix, and I am still confused by the output of the Gc log for the new attached test program. It seems to me that (for this program) extra_heap_resources is always 0 for 4.03, while it takes the expected value (1) in 4.02.3. Running the said test program works "fine" in 4.02.3, but runs out of memory in 4.03.

It seems that 8851e6b
changed the code for caml_alloc_custom but the branch that allocates the custom block on the minor heap does not adjust the Gc speed (and hence, does not change the value of extra_heap_resources). Is this expected?

vicuna · 2016-03-22T17:01:00Z

Comment author: braibant

Indeed, setting

result = caml_alloc_custom(&test_ops, sizeof(value) * Max_young_wosize, 1, 1000);

to make sure that the said custom blocks are allocated on the major heap makes extra_heap_resources non zero in 4.03, and makes the whole test case not run out of memory in 4.03.

vicuna · 2016-03-24T14:17:58Z

Comment author: @damiendoligez

Thanks. This is a nasty bug in the commit you referenced. [edit: in fact the bug was already present before, but this commit made it much more likely to trigger]

About updating the documentation, I'll do that if you don't need the return value. Otherwise I will implement what the documentation says.

@gasche, I'll make the entry more explicit and add the mark.

vicuna · 2016-03-25T13:47:40Z

Comment author: @damiendoligez

I think I fixed it in commit c95ccf7 but I can't reproduce the bug with your second program. Can you check the fix (and if you can review the patch that would be great) ?

vicuna · 2016-03-25T14:09:17Z

Comment author: braibant

I will have a look either today or tuesday, and will come back to you. Just to be sure, when you say that you cannot reproduce with my test program, is it because it fails out of memory also in 4.02.3 ? (I have quite a bit of RAM, and the apparent steady state was quite high, so it might the reason for the issue.)

vicuna · 2016-03-29T09:02:11Z

Comment author: braibant

I reuploaded the test for the gc minor slice. On my system, the program seems to have a steady state memory consumption at around 16gb in 4.02.3, and a steady state at ~22gb in latest 4.03, but I think this increase is irrelevant.

To some extent, this is a really bad test case, because it's really not precise and changing a few constants would change the behavior quite a bit. On the other hand, it's quite hard to come up with an example with precise claims about how much memory should be used in 4.02.3, and how much it uses in 4.03. Now, the numbers of "extra_heap_resources" seem to be correct, and that's probably what we should look at.

I have made some comments on your commit on github
c95ccf7

My main question regarding the current patch is whether or not one should address the issue of counting the extra_resources on the minor heap and triggering the gc automatically when it reaches some level.

vicuna · 2016-04-04T14:31:33Z

Comment author: @damiendoligez

My main question regarding the current patch is whether or not one should address the issue of counting the extra_resources on the minor heap and triggering the gc automatically when it reaches some level.

We'll do that for 4.04.

For the test, I meant it didn't run out of memory on my machine with any OCaml version.

vicuna · 2016-08-30T09:05:24Z

Comment author: @alainfrisch

This is marked as major with target version = 4.04. Does there remain something to be done for 4.04?

vicuna · 2016-09-01T19:26:47Z

Comment author: braibant

Nothing has been done to count the amount of extra_resources on the minor heap. I would argue to change the target version for succ(4.04)

vicuna · 2017-02-21T13:22:07Z

Comment author: @damiendoligez

This PR is still open because I got a report that jenga's memory behavior changed quite a lot because of this issue with extra_resources on the minor heap.

vicuna · 2017-11-14T10:38:56Z

Comment author: @alainfrisch

Nothing has been done to count the amount of extra_resources on the minor heap.

Attempt here:

#1476

vicuna · 2018-11-09T13:29:22Z

Comment author: @alainfrisch

Would be good to know if #1476 and #1738 improved the behavior for the reported case. Thomas?

mshinwell · 2020-04-17T12:40:24Z

No reply for 18 months, so closing.

gasche · 2020-04-17T13:47:05Z

@braibant it would still be interesting to know whether the new custom-allocated-resources tracking added in 4.08 ( #1738 ) solves the issues you were observing.

vicuna added the stdlib label Mar 14, 2019

vicuna assigned damiendoligez Mar 14, 2019

This was referenced Mar 14, 2019

GC does not reclaim bigarrays fast enough causing Out_of_memory #7750

Closed

Bigarray's caml_ba_alloc doesn't try GC if malloc fails #7100

Closed

vicuna added the bug label Mar 20, 2019

mshinwell closed this as completed Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change of behavior in Gc.major_slice in 4.03 #7180

Change of behavior in Gc.major_slice in 4.03 #7180

vicuna commented Mar 14, 2016

vicuna commented Mar 21, 2016

vicuna commented Mar 21, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 24, 2016

vicuna commented Mar 25, 2016

vicuna commented Mar 25, 2016

vicuna commented Mar 29, 2016

vicuna commented Apr 4, 2016

vicuna commented Aug 30, 2016

vicuna commented Sep 1, 2016

vicuna commented Feb 21, 2017

vicuna commented Nov 14, 2017

vicuna commented Nov 9, 2018

mshinwell commented Apr 17, 2020

gasche commented Apr 17, 2020

Change of behavior in Gc.major_slice in 4.03 #7180

Change of behavior in Gc.major_slice in 4.03 #7180

Comments

vicuna commented Mar 14, 2016

Bug description

Steps to reproduce

File attachments

vicuna commented Mar 21, 2016

vicuna commented Mar 21, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 22, 2016

vicuna commented Mar 24, 2016

vicuna commented Mar 25, 2016

vicuna commented Mar 25, 2016

vicuna commented Mar 29, 2016

vicuna commented Apr 4, 2016

vicuna commented Aug 30, 2016

vicuna commented Sep 1, 2016

vicuna commented Feb 21, 2017

vicuna commented Nov 14, 2017

vicuna commented Nov 9, 2018

mshinwell commented Apr 17, 2020

gasche commented Apr 17, 2020