Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change of behavior in Gc.major_slice in 4.03 #7180

Closed
vicuna opened this issue Mar 14, 2016 · 18 comments
Closed

Change of behavior in Gc.major_slice in 4.03 #7180

vicuna opened this issue Mar 14, 2016 · 18 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Mar 14, 2016

Original bug ID: 7180
Reporter: braibant
Assigned to: @damiendoligez
Status: feedback (set by @alainfrisch on 2018-11-09T13:29:31Z)
Resolution: reopened
Priority: normal
Severity: minor
Category: runtime system and C interface
Related to: #7100
Child of: #7750
Monitored by: bgrundmann @gasche @diml bobot @hcarty @yakobowski @alainfrisch

Bug description

I am trying to understand the changes in the Gc heuristics in the upcoming 4.03 w.r.t custom blocks. More specifically, I looked at how the value of caml_extra_heap_resources in major_gc.c contributes to make the Gc run major collection more often.

I am puzzled by the output of the attached test program. It's a loop that allocates k custom blocks with used = 1, max = 1000. The payload of the blocks is a pointer to (big) heap allocated region (to detect issues). The main loop calls Gc.major_slice i every so often (where i is a small int).

In 4.02.3, the program reaches a steady state: the custom blocks are deallocated as expected, Gc.major_slice returns a non-zero integer.

In todays's 4.03 (or 4.03.0+beta1), depending on the value of k and i, it's possible to get a steady state, or to eat all the available memory. If Gc.major_slice is called with i = 0 or a small int, its return value is 0. If Gc.major_slice is called with i = 10, the return value of Gc.major_slice is still 0, and the program quickly eats all available memory.

Returning 0 seems to violate the documentation for 4.02.3 "In all cases, the result is the computed slice size." (It would be nice to document the unit of the size here.)

The fact that the program eats all memory seems to violate this part of the manual (which has not changed between 4.02.3 and 4.03.0+beta1)
https://github.com/ocaml/ocaml/blob/4.03/manual/manual/cmds/intf-c.etex#L1804-L1811

The way I read it is that I should expect one full major collection every N allocations (when used/max = 1 / N). It seems that giving "bad" slice sizes to Gc.major_slice confuses the speed heuristic.

I tried to have a look at the Gc code to understand better what was going on, and I am confused by the code that was added by this commit 0225ca0

https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L679-L680
https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L803

if p is capped at 0.3, why do we reset caml_extra_heap_resources to 0.0? Shouldn't it be decreased by the actual amount of work that was done to copy with the steady state assumption?

Steps to reproduce

opam switch 4.03.0+beta1
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state.
make failure # call Gc.major_slice 10, runs out of memory

opam switch 4.02.3
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state
make failure # call Gc.major_slice 10, reach a steady state

File attachments

@vicuna
Copy link
Author

vicuna commented Mar 21, 2016

Comment author: braibant

Looking at the changes introduced by the aforementioned commit, the return type of
caml_major_collection_slice was changed to void, and Gc.major_slice has been explicitly changed to return 0. If this change is intentional, I would be happy to edit the relevant entry to mark it as potentially breaking in the changelog and update the documentation in gc.mli to explain the change.

If the other change (the fact that a valid program in 4.02.3 runs out of memory in 4.03) is considered a bug, I think I understand enough to propose a patch, but I would like a confirmation that this is indeed a bug before spending time on this.

@vicuna
Copy link
Author

vicuna commented Mar 21, 2016

Comment author: @damiendoligez

Yes, the change to Gc.major_slice is intentional, and yes the running out of memory is very probably a bug. I'm looking into it.

@vicuna
Copy link
Author

vicuna commented Mar 22, 2016

Comment author: @damiendoligez

Here's the bug: Gc.major_slice triggers a minor collection before the major slice. This was needed by the GC invariants up to 4.02.3 but not for 4.03.0, and I forgot to remove it.

Triggering the minor collection like that means (in this program) that the minor heap is never half-full, so we never get an automatic-mode major slice, and the slice size computation is all wrong.

When I fixed this bug, I got another problem: your program does so many major slices between minor collections that it would do more major cycles than minor collections. But we need an empty minor heap to start the major GC, so a lot of these slices just end up doing nothing.

I solved this second problem by requesting a minor GC as soon as the major cycle is finished, so the next cycle can start right away.

Both changes are in commit 84b86b5 and they resolve the problem in your example program.

For the return value, I wanted to return something in the same unit as the input, but that would involve a backward calculation from the internal p parameter and I don't like that. Do you really need this output? I'm not sure the change has enough breaking potential to warrant a mark in the changelog.

@vicuna
Copy link
Author

vicuna commented Mar 22, 2016

Comment author: @gasche

My approach to breaking-star marks in the changelog is that the entry containing them should be clear enough for users to be able to know whether they are affected just by reading it. If you follow this policy, the fatigue cost of an additional mark can remain modest. I would rather make more people read an irrelevant entry than have some careful people uninformed about code breakage despite their carefulness.

@vicuna
Copy link
Author

vicuna commented Mar 22, 2016

Comment author: braibant

Thanks for the commit. I think it would still be nice to update the documentation to reflect the fact that the return value is now always 0. I am not sure to what extent this is a breaking change in itself.

I looked at the fix, and I am still confused by the output of the Gc log for the new attached test program. It seems to me that (for this program) extra_heap_resources is always 0 for 4.03, while it takes the expected value (1) in 4.02.3. Running the said test program works "fine" in 4.02.3, but runs out of memory in 4.03.

It seems that 8851e6b
changed the code for caml_alloc_custom but the branch that allocates the custom block on the minor heap does not adjust the Gc speed (and hence, does not change the value of extra_heap_resources). Is this expected?

@vicuna
Copy link
Author

vicuna commented Mar 22, 2016

Comment author: braibant

Indeed, setting

result = caml_alloc_custom(&test_ops, sizeof(value) * Max_young_wosize, 1, 1000);

to make sure that the said custom blocks are allocated on the major heap makes extra_heap_resources non zero in 4.03, and makes the whole test case not run out of memory in 4.03.

@vicuna
Copy link
Author

vicuna commented Mar 24, 2016

Comment author: @damiendoligez

Thanks. This is a nasty bug in the commit you referenced. [edit: in fact the bug was already present before, but this commit made it much more likely to trigger]

About updating the documentation, I'll do that if you don't need the return value. Otherwise I will implement what the documentation says.

@gasche, I'll make the entry more explicit and add the mark.

@vicuna
Copy link
Author

vicuna commented Mar 25, 2016

Comment author: @damiendoligez

I think I fixed it in commit c95ccf7 but I can't reproduce the bug with your second program. Can you check the fix (and if you can review the patch that would be great) ?

@vicuna
Copy link
Author

vicuna commented Mar 25, 2016

Comment author: braibant

I will have a look either today or tuesday, and will come back to you. Just to be sure, when you say that you cannot reproduce with my test program, is it because it fails out of memory also in 4.02.3 ? (I have quite a bit of RAM, and the apparent steady state was quite high, so it might the reason for the issue.)

@vicuna
Copy link
Author

vicuna commented Mar 29, 2016

Comment author: braibant

I reuploaded the test for the gc minor slice. On my system, the program seems to have a steady state memory consumption at around 16gb in 4.02.3, and a steady state at ~22gb in latest 4.03, but I think this increase is irrelevant.

To some extent, this is a really bad test case, because it's really not precise and changing a few constants would change the behavior quite a bit. On the other hand, it's quite hard to come up with an example with precise claims about how much memory should be used in 4.02.3, and how much it uses in 4.03. Now, the numbers of "extra_heap_resources" seem to be correct, and that's probably what we should look at.

I have made some comments on your commit on github
c95ccf7

My main question regarding the current patch is whether or not one should address the issue of counting the extra_resources on the minor heap and triggering the gc automatically when it reaches some level.

@vicuna
Copy link
Author

vicuna commented Apr 4, 2016

Comment author: @damiendoligez

My main question regarding the current patch is whether or not one should address the issue of counting the extra_resources on the minor heap and triggering the gc automatically when it reaches some level.

We'll do that for 4.04.

For the test, I meant it didn't run out of memory on my machine with any OCaml version.

@vicuna
Copy link
Author

vicuna commented Aug 30, 2016

Comment author: @alainfrisch

This is marked as major with target version = 4.04. Does there remain something to be done for 4.04?

@vicuna
Copy link
Author

vicuna commented Sep 1, 2016

Comment author: braibant

Nothing has been done to count the amount of extra_resources on the minor heap. I would argue to change the target version for succ(4.04)

@vicuna
Copy link
Author

vicuna commented Feb 21, 2017

Comment author: @damiendoligez

This PR is still open because I got a report that jenga's memory behavior changed quite a lot because of this issue with extra_resources on the minor heap.

@vicuna
Copy link
Author

vicuna commented Nov 14, 2017

Comment author: @alainfrisch

Nothing has been done to count the amount of extra_resources on the minor heap.

Attempt here:

#1476

@vicuna
Copy link
Author

vicuna commented Nov 9, 2018

Comment author: @alainfrisch

Would be good to know if #1476 and #1738 improved the behavior for the reported case. Thomas?

@mshinwell
Copy link
Contributor

No reply for 18 months, so closing.

@gasche
Copy link
Member

gasche commented Apr 17, 2020

@braibant it would still be interesting to know whether the new custom-allocated-resources tracking added in 4.08 ( #1738 ) solves the issues you were observing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants