New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change of behavior in Gc.major_slice in 4.03 #7180
Comments
Comment author: braibant Looking at the changes introduced by the aforementioned commit, the return type of If the other change (the fact that a valid program in 4.02.3 runs out of memory in 4.03) is considered a bug, I think I understand enough to propose a patch, but I would like a confirmation that this is indeed a bug before spending time on this. |
Comment author: @damiendoligez Yes, the change to Gc.major_slice is intentional, and yes the running out of memory is very probably a bug. I'm looking into it. |
Comment author: @damiendoligez Here's the bug: Gc.major_slice triggers a minor collection before the major slice. This was needed by the GC invariants up to 4.02.3 but not for 4.03.0, and I forgot to remove it. Triggering the minor collection like that means (in this program) that the minor heap is never half-full, so we never get an automatic-mode major slice, and the slice size computation is all wrong. When I fixed this bug, I got another problem: your program does so many major slices between minor collections that it would do more major cycles than minor collections. But we need an empty minor heap to start the major GC, so a lot of these slices just end up doing nothing. I solved this second problem by requesting a minor GC as soon as the major cycle is finished, so the next cycle can start right away. Both changes are in commit 84b86b5 and they resolve the problem in your example program. For the return value, I wanted to return something in the same unit as the input, but that would involve a backward calculation from the internal p parameter and I don't like that. Do you really need this output? I'm not sure the change has enough breaking potential to warrant a mark in the changelog. |
Comment author: @gasche My approach to breaking-star marks in the changelog is that the entry containing them should be clear enough for users to be able to know whether they are affected just by reading it. If you follow this policy, the fatigue cost of an additional mark can remain modest. I would rather make more people read an irrelevant entry than have some careful people uninformed about code breakage despite their carefulness. |
Comment author: braibant Thanks for the commit. I think it would still be nice to update the documentation to reflect the fact that the return value is now always 0. I am not sure to what extent this is a breaking change in itself. I looked at the fix, and I am still confused by the output of the Gc log for the new attached test program. It seems to me that (for this program) extra_heap_resources is always 0 for 4.03, while it takes the expected value (1) in 4.02.3. Running the said test program works "fine" in 4.02.3, but runs out of memory in 4.03. It seems that 8851e6b |
Comment author: braibant Indeed, setting
to make sure that the said custom blocks are allocated on the major heap makes extra_heap_resources non zero in 4.03, and makes the whole test case not run out of memory in 4.03. |
Comment author: @damiendoligez Thanks. This is a nasty bug in the commit you referenced. [edit: in fact the bug was already present before, but this commit made it much more likely to trigger] About updating the documentation, I'll do that if you don't need the return value. Otherwise I will implement what the documentation says. @gasche, I'll make the entry more explicit and add the mark. |
Comment author: @damiendoligez I think I fixed it in commit c95ccf7 but I can't reproduce the bug with your second program. Can you check the fix (and if you can review the patch that would be great) ? |
Comment author: braibant I will have a look either today or tuesday, and will come back to you. Just to be sure, when you say that you cannot reproduce with my test program, is it because it fails out of memory also in 4.02.3 ? (I have quite a bit of RAM, and the apparent steady state was quite high, so it might the reason for the issue.) |
Comment author: braibant I reuploaded the test for the gc minor slice. On my system, the program seems to have a steady state memory consumption at around 16gb in 4.02.3, and a steady state at ~22gb in latest 4.03, but I think this increase is irrelevant. To some extent, this is a really bad test case, because it's really not precise and changing a few constants would change the behavior quite a bit. On the other hand, it's quite hard to come up with an example with precise claims about how much memory should be used in 4.02.3, and how much it uses in 4.03. Now, the numbers of "extra_heap_resources" seem to be correct, and that's probably what we should look at. I have made some comments on your commit on github My main question regarding the current patch is whether or not one should address the issue of counting the extra_resources on the minor heap and triggering the gc automatically when it reaches some level. |
Comment author: @damiendoligez
We'll do that for 4.04. For the test, I meant it didn't run out of memory on my machine with any OCaml version. |
Comment author: @alainfrisch This is marked as major with target version = 4.04. Does there remain something to be done for 4.04? |
Comment author: braibant Nothing has been done to count the amount of extra_resources on the minor heap. I would argue to change the target version for succ(4.04) |
Comment author: @damiendoligez This PR is still open because I got a report that jenga's memory behavior changed quite a lot because of this issue with extra_resources on the minor heap. |
Comment author: @alainfrisch
Attempt here: |
Comment author: @alainfrisch Would be good to know if #1476 and #1738 improved the behavior for the reported case. Thomas? |
No reply for 18 months, so closing. |
Original bug ID: 7180
Reporter: braibant
Assigned to: @damiendoligez
Status: feedback (set by @alainfrisch on 2018-11-09T13:29:31Z)
Resolution: reopened
Priority: normal
Severity: minor
Category: runtime system and C interface
Related to: #7100
Child of: #7750
Monitored by: bgrundmann @gasche @diml bobot @hcarty @yakobowski @alainfrisch
Bug description
I am trying to understand the changes in the Gc heuristics in the upcoming 4.03 w.r.t custom blocks. More specifically, I looked at how the value of caml_extra_heap_resources in major_gc.c contributes to make the Gc run major collection more often.
I am puzzled by the output of the attached test program. It's a loop that allocates k custom blocks with used = 1, max = 1000. The payload of the blocks is a pointer to (big) heap allocated region (to detect issues). The main loop calls
Gc.major_slice i
every so often (where i is a small int).In 4.02.3, the program reaches a steady state: the custom blocks are deallocated as expected, Gc.major_slice returns a non-zero integer.
In todays's 4.03 (or 4.03.0+beta1), depending on the value of k and i, it's possible to get a steady state, or to eat all the available memory. If Gc.major_slice is called with i = 0 or a small int, its return value is 0. If Gc.major_slice is called with i = 10, the return value of Gc.major_slice is still 0, and the program quickly eats all available memory.
Returning 0 seems to violate the documentation for 4.02.3 "In all cases, the result is the computed slice size." (It would be nice to document the unit of the size here.)
The fact that the program eats all memory seems to violate this part of the manual (which has not changed between 4.02.3 and 4.03.0+beta1)
https://github.com/ocaml/ocaml/blob/4.03/manual/manual/cmds/intf-c.etex#L1804-L1811
The way I read it is that I should expect one full major collection every N allocations (when used/max = 1 / N). It seems that giving "bad" slice sizes to Gc.major_slice confuses the speed heuristic.
I tried to have a look at the Gc code to understand better what was going on, and I am confused by the code that was added by this commit 0225ca0
https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L679-L680
https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L803
if p is capped at 0.3, why do we reset caml_extra_heap_resources to 0.0? Shouldn't it be decreased by the actual amount of work that was done to copy with the steady state assumption?
Steps to reproduce
opam switch 4.03.0+beta1
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state.
make failure # call Gc.major_slice 10, runs out of memory
opam switch 4.02.3
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state
make failure # call Gc.major_slice 10, reach a steady state
File attachments
The text was updated successfully, but these errors were encountered: