Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007180OCamlruntime system and C interfacepublic2016-03-14 21:532017-11-14 11:38
Reporterbraibant 
Assigned Todoligez 
PrioritynormalSeverityminorReproducibilityalways
StatusassignedResolutionreopened 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0007180: Change of behavior in Gc.major_slice in 4.03
DescriptionI am trying to understand the changes in the Gc heuristics in the upcoming 4.03 w.r.t custom blocks. More specifically, I looked at how the value of caml_extra_heap_resources in major_gc.c contributes to make the Gc run major collection more often.
 
I am puzzled by the output of the attached test program. It's a loop that allocates k custom blocks with used = 1, max = 1000. The payload of the blocks is a pointer to (big) heap allocated region (to detect issues). The main loop calls `Gc.major_slice i` every so often (where i is a small int).

In 4.02.3, the program reaches a steady state: the custom blocks are deallocated as expected, Gc.major_slice returns a non-zero integer.

In todays's 4.03 (or 4.03.0+beta1), depending on the value of k and i, it's possible to get a steady state, or to eat all the available memory. If Gc.major_slice is called with i = 0 or a small int, its return value is 0. If Gc.major_slice is called with i = 10, the return value of Gc.major_slice is still 0, and the program quickly eats all available memory.

Returning 0 seems to violate the documentation for 4.02.3 "In all cases, the result is the computed slice size." (It would be nice to document the unit of the size here.)

The fact that the program eats all memory seems to violate this part of the manual (which has not changed between 4.02.3 and 4.03.0+beta1)
https://github.com/ocaml/ocaml/blob/4.03/manual/manual/cmds/intf-c.etex#L1804-L1811 [^]

The way I read it is that I should expect one full major collection every N allocations (when used/max = 1 / N). It seems that giving "bad" slice sizes to Gc.major_slice confuses the speed heuristic.

I tried to have a look at the Gc code to understand better what was going on, and I am confused by the code that was added by this commit https://github.com/ocaml/ocaml/commit/0225ca01e39289ce1801fb09cd011cdbfb542b8d [^]

https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L679-L680 [^]
https://github.com/ocaml/ocaml/blob/4.03/byterun/major_gc.c#L803 [^]

if p is capped at 0.3, why do we reset caml_extra_heap_resources to 0.0? Shouldn't it be decreased by the actual amount of work that was done to copy with the steady state assumption?
Steps To Reproduceopam switch 4.03.0+beta1
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state.
make failure # call Gc.major_slice 10, runs out of memory

opam switch 4.02.3
make clean
make test # build the test binary
make success # call Gc.major_slice 0, reach a steady state
make failure # call Gc.major_slice 10, reach a steady state
TagsNo tags attached.
Attached Filesgz file icon test-gc-major-slice.tar.gz [^] (994 bytes) 2016-03-14 21:53
? file icon test-gc-minor-slice-4-03.ml [^] (581 bytes) 2016-03-22 17:35 [Show Content]
gz file icon test-gc-minor-slice.tar.gz [^] (1,053 bytes) 2016-03-29 10:24

- Relationships
related to 0007100assigneddoligez Bigarray's caml_ba_alloc doesn't try GC if malloc fails 

-  Notes
(0015561)
braibant (reporter)
2016-03-21 11:38

Looking at the changes introduced by the aforementioned commit, the return type of
caml_major_collection_slice was changed to void, and Gc.major_slice has been explicitly changed to return 0. If this change is intentional, I would be happy to edit the relevant entry to mark it as potentially breaking in the changelog and update the documentation in gc.mli to explain the change.

If the other change (the fact that a valid program in 4.02.3 runs out of memory in 4.03) is considered a bug, I think I understand enough to propose a patch, but I would like a confirmation that this is indeed a bug before spending time on this.
(0015562)
doligez (administrator)
2016-03-21 15:22

Yes, the change to Gc.major_slice is intentional, and yes the running out of memory is very probably a bug. I'm looking into it.
(0015572)
doligez (administrator)
2016-03-22 15:43

Here's the bug: Gc.major_slice triggers a minor collection before the major slice. This was needed by the GC invariants up to 4.02.3 but not for 4.03.0, and I forgot to remove it.

Triggering the minor collection like that means (in this program) that the minor heap is never half-full, so we never get an automatic-mode major slice, and the slice size computation is all wrong.

When I fixed this bug, I got another problem: your program does so many major slices between minor collections that it would do more major cycles than minor collections. But we need an empty minor heap to start the major GC, so a lot of these slices just end up doing nothing.

I solved this second problem by requesting a minor GC as soon as the major cycle is finished, so the next cycle can start right away.

Both changes are in commit 84b86b5a163e397f2aaa1590e09b14b183219331 and they resolve the problem in your example program.

For the return value, I wanted to return something in the same unit as the input, but that would involve a backward calculation from the internal p parameter and I don't like that. Do you really need this output? I'm not sure the change has enough breaking potential to warrant a mark in the changelog.
(0015573)
gasche (developer)
2016-03-22 16:06

My approach to breaking-star marks in the changelog is that the entry containing them should be clear enough for users to be able to know whether they are affected just by reading it. If you follow this policy, the fatigue cost of an additional mark can remain modest. I would rather make more people read an irrelevant entry than have some careful people uninformed about code breakage despite their carefulness.
(0015576)
braibant (reporter)
2016-03-22 17:47

Thanks for the commit. I think it would still be nice to update the documentation to reflect the fact that the return value is now always 0. I am not sure to what extent this is a breaking change in itself.

I looked at the fix, and I am still confused by the output of the Gc log for the new attached test program. It seems to me that (for this program) extra_heap_resources is always 0 for 4.03, while it takes the expected value (1) in 4.02.3. Running the said test program works "fine" in 4.02.3, but runs out of memory in 4.03.

It seems that https://github.com/ocaml/ocaml/commit/8851e6b7d5850ff5d7dab6971f3a1bc26455b1f4 [^]
changed the code for caml_alloc_custom but the branch that allocates the custom block on the minor heap does not adjust the Gc speed (and hence, does not change the value of extra_heap_resources). Is this expected?
(0015578)
braibant (reporter)
2016-03-22 18:01

Indeed, setting
```
result = caml_alloc_custom(&test_ops, sizeof(value) * Max_young_wosize, 1, 1000);
```
to make sure that the said custom blocks are allocated on the major heap makes extra_heap_resources non zero in 4.03, and makes the whole test case not run out of memory in 4.03.
(0015597)
doligez (administrator)
2016-03-24 15:17
edited on: 2016-03-25 12:31

Thanks. This is a nasty bug in the commit you referenced. [edit: in fact the bug was already present before, but this commit made it much more likely to trigger]

About updating the documentation, I'll do that if you don't need the return value. Otherwise I will implement what the documentation says.

@gasche, I'll make the entry more explicit and add the mark.

(0015616)
doligez (administrator)
2016-03-25 14:47

I think I fixed it in commit c95ccf7e9b3df76fb159113c36312a4e9914f627 but I can't reproduce the bug with your second program. Can you check the fix (and if you can review the patch that would be great) ?
(0015617)
braibant (reporter)
2016-03-25 15:09

I will have a look either today or tuesday, and will come back to you. Just to be sure, when you say that you cannot reproduce with my test program, is it because it fails out of memory also in 4.02.3 ? (I have quite a bit of RAM, and the apparent steady state was quite high, so it might the reason for the issue.)
(0015640)
braibant (reporter)
2016-03-29 11:02

I reuploaded the test for the gc minor slice. On my system, the program seems to have a steady state memory consumption at around 16gb in 4.02.3, and a steady state at ~22gb in latest 4.03, but I think this increase is irrelevant.

To some extent, this is a really bad test case, because it's really not precise and changing a few constants would change the behavior quite a bit. On the other hand, it's quite hard to come up with an example with precise claims about how much memory should be used in 4.02.3, and how much it uses in 4.03. Now, the numbers of "extra_heap_resources" seem to be correct, and that's probably what we should look at.

I have made some comments on your commit on github
https://github.com/ocaml/ocaml/commit/c95ccf7e9b3df76fb159113c36312a4e9914f627 [^]

My main question regarding the current patch is whether or not one should address the issue of counting the extra_resources on the minor heap and triggering the gc automatically when it reaches some level.
(0015672)
doligez (administrator)
2016-04-04 16:31

>My main question regarding the current patch is whether or not one should address the issue of counting the extra_resources on the minor heap and triggering the gc automatically when it reaches some level.

We'll do that for 4.04.

For the test, I meant it didn't run out of memory on my machine with any OCaml version.
(0016251)
frisch (developer)
2016-08-30 11:05

This is marked as major with target version = 4.04. Does there remain something to be done for 4.04?
(0016260)
braibant (reporter)
2016-09-01 21:26

Nothing has been done to count the amount of extra_resources on the minor heap. I would argue to change the target version for succ(4.04)
(0017391)
doligez (administrator)
2017-02-21 14:22

This PR is still open because I got a report that jenga's memory behavior changed quite a lot because of this issue with extra_resources on the minor heap.
(0018661)
frisch (developer)
2017-11-14 11:38

> Nothing has been done to count the amount of extra_resources on the minor heap.

Attempt here:

https://github.com/ocaml/ocaml/pull/1476 [^]

- Issue History
Date Modified Username Field Change
2016-03-14 21:53 braibant New Issue
2016-03-14 21:53 braibant File Added: test-gc-major-slice.tar.gz
2016-03-21 11:38 braibant Note Added: 0015561
2016-03-21 15:22 doligez Note Added: 0015562
2016-03-21 15:23 doligez Status new => acknowledged
2016-03-21 15:23 doligez Target Version => 4.03.0+dev / +beta1
2016-03-21 15:23 doligez Assigned To => doligez
2016-03-21 15:23 doligez Status acknowledged => assigned
2016-03-22 15:43 doligez Note Added: 0015572
2016-03-22 15:43 doligez Status assigned => resolved
2016-03-22 15:43 doligez Resolution open => fixed
2016-03-22 15:43 doligez Fixed in Version => 4.03.0+dev / +beta1
2016-03-22 16:06 gasche Note Added: 0015573
2016-03-22 17:35 braibant File Added: test-gc-minor-slice-4-03.ml
2016-03-22 17:47 braibant Note Added: 0015576
2016-03-22 18:01 braibant Note Added: 0015578
2016-03-22 18:07 braibant Note Added: 0015579
2016-03-22 18:07 braibant Status resolved => feedback
2016-03-22 18:07 braibant Resolution fixed => reopened
2016-03-23 00:13 braibant Note Deleted: 0015579
2016-03-24 15:17 doligez Note Added: 0015597
2016-03-25 12:31 doligez Note Edited: 0015597 View Revisions
2016-03-25 14:47 doligez Note Added: 0015616
2016-03-25 15:09 braibant Note Added: 0015617
2016-03-25 15:09 braibant Status feedback => assigned
2016-03-29 10:24 braibant File Added: test-gc-minor-slice.tar.gz
2016-03-29 11:02 braibant Note Added: 0015640
2016-04-04 16:31 doligez Note Added: 0015672
2016-04-04 16:31 doligez Target Version 4.03.0+dev / +beta1 => 4.03.1+dev
2016-07-13 10:17 frisch Fixed in Version 4.03.0+dev / +beta1 =>
2016-07-13 10:17 frisch Target Version 4.03.1+dev => 4.04.0 +dev / +beta1 / +beta2
2016-08-30 11:05 frisch Note Added: 0016251
2016-09-01 21:26 braibant Note Added: 0016260
2016-09-02 17:39 doligez Target Version 4.04.0 +dev / +beta1 / +beta2 => 4.05.0 +dev/beta1/beta2/beta3/rc1
2017-02-21 14:22 doligez Note Added: 0017391
2017-02-22 15:12 xleroy Priority high => normal
2017-02-22 15:12 xleroy Severity major => minor
2017-02-22 15:12 xleroy Target Version 4.05.0 +dev/beta1/beta2/beta3/rc1 => 4.06.0 +dev/beta1/beta2/rc1
2017-02-23 16:43 doligez Category OCaml runtime system => runtime system
2017-03-03 17:45 doligez Category runtime system => runtime system and C interface
2017-10-05 18:07 doligez Relationship added related to 0007100
2017-10-05 18:08 doligez Target Version 4.06.0 +dev/beta1/beta2/rc1 =>
2017-11-14 11:38 frisch Note Added: 0018661


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker