|Anonymous | Login | Signup for a new account||2017-09-23 00:22 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0007158||OCaml||otherlibs||public||2016-02-28 23:44||2017-02-16 14:12|
|Target Version||4.05.0 +dev/beta1/beta2/beta3/rc1||Fixed in Version||4.05.0 +dev/beta1/beta2/beta3/rc1|
|Summary||0007158: Event.sync forces a full major GC cycle every 5000 calls at most|
|Description||Event.sync uses condition variables, which are represented with custom blocks. |
The parameters to alloc_custom are used=1, max=Max_condition_number=5000 (raised in 2010 from the original 1000 set back in 1996).
The end result is that a full major GC cycle is completed after at most 5000 calls to Event.sync, which can represent a considerable GC load. This is triggered for example by Lwt_preemptive.
|Additional Information||References: |
|Tags||No tags attached.|
Some further considerations:
this is not the first time I run into such a thing (caml_alloc_custom params triggering too frequent GC): it also happened with Pcre regexps (at most 500(!) unreclaimed at any time) until recently.
I'd say there's an underlying API issue with caml_alloc_custom here: the used/max limits are application-dependent and not future-proof, so any library not using used=0,max=n is exposing itself to causing performance troubles to the users and/or being rendered comically outdated when the "acceptable" limits raise exponentially.
When the custom block does not represent scarce resources (like file descriptors or things with attached kernel structures), but only out-of-(OCaml-)heap memory, it would be preferable to have the GC adjust its speed based on the memory footprint relative to the current heap size.
caml_alloc_custom currently increases an internal value by used/max, and a full GC cycle is completed by the time it exceeds 1 or 0.5 * minor / major.
Would it be possible to have a new caml_alloc_custom-like function for "(extra-heap) memory only" structures which increased the internal value by something proportional to, say, resource_size / major or to piggy-back on the GC's speed control system to the same effect?
Many C libraries using custom block could use such a function, becoming both future-proof and usable across very different applications.
Having custom blocks that represent scarce resources is arguably a bad idea (or more precisely, leaving their disposal up to the GC), but there's indeed some value in having a safety net like the one offered by caml_alloc_custom at present. It would be nice to have a way to expose and make more visible all those "runtime parameters" so that different applications can manipulate them without patching all the dependencies, and as to make it easier to locate and increase them in the future.
This could be as simple as a registry of (mutable) "build-time constants" in the runtime, along with a trivial module in the stdlib, providing 3 operations: (1) register a value associated to a unique name (like the custom_ops identifier), (2) find a value and (3) list all values.
(3) would be useful to future developers to get a comprehensive list of build-time constants they might want to tweak for their specific applications or review as the resources become more abundant.
(BTW, Lwt_preemptive didn't need first-class sync communication, so I proposed to replace Event with a trivial mutex + CV combo: https://github.com/ocsigen/lwt/pull/219 [^] . I wonder how many actual users of Event's full capabilities there are).
edited on: 2016-03-27 16:11
I have located yet another instance of the hardcoded limit issue: sqlite3-ocaml's database and statement handles (both with used=1, max=100), resulting in the GC taking over >70% of the CPU time. I suspect systematic search for more caml_alloc_custom/caml_alloc_final uses would yield several results.
Edit: should I open a new PR for the caml_alloc_custom/caml_alloc_final API issue?
> should I open a new PR for the caml_alloc_custom/caml_alloc_final API issue?
Do as you feel is best. Damien's triaging of the issue indicates that he considers it a major issue, but that he probably won't be working on it before the 4.03 release -- I guess there is not enough time left to design and test more GC control mechanism, as those have a tendency to need a lot of testing on production workloads.
Indeed, it's way too late to fiddle with the GC and introduce a new C API. There's more than enough GC work going on with the ephemerons and the low-latency stuff :)
On further reflection, the Event.sync performance bug is an instance of the broader API issue, so the latter definitely deserves a PR of its own, I'm posting it in a minute.
After some thoughts and discussions, it appears that good C implementations of mutexes and condition variables do not consume kernel resources and we can allocate as many as will fit in memory. Hence we now call caml_alloc_custom with cost 0/1 instead of 1/N. This will be in release 4.05.
Commits: [trunk 84be1bc] and [4.05 16ade59]
|2016-02-28 23:44||mfp||New Issue|
|2016-02-29 13:52||mfp||Note Added: 0015410|
|2016-02-29 16:21||doligez||Severity||minor => major|
|2016-02-29 16:21||doligez||Status||new => confirmed|
|2016-02-29 16:21||doligez||Target Version||=> 4.03.1+dev|
|2016-03-27 16:11||mfp||Note Added: 0015621|
|2016-03-27 16:11||mfp||Note Edited: 0015621||View Revisions|
|2016-03-27 16:16||gasche||Note Added: 0015623|
|2016-03-27 16:49||mfp||Note Added: 0015625|
|2016-04-06 13:07||doligez||Relationship added||related to 0007198|
|2017-02-16 14:00||doligez||Target Version||4.03.1+dev => undecided|
|2017-02-16 14:12||xleroy||Note Added: 0017283|
|2017-02-16 14:12||xleroy||Status||confirmed => resolved|
|2017-02-16 14:12||xleroy||Resolution||open => fixed|
|2017-02-16 14:12||xleroy||Fixed in Version||=> 4.05.0 +dev/beta1/beta2/beta3/rc1|
|2017-02-16 14:12||xleroy||Target Version||undecided => 4.05.0 +dev/beta1/beta2/beta3/rc1|
|2017-02-23 16:42||doligez||Category||OCaml otherlibs => otherlibs|
|Copyright © 2000 - 2011 MantisBT Group|