Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event.sync forces a full major GC cycle every 5000 calls at most #7158

Closed
vicuna opened this issue Feb 28, 2016 · 6 comments
Closed

Event.sync forces a full major GC cycle every 5000 calls at most #7158

vicuna opened this issue Feb 28, 2016 · 6 comments

Comments

@vicuna
Copy link

vicuna commented Feb 28, 2016

Original bug ID: 7158
Reporter: mfp
Status: resolved (set by @xavierleroy on 2017-02-16T13:12:43Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.02.3
Target version: 4.05.0 +dev/beta1/beta2/beta3/rc1
Fixed in version: 4.05.0 +dev/beta1/beta2/beta3/rc1
Category: otherlibs
Related to: #7100 #7198 #7750
Monitored by: braibant @gasche @diml @hcarty

Bug description

Event.sync uses condition variables, which are represented with custom blocks.

The parameters to alloc_custom are used=1, max=Max_condition_number=5000 (raised in 2010 from the original 1000 set back in 1996).

The end result is that a full major GC cycle is completed after at most 5000 calls to Event.sync, which can represent a considerable GC load. This is triggered for example by Lwt_preemptive.

Additional information

References:
ocsigen/lwt#218
mfp/ocaml-sqlexpr#13

@vicuna
Copy link
Author

vicuna commented Feb 29, 2016

Comment author: mfp

Some further considerations:

this is not the first time I run into such a thing (caml_alloc_custom params triggering too frequent GC): it also happened with Pcre regexps (at most 500(!) unreclaimed at any time) until recently.

I'd say there's an underlying API issue with caml_alloc_custom here: the used/max limits are application-dependent and not future-proof, so any library not using used=0,max=n is exposing itself to causing performance troubles to the users and/or being rendered comically outdated when the "acceptable" limits raise exponentially.

When the custom block does not represent scarce resources (like file descriptors or things with attached kernel structures), but only out-of-(OCaml-)heap memory, it would be preferable to have the GC adjust its speed based on the memory footprint relative to the current heap size.

caml_alloc_custom currently increases an internal value by used/max, and a full GC cycle is completed by the time it exceeds 1 or 0.5 * minor / major.

Would it be possible to have a new caml_alloc_custom-like function for "(extra-heap) memory only" structures which increased the internal value by something proportional to, say, resource_size / major or to piggy-back on the GC's speed control system to the same effect?

Many C libraries using custom block could use such a function, becoming both future-proof and usable across very different applications.

Having custom blocks that represent scarce resources is arguably a bad idea (or more precisely, leaving their disposal up to the GC), but there's indeed some value in having a safety net like the one offered by caml_alloc_custom at present. It would be nice to have a way to expose and make more visible all those "runtime parameters" so that different applications can manipulate them without patching all the dependencies, and as to make it easier to locate and increase them in the future.

This could be as simple as a registry of (mutable) "build-time constants" in the runtime, along with a trivial module in the stdlib, providing 3 operations: (1) register a value associated to a unique name (like the custom_ops identifier), (2) find a value and (3) list all values.

(3) would be useful to future developers to get a comprehensive list of build-time constants they might want to tweak for their specific applications or review as the resources become more abundant.

(BTW, Lwt_preemptive didn't need first-class sync communication, so I proposed to replace Event with a trivial mutex + CV combo: ocsigen/lwt#219 . I wonder how many actual users of Event's full capabilities there are).

@vicuna
Copy link
Author

vicuna commented Mar 27, 2016

Comment author: mfp

I have located yet another instance of the hardcoded limit issue: sqlite3-ocaml's database and statement handles (both with used=1, max=100), resulting in the GC taking over >70% of the CPU time. I suspect systematic search for more caml_alloc_custom/caml_alloc_final uses would yield several results.

Edit: should I open a new PR for the caml_alloc_custom/caml_alloc_final API issue?

@vicuna
Copy link
Author

vicuna commented Mar 27, 2016

Comment author: @gasche

should I open a new PR for the caml_alloc_custom/caml_alloc_final API issue?

Do as you feel is best. Damien's triaging of the issue indicates that he considers it a major issue, but that he probably won't be working on it before the 4.03 release -- I guess there is not enough time left to design and test more GC control mechanism, as those have a tendency to need a lot of testing on production workloads.

@vicuna
Copy link
Author

vicuna commented Mar 27, 2016

Comment author: mfp

Indeed, it's way too late to fiddle with the GC and introduce a new C API. There's more than enough GC work going on with the ephemerons and the low-latency stuff :)

On further reflection, the Event.sync performance bug is an instance of the broader API issue, so the latter definitely deserves a PR of its own, I'm posting it in a minute.

@vicuna
Copy link
Author

vicuna commented Feb 16, 2017

Comment author: @xavierleroy

After some thoughts and discussions, it appears that good C implementations of mutexes and condition variables do not consume kernel resources and we can allocate as many as will fit in memory. Hence we now call caml_alloc_custom with cost 0/1 instead of 1/N. This will be in release 4.05.

Commits: [trunk 84be1bc] and [4.05 16ade59]

@vicuna vicuna closed this as completed Feb 16, 2017
@vicuna
Copy link
Author

vicuna commented Nov 9, 2018

Comment author: @alainfrisch

Would it be possible to have a new caml_alloc_custom-like function for "(extra-heap) memory only" structures which increased the internal value by something proportional to, say, resource_size / major or to piggy-back on the GC's speed control system to the same effect?

Cf #1738

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant