Performance: Buffer.add_char is not inlined #5872

vicuna · 2013-01-04T15:39:46Z

Original bug ID: 5872
Reporter: gerd
Status: closed (set by @xavierleroy on 2015-12-11T18:20:58Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.00.1
Target version: 4.01.0+dev
Fixed in version: 4.01.0+dev
Category: standard library
Related to: #5873
Monitored by: @gasche @lefessan @yakobowski @alainfrisch

Bug description

According to ocamlobjinfo, Buffer.add_char is not marked for being inlined. IMHO, this is a performance bug, because this function makes only sense if it can be inlined.

Proposed fix: change [let resize] into [let rec resize], i.e. make this function artificially recursive, as this prevents [resize] from being inlined into [add_char] (which is probably the cause for the problem).

vicuna · 2013-01-08T12:51:43Z

Comment author: @alainfrisch

because this function makes only sense if it can be inlined.

While I can imagine that inlining could make a difference here, I don't understand this statement. Can you elaborate on why Buffer.add_char is useless if it is not inlined?

vicuna · 2013-01-08T14:51:28Z

Comment author: gerd

Maybe not totally useless, but my understanding is that algorithms are sometimes easier to write when you treat every char separately - finally caling Buffer.add_char when you are done with a character. The alternative is a different style where you try to treat several characters at once (e.g. find first a substring, then add all at once to the buffer). Well, of course my statement makes only sense when this alternative exists. You can then view add_char as the function that enables the simpler style hopefully without impact on the performance.

Regarding the performance, at least, I recently had to rewrite an algorithm into the alternate style because Buffer.add_char was too slow (or better: it was important for me to maximize the performance, and the choice of style made a difference). My guess is that this would not have been necessary if add_char was inlined (in which case it compiles to around 15 assembly instructions in the "fast path").

Btw, if I compile buffer.ml with -inline 3, add_char is marked as inline function. It is just slightly too large. (Maybe some hint that the default inlining limit is too low.) I just checked again: A simple loop adding 100M chars to a buffer is 10% faster if add_char is inlined (for amd64).

vicuna · 2013-01-08T14:53:42Z

Comment author: @alainfrisch

So an alternative fix would be to compile the standard library with a larger -inline factor, right?

vicuna · 2013-01-08T14:55:36Z

Comment author: gerd

Yes.

vicuna · 2013-01-08T15:42:36Z

Comment author: @alainfrisch

So bumping -inline for the standard library seems more robust to me than trying to work around the inlining heuristic by adding a spurious "rec". We would need to assess the impact of this change, though. There is an ongoing project to collect realistic piece of code in order to create a serious benchmarking suite for OCaml code (in order to study optimizations to the compiler, runtime system, standard library). Cf: http://gallium.inria.fr/~scherer/gagallium/we-need-a-representative-benchmark-suite/

Hopefully such a suite will make it easier to make a decision like changing the default inlining factor (or for the stdlib only).

vicuna · 2013-01-08T16:40:00Z

Comment author: gerd

Just for this case, increasing -inline is ok.

But thinking a bit more general, one of the problems we face here is that the inlining heuristics is very imprecise. So far I understand, the metrics is on the size of an intermediate code representation of the inlining candidate. What is totally ignored is the cost of the function call (if inlining is not done), which is (also) a function of the number of registers the caller needs to save to the stack. Illustrated at the example of Buffer.add_char, this is not negligible: if you e.g. need to save 5 registers, you need 10 more instructions, and this is a lot when the called function consists only of 15 instructions. I have no good idea how you can guess the calling costs based on the intermediate code, but maybe one can use something like the number of variables in the innermost scope as estimate. My thinking is that this matters more when CPUs get more registers. But anyway, "calibrating" something like this (or even inlining in general) is probably difficult.

vicuna · 2013-03-24T14:42:40Z

Comment author: @xavierleroy

Even if Gerd's suggestion (let -> let rec) is a bit of a hack, I like it because it would cause the fast path of add_char to be inlined, while keeping the slow path "offline", and that's what's best wrt performance.

vicuna · 2013-07-11T21:49:46Z

Comment author: @damiendoligez

Even if Gerd's suggestion (let -> let rec) is a bit of a hack, I like it because...

Except that I've just tried and it doesn't work: add_char is still not inlined.

OTOH, compiling buffer.ml with -inline 3 does work, and it doesn't inline resize, so the slow path stays offline.

I have added the -inline 3 to Compflags (rev 13888 in branch 4.01).

vicuna closed this as completed Dec 11, 2015

vicuna added the stdlib label Mar 14, 2019

vicuna added this to the 4.01.0 milestone Mar 14, 2019

This was referenced Mar 14, 2019

Feature: control inlining better #5873

Closed

Mark functions in the stdlib that should be inlined #7126

Closed

vicuna added the bug label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Buffer.add_char is not inlined #5872

Performance: Buffer.add_char is not inlined #5872

vicuna commented Jan 4, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Mar 24, 2013

vicuna commented Jul 11, 2013

Performance: Buffer.add_char is not inlined #5872

Performance: Buffer.add_char is not inlined #5872

Comments

vicuna commented Jan 4, 2013

Bug description

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Jan 8, 2013

vicuna commented Mar 24, 2013

vicuna commented Jul 11, 2013