|Anonymous | Login | Signup for a new account||2017-02-23 19:27 CET|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006224||OCaml||back end (clambda to assembly)||public||2013-11-04 18:05||2017-02-16 14:01|
|Priority||normal||Severity||tweak||Reproducibility||have not tried|
|Target Version||undecided||Fixed in Version|
|Summary||0006224: introduce cmovcc instruction to speed up certain simple "if" blocks|
|Description||(on behalf of pdenys who worked on this)|
Using cmovcc instruction is a way to avoid branch mispredictions. This change translates "if" statements of the form [if cond then var1 else var2] and similar to use "cmovcc".
|Steps To Reproduce||Do [if cond then var1 else var2] with [cond] hard to predict. The penalty of misprediction is about 24 cycles. On the other hand [cmovcc] takes just 2 cycles. In the case of 50% mispredictions, the speedup should be 6 times.|
|Additional Information||The effect is visible in [Map]. For instance [Map.find] has [if cond then l else r] to select left and right subtree. The benchmark shows that [Map.find] got 1.6 times faster.|
The data structures where this should be noticeable are [Map], [Set], various implementations of heap and others which use ordered types.
|Attached Files|| cmovcc.2.diff [^] (31,266 bytes) 2013-11-04 18:05 [Show Content]
bench_map_find.ml [^] (559 bytes) 2013-11-04 18:06 [Show Content]
|For example, on ARM (even in Thumb-2 mode) almost every instruction can be conditional. It would be nice to have a slightly more general approach where the backend can choose to support conditions on arbitrary Iop's.|
|Coq devs love it when Map becomes faster. I notified Pierre-Marie Pédrot to ask him for large-scale performance feedback.|
So I tested it on Coq trunk. The global gain is neglectable and cannot be distinguished from random noise. The overall speedup is indeed around 0.8%.
Yet, on time-consuming files, we get remarkable speed-ups, some reaching up to around 5%. Hence it may be a good idea after all.
|I think we should do this optimization during linearization. Let's say we have an Iifthenelse with two branches, each of which contains only Iop's. Then we'd check for each of these Iop's, whether the architecture supports a conditional version of this Iop, up to a given number of Iop's. If this is holds, then we combine them into a conditional block avoiding the branches.|
I tried a bit to see what this would look like at the linearization pass. And it does not seems like the right place for it. First to detect that correctly, you need to add some kind of machinery. The case where it would apply are almost always like
-> mov x y
-> mov x y (same x and y)
do something else or nothing
So we first need a way to push those mov out of the branches. There is also the same problem at the exit point of the branches. (I did a small pass to try those)
But the main problem is that linearisation is after register allocation. When generating something like
if x then 1 else 2
if this is done before register allocation we can do:
y <- 2
if x then y <- 1
But after, it becomes quite difficult and we would end up doing 2 cmov instead.
Benedikt, why don't you think that converting to arm conditionnal instruction can't be done during selection ? By the way I think that this could be made a bit more generic to share some parts with other architectures.
|2013-11-04 18:05||vbrankov||New Issue|
|2013-11-04 18:05||vbrankov||File Added: cmovcc.2.diff|
|2013-11-04 18:06||vbrankov||File Added: bench_map_find.ml|
|2013-11-04 20:53||meurer||Note Added: 0010585|
|2013-11-05 13:47||gasche||Note Added: 0010590|
|2013-11-05 17:45||ppedrot||Note Added: 0010593|
|2013-11-08 08:01||meurer||Note Added: 0010602|
|2014-01-30 15:54||chambart||Note Added: 0010867|
|2014-02-19 16:39||doligez||Tag Attached: patch|
|2014-07-16 13:55||doligez||Status||new => acknowledged|
|2014-07-16 13:55||doligez||Target Version||=> 4.03.0+dev / +beta1|
|2016-04-14 17:02||doligez||Target Version||4.03.0+dev / +beta1 => 4.03.1+dev|
|2017-02-16 14:01||doligez||Target Version||4.03.1+dev => undecided|
|2017-02-23 16:35||doligez||Category||OCaml backend (code generation) => Back end (clambda to assembly)|
|2017-02-23 16:44||doligez||Category||Back end (clambda to assembly) => back end (clambda to assembly)|
|Copyright © 2000 - 2011 MantisBT Group|