Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce cmovcc instruction to speed up certain simple "if" blocks #6224

Closed
vicuna opened this issue Nov 4, 2013 · 6 comments
Closed

introduce cmovcc instruction to speed up certain simple "if" blocks #6224

vicuna opened this issue Nov 4, 2013 · 6 comments

Comments

@vicuna
Copy link

vicuna commented Nov 4, 2013

Original bug ID: 6224
Reporter: vbrankov
Status: acknowledged (set by @damiendoligez on 2014-07-16T11:55:31Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.01.0
Target version: undecided
Category: back end (clambda to assembly)
Tags: patch
Monitored by: @gasche meurer @diml @chambart @hcarty @yakobowski

Bug description

(on behalf of pdenys who worked on this)

Using cmovcc instruction is a way to avoid branch mispredictions. This change translates "if" statements of the form [if cond then var1 else var2] and similar to use "cmovcc".

Steps to reproduce

Do [if cond then var1 else var2] with [cond] hard to predict. The penalty of misprediction is about 24 cycles. On the other hand [cmovcc] takes just 2 cycles. In the case of 50% mispredictions, the speedup should be 6 times.

Additional information

The effect is visible in [Map]. For instance [Map.find] has [if cond then l else r] to select left and right subtree. The benchmark shows that [Map.find] got 1.6 times faster.

The data structures where this should be noticeable are [Map], [Set], various implementations of heap and others which use ordered types.

File attachments

@vicuna
Copy link
Author

vicuna commented Nov 4, 2013

Comment author: meurer

For example, on ARM (even in Thumb-2 mode) almost every instruction can be conditional. It would be nice to have a slightly more general approach where the backend can choose to support conditions on arbitrary Iop's.

@vicuna
Copy link
Author

vicuna commented Nov 5, 2013

Comment author: @gasche

Coq devs love it when Map becomes faster. I notified Pierre-Marie Pédrot to ask him for large-scale performance feedback.

@vicuna
Copy link
Author

vicuna commented Nov 5, 2013

Comment author: @ppedrot

So I tested it on Coq trunk. The global gain is neglectable and cannot be distinguished from random noise. The overall speedup is indeed around 0.8%.

Yet, on time-consuming files, we get remarkable speed-ups, some reaching up to around 5%. Hence it may be a good idea after all.

@vicuna
Copy link
Author

vicuna commented Nov 8, 2013

Comment author: meurer

I think we should do this optimization during linearization. Let's say we have an Iifthenelse with two branches, each of which contains only Iop's. Then we'd check for each of these Iop's, whether the architecture supports a conditional version of this Iop, up to a given number of Iop's. If this is holds, then we combine them into a conditional block avoiding the branches.

@vicuna
Copy link
Author

vicuna commented Jan 30, 2014

Comment author: @chambart

I tried a bit to see what this would look like at the linearization pass. And it does not seems like the right place for it. First to detect that correctly, you need to add some kind of machinery. The case where it would apply are almost always like

cmp *
j*
-> mov x y
do something
-> mov x y (same x and y)
do something else or nothing

So we first need a way to push those mov out of the branches. There is also the same problem at the exit point of the branches. (I did a small pass to try those)

But the main problem is that linearisation is after register allocation. When generating something like

if x then 1 else 2

if this is done before register allocation we can do:

y <- 2
if x then y <- 1

But after, it becomes quite difficult and we would end up doing 2 cmov instead.

Benedikt, why don't you think that converting to arm conditionnal instruction can't be done during selection ? By the way I think that this could be made a bit more generic to share some parts with other architectures.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant