New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce cmovcc instruction to speed up certain simple "if" blocks #6224
Comments
Comment author: meurer For example, on ARM (even in Thumb-2 mode) almost every instruction can be conditional. It would be nice to have a slightly more general approach where the backend can choose to support conditions on arbitrary Iop's. |
Comment author: @gasche Coq devs love it when Map becomes faster. I notified Pierre-Marie Pédrot to ask him for large-scale performance feedback. |
Comment author: @ppedrot So I tested it on Coq trunk. The global gain is neglectable and cannot be distinguished from random noise. The overall speedup is indeed around 0.8%. Yet, on time-consuming files, we get remarkable speed-ups, some reaching up to around 5%. Hence it may be a good idea after all. |
Comment author: meurer I think we should do this optimization during linearization. Let's say we have an Iifthenelse with two branches, each of which contains only Iop's. Then we'd check for each of these Iop's, whether the architecture supports a conditional version of this Iop, up to a given number of Iop's. If this is holds, then we combine them into a conditional block avoiding the branches. |
Comment author: @chambart I tried a bit to see what this would look like at the linearization pass. And it does not seems like the right place for it. First to detect that correctly, you need to add some kind of machinery. The case where it would apply are almost always like cmp * So we first need a way to push those mov out of the branches. There is also the same problem at the exit point of the branches. (I did a small pass to try those) But the main problem is that linearisation is after register allocation. When generating something like if x then 1 else 2 if this is done before register allocation we can do: y <- 2 But after, it becomes quite difficult and we would end up doing 2 cmov instead. Benedikt, why don't you think that converting to arm conditionnal instruction can't be done during selection ? By the way I think that this could be made a bit more generic to share some parts with other architectures. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Original bug ID: 6224
Reporter: vbrankov
Status: acknowledged (set by @damiendoligez on 2014-07-16T11:55:31Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.01.0
Target version: undecided
Category: back end (clambda to assembly)
Tags: patch
Monitored by: @gasche meurer @diml @chambart @hcarty @yakobowski
Bug description
(on behalf of pdenys who worked on this)
Using cmovcc instruction is a way to avoid branch mispredictions. This change translates "if" statements of the form [if cond then var1 else var2] and similar to use "cmovcc".
Steps to reproduce
Do [if cond then var1 else var2] with [cond] hard to predict. The penalty of misprediction is about 24 cycles. On the other hand [cmovcc] takes just 2 cycles. In the case of 50% mispredictions, the speedup should be 6 times.
Additional information
The effect is visible in [Map]. For instance [Map.find] has [if cond then l else r] to select left and right subtree. The benchmark shows that [Map.find] got 1.6 times faster.
The data structures where this should be noticeable are [Map], [Set], various implementations of heap and others which use ordered types.
File attachments
The text was updated successfully, but these errors were encountered: