Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006224OCamlback end (clambda to assembly)public2013-11-04 18:052017-03-07 13:25
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
PlatformOSOS Version
Product Version4.01.0 
Target VersionundecidedFixed in Version 
Summary0006224: introduce cmovcc instruction to speed up certain simple "if" blocks
Description(on behalf of pdenys who worked on this)

Using cmovcc instruction is a way to avoid branch mispredictions. This change translates "if" statements of the form [if cond then var1 else var2] and similar to use "cmovcc".
Steps To ReproduceDo [if cond then var1 else var2] with [cond] hard to predict. The penalty of misprediction is about 24 cycles. On the other hand [cmovcc] takes just 2 cycles. In the case of 50% mispredictions, the speedup should be 6 times.
Additional InformationThe effect is visible in [Map]. For instance [Map.find] has [if cond then l else r] to select left and right subtree. The benchmark shows that [Map.find] got 1.6 times faster.

The data structures where this should be noticeable are [Map], [Set], various implementations of heap and others which use ordered types.
Attached Filesdiff file icon cmovcc.2.diff [^] (31,266 bytes) 2013-11-04 18:05 [Show Content]
? file icon [^] (559 bytes) 2013-11-04 18:06 [Show Content]

- Relationships

-  Notes
meurer (developer)
2013-11-04 20:53

For example, on ARM (even in Thumb-2 mode) almost every instruction can be conditional. It would be nice to have a slightly more general approach where the backend can choose to support conditions on arbitrary Iop's.
gasche (developer)
2013-11-05 13:47

Coq devs love it when Map becomes faster. I notified Pierre-Marie P├ędrot to ask him for large-scale performance feedback.
ppedrot (reporter)
2013-11-05 17:45

So I tested it on Coq trunk. The global gain is neglectable and cannot be distinguished from random noise. The overall speedup is indeed around 0.8%.

Yet, on time-consuming files, we get remarkable speed-ups, some reaching up to around 5%. Hence it may be a good idea after all.
meurer (developer)
2013-11-08 08:01

I think we should do this optimization during linearization. Let's say we have an Iifthenelse with two branches, each of which contains only Iop's. Then we'd check for each of these Iop's, whether the architecture supports a conditional version of this Iop, up to a given number of Iop's. If this is holds, then we combine them into a conditional block avoiding the branches.
chambart (developer)
2014-01-30 15:54

I tried a bit to see what this would look like at the linearization pass. And it does not seems like the right place for it. First to detect that correctly, you need to add some kind of machinery. The case where it would apply are almost always like

cmp *
-> mov x y
   do something
-> mov x y (same x and y)
   do something else or nothing

So we first need a way to push those mov out of the branches. There is also the same problem at the exit point of the branches. (I did a small pass to try those)

But the main problem is that linearisation is after register allocation. When generating something like

if x then 1 else 2

if this is done before register allocation we can do:

y <- 2
if x then y <- 1

But after, it becomes quite difficult and we would end up doing 2 cmov instead.

Benedikt, why don't you think that converting to arm conditionnal instruction can't be done during selection ? By the way I think that this could be made a bit more generic to share some parts with other architectures.

- Issue History
Date Modified Username Field Change
2013-11-04 18:05 vbrankov New Issue
2013-11-04 18:05 vbrankov File Added: cmovcc.2.diff
2013-11-04 18:06 vbrankov File Added:
2013-11-04 20:53 meurer Note Added: 0010585
2013-11-05 13:47 gasche Note Added: 0010590
2013-11-05 17:45 ppedrot Note Added: 0010593
2013-11-08 08:01 meurer Note Added: 0010602
2014-01-30 15:54 chambart Note Added: 0010867
2014-02-19 16:39 doligez Tag Attached: patch
2014-07-16 13:55 doligez Status new => acknowledged
2014-07-16 13:55 doligez Target Version => 4.03.0+dev / +beta1
2016-04-14 17:02 doligez Target Version 4.03.0+dev / +beta1 => 4.03.1+dev
2017-02-16 14:01 doligez Target Version 4.03.1+dev => undecided
2017-02-23 16:35 doligez Category OCaml backend (code generation) => Back end (clambda to assembly)
2017-02-23 16:44 doligez Category Back end (clambda to assembly) => back end (clambda to assembly)
2017-03-07 13:25 shinwell Severity tweak => feature

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker