New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"for" loop not entirely optimal? #5999
Comments
Comment author: @alainfrisch "svn annotate" is your friend: http://caml.inria.fr/cgi-bin/viewvc.cgi?view=revision&revision=5277 |
Comment author: vbrankov Ups, sorry. |
Comment author: @xavierleroy See #415. I suspect your patch reintroduces this bug, namely
fails to terminate. Could you check whether it is the case? |
Comment author: vbrankov Yes, it does. How about this? However, with this patch, although I think it generates simpler cmm code, I cannot reproduce the speedup. I'm checking whether I'm doing anything wrong. Index: asmcomp/cmmgen.ml--- asmcomp/cmmgen.ml (revision 13585)
@@ -1255,13 +1254,11 @@
|
Comment author: vbrankov Or we can change the "if" condition to Cne so that Cexit is in the second leg. I'm not sure whether that would have any benefits. |
Comment author: @xavierleroy If I read your patch #2 correctly, you end up generating a loop with a conditional test in the middle and a jump at the end: Lhead: instead of a loop with the conditional test at the bottom: Lhead: This doesn't look good to me. What I had in mind was, rather, to precompute "max loop value plus 1 / minus 1" and compare the loop index after incrementation / decrementation against this precomputed value. I doubt this would make much of a performance difference wrt the code currently generated, though. Concerning 3% speedups, see the recent discussion on caml-list: minute changes in code placement alone can result in much bigger slowdowns or speedups. |
Comment author: @alainfrisch
Gabriel Scherer collects a set of realistic benchmarks, which will hopefully make it easier to assess the benefits of such optimizations. While I agree that a 3% speedup on one example does not give any useful information, if we can observe a 3% speedup on a reasonable number of benchmarks and no degradation on the others, it is definitely a good idea to consider the optimization as a good candidate. Do you agree? (Then other criteria apply, of course, such as complexity of the compiler, possible interactions with other optimizations, etc) |
Comment author: vbrankov I apologize for my somewhat messy and hastily written input. I agree with Xavier that I reversed the "if" conditions, and what I wondered in my previous comment is why after fixing it the code was not faster. I think I understand the reason and my new change produced a measurable speedup. I validated the following change against this code: let () = I also validated the loop going from min_int to max_int and vice versa. The change follows. I agree that it's not too much simpler and I understand the complications arising around max_int and min_int. It might have been simpler if "for" loop didn't include the last value :) Index: asmcomp/cmmgen.ml--- asmcomp/cmmgen.ml (revision 13585)
@@ -1252,16 +1252,17 @@
I tried these simple benchmarks and the results show a speedup. You can decide whether it's worth trying on your benchmarks which I'm sure are much better. for1.ml: let () = for2.ml: let () = binomial.ml: let print x = let binomial =
let () = Sandy Bridge E5-2678W: $ ./for1o.exe Results on Nehalem W5580: $ ./for1n.exe |
Comment author: vbrankov I'm sorry, I see that this issue stalled. Do I need to do something about it? |
Comment author: @xavierleroy
No, thanks for asking. We're in code freeze for release 4.01, so this will have to wait for later. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Original bug ID: 5999
Reporter: vbrankov
Status: acknowledged (set by @xavierleroy on 2013-06-14T12:00:34Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.00.1
Target version: undecided
Category: back end (clambda to assembly)
Tags: patch
Child of: #2819
Monitored by: @gasche
Bug description
Why does the "for" loop create extra variable in cmmgen.ml? I simplified it and got a measurable speedup. I used binomial options pricing as benchmark and the speedup was between 0.3 and 3% depending on the version of Intel Xeon that I've tested it on.
Here's the patch:
Index: asmcomp/cmmgen.ml
--- asmcomp/cmmgen.ml (revision 13585)
+++ asmcomp/cmmgen.ml (working copy)
@@ -1243,7 +1243,6 @@
let tst = match dir with Upto -> Cgt | Downto -> Clt in
let inc = match dir with Upto -> Caddi | Downto -> Csubi in
let raise_num = next_raise_count () in
@@ -1255,13 +1254,12 @@
Cloop
(Csequence
(remove_unit(transl body),
return_unit(Cassign(id, transl exp))
The text was updated successfully, but these errors were encountered: