Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005999OCamlOCaml backend (code generation)public2013-04-24 20:542014-07-16 20:43
Reportervbrankov 
Assigned To 
PrioritynormalSeveritytweakReproducibilityN/A
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version4.00.1 
Target Versionafter-4.02.0Fixed in Version 
Summary0005999: "for" loop not entirely optimal?
DescriptionWhy does the "for" loop create extra variable in cmmgen.ml? I simplified it and got a measurable speedup. I used binomial options pricing as benchmark and the speedup was between 0.3 and 3% depending on the version of Intel Xeon that I've tested it on.

Here's the patch:

Index: asmcomp/cmmgen.ml
===================================================================
--- asmcomp/cmmgen.ml (revision 13585)
+++ asmcomp/cmmgen.ml (working copy)
@@ -1243,7 +1243,6 @@
       let tst = match dir with Upto -> Cgt | Downto -> Clt in
       let inc = match dir with Upto -> Caddi | Downto -> Csubi in
       let raise_num = next_raise_count () in
- let id_prev = Ident.rename id in
       return_unit
         (Clet
            (id, transl low,
@@ -1255,13 +1254,12 @@
                     Cloop
                       (Csequence
                          (remove_unit(transl body),
- Clet(id_prev, Cvar id,
                           Csequence
                             (Cassign(id,
                                Cop(inc, [Cvar id; Cconst_int 2])),
                              Cifthenelse
- (Cop(Ccmpi Ceq, [Cvar id_prev; high]),
- Cexit (raise_num,[]), Ctuple [])))))),
+ (Cop(Ccmpi tst, [Cvar id; high]),
+ Cexit (raise_num,[]), Ctuple []))))),
                  Ctuple []))))
   | Uassign(id, exp) ->
       return_unit(Cassign(id, transl exp))
Tagspatch
Attached Files

- Relationships
child of 0000415closed for i = 0 to max_int do () done loop ? 

-  Notes
(0009206)
frisch (developer)
2013-04-25 16:27

"svn annotate" is your friend:

http://caml.inria.fr/cgi-bin/viewvc.cgi?view=revision&revision=5277 [^]
(0009207)
vbrankov (reporter)
2013-04-25 16:31

Ups, sorry.
(0009209)
xleroy (administrator)
2013-04-25 16:58
edited on: 2014-07-16 20:40

See 0000415. I suspect your patch reintroduces this bug, namely

    for i = ... to max_int do ... done

fails to terminate. Could you check whether it is the case?

(0009211)
vbrankov (reporter)
2013-04-25 17:30

Yes, it does. How about this?

However, with this patch, although I think it generates simpler cmm code, I cannot reproduce the speedup. I'm checking whether I'm doing anything wrong.

Index: asmcomp/cmmgen.ml
===================================================================
--- asmcomp/cmmgen.ml (revision 13585)
+++ asmcomp/cmmgen.ml (working copy)
@@ -1243,7 +1243,6 @@
       let tst = match dir with Upto -> Cgt | Downto -> Clt in
       let inc = match dir with Upto -> Caddi | Downto -> Csubi in
       let raise_num = next_raise_count () in
- let id_prev = Ident.rename id in
       return_unit
         (Clet
            (id, transl low,
@@ -1255,13 +1254,11 @@
                     Cloop
                       (Csequence
                          (remove_unit(transl body),
- Clet(id_prev, Cvar id,
- Csequence
- (Cassign(id,
- Cop(inc, [Cvar id; Cconst_int 2])),
- Cifthenelse
- (Cop(Ccmpi Ceq, [Cvar id_prev; high]),
- Cexit (raise_num,[]), Ctuple [])))))),
+ Cifthenelse
+ (Cop(Ccmpi Ceq, [Cvar id; high]),
+ Cexit (raise_num,[]),
+ Cassign(id,
+ Cop(inc, [Cvar id; Cconst_int 2])))))),
                  Ctuple []))))
   | Uassign(id, exp) ->
       return_unit(Cassign(id, transl exp))
(0009212)
vbrankov (reporter)
2013-04-25 17:52

Or we can change the "if" condition to Cne so that Cexit is in the second leg. I'm not sure whether that would have any benefits.
(0009221)
xleroy (administrator)
2013-04-26 18:39

If I read your patch #2 correctly, you end up generating a loop with a conditional test in the middle and a jump at the end:

Lhead:
   ...
   branch-if-finished Lafter
   ...
   branch Lhead
Lafter:

instead of a loop with the conditional test at the bottom:

Lhead:
   ...
   branch-if-not-finished Lhead

This doesn't look good to me.

What I had in mind was, rather, to precompute "max loop value plus 1 / minus 1" and compare the loop index *after incrementation / decrementation* against this precomputed value. I doubt this would make much of a performance difference wrt the code currently generated, though.

Concerning 3% speedups, see the recent discussion on caml-list: minute changes in code placement alone can result in much bigger slowdowns or speedups.
(0009222)
frisch (developer)
2013-04-26 19:35

> Concerning 3% speedups, see the recent discussion on caml-list: minute changes in code placement alone can result in much bigger slowdowns or speedups.

Gabriel Scherer collects a set of realistic benchmarks, which will hopefully make it easier to assess the benefits of such optimizations. While I agree that a 3% speedup on one example does not give any useful information, if we can observe a 3% speedup on a reasonable number of benchmarks and no degradation on the others, it is definitely a good idea to consider the optimization as a good candidate. Do you agree? (Then other criteria apply, of course, such as complexity of the compiler, possible interactions with other optimizations, etc)
(0009234)
vbrankov (reporter)
2013-04-30 00:06

I apologize for my somewhat messy and hastily written input. I agree with Xavier that I reversed the "if" conditions, and what I wondered in my previous comment is why after fixing it the code was not faster. I think I understand the reason and my new change produced a measurable speedup.

I validated the following change against this code:

let () =
  for x = max_int - 3 to max_int do
    Printf.printf "%d\n" x
  done;
  for x = min_int to min_int + 3 do
    Printf.printf "%d\n" x
  done;
  for x = max_int downto max_int - 3 do
    Printf.printf "%d\n" x
  done;
  for x = min_int + 3 downto min_int do
    Printf.printf "%d\n" x
  done;

I also validated the loop going from min_int to max_int and vice versa.

The change follows. I agree that it's not too much simpler and I understand the complications arising around max_int and min_int. It might have been simpler if "for" loop didn't include the last value :)

Index: asmcomp/cmmgen.ml
===================================================================
--- asmcomp/cmmgen.ml (revision 13585)
+++ asmcomp/cmmgen.ml (working copy)
@@ -1242,8 +1242,8 @@
   | Ufor(id, low, high, dir, body) ->
       let tst = match dir with Upto -> Cgt | Downto -> Clt in
       let inc = match dir with Upto -> Caddi | Downto -> Csubi in
+ let dec = match dir with Upto -> Csubi | Downto -> Caddi in
       let raise_num = next_raise_count () in
- let id_prev = Ident.rename id in
       return_unit
         (Clet
            (id, transl low,
@@ -1252,16 +1252,17 @@
                 (raise_num, [],
                  Cifthenelse
                    (Cop(Ccmpi tst, [Cvar id; high]), Cexit (raise_num, []),
- Cloop
- (Csequence
- (remove_unit(transl body),
- Clet(id_prev, Cvar id,
- Csequence
+ Csequence
+ (Cassign(id, Cop(dec, [Cvar id; Cconst_int 2])),
+ Cloop
+ (Csequence
                             (Cassign(id,
                                Cop(inc, [Cvar id; Cconst_int 2])),
- Cifthenelse
- (Cop(Ccmpi Ceq, [Cvar id_prev; high]),
- Cexit (raise_num,[]), Ctuple [])))))),
+ Csequence
+ (remove_unit(transl body),
+ Cifthenelse
+ (Cop(Ccmpi Ceq, [Cvar id; high]),
+ Cexit (raise_num,[]), Ctuple [])))))),
                  Ctuple []))))
   | Uassign(id, exp) ->
       return_unit(Cassign(id, transl exp))

I tried these simple benchmarks and the results show a speedup. You can decide whether it's worth trying on your benchmarks which I'm sure are much better.

for1.ml:

let () =
  for j = 1 to 16 do
    let t = Unix.gettimeofday () in
    let s = ref 0 in
    for i = 0 to 256 * 1024 * 1024 do
      s := !s + i
    done;
    Printf.printf "%f\n%!" (Unix.gettimeofday () -. t)
  done

for2.ml:

let () =
  for j = 1 to 16 do
    let t = Unix.gettimeofday () in
    for i = 0 to 256 * 1024 * 1024 do
      let _ = () in ()
    done;
    Printf.printf "%f\n%!" (Unix.gettimeofday () -. t)
  done

binomial.ml:

let print x =
  Printf.printf "%f\n" x

let binomial =
  let n = 31 in
  let up_pow = Array.make (2 * n + 1) 0. in
  let p = Array.make (n + 1) 0. in
  fun t s k r sigma q ->
    let delta_t = t /. (float_of_int n) in
    let up = exp (sigma *. (sqrt delta_t)) in
    let e = exp ((q -. r) *. delta_t) in
    let p0 = (up -. e) /. (up *. up -. 1.) in
    let p1 = e -. p0 in

    let upp = ref 1. in
    for i = 0 to n do
      up_pow.(n - i) <- 1. /. !upp;
      up_pow.(n + i) <- !upp;
      upp := !upp *. up
    done;

    for i = 0 to n do
      p.(i) <- max (k -. s *. up_pow.(2 * i)) 0.
    done;

    for j = n - 1 downto 0 do
      for i = 0 to j do
        p.(i) <- max (p0 *. p.(i + 1) +. p1 *. p.(i))
                     (k -. s *. up_pow.(2 * i - j + n))
      done
    done;

    p.(0)

let () =
  for j = 1 to 16 do
    let t = Unix.gettimeofday () in
    for i = 1 to 16 * 1024 do
      let _ = binomial 2. 51. 52. 0.05 0.30 0. in ()
    done;
    Printf.printf "%f\n%!" (Unix.gettimeofday () -. t)
  done

Sandy Bridge E5-2678W:

$ ./for1o.exe
0.260819
0.260948
0.260801
0.260866
0.260827
0.260850
0.260839
0.260838
0.260805
0.260932
0.260837
0.260867
0.260853
0.260815
0.260880
0.260835
$ ./for1n.exe
0.260984
0.260826
0.260842
0.260806
0.260820
0.260875
0.260818
0.260833
0.260931
0.260856
0.260883
0.260831
0.260840
0.260823
0.260818
0.260916
$ ./for2o.exe
0.086952
0.086954
0.087016
0.086974
0.086926
0.086920
0.086960
0.086930
0.086975
0.086945
0.086949
0.086933
0.086921
0.086993
0.087036
0.086943
$ ./for2n.exe
0.163828
0.163704
0.163343
0.163945
0.163630
0.163942
0.163502
0.163694
0.163925
0.163866
0.163601
0.163771
0.164038
0.163980
0.163916
0.163751
$ ./binomialo.exe
0.207450
0.206657
0.206742
0.206671
0.206671
0.206698
0.206661
0.206738
0.206699
0.206717
0.206709
0.206715
0.206755
0.206704
0.206725
0.206675
$ ./binomialn.exe
0.213871
0.213428
0.213373
0.213341
0.213416
0.213362
0.213369
0.213388
0.213402
0.213394
0.213379
0.213426
0.213356
0.213453
0.213383
0.213424

Results on Nehalem W5580:

$ ./for1n.exe
0.174947
0.174892
0.174715
0.174037
0.174908
0.174869
0.175077
0.174716
0.174957
0.174934
0.173908
0.174368
0.174649
0.173923
0.174939
0.174617
$ ./for1o.exe
0.155822
0.156458
0.155934
0.155775
0.155727
0.155996
0.155818
0.155730
0.155991
0.155727
0.155741
0.155731
0.155782
0.155853
0.155822
0.156264
$ ./for2n.exe
0.156385
0.155981
0.156028
0.156326
0.156335
0.159161
0.155775
0.155884
0.155790
0.155858
0.155726
0.155789
0.155730
0.155765
0.155715
0.156170
$ ./for2o.exe
0.156091
0.156449
0.155923
0.155901
0.155962
0.155920
0.155927
0.155922
0.156145
0.156003
0.155985
0.155826
0.155922
0.155877
0.156098
0.155893
$ ./binomialn.exe
0.195871
0.197264
0.197959
0.198080
0.197767
0.197309
0.197652
0.197516
0.197584
0.197424
0.197679
0.197659
0.198057
0.197310
0.197322
0.197594
$ ./binomialo.exe
0.197047
0.194971
0.195267
0.197578
0.197261
0.197287
0.197569
0.197292
0.197625
0.197437
0.197564
0.197575
0.198512
0.197781
0.197244
0.197192
(0009483)
vbrankov (reporter)
2013-06-13 18:49

I'm sorry, I see that this issue stalled. Do I need to do something about it?
(0009492)
xleroy (administrator)
2013-06-14 14:00

> Do I need to do something about it?

No, thanks for asking. We're in code freeze for release 4.01, so this will have to wait for later.

- Issue History
Date Modified Username Field Change
2013-04-24 20:54 vbrankov New Issue
2013-04-25 16:27 frisch Note Added: 0009206
2013-04-25 16:31 vbrankov Note Added: 0009207
2013-04-25 16:56 xleroy Relationship added child of 0000415
2013-04-25 16:58 xleroy Note Added: 0009209
2013-04-25 16:58 xleroy Status new => feedback
2013-04-25 16:58 xleroy Note View State: 0009209: public
2013-04-25 17:30 vbrankov Note Added: 0009211
2013-04-25 17:30 vbrankov Status feedback => new
2013-04-25 17:52 vbrankov Note Added: 0009212
2013-04-26 18:39 xleroy Note Added: 0009221
2013-04-26 19:35 frisch Note Added: 0009222
2013-04-30 00:06 vbrankov Note Added: 0009234
2013-06-13 18:49 vbrankov Note Added: 0009483
2013-06-14 14:00 xleroy Note Added: 0009492
2013-06-14 14:00 xleroy Severity minor => tweak
2013-06-14 14:00 xleroy Status new => acknowledged
2013-06-14 14:00 xleroy Target Version => 4.02.0+dev
2013-07-12 18:15 doligez Target Version 4.02.0+dev => 4.01.1+dev
2013-12-16 15:53 doligez Tag Attached: patch
2014-05-25 20:20 doligez Target Version 4.01.1+dev => 4.02.0+dev
2014-07-16 20:40 doligez Note Edited: 0009209 View Revisions
2014-07-16 20:43 doligez Target Version 4.02.0+dev => after-4.02.0


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker