You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original bug ID: 5205 Reporter:@alainfrisch Assigned to:@xavierleroy Status: closed (set by @xavierleroy on 2012-09-25T18:38:37Z) Resolution: fixed Priority: high Severity: minor Fixed in version: 3.13.0+dev Category: ~DO NOT USE (was: OCaml general) Has duplicate:#5288 Monitored by:@ygrek till "Julien Signoles" @hcarty@Chris00@alainfrisch
Bug description
ocamlopt inlines some local bindings that are pure and used once. However, the notion of "used once" is syntactic, which allows some non-trivial expressions (typically field projection) to be pushed within a loop body instead of being evaluated once outside the loop.
Here is an example:
=================================================================
type t1 = {x:int}
type t2 = {y:t1}
let f {y = {x = x1}} {y = {x = x2}} {y = {x = x3}} {y = {x = x4}} =
(*
let x1 = abs x1 in
let x2 = abs x2 in
let x3 = abs x3 in
let x4 = abs x4 in
*)
let r = ref 0 in
for i = 1 to 1000000 do
for j = 1 to 1000 do
r := x1 + x2 + x3 + x4;
done
done;
!r
let () =
let r = {y = { x = 2 }} in
Printf.printf "%i\n%!" (f r r r r)
On a Linux x64 (Intel Core i7, 2.80 GHz), this program takes 2.34 seconds
and only 1.18 seconds after uncommenting the first lines in f. The effect of these lines is to force the field accesses to be done outside the loop. (On simpler loops, the processor does an amazing job of removing the indirection overhead.)
The same problem appear because of pushing field accessed inside lambdas, e.g.:
=================================================================
let rec f () = r := x1 + x2 + x3 + x4 in
for i = 1 to 1000000 do
for j = 1 to 1000 do
f ()
done
done;
(Timing goes from 5s to 1.5s by enabling the 'let xi = abs xi' lines.)
The faulty optimization is implemented in Simplif.simplify_lets. I suggest to keep track of the depth where let variables are bound (bumping the depth when entering a lambda or loop body) and used, and not inline aliases which are used at a deeper level than their definition.
As an extra benefit, the patch avoids keeping references to useless parts of data structures.
I'm wondering whether inlining let-aliases used only once is really a good heuristic. Isn't it generally better to load from memory early in order to improve pipelining? Has someone done any benchmark to see if the heuristic is useful?
Original bug ID: 5205
Reporter: @alainfrisch
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2012-09-25T18:38:37Z)
Resolution: fixed
Priority: high
Severity: minor
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Has duplicate: #5288
Monitored by: @ygrek till "Julien Signoles" @hcarty @Chris00 @alainfrisch
Bug description
ocamlopt inlines some local bindings that are pure and used once. However, the notion of "used once" is syntactic, which allows some non-trivial expressions (typically field projection) to be pushed within a loop body instead of being evaluated once outside the loop.
Here is an example:
=================================================================
type t1 = {x:int}
type t2 = {y:t1}
let f {y = {x = x1}} {y = {x = x2}} {y = {x = x3}} {y = {x = x4}} =
(*
let x1 = abs x1 in
let x2 = abs x2 in
let x3 = abs x3 in
let x4 = abs x4 in
*)
let r = ref 0 in
for i = 1 to 1000000 do
for j = 1 to 1000 do
r := x1 + x2 + x3 + x4;
done
done;
!r
let () =
let r = {y = { x = 2 }} in
Printf.printf "%i\n%!" (f r r r r)
On a Linux x64 (Intel Core i7, 2.80 GHz), this program takes 2.34 seconds
and only 1.18 seconds after uncommenting the first lines in f. The effect of these lines is to force the field accesses to be done outside the loop. (On simpler loops, the processor does an amazing job of removing the indirection overhead.)
The same problem appear because of pushing field accessed inside lambdas, e.g.:
=================================================================
let rec f () = r := x1 + x2 + x3 + x4 in
for i = 1 to 1000000 do
for j = 1 to 1000 do
f ()
done
done;
(Timing goes from 5s to 1.5s by enabling the 'let xi = abs xi' lines.)
File attachments
The text was updated successfully, but these errors were encountered: