You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original bug ID: 6511 Reporter: drichman Status: acknowledged (set by @xavierleroy on 2014-09-01T13:21:40Z) Resolution: open Priority: normal Severity: feature Version: 4.02.0+beta1 / +rc1 Category: back end (clambda to assembly) Monitored by:@gasche@diml@yakobowski
Bug description
I believe OCaml 4.02 has (compared to 4.01) improved its ability to detect re-use of a global variable and avoid loading it into a register several times.
However, there are still a few cases where it does not. I've noticed that adding
let some_global = some_global in
to the top of some of my functions produces different (better) assembly and speeds them up; my feature request is that OCaml spot this without my help.
I've put two examples below that I believe produce sub-optimal output with OCaml trunk; the first is a toy example and the latter is the one I care about.
Thanks,
Daniel
Steps to reproduce
Toy example: global refs
This change to the source:
--- global_ref.ml 2014-08-04 09:37:04.634534095 +0100
+++ global_ref2.ml 2014-08-04 09:39:15.978539072 +0100
@@ -2,6 +2,8 @@
let global = int_of_string "1234"
let aref = ref 0
let () =
+ let global = global in
+ let aref = aref in
aref := global;
aref := global;
aref := global
produces this change in assembly:
--- global_ref.s/ocaml_4.02.0+beta1.s 2014-08-04 09:40:03.963544100 +0100
+++ global_ref2.s/ocaml_4.02.0+beta1.s 2014-08-04 09:40:20.043546093 +0100
@@ -43,15 +43,11 @@
; Before here, rbx is loaded with the address of camlTest
movq $1024, -8(%rax)
movq $1, (%rax) ; tagged zero
movq %rax, 8(%rbx) ; store initial zero
- movq 8(%rbx), %rax ; load addr of aref
- movq (%rbx), %rdi ; load global
- movq %rdi, (%rax) ; store global
- movq 8(%rbx), %rax ; load both _again_
- movq (%rbx), %rdi
- movq %rdi, (%rax) ; store
- movq 8(%rbx), %rax ; load
- movq (%rbx), %rbx ;
- movq %rbx, (%rax) ; store
+ movq (%rbx), %rax ; load global
+ movq 8(%rbx), %rbx ; load addr of aref
+ movq %rax, (%rbx) ; store
+ movq %rax, (%rbx) ; store
+ movq %rax, (%rbx) ; store
movq $1, %rax
addq $8, %rsp
.cfi_adjust_cfa_offset -8
Example: Core.Iobuf
--- iobuf.ml 2014-08-04 11:15:00.210635072 +0100
+++ iobuf2.ml 2014-08-04 11:15:04.746636077 +0100
@@ -8,6 +8,7 @@
open Bigarray
type t = (char, int8_unsigned_elt, c_layout) Array1.t
(* After inlining, etc. Iobuf.Unsafe.Poke.int63 boils down to this: *)
external int64_of_int : int -> int64 = "%int64_of_int"
external unsafe_set_64 : t -> int -> int64 -> unit = "%caml_bigstring_set64u"
let unsafe_write_int64_int t ~pos x = unsafe_set_64 t pos (int64_of_int x)
let global_buf = Array1.create char c_layout 1024
let () =
+ let global_buf = global_buf in
unsafe_write_int64_int global_buf ~pos:0 123;
unsafe_write_int64_int global_buf ~pos:8 456;
unsafe_write_int64_int global_buf ~pos:16 789
Adding the "let global_buf = global_buf in" line produces this change in the assembly output:
B: get bigarray;
C: get pointer to buffer in C heap
--- iobuf.s/ocaml_4.02.0+beta1.s 2014-08-04 11:15:19.571638074 +0100
+++ iobuf2.s/ocaml_4.02.0+beta1.s 2014-08-04 11:15:14.667637075 +0100
@@ -70,12 +70,10 @@
call camlBigarray__create_1118@PLT
.L102:
movq camlTest@GOTPCREL(%rip), %rbx ; A
movq %rax, 8(%rbx) ; storing the result of BA_create
movq 8(%rbx), %rax ; B
- movq 8(%rax), %rax ; C
- movq $123, (%rax) ; store
- movq 8(%rbx), %rax ; B
- movq 8(%rax), %rax ; C
- movq $456, 8(%rax) ; store
- movq 8(%rbx), %rax ; B
- movq 8(%rax), %rax ; C
- movq $789, 16(%rax) ; store
+ movq 8(%rax), %rbx ; C
+ movq $123, (%rbx) ; store
+ movq 8(%rax), %rbx ; C
+ movq $456, 8(%rbx) ; store
+ movq 8(%rax), %rax ; C
+ movq $789, 16(%rax) ; store
movq $1, %rax
I don't know how feasible it is to eliminate the duplicate "C" loads too.
The text was updated successfully, but these errors were encountered:
Yes, CSE optimization can factor out repeated loads from the same address, but it needs to be prudent in the face of memory stores (such as ":="), which can invalidate previously-loaded data. In your example:
aref := global;
aref := global;
CSE doesn't know that "global" resides in memory at an address different from the reference cell "aref" which is being updated. Hence, the second load from "global" (line 2) remains. To eliminate it, an alias analysis is needed so that CSE knows the two addresses involved are different.
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.
Original bug ID: 6511
Reporter: drichman
Status: acknowledged (set by @xavierleroy on 2014-09-01T13:21:40Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.02.0+beta1 / +rc1
Category: back end (clambda to assembly)
Monitored by: @gasche @diml @yakobowski
Bug description
I believe OCaml 4.02 has (compared to 4.01) improved its ability to detect re-use of a global variable and avoid loading it into a register several times.
However, there are still a few cases where it does not. I've noticed that adding
to the top of some of my functions produces different (better) assembly and speeds them up; my feature request is that OCaml spot this without my help.
I've put two examples below that I believe produce sub-optimal output with OCaml trunk; the first is a toy example and the latter is the one I care about.
Thanks,
Daniel
Steps to reproduce
Toy example: global refs
This change to the source:
produces this change in assembly:
Example: Core.Iobuf
Adding the "let global_buf = global_buf in" line produces this change in the assembly output:
B: get bigarray;
C: get pointer to buffer in C heap
I don't know how feasible it is to eliminate the duplicate "C" loads too.
The text was updated successfully, but these errors were encountered: