Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid loading global variables more than necessary #6511

Closed
vicuna opened this issue Aug 11, 2014 · 2 comments
Closed

avoid loading global variables more than necessary #6511

vicuna opened this issue Aug 11, 2014 · 2 comments

Comments

@vicuna
Copy link

vicuna commented Aug 11, 2014

Original bug ID: 6511
Reporter: drichman
Status: acknowledged (set by @xavierleroy on 2014-09-01T13:21:40Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.02.0+beta1 / +rc1
Category: back end (clambda to assembly)
Monitored by: @gasche @diml @yakobowski

Bug description

I believe OCaml 4.02 has (compared to 4.01) improved its ability to detect re-use of a global variable and avoid loading it into a register several times.

However, there are still a few cases where it does not. I've noticed that adding

let some_global = some_global in

to the top of some of my functions produces different (better) assembly and speeds them up; my feature request is that OCaml spot this without my help.

I've put two examples below that I believe produce sub-optimal output with OCaml trunk; the first is a toy example and the latter is the one I care about.

Thanks,
Daniel

Steps to reproduce

Toy example: global refs

This change to the source:

--- global_ref.ml	2014-08-04 09:37:04.634534095 +0100
+++ global_ref2.ml	2014-08-04 09:39:15.978539072 +0100
@@ -2,6 +2,8 @@
 let global = int_of_string "1234"
 let aref = ref 0
 
 let () =
+  let global = global in
+  let aref = aref in
   aref := global;
   aref := global;
   aref := global

produces this change in assembly:

--- global_ref.s/ocaml_4.02.0+beta1.s	2014-08-04 09:40:03.963544100 +0100
+++ global_ref2.s/ocaml_4.02.0+beta1.s	2014-08-04 09:40:20.043546093 +0100
@@ -43,15 +43,11 @@

  ; Before here, rbx is loaded with the address of camlTest
 	movq	$1024, -8(%rax)
 	movq	$1, (%rax)              ; tagged zero
 	movq	%rax, 8(%rbx)           ; store initial zero
-	movq	8(%rbx), %rax           ; load addr of aref
-	movq	(%rbx), %rdi            ; load global
-	movq	%rdi, (%rax)            ; store global
-	movq	8(%rbx), %rax           ; load both _again_
-	movq	(%rbx), %rdi
-	movq	%rdi, (%rax)            ; store
-	movq	8(%rbx), %rax           ; load
-	movq	(%rbx), %rbx            ; 
-	movq	%rbx, (%rax)            ; store
+	movq	(%rbx), %rax            ; load global
+	movq	8(%rbx), %rbx           ; load addr of aref
+	movq	%rax, (%rbx)            ; store
+	movq	%rax, (%rbx)            ; store
+	movq	%rax, (%rbx)            ; store
 	movq	$1, %rax
 	addq	$8, %rsp
 	.cfi_adjust_cfa_offset	-8

Example: Core.Iobuf

--- iobuf.ml	2014-08-04 11:15:00.210635072 +0100
+++ iobuf2.ml	2014-08-04 11:15:04.746636077 +0100
@@ -8,6 +8,7 @@
 open Bigarray

 type t = (char, int8_unsigned_elt, c_layout) Array1.t

 (* After inlining, etc. Iobuf.Unsafe.Poke.int63 boils down to this: *)
 external int64_of_int : int -> int64 = "%int64_of_int"
 external unsafe_set_64 : t -> int -> int64 -> unit = "%caml_bigstring_set64u"
 let unsafe_write_int64_int t ~pos x = unsafe_set_64 t pos (int64_of_int x)

 let global_buf = Array1.create char c_layout 1024
 
 let () =
+  let global_buf = global_buf in
   unsafe_write_int64_int global_buf ~pos:0  123;
   unsafe_write_int64_int global_buf ~pos:8  456;
   unsafe_write_int64_int global_buf ~pos:16 789

Adding the "let global_buf = global_buf in" line produces this change in the assembly output:

B: get bigarray;
C: get pointer to buffer in C heap

--- iobuf.s/ocaml_4.02.0+beta1.s	2014-08-04 11:15:19.571638074 +0100
+++ iobuf2.s/ocaml_4.02.0+beta1.s	2014-08-04 11:15:14.667637075 +0100
@@ -70,12 +70,10 @@
	call  camlBigarray__create_1118@PLT
.L102:
 	movq	camlTest@GOTPCREL(%rip), %rbx   ; A
 	movq	%rax, 8(%rbx)       ; storing the result of BA_create
 	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$123, (%rax)        ; store
-	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$456, 8(%rax)       ; store
-	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$789, 16(%rax)      ; store
+	movq	8(%rax), %rbx       ; C
+	movq	$123, (%rbx)        ; store
+	movq	8(%rax), %rbx       ; C
+	movq	$456, 8(%rbx)       ; store
+	movq	8(%rax), %rax       ; C
+	movq	$789, 16(%rax)      ; store
 	movq	$1, %rax

I don't know how feasible it is to eliminate the duplicate "C" loads too.

@vicuna
Copy link
Author

vicuna commented Sep 1, 2014

Comment author: @xavierleroy

Yes, CSE optimization can factor out repeated loads from the same address, but it needs to be prudent in the face of memory stores (such as ":="), which can invalidate previously-loaded data. In your example:

aref := global;
aref := global;

CSE doesn't know that "global" resides in memory at an address different from the reference cell "aref" which is being updated. Hence, the second load from "global" (line 2) remains. To eliminate it, an alias analysis is needed so that CSE knows the two addresses involved are different.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant