Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006511OCamlOcaml optimizationpublic2014-08-11 17:032014-09-04 00:25
Reporterdrichman 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityalways
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version4.02.0+beta1 / +rc1 
Target VersionundecidedFixed in Version 
Summary0006511: avoid loading global variables more than necessary
DescriptionI believe OCaml 4.02 has (compared to 4.01) improved its ability to detect re-use of a global variable and avoid loading it into a register several times.

However, there are still a few cases where it does not. I've noticed that adding

let some_global = some_global in


to the top of some of my functions produces different (better) assembly and speeds them up; my feature request is that OCaml spot this without my help.

I've put two examples below that I believe produce sub-optimal output with OCaml trunk; the first is a toy example and the latter is the one I care about.

Thanks,
Daniel
Steps To ReproduceToy example: global refs

This change to the source:

--- global_ref.ml	2014-08-04 09:37:04.634534095 +0100
+++ global_ref2.ml	2014-08-04 09:39:15.978539072 +0100
@@ -2,6 +2,8 @@
 let global = int_of_string "1234"
 let aref = ref 0
 
 let () =
+  let global = global in
+  let aref = aref in
   aref := global;
   aref := global;
   aref := global


produces this change in assembly:

--- global_ref.s/ocaml_4.02.0+beta1.s	2014-08-04 09:40:03.963544100 +0100
+++ global_ref2.s/ocaml_4.02.0+beta1.s	2014-08-04 09:40:20.043546093 +0100
@@ -43,15 +43,11 @@

  ; Before here, rbx is loaded with the address of camlTest
     movq	$1024, -8(%rax)
     movq	$1, (%rax)              ; tagged zero
     movq	%rax, 8(%rbx)           ; store initial zero
-	movq	8(%rbx), %rax           ; load addr of aref
-	movq	(%rbx), %rdi            ; load global
-	movq	%rdi, (%rax)            ; store global
-	movq	8(%rbx), %rax           ; load both _again_
-	movq	(%rbx), %rdi
-	movq	%rdi, (%rax)            ; store
-	movq	8(%rbx), %rax           ; load
-	movq	(%rbx), %rbx            ; 
-	movq	%rbx, (%rax)            ; store
+	movq	(%rbx), %rax            ; load global
+	movq	8(%rbx), %rbx           ; load addr of aref
+	movq	%rax, (%rbx)            ; store
+	movq	%rax, (%rbx)            ; store
+	movq	%rax, (%rbx)            ; store
     movq	$1, %rax
     addq	$8, %rsp
     .cfi_adjust_cfa_offset	-8


Example: Core.Iobuf

--- iobuf.ml	2014-08-04 11:15:00.210635072 +0100
+++ iobuf2.ml	2014-08-04 11:15:04.746636077 +0100
@@ -8,6 +8,7 @@
 open Bigarray

 type t = (char, int8_unsigned_elt, c_layout) Array1.t

 (* After inlining, etc. Iobuf.Unsafe.Poke.int63 boils down to this: *)
 external int64_of_int : int -> int64 = "%int64_of_int"
 external unsafe_set_64 : t -> int -> int64 -> unit = "%caml_bigstring_set64u"
 let unsafe_write_int64_int t ~pos x = unsafe_set_64 t pos (int64_of_int x)

 let global_buf = Array1.create char c_layout 1024
 
 let () =
+  let global_buf = global_buf in
   unsafe_write_int64_int global_buf ~pos:0  123;
   unsafe_write_int64_int global_buf ~pos:8  456;
   unsafe_write_int64_int global_buf ~pos:16 789


Adding the "let global_buf = global_buf in" line produces this change in the assembly output:

B: get bigarray;
C: get pointer to buffer in C heap

--- iobuf.s/ocaml_4.02.0+beta1.s	2014-08-04 11:15:19.571638074 +0100
+++ iobuf2.s/ocaml_4.02.0+beta1.s	2014-08-04 11:15:14.667637075 +0100
@@ -70,12 +70,10 @@
    call  camlBigarray__create_1118@PLT
.L102:
     movq	camlTest@GOTPCREL(%rip), %rbx   ; A
     movq	%rax, 8(%rbx)       ; storing the result of BA_create
     movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$123, (%rax)        ; store
-	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$456, 8(%rax)       ; store
-	movq	8(%rbx), %rax       ; B
-	movq	8(%rax), %rax       ; C
-	movq	$789, 16(%rax)      ; store
+	movq	8(%rax), %rbx       ; C
+	movq	$123, (%rbx)        ; store
+	movq	8(%rax), %rbx       ; C
+	movq	$456, 8(%rbx)       ; store
+	movq	8(%rax), %rax       ; C
+	movq	$789, 16(%rax)      ; store
     movq	$1, %rax


I don't know how feasible it is to eliminate the duplicate "C" loads too.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0012070)
xleroy (administrator)
2014-09-01 15:21

Yes, CSE optimization can factor out repeated loads from the same address, but it needs to be prudent in the face of memory stores (suc as ":="), which can invalidate previously-loaded data. In your example:

aref := global;
aref := global;

CSE doesn't know that "global" resides in memory at an address different from the reference cell "aref" which is being updated. Hence, the second load from "global" (line 2) remains. To eliminate it, an alias analysis is needed so that CSE knows the two addresses involved are different.


- Issue History
Date Modified Username Field Change
2014-08-11 17:03 drichman New Issue
2014-08-29 14:30 doligez Severity feature => minor
2014-08-29 14:30 doligez Category OCaml backend (code generation) => Ocaml optimization
2014-08-29 14:30 doligez Target Version => 4.02.1+dev
2014-09-01 15:21 xleroy Note Added: 0012070
2014-09-01 15:21 xleroy Severity minor => feature
2014-09-01 15:21 xleroy Status new => acknowledged
2014-09-04 00:25 doligez Target Version 4.02.1+dev => undecided


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker