Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005360OCamlOCaml generalpublic2011-09-25 14:562012-09-25 20:07
ReporterPascal Cuoq 
Assigned To 
PrioritynormalSeveritytweakReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version3.12.1 
Target VersionFixed in Version 
Summary0005360: For 64-bit targets, array bound checking assembly code could be more compact
DescriptionConsider the following module:

type t

let f (a: t array) =
  a.(7)

Here is a diff between the code generated by OCaml 3.12.1 and the code that could be expected for bounds checking, when values have the same size as double-precision floating-point numbers :

--- t.orig.s 2011-09-25 14:43:16.000000000 +0200
+++ t.s 2011-09-25 14:44:05.000000000 +0200
@@ -28,20 +28,17 @@
     movq %rax, %rdi
     movq -8(%rdi), %rax
     movq %rax, %rbx
- andq $255, %rbx
- cmpq $254, %rbx
- je .L100
- shrq $9, %rax
- cmpq $15, %rax
+ shrq $9, %rbx
+ cmpq $15, %rbx
     jbe .L102
+ andq $255, %rax
+ cmpq $254, %rax
+ je .L100
     movq 56(%rdi), %rax
     addq $8, %rsp
     ret
     .align 4
 .L100:
- shrq $9, %rax
- cmpq $15, %rax
- jbe .L102
 .L103: subq $16, %r15
     movq caml_young_limit@GOTPCREL(%rip), %rax
     cmpq (%rax), %r15

The minimally tested attached patch implements the new version.
TagsNo tags attached.
Attached Filespatch file icon factor_array_length.patch [^] (1,085 bytes) 2011-09-25 14:56 [Show Content]
patch file icon factor_array_length_2.patch [^] (1,950 bytes) 2011-12-22 09:47 [Show Content]

- Relationships
related to 0005345closed Optimize bounds checking 

-  Notes
(0006430)
gasche (developer)
2011-12-21 11:24

Mandatory hard-facts question -- I hope xleroy didn't patent it: do you have an example of real application (Frama-C?) where this induced a noticeable performance improvement?
(0006480)
pascal_cuoq (reporter)
2011-12-22 01:30

This is not a performance optimization, it is a code size optimization. In realistic function f above, the same three-instruction sequence appears at the beginning of each of two branches, making the code of this function three instructions longer than necessary. This is stupid and can be fixed with the same "if wordsize_shift = numfloat_shift" test that's already used elsewhere in cmmgen.ml.

I have no doubt that this is entirely absorbed by other inefficiencies in the compiler, if that's what you are getting at. The only hard facts I have at hand are that the Frama-C core (excluding most functionality) compiles to a 10MB binary and that compiling with -compact makes it smaller *and* faster, hinting that code bloat does influence performance.
(0006485)
gasche (developer)
2011-12-22 09:53
edited on: 2011-12-22 10:05

I compiled frama-C without -compact on my trunk, and here are the size of toplevel.opt:
- 10476K (10697637) without any patch
- 10472K (10691109) with your patch (reduces Parrayrefs)
- 10468K (10689669) with the patch I just uploaded, that reduces both Parrayrefs and Parraysets (I haven't checked it carefully though)

The difference is rather small.

That said, in the process I discovered that this optimization is already performed in the Parraylength case, so applying it in the Parrayref and Parrayset cases does not seem preposterous.

PS: Oh, and Frama-C doesn't compile as is against trunk, because you have some Datatype.Hashtbl functor with input signature Hashtbl.S, to which you pass your Inthash module, but the signature of stdlib's Hashtbl changed in trunk (a new "stats" function is available). It is not future-proof to depend contravariantly on a stdlib interface.

(0006488)
xleroy (administrator)
2011-12-22 10:20

Patch #2 applied in SVN trunk (commit 11932).

I agreee more compact code is better, and optimizations specific to 64-bit targets are welcome since these targets are getting dominant.

For Parraylength, the 64-bit case saves one run-time test, while for Parrayrefs and Parraysets we have the same number of tests executed (2), but produce one fewer. So, the benefits will be much smaller.
(0006508)
pascal_cuoq (reporter)
2011-12-22 18:13
edited on: 2011-12-22 18:32

Performance:

All together, the patches here and from entry 0005345 (including the assembly-level patch 0002 that was rejected) applied to OCaml 3.12.1 appear to make the sha1.c benchmark from http://blog.frama-c.com/index.php?post/2011/08/29/CompCert-gets-a-safe-interpreter-mode [^] 1.5% faster (same protocol as in the blog post, sha1.c modified to run the same computations 5 times). This is pretty cool even if it does not nearly bridge the gap with CompCert's interpreter.

Other Frama-C use cases would use a smaller proportion of array operations, but yet other use cases would use more. I think the sha1.c benchmark is average in this respect.


- Issue History
Date Modified Username Field Change
2011-09-25 14:56 Pascal Cuoq New Issue
2011-09-25 14:56 Pascal Cuoq File Added: factor_array_length.patch
2011-12-21 11:24 gasche Note Added: 0006430
2011-12-21 11:24 gasche Status new => feedback
2011-12-22 01:30 pascal_cuoq Note Added: 0006480
2011-12-22 09:40 protz Relationship added related to 0005345
2011-12-22 09:47 gasche File Added: factor_array_length_2.patch
2011-12-22 09:53 gasche Note Added: 0006485
2011-12-22 09:58 gasche Note Edited: 0006485 View Revisions
2011-12-22 10:04 gasche Assigned To => gasche
2011-12-22 10:04 gasche Status feedback => new
2011-12-22 10:04 gasche Assigned To gasche =>
2011-12-22 10:05 gasche Note Edited: 0006485 View Revisions
2011-12-22 10:20 xleroy Note Added: 0006488
2011-12-22 10:20 xleroy Status new => resolved
2011-12-22 10:20 xleroy Resolution open => fixed
2011-12-22 18:13 pascal_cuoq Note Added: 0006508
2011-12-22 18:32 gasche Note Edited: 0006508 View Revisions
2012-09-25 20:07 xleroy Status resolved => closed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker