Tiny optim to avoid some spilling of floats in x87 #6924

vicuna · 2015-07-09T07:45:50Z

Original bug ID: 6924
Reporter: @alainfrisch
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2017-02-16T14:15:02Z)
Resolution: fixed
Priority: low
Severity: tweak
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: back end (clambda to assembly)
Monitored by: @gasche @jmeber

Bug description

The code generated for

let f (m:float array) i x = Array.set m i x

on x86 (32-bit) unboxes the float argument, spills it, does the checkbound, and loads the float back from the stack before putting it in the array:

	sub	esp, 8
L100:
	fld	REAL8 PTR [ecx]
	fstp	REAL8 PTR [esp]
	mov	ecx, DWORD PTR [eax-4]
	shr	ecx, 10
	cmp	ecx, ebx
	jbe	L101
	fld	REAL8 PTR [esp]
	fstp	REAL8 PTR [ebx*4+eax-4]
	mov	eax, 1
	add	esp, 8
	ret

I guess that the current strategy for the x87 fp stack is to use it only within a basic block, and it would probably be quite difficult to change that in general. Fragments of numerical code which have branches only because of check bounds could probably benefit a little bit from optimizing cases such as above. The spill above can be avoided by the attached patch, which doesn't bind the rhs of the assignment (i.e. the unboxing code) before the checkbound. The patch is quite restrictive, it should be ok to do so as long as the rhs cannot raise an exception.

The code generated after the patch is:

	sub	esp, 8
L100:
	mov	edx, DWORD PTR [eax-4]
	shr	edx, 10
	cmp	edx, ebx
	jbe	L101
	fld	REAL8 PTR [ecx]
	fstp	REAL8 PTR [ebx*4+eax-4]
	mov	eax, 1
	add	esp, 8
	ret

A very rough micro-benchmark on the code below:

let f a (x : float) =
  for i = 1 to 1000 do
    for j = 0 to Array.length a - 1 do
      a.(j) <- x;
      a.(j) <- x;
      a.(j) <- x;
      a.(j) <- x
    done
  done

let () =
  let a = Array.make 1024 0. in
  for i = 1 to 1000 do
    f a (float i)
  done

gives the following results:

  Before patch,  -unsafe mode:  1.8s
  Before patch,     safe mode:  3.2s
  After patch,   -unsafe mode:  1.8s
  After patch,      safe mode:  2.1s

which shows that most of the overhead for bound checks comes from the spilling overhead.

File attachments

no_spill.diff

vicuna · 2015-07-09T23:06:19Z

Comment author: @jmeber

I strongly advocate inclusion of this patch as it gives big speedups (on x86) of a very common and frequent operation in numeric code (in-place modification of float array with "simple" values) when you still want to benefit from bounds checking.
We noted this behavior during a more general initiative in our company to try to implement some numerical kernel routines "back" in OCaml (instead of having them as external C routines) but without loosing too much speed (and yes, we have to support x86 platform for now).

vicuna · 2015-07-25T20:45:31Z

Comment author: @gasche

I merged the patch in trunk.

vicuna · 2015-07-27T11:49:39Z

Comment author: @alainfrisch

Thanks for merging. That said, the patch only covers a very specific case and the problem might be more general. I wonder whether we should keep the ticket open for this reason.

vicuna closed this as completed Feb 16, 2017

vicuna added the back-end label Mar 14, 2019

vicuna added this to the 4.03.0 milestone Mar 14, 2019

vicuna assigned gasche Mar 14, 2019

vicuna added the bug label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiny optim to avoid some spilling of floats in x87 #6924

Tiny optim to avoid some spilling of floats in x87 #6924

vicuna commented Jul 9, 2015 •

edited by nojb

vicuna commented Jul 9, 2015 •

edited by nojb

vicuna commented Jul 25, 2015

vicuna commented Jul 27, 2015

Tiny optim to avoid some spilling of floats in x87 #6924

Tiny optim to avoid some spilling of floats in x87 #6924

Comments

vicuna commented Jul 9, 2015 • edited by nojb

Bug description

File attachments

vicuna commented Jul 9, 2015 • edited by nojb

vicuna commented Jul 25, 2015

vicuna commented Jul 27, 2015

vicuna commented Jul 9, 2015 •

edited by nojb

vicuna commented Jul 9, 2015 •

edited by nojb