Re: Upgrade from OCaml 2.01 to OCaml 2.02 made things _slower_!

From: Alexey Nogin (nogin@cs.cornell.edu)
Date: Fri Mar 12 1999 - 00:59:49 MET


Date: Thu, 11 Mar 1999 18:59:49 -0500
From: Alexey Nogin <nogin@cs.cornell.edu>
To: Xavier Leroy <Xavier.Leroy@inria.fr>
Subject: Re: Upgrade from OCaml 2.01 to OCaml 2.02 made things _slower_!

Xavier Leroy wrote:

> In the past, I've observed speed variations by at least +/- 5% caused
> exclusively by minor variations in code placement (such as adding or
> deleting instructions that are never executed). Almost any
> modification in the code generator affects code placement. If only
> for this reason, speed variations of less than 5% are essentially
> meaningless: there's no way to attribute them to a particular
> otpimization or to good/bad luck in code placement. (Makes you very
> suspicious of those PLDI papers where they report 1% speedups...)
>
> > Also, I was doing some performance mesurements (using P6 performance
> > counter support patches for Linux by Erik Hendriks -
> > http://beowulf.gsfc.nasa.gov/software/ ) when I upgraded, so I have some
> > information (and can get more of it) on the performance counters for my
> > program under both 2.01 and 2.02. In particular, the number of requests
> > from the processor to the L1 data cache became 2%-3% bigger.
>
> That's more meaningful. The two new optimizations in 2.02 (closed
> toplevel functions and allocation coalescing) should reduce the number
> of memory accesses. Allocation coalescing might increase register
> pressure locally, causing other stuff to be spilled on the stack,
> though.

Well, in this case I should probably try to remove the allocation coalescing
and see what happens. Am I right assuming that in order to do that I have to
revert changes for versions 1.8 -> 1.9 and 1.10 -> 1.11 of the
asmcomp/selectgen.ml?

> Is there any way you could get a per-function profile of
> memory requests? (like on the Alpha with the Digital Unix tools).

I am not sure. I could probably write something gprof-like that would record
the values of the performance counters at each function call, but I am afraid
that's a lot of work. And I could probably get access to Alpha, but I do not
think I will see the same slowdown effect on Alpha as I see on x86, so the
Alpha memory access numbers would not help much.

Alexey
--------------------------------------------------------------
Home Page: http://www.cs.cornell.edu/nogin/
E-Mail: nogin@cs.cornell.edu (office), ayn2@cornell.edu (home)
Office: Upson 4139, tel: 1-607-255-4934
ICQ #: 24708107 (office), 24678341 (home)



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:20 MET