Version française
Home     About     Download     Resources     Contact us    
Browse thread
Objective Caml 2.02
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Alexey Nogin <nogin@c...>
Subject: Re: Upgrade from OCaml 2.01 to OCaml 2.02 made things _slower_!
Xavier Leroy wrote:

> In the past, I've observed speed variations by at least +/- 5% caused
> exclusively by minor variations in code placement (such as adding or
> deleting instructions that are never executed).  Almost any
> modification in the code generator affects code placement.  If only
> for this reason, speed variations of less than 5% are essentially
> meaningless: there's no way to attribute them to a particular
> otpimization or to good/bad luck in code placement.  (Makes you very
> suspicious of those PLDI papers where they report 1% speedups...)
>
> > Also, I was doing some performance mesurements (using P6 performance
> > counter support patches for Linux by Erik Hendriks -
> > http://beowulf.gsfc.nasa.gov/software/ ) when I upgraded, so I have some
> > information (and can get more of it) on the performance counters for my
> > program under both 2.01 and 2.02. In particular, the number of requests
> > from the processor to the L1 data cache became 2%-3% bigger.
>
> That's more meaningful.  The two new optimizations in 2.02 (closed
> toplevel functions and allocation coalescing) should reduce the number
> of memory accesses.  Allocation coalescing might increase register
> pressure locally, causing other stuff to be spilled on the stack,
> though.

Well, in this case I should probably try to remove the allocation coalescing
and see what happens. Am I right assuming that in order to do that I have to
revert changes for versions 1.8 -> 1.9 and 1.10 -> 1.11 of the
asmcomp/selectgen.ml?


> Is there any way you could get a per-function profile of
> memory requests? (like on the Alpha with the Digital Unix tools).

I am not sure. I could probably write something gprof-like that would record
the values of the performance counters at each function call, but I am afraid
that's a lot of work. And I could probably get access to Alpha, but I do not
think I will see the same slowdown effect on Alpha as I see on x86, so the
Alpha memory access numbers would not help much.

Alexey
--------------------------------------------------------------
Home Page: http://www.cs.cornell.edu/nogin/
E-Mail: nogin@cs.cornell.edu (office), ayn2@cornell.edu (home)
Office: Upson 4139, tel: 1-607-255-4934
ICQ #: 24708107 (office), 24678341 (home)