Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] partial eval question
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Andrew Lenharth <andrewl@d...>
Subject: Re: [Caml-list] partial eval question
On Mon, Oct 27, 2003 at 07:12:59PM +0000, William Chesters wrote:
> What I wrote, the "obvious thing", is
>    -- easy to write, and hard to get wrong

Easy to write, maybe, if recursion is hard.  Harder for compilers to
 optimize.  Tests latter.

>    -- gives much less confusing error messages if you get it slightly
> wrong
>    -- easy to read

Which is less important in high performance situlations.

>    -- uses a smaller subset of the language, so is especially easier
> for non-C++ experts
>    -- more general, in that it doesn't blow up and use silly amounts
> of space (and probably more time too, given cache churn) if N is not
> tiny

When I dump the code, I see 2 * N bytes of code in the unrolled loop.
When I dump your code, I see no loop unrolling, hence no point in
even trying to specialize the case.

Anytime you are doing code generation, you must be careful to avoid
blow up.  Such is life.  As you alluded to in the cache churn comment
idea unrolling requires mesurement.  However, with the template code
we are explicitly using code generation, in the c code, we are simply
hoping the compiler will do something we want.

>    -- more general also in that the same code does both the
> general-purpose (n known only at runtime) and the special-purpose
> job

yes, and properly written the template code will support that.
 
>  > The C example relies on a fairly smart compiler to 
>  > do interprocedual analysis.
> 
> Depends what you mean by fairly smart.  This is standard stuff: gcc is
> really not the best optimising compiler around.

This is true.  Therefore I tried this on both gcc and icc (intel's compiler).
icc did worse on the c than gcc did (interesting).  Especially since icc
was trying to do interprocedual analysis.

The C++ versions were very similar at the assembly level in both compilers.
The body of the loop was just series of fmul is both compiler outputs.

quick measurement with gettimeofday:
compiler N=3   N=30
gcc      5181  89866 
icc,c   18437 106522
icc,c++  1336   1296
g++      1297   1297

for 1M iterations of assignment of pow into a volatile double

These are very dependent on compiler flags in the c case.  Having
icc be more aggressive with loop unrolling made matters worse, since
the code no longer fit in L1.  There was much less needing to tune
compiler options with the template approach, it consistently gave the
best code.

I would venture to say that for performance, there is still need for
a programmer visible code generation engine.

>  > The C++ example only requires the inline keywork be honored, and
>  > you don't need explicit pow3 pow2, you have pow<3> pow<2> pow<any
>  > constant>.
>  > 
>  > Gives a bit more control over code generation.
> 
> It does.  I personally feel (actually have learned the hard way) that
> using C++ templates for multi-stage programming is mostly much more
> trouble than it is worth---especially when you realise what careful
> exploitation of C-level specalisation by inlining can do.

Relying on the compiler is risky business.  Doing code generation and
hoping the compiler will optimize something a certain way are really
different things.

Further, for really interesting code generation, C-level specalisation
by inlining is utterly inadiquit.  Some tasks, such as data marshalling
engines, can benifit greatly from straight forward template style code
generation (both in shorter development time and smaller code base), 
whereas there is no equivilent framework to do that in C.

> If you really want more control over code generation (not forgetting
> that just writing out what you want by hand is often the simplest
> option in practice!) then I think C++ templates are a dead end---far
> better to make the object language the same as the target language,
> as in MetaOcaml and similar.

Indeed, a unified approach is easier to use, though for performance the
code generation needs to be tuned at runtime, a task C++ templates 
certainly cannot pretent to approach.

Andrew Lenharth

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners