Version française
Home     About     Download     Resources     Contact us    
Browse thread
Performance questions, -inline, ...
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Kuba Ober <ober.14@o...>
Subject: Re: [Caml-list] Performance questions, -inline, ...
On Monday 07 January 2008, Jon Harrop wrote:
> On Monday 07 January 2008 15:22:40 Kuba Ober wrote:
> > On Monday 07 January 2008, Jon Harrop wrote:
> > > You mean it might be possible to recover the performance of C from
> > > numerical code with high-level abstractions? Yes. Indeed, I would like
> > > to see this done. However, I've never heard of an implementation of any
> > > language that can do this.
> >
> > g++ does that reasonably well. Heck, I have a severely hacked in-house
> > LISP system which had grown out of an assembler-with-LISP-macros, and it
> > does it all the time. It's a LISP with static types at runtime, but
> > general LISPiness at compile time (macros just run in gcl). In fact I'm
> > looking to port it to OCaml just because type inference and commonplace
> > LISP syntax for type declarations don't mix too well - the code starts
> > looking really ugly. Maybe I stuck too much with LISP's usual
> > declaim-this-and-that approach, but that's what you get when you reuse
> > most of underlying LISP's implementation.
>
> Yes. If you're willing to do that kind of hacking then you could get a long
> way with camlp4 without too much trouble.
>
> > > I assume you didn't mimic run-time polymorphism, currying, closures and
> > > higher-order functions in the C though?
> >
> > No. But while those are very useful features to have, the code is messy
> > if you've got two languages to work with (hi-perf stuff in C, everything
> > else in OCaml).
>
> On 64-bit there is no performance benefit in dropping to C.
>
> > > Yes. Your "add1" function is entirely dead code and, therefore, is
> > > trivially reducible. This is not true of real programs and, therefore,
> > > your results will not be representative of real programs. Moreover,
> > > there is no sense in trying to optimize a program that does nothing
> > > anyway.
> >
> > If you code such a thing in gcc, it correctly becomes a no-op. OCaml
> > isn't all that clever ;)
> >
> > You're 100% academically-correct of course.
>
> As Xavier says, "the OCaml compiler is designed to compile good code".
>
> > > The simplest route to recovering C performance here is:
> > >
> > > . Inline "( +. )".
> > > . Inline "op1".
> > > . Type-specialize "op1".
> > > . Hoist bounds checks.
> >
> > You mean "the programmer has to do this"? OCaml compiler is *really* bad,
> > then.
>
> You're 100% academically-correct of course.
>
> In practice, the combination of OCaml's awesome native-code generation on
> AMD64 and the ease with which you can work around those missing
> optimizations mean that OCaml is *really* good compared to all other
> language
> implementations with comparable qualities. This is precisely why it is so
> popular.
>
> > > There are some trade-offs here though. Stalin-compiled Scheme,
> > > MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in
> > > theory but, in practice, they are much less useful for high-performance
> > > numerics because their performance is so unpredictable.
> >
> > Good to know.
>
> OCaml's code generator is also much better than MLton's and GHC's (even
> 6.8).
>
> > > I've already spent a long time meticulously contructing benchmarks in
> > > various different languages along these lines and the most important
> > > optimizations missing from OCaml are:
> > >
> > > . Hoisting bounds checks.
> > > . Common subexpression elimination.
> > > . Type specialization.
> > > . More aggressive inlining.
> > > . Static optimization of div and mod.
> > >
> > > The nice thing about OCaml is that it has a superb 64-bit code
> > > generator so, once you've done those optimizations on the hot paths
> > > manually, OCaml lets you run code as quickly as C but with all of the
> > > high-level benefits everywhere else.
> > >
> > > A perfect language implementation would certainly do these
> > > optimizations for you and I wish OCaml did but I still haven't found
> > > anything better.
> >
> > After I gain some more experience I will hopefully add an OCaml-esque
> > front end to my hackish compiler. It's completely unportable, only works
> > on 12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but
> > hey, maybe it can be a good starting point to something bigger and
> > better. The backend still has to be ported from Lisp, and I'm too time-
> > and
> > knowledge-constrained to do it just yet.
> > ...
>
> I see. If you're not using OCaml as a general purpose language then you
> don't need its other benefits (e.g. GUI libraries).
>
> I'm in a similar situation. I'm toying with the idea of building a better
> FPL implementation for high-performance numerics that is commerce friendly
> using LLVM. If you take the path of least resistance then I believe that is
> tractable but you lose nice things like LablGTK2...

If you can do some code generation, it shouldn't be a big deal to implement 
even some very complex ABIs, say interfacing to C++ libraries. As long as you 
can cleanly run your language at compile time (like lisp does), and as long 
as the compiler provides a documented way to pop assembly into the code, you 
can have a nice language that can, in "natively compiled" output, interface 
say with Qt. IMHO, the latter is now a few years ahead of GTK, and is only 
gaining the advantage as time passes. Languages such as OCaml sorely lack in 
nontrivial ABI department - there's no reason to have to write (or even 
generate) C-style binding code for say Qt just to use it in OCaml. Both 
bytecode compiler and native code compiler should have a pluggable 
ABI-generator scheme where they can interface with C++ (at least). Another 
way to go about it would be to machine-translate C++ libraries to OCaml. This 
should be doable for Qt, but I don't think I'm up for that one just yet, even 
though I think about it.

If you want to generate code for a VM, the backend should allow selectable 
VMs - at least CLR and JVM, if the thing is to be of any relevance.

The language should, IMHO, also be flexible enough that features could be 
reasonably added to have it support resource-constrained architectures.

I'd expect the language to have a core feature set which doesn't do automated 
memory management beyond what can be inferred for fixed-lifetime data. This 
would be akin to working with good old C++, and should be easily portable to 
tiny targets. Then you have "level 2" feature set which is supposed to use 
garbage collecting virtual machines. I'd even add something like "level 1.5" 
which uses say C++ ABI but also links-in a garbage collector. So that you get 
automated memory management but no VM per se. Heck, for all I care you could 
have that thing emit code in C++ ABI by default - it's a reasonable-enough 
choice.

I find it annoying that writing good "PC" code requires a wholly different 
toolset from writing embedded code, the latter usually limited to really 
broken vendor C (or sometimes C++).

Cheers, Kuba