English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
[Caml-list] ocamlopt and Windows DLL
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2001-05-27 (08:47)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: [Caml-list] Obsessed by speed
> 1. If I define Array.iter + a function that uses it,
>    will ocamlopt combine these two functions two one?
>    (I looked in the assembler code, but it seemed as
>     ocamlopt didn't combine them)

You're right, the function inlining pass in ocamlopt is rather
conservative and doesn't inline and beta-reduce function arguments to
higher-order functions.  Inlining is a delicate trade-off between too
little and too much (e.g. code size and compile time explosion), and
OCaml errs on the conservative side.

> 2. Do we need special Pentium-4 adaptions to utilize
>    its very good float performance?

I'm not familiar with the P4 micro-architecture, so I don't know.
ocamlopt uses the standard IA32 stack-based model for floating-point
code.  Apparently, the P4 can now do 64-bit float arithmetic between
true registers (the SSE2 model), and ocamlopt could generate better
(and simpler!) floating-point code for this model.  Don't know how
much of a performance difference that would make, though.  

At any rate, the ocamlopt port for AMD's x86-64 architecture will use
the SSE2 model.

> 3. Would ocamlopt benefit from a peephole optimizer of
>    the assembler code? Or is the assembler code already
>    optimal?

No assembly code is ever optimal, especially if machine-generated :-)
A peephole optimizer could remove some cosmetic inefficiencies, but I
doubt this would make a significant speed difference.  Today's
processors have enough instruction-level parallelism and dynamic
instruction scheduling that a few redundant integer operations here
and there don't really hurt.  

Other higher-level optimizations not currently performed could improve
performance more, e.g. common subexpression elimination on memory loads.

> 4. What is unboxed and what isn't?
>    I have noticed that there is a
>    continuos work to unbox more.

Very briefly:

Always unboxed:
  int, char, bool, constant constructors of datatypes.
Locally unboxed (in expressions and let...in): 
  float, int32, nativeint (and int64 on 64-bit processors)
Unboxed inside arrays:
Unboxed inside big arrays:
  all numerical types
Always boxed:
  everything else (records, non-constant datatype constructors,
  tuples, arrays, etc)

> 5. How do you make sense of gprof-output? Any suggestions?

The "info" docs for gprof contain a tutorial.
The function names that appear are of the form Modulename_functionname_NNN
where NNN is a unique integer.
Be careful with the call graph reported by gprof: it is totally
inaccurate for higher-order functions.

Hope this answers your questions.

- Xavier Leroy
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr