Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Does Caml have slow arithmetics ?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-07-08 (21:06)
From: Brandon J. Van Every <vanevery@i...>
Subject: [Caml-list] tail recursion and register poor Intel architecture
Markus Mottl wrote:
> On Thu, 08 Jul 2004, Alex Baretta wrote:
> > Luc Maranget wrote:
> > >  + ocamlopt does it less often. Namely, calls in tail position
> > >  become real tail calls when all their arguments are
> > >  passed in registers.
> > >  (This does not apply to self-tail calls which are always
> > >  optimized)
> >
> >
> > What?! Is this true? This effectively means that I cannot count on
> > tail-call elimination in general?
> No, you can't.  If you happen to use a register-starved
> architecture like
> IA-32 and call an OCaml-function with more than six
> parameters, then you
> are out of luck, because you'll have to throw things on the
> stack.  AFAIK,
> the OCaml compiler generates closures on the heap to work around this,
> but only in cases where the recursion is obvious (self-tail calls).
> Hm, shouldn't be too difficult to extend this to mutual recursion?
> Or does OCaml already do this?
> I usually avoid recursive functions with more than six parameters for
> performance reasons.  I wrap them in closures, which bind constant
> parameters, or, if there are too many volatile parameters, I
> modify the
> volatile parameters in references outside of the function.
> Another (less
> efficient) solution would be to use tuples to pass volatile
> parameters.
> The ultimate solution is, of course: buy a better architecture :-)

Or implement more of what Intel's crufty architecture actually offers.
For instance, supporting SSE would provide 8 additional 32-bit FPU
registers.  One doesn't have to use use the XMM registers for vectors,
they could be used as scalars, and that's the more straightforward
benefit of SSE architecture.  SSE2 allows for 64-bit FPU registers and
also integer registers, of size 32-bit and 64-bit IIRC (but I haven't
much cared about integer code).  So, if you were willing to compile for
a Pentium III for SSE, or a Pentium4 for SSE2, the optimization problem
would be less painful.  Of course, you'd have to write the painful
support for SSE/SSE2 in the first place.  :-)

For the record, I hate Intel's architectures.  They were talking about
Merced when I was writing real code on DEC Alpha.  Alpha is dead for
marketing reasons, not technical ones.  Meanwhile, Itanium is still

AMD also offers more registers on their newest chips.  I'm too fazed to
remember the details right now; I do recall twice as many FPU registers.

Brand*n Van Every               S*attle, WA

Praise Be to the caml-list Bayesian filter! It blesseth
my postings, it is evil crap!  evil crap!  evil crap!

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: