Version française
Home     About     Download     Resources     Contact us    
Browse thread
Odd performance result with HLVM
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Kuba Ober <ober.14@o...>
Subject: Re: [Caml-list] Odd performance result with HLVM

On Mar 4, 2009, at 11:17 AM, Mikkel Fahnøe Jørgensen wrote:

> When looking at the benchmark game and other benchmarks I have seen, I
> noticed that Haskell is almost as fast as OCaml and sometimes faster.
> Some Lisp implementations are also pretty fast.
>
> However, when you look at memory consumption OCaml uses considerably
> less memory, except for languages in the C family.
>
> I suspect that many real world performance scenarios, such as heavily
> loaded web servers and complex simulations, depend very much on memory
> consumption. This is both because of GC overhead and because of the
> slower memory pipeline the more cache levels are involved.
>
> So in case of a new JIT solution for OCaml, I believe it is important
> to observe this aspect as well.

I believe it is also important not to dynamically allocate memory for no
good reason.

All of my realtime numerical code uses statically allocated memory with
overlaying based on execution flow of basic blocks. That has zero  
runtime
overhead: the produced machine code has fixed addresses for data
(not all data of course).

It reduces to whether a "basic block" can be re-entered from its  
future (downstream)
or not. If it can, you have to use stack or heap. If it won't, then  
you can do static
allocation. The potential cost is if given function is entered from  
many points.
At that point you can get some overhead since the overlaying has to take
into account all possible ways the code is reached. This can be  
mitigated
by generating more than one copy of the function. It makes sense when  
you
have some free code ROM, but your RAM is almost full.

This of course can only be done when you do whole-project compilation.  
If you
compile "modules" separately, you have to fall back on doing it in the  
linker,
where all you have is the function call graph and available  
granularity is much
worse, at bigger RAM overhead. The code ROM overhead is then none  
since linker
can hardly generate copies of functions; at the point where you copy  
functions
you may as well do other optimizations, so linker is way too late to  
do that
efficiently.

There's no reason not to use those techniques in code that runs on  
"large"
platforms. It'd at least artificially boost some benchmark results ;)

Cheers, Kuba