Browse thread
Odd performance result with HLVM
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Kuba Ober <ober.14@o...> |
| Subject: | Re: [Caml-list] Odd performance result with HLVM |
On Mar 4, 2009, at 11:17 AM, Mikkel Fahnøe Jørgensen wrote: > When looking at the benchmark game and other benchmarks I have seen, I > noticed that Haskell is almost as fast as OCaml and sometimes faster. > Some Lisp implementations are also pretty fast. > > However, when you look at memory consumption OCaml uses considerably > less memory, except for languages in the C family. > > I suspect that many real world performance scenarios, such as heavily > loaded web servers and complex simulations, depend very much on memory > consumption. This is both because of GC overhead and because of the > slower memory pipeline the more cache levels are involved. > > So in case of a new JIT solution for OCaml, I believe it is important > to observe this aspect as well. I believe it is also important not to dynamically allocate memory for no good reason. All of my realtime numerical code uses statically allocated memory with overlaying based on execution flow of basic blocks. That has zero runtime overhead: the produced machine code has fixed addresses for data (not all data of course). It reduces to whether a "basic block" can be re-entered from its future (downstream) or not. If it can, you have to use stack or heap. If it won't, then you can do static allocation. The potential cost is if given function is entered from many points. At that point you can get some overhead since the overlaying has to take into account all possible ways the code is reached. This can be mitigated by generating more than one copy of the function. It makes sense when you have some free code ROM, but your RAM is almost full. This of course can only be done when you do whole-project compilation. If you compile "modules" separately, you have to fall back on doing it in the linker, where all you have is the function call graph and available granularity is much worse, at bigger RAM overhead. The code ROM overhead is then none since linker can hardly generate copies of functions; at the point where you copy functions you may as well do other optimizations, so linker is way too late to do that efficiently. There's no reason not to use those techniques in code that runs on "large" platforms. It'd at least artificially boost some benchmark results ;) Cheers, Kuba