Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
More cores
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-12-24 (16:43)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] More Caml
On Wednesday 24 December 2008 13:12:58 Mikkel Fahnøe Jørgensen wrote:
> 2008/12/23 Jon Harrop <>:
> > symbolics. I am guessing the performance of allocation will be degraded
> > 10-100x but many allocations can be removed. This leaves me wondering how
> > much slowdown is acceptable without deterring a lot of users?
> I think this would be major problem. A great advantage of OCaml is
> that you can do things you would not normally do in C/C++ for
> performance reasons, like mapping a list.

Yes. The counter argument is that people wanting performance should not be 
using long linked lists in the first place. I think that is actually quite a 
compelling argument: it is more important to remove barriers to potential 
performance than it is to make everything fast.

Indeed, that is one of the strongest reasons to choose OCaml over the 
alternatives: you can write very fast code in OCaml without having to drop to 
C. In contrast, Haskell compilers do a lot of generic optimizations like 
deforesting that help in unimportant cases but place a low performance 
ceiling on fast code so you are forced to drop to another language for speed.

> I believe it is desirable to split allocation into pools dedicated to
> each physical thread.

Absolutely. However, that is also extremely complicated and I just cannot see 
myself implementing that any time soon. A parallel GC using a shadow stack 
seems feasible and should still be a productive improvement. Symbolic code 
will really suffer but numeric code will really gain.

> Memory barriers are really expensive, such as used for atomic increment, not
> to mention locking. 

Particularly when mutating pointers to reference types in the heap. However, 
reading and writing value types in the heap should still be fast, e.g. 
parallel numerical methods.

> Small block 
> allocation could be done in relatively small heap segments where each
> physical thread locks a larger heap only when it needs a new segment,
> but not while allocating within a segment. This can further be
> improved by adding NUMA support and scope based allocation (arena). If
> full segments are marked non-mutable where possible, it is probably
> possible to reduce thread blocking during collection. Garbage
> collection could be limited to processing full segments. Probably not
> trivial though.

Indeed. There are also a variety of type-directed approaches that may be 
useful given the information available to the JIT. Not to mention that I'm 
expecting to JIT the GC code for each type as well... :-)

Incidentally, the availability of fences as intrinsics in LLVM makes it sound 
ideal for implementing GCs using wait-free methods. If I can get the basics 
up and running then it may be best for me to focus on making this VM an easy 
target for GC researchers to implement new algorithms.

> How do you think of adding an Erlang style threading model?

That is effectively what F#'s asynchronous workflows aim to achieve. However, 
I currently have no use for it so I only desire that its existence does not 
limit the capabilities that I am interested in (parallelism).

> Speaking of JIT compiler, would eval make any sense?

Yes, no reason why not. Perhaps the same infrastructure used to type 
specialize definitions as they are JIT compiled can be used for other kinds 
of partial specialization, like low-dimensional vector/matrix operations.

Dr Jon Harrop, Flying Frog Consultancy Ltd.