Browse thread
More cores
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jon Harrop <jon@f...> |
| Subject: | Re: [Caml-list] More Caml |
On Wednesday 24 December 2008 13:12:58 Mikkel Fahnøe Jørgensen wrote: > 2008/12/23 Jon Harrop <jon@ffconsultancy.com>: > > symbolics. I am guessing the performance of allocation will be degraded > > 10-100x but many allocations can be removed. This leaves me wondering how > > much slowdown is acceptable without deterring a lot of users? > > I think this would be major problem. A great advantage of OCaml is > that you can do things you would not normally do in C/C++ for > performance reasons, like mapping a list. Yes. The counter argument is that people wanting performance should not be using long linked lists in the first place. I think that is actually quite a compelling argument: it is more important to remove barriers to potential performance than it is to make everything fast. Indeed, that is one of the strongest reasons to choose OCaml over the alternatives: you can write very fast code in OCaml without having to drop to C. In contrast, Haskell compilers do a lot of generic optimizations like deforesting that help in unimportant cases but place a low performance ceiling on fast code so you are forced to drop to another language for speed. > I believe it is desirable to split allocation into pools dedicated to > each physical thread. Absolutely. However, that is also extremely complicated and I just cannot see myself implementing that any time soon. A parallel GC using a shadow stack seems feasible and should still be a productive improvement. Symbolic code will really suffer but numeric code will really gain. > Memory barriers are really expensive, such as used for atomic increment, not > to mention locking. Particularly when mutating pointers to reference types in the heap. However, reading and writing value types in the heap should still be fast, e.g. parallel numerical methods. > Small block > allocation could be done in relatively small heap segments where each > physical thread locks a larger heap only when it needs a new segment, > but not while allocating within a segment. This can further be > improved by adding NUMA support and scope based allocation (arena). If > full segments are marked non-mutable where possible, it is probably > possible to reduce thread blocking during collection. Garbage > collection could be limited to processing full segments. Probably not > trivial though. Indeed. There are also a variety of type-directed approaches that may be useful given the information available to the JIT. Not to mention that I'm expecting to JIT the GC code for each type as well... :-) Incidentally, the availability of fences as intrinsics in LLVM makes it sound ideal for implementing GCs using wait-free methods. If I can get the basics up and running then it may be best for me to focus on making this VM an easy target for GC researchers to implement new algorithms. > How do you think of adding an Erlang style threading model? That is effectively what F#'s asynchronous workflows aim to achieve. However, I currently have no use for it so I only desire that its existence does not limit the capabilities that I am interested in (parallelism). > Speaking of JIT compiler, would eval make any sense? Yes, no reason why not. Perhaps the same infrastructure used to type specialize definitions as they are JIT compiled can be used for other kinds of partial specialization, like low-dimensional vector/matrix operations. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e