Browse thread
Re: [Caml-list] How to write a CUDA kernel in ocaml?
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Philippe Wang <philippe.wang.lists@g...> |
| Subject: | Re: [Caml-list] How to write a CUDA kernel in ocaml? |
On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural <examachine@gmail.com> wrote: > One trivial and low-performance solution that comes to mind is: make > an ocaml bytecode interpreter into a CUDA kernel and then pass the > bytecode to it, and then voila, at least we have some 512-way > parallelism on the GT300. How does that sound? We'd be losing some > performance but massive parallelism will cover up for some of that. With parallel processors, you move very quickly the performance bottleneck from processor(s) to memory bandwidth, such that - it's hell to program because you have to manage concurrency and it has a real cost - it's useful for very specific programs that have very few memory access compared to processor computations (such as some compression algorithms, a more specific and very easy to write example is matrix multiplications). Imagine you have 3000MHz for memory bandwidth, which is extremely good today (I think). And imagine you have 100 processors that share this memory bandwidth. If they all want to access memory at the same time, even if you forget the concurrency management cost, you have 3000/100MHz/processor=30MHz/processor, which is very very very low. So think about 10 processors instead of 100 to be more realistic, it's still 300MHz/processor, which looks like what we had about a decade ago... (IMHO) A not-too-too-bad-but-still-realistic way to take benefit of GPUs today, with OCaml (or any high-level language), is to write computation functions in C (possibly with some assembly), and to write composition functions in OCaml. Or (less realistic in a short amount of time) maybe to write a compiler that may do the job for you, but it's not quite easy... Good luck, -- Philippe Wang mail@philippewang.info