Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: [Caml-list] How to write a CUDA kernel in ocaml?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Philippe Wang <philippe.wang.lists@g...>
Subject: Re: [Caml-list] How to write a CUDA kernel in ocaml?
On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural <examachine@gmail.com> wrote:

> One trivial and low-performance solution that comes to mind is: make
> an ocaml bytecode interpreter into a CUDA kernel and then pass the
> bytecode to it, and then voila, at least we have some 512-way
> parallelism on the GT300. How does that sound? We'd be losing some
> performance but massive parallelism will cover up for some of that.


With parallel processors, you move very quickly the performance
bottleneck from processor(s) to memory bandwidth, such that
- it's hell to program because you have to manage concurrency and it
has a real cost
- it's useful for very specific programs that have very few memory
access compared to processor computations (such as some compression
algorithms, a more specific and very easy to write example is matrix
multiplications).

Imagine you have 3000MHz for memory bandwidth, which is extremely good
today (I think). And imagine you have 100 processors that share this
memory bandwidth. If they all want to access memory at the same time,
even if you forget the concurrency management cost, you have
3000/100MHz/processor=30MHz/processor, which is very very very low. So
think about 10 processors instead of 100 to be more realistic, it's
still 300MHz/processor, which looks like what we had about a decade
ago...

(IMHO) A not-too-too-bad-but-still-realistic way to take benefit of
GPUs today, with OCaml (or any high-level language), is to write
computation functions in C (possibly with some assembly), and to write
composition functions in OCaml. Or (less realistic in a short amount
of time) maybe to write a compiler that may do the job for you, but
it's not quite easy...

Good luck,

-- 
Philippe Wang
   mail@philippewang.info