Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: [Caml-list] How to write a CUDA kernel in ocaml?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Eray Ozkural <examachine@g...>
Subject: Re: [Caml-list] How to write a CUDA kernel in ocaml?
On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdegård <mattias@virtutech.se> wrote:
>>And trampolines to eliminate tail calls that cannot be eliminated using goto.
>>However, trampolines are ~10x slower than TCO in the code gen.
>
> With some care, gcc's sibcall mechanism can be exploited. For example,
> by having one standard signature for all generated C functions, and
> taking care not to pass pointers to variables in the caller's stack
> frame. This should give fairly good performance (better than
> trampolines anyway), at the cost of portability (but gcc is good at
> that). It would give full TCO, even across compilation units. It
> should work well with a Cheney-on-the-MTA-style GC, too.
>
> How suitable it is depends on the reason why compilation to C is done in
> the first place. It might be one of:
>
> 1) portability to odd platforms with semi-decent performance (ie,
>   better than interpreted bytecode)
> 2) a simple target for maintaining bootstrapping capability for the
>   compiler (but bytecode works well for this too)
> 3) simpler (?) interfacing to libraries in C etc
> 4) flat-out maximum performance by exploiting the optimisations that
>   modern C compilers are capable of
>
> Of course, these days we have llvm which has a lot going for it.

Well, the original question was to be able to use the CUDA or OpenCL
compiler on that generated C code.

Possible or impossible? :)

One trivial and low-performance solution that comes to mind is: make
an ocaml bytecode interpreter into a CUDA kernel and then pass the
bytecode to it, and then voila, at least we have some 512-way
parallelism on the GT300. How does that sound? We'd be losing some
performance but massive parallelism will cover up for some of that.

Best,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct