Browse thread
Re: [Caml-list] How to write a CUDA kernel in ocaml?
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Eray Ozkural <examachine@g...> |
| Subject: | Re: [Caml-list] How to write a CUDA kernel in ocaml? |
On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdegård <mattias@virtutech.se> wrote: >>And trampolines to eliminate tail calls that cannot be eliminated using goto. >>However, trampolines are ~10x slower than TCO in the code gen. > > With some care, gcc's sibcall mechanism can be exploited. For example, > by having one standard signature for all generated C functions, and > taking care not to pass pointers to variables in the caller's stack > frame. This should give fairly good performance (better than > trampolines anyway), at the cost of portability (but gcc is good at > that). It would give full TCO, even across compilation units. It > should work well with a Cheney-on-the-MTA-style GC, too. > > How suitable it is depends on the reason why compilation to C is done in > the first place. It might be one of: > > 1) portability to odd platforms with semi-decent performance (ie, > better than interpreted bytecode) > 2) a simple target for maintaining bootstrapping capability for the > compiler (but bytecode works well for this too) > 3) simpler (?) interfacing to libraries in C etc > 4) flat-out maximum performance by exploiting the optimisations that > modern C compilers are capable of > > Of course, these days we have llvm which has a lot going for it. Well, the original question was to be able to use the CUDA or OpenCL compiler on that generated C code. Possible or impossible? :) One trivial and low-performance solution that comes to mind is: make an ocaml bytecode interpreter into a CUDA kernel and then pass the bytecode to it, and then voila, at least we have some 512-way parallelism on the GT300. How does that sound? We'd be losing some performance but massive parallelism will cover up for some of that. Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct