Version française
Home     About     Download     Resources     Contact us    
Browse thread
MetaOcaml and high-performance [was: AST versus Ocaml]
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: oleg@o...
Subject: MetaOcaml and high-performance [was: AST versus Ocaml]

Jon Harrop wrote:
> I did some experiments with MetaOCaml. Firstly, it is x86 only and not x64
> which means poor floating-point performance, many times slower than HLVM's.
> The language itself is also very restrictive, e.g. you cannot generate
> pattern matches dynamically so you cannot leverage the pattern match
> compiler, which is a huge loss. In essence, effective use of MetaOCaml
> requires writing in continuation passing style which OCaml and, therefore,
> MetaOCaml do not handle well (closure invocations are slow in OCaml and CPS
> is not optimized back into anything sane). So I do not consider MetaOCaml to
> be a viable option for performance users.

A few clarifications seems to be in order. First of all, the original
poster asked about _offshoring_ with MetaOCaml. When the generated
code is run with offshoring, a C of Fortran file is created, which can
be compiled with your favorite compiler and dynamically linked back
into the running OCaml program. Alternatively, you can use the
generated C/Fortran code as it is, as a part of a C/Fortran project. We
did exactly the latter in our FFT project: we used MetaOCaml to create C
files for FFT kernels, and plugged the files into the FFTW benchmarking
framework, which is pure C. It worked as expected.

Because offshoring produces a portable C or Fortran code file, you can
use the code on 32 or 64-bit platform. The reason the native MetaOCaml
without offshoring does not work on amd64 is because at that time
OCaml didn't emit PIC code for amd64. So, dynamic linking was
impossible. That problem has long been fixed in later versions of
OCaml. Offshoring is a good way to get around it, thanks Jan Kybic.

Offshoring could just as well produce Verilog or LLVM code. Alas,
we didn't get around to exploring that idea.

Regarding continuation-passing style (CPS): if your code generatOR
needs let-insertion, then the generatOR _may_ need to be encoded in
CPS. The generatED code is _not_ in CPS; it is in the `conventional',
so-called `direct style'. Even if we assume that CPS code is
difficult to compile efficiently (which is not certain: in CPS, all
`important' function calls are tail-calls, and OCaml is very good with
tail-calls), that difficulty affects only the code generator rather
than the generated code. Most of the time it is the generated code
that is performance-critical.

One may say that writing the generator in CPS is a bother. I have
heard such objections. Please see our PEPM2009 paper about a way to
address it
	http://okmij.org/ftp/Computation/Generative.html#circle-shift
and write generators in the conventional style. Please see the example
of writing a flexible Gaussian Elimination code, paying no penalty for
abstractions. 

Fortunately, some people have considered MetaOCaml to be a viable
option for performance users and have reported good results. For
example,

	Tuning MetaOCaml Programs for High Performance
	Diploma Thesis of Tobias Langhammer.
	http://www.infosun.fmi.uni-passau.de/cl/arbeiten/Langhammer.pdf

Here is a good quotation from the Introduction:

``This thesis proposes MetaOCaml for enriching the domain of high-performance
computing by multi-staged programming. MetaOCaml extends the OCaml
language.
...
    Benchmarks for all presented implementations confirm that the
execution time can be reduced significantly by high-level
optimizations. Some MetaOCaml programs even run as fast as respective
C implementations. Furthermore, in situations where optimizations in
pure MetaOCaml are limited, computation hotspots can be explicitly or
implicitly exported to C. This combination of high-level and low-level
techniques allows optimizations which cannot be obtained in pure C
without enormous effort.''