Version française
Home     About     Download     Resources     Contact us    
Browse thread
OCamlJIT2 vs. OCamlJIT
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jonathandeanharrop@g...>
Subject: RE: ocamlopt LLVM support (Was: [Caml-list] OCamlJIT2 vs. OCamlJIT)
Benedikt wrote:
> On Dec 3, 2010, at 21:07 , Jon Harrop wrote:
> > Benedikt wrote:
> >> So, let's take that LLVM thing to a somewhat more serious level
> (which
> >> means no more arguing with toy projects that simply ignore the
> >> difficult stuff).
> >
> > You can learn a lot from prototypes.
> 
> Indeed, but you need stay on topic. HLVM does not prototype the use of
> LLVM within OCaml, because you're using a quite different data
> representation.

My point was that HLVM's data representation is far more flexible than
ocamlopt's, so you can use it to measure the effect of changing data
representations on the efficiency of the code LLVM generates more easily
than with ocamlopt.

In particular, I am saying that (from my own measurements) LLVM does not
cope with data representations like ocamlopt's at all well. Specifically,
when you box and cast away type information. My guess is that retargeting
ocamlopt to LLVM IR (which is a considerable undertaking due to the amount
of hacking required on LLVM itself) would result in more performance
degradation that improvement. Moreover, that degradation is most likely to
occur on code like Coq because ocamlopt has been so heavily optimized for
that one application domain and, consequently, I doubt your hard work would
ever be accepted into mainstream OCaml.

Ultimately, LLVM was built specifically to exploit a typed intermediate
representation whereas ocamlopt removes type information very early.

> Using a different data representation within OCaml would indeed be
> possible, but one has to ask what it's worth? You'd get better floating
> point performance (most likely)

And faster tuples, ints, chars, complex numbers, low-dimensional
vectors/matrices, hash tables and so on. More types (e.g. int16 and
float32). Even arbitrary-precision arithmetic can benefit by keeping small
numbers unboxed when possible. Bigarrays disappear. The FFI gets simpler,
easier to use and more powerful (e.g. you can generate AoS layouts for
OpenGL). The benefits are enormous in the general case but that is beside
the point here. Moreover, this would never be accepted either because it
would degrade the performance of applications like Coq. If it is done at
all, such work must be kept separate from the OCaml compilers.

> and possibly also gain from a few LLVM optimization passes.

I found that LLVM's optimization passes are of little value in this context.
You want to do the optimization passes on the OCaml side (where it is vastly
easier, if nothing else) and focus on emitting high quality LLVM IR for
LLVM's code gen to work with.

> But on the other hand:
> 
> 1. You will have to rewrite not only the compiler, the standard library
> and the bytecode interpreter (already a massive amount of work), but
> you also loose compatibility with almost every existing OCaml library
> binding, since the native function interface will be different. That's
> a hell of a lot of work for many people.

Yes. That would be revolution and not evolution. I cannot see any easy way
around that and also advise against it.

> 2. The current data representation is already close to optimal for
> symbolic computation, so there's is not much to gain here. The most
> complex applications use OCaml for symbolic computation (i.e. Coq), so
> they will not benefit (if at all, see 3.).

Yes.

> 3. The current garbage collector is mostly straight-forward because of
> the data representation. No need to carry around/load type information,

Note that conveying run-time type information also facilitates useful
features like generic printing.

> you just need the following bits of information: Is it a block in the
> heap? Does it contain pointers? If so how many? All this information is
> immediately available with the current data representation (also
> improves cache locality of the GC loops).

That information is also immediately available with HLVM's design. Each
value contains a pointer to its run-time type. Each run-time type contains a
pointer to a "visit" function for the GC that returns the references in any
value of that type.

> Even if the current GC is
> replaced with something more recent (which I'm sure is worth it), the
> new GC will also be easy to implement because of the data
> representation.

I'm not sure that would be any easier than HLVM's design.

> 4. Marshalling/unmarshalling will also be touched (and probably become
> more complex), loosing even more compatibility (less of an issue than
> the FFI, but nevertheless something to take care of).

Yes. I'd be amazed if anyone cared about that though.

> So, it is really worth to spend years on a new data representation
> (including fixing all C bindings, etc.)? Just to get better floating
> point performance? Integer performance will be almost the same,
> avoiding the shift/tag bit just replaces an "addq r1, r2; subq $1, r2"
> with "addq r1, r2"; even doing this thousands of times will not cause a
> noticeable difference.

On the contrary, there are many example of huge performance gains other than
floating point. The int-based random number generator from the SciMark2
benchmark is 6x faster with LLVM than with OCaml. Generic hash tables are
17x faster in F# than Java because of value types. And so on.

> I already mentioned a few simple ways to improve floating point
> performance w/o the need to throw away most of the code base. I.e. just
> compile (tail-)recursive floating point functions in a way that avoids
> the boxing for the (tail-)calls. If this seems too complex, you could
> use an even simpler trick: Just compile every floating point function
> twice, one "generic version", which accepts "value"s (for use in
> closures, on parameter position, etc.), and one "float version" which
> takes non-boxed doubles. Calling the appropriate version is trivial,
> just pass the type information down to the Cmm level. That way
> everything will continue to work, no need to mess up everything. And
> you can also apply your LLVM optimizations.

Yes.

> >> 3. A better LLVM. If we do it right, other's implementing functional
> >> languages using LLVM will not be faced with the same problems again.
> >> Dunno if that's possible.
> >
> > What exactly are you thinking of that goes beyond what the other
> functional
> > languages that already target LLVM have already had to implement?
> 
> Two main areas for now: The GC interface and the exception handling.
> LLVM's exception support is really limited; the GC support is better
> and more generic. I don't know how to implement the OCaml exception
> model within LLVM w/o adding a lot of special case stuff to LLVM itself
> (mentioned in my original post); if there would be a way to easily
> extend LLVM with special exception models, other projects could plug in
> their models.
> 
> For the GC interface. From what I know right now, the "ocaml" GC plugin
> generates the frametables and the special symbols. But several details
> are left to LLVM, which don't seem to be handled properly: For example,
> there does not seem to be a way to tell LLVM how to spill/reload; in
> case of OCaml the stack cells marked as gcroots must not contain
> "untagged integers". I'm still trying to figure out what are the exact
> lowering mechanisms involved, so it may already be possible - we shall
> see.
> 
> The other thing I noticed so far: LLVM does not seem to eliminate
> stores to gcroots (probably required for Java, dunno). In case of
> ocamlopt, this would generate a lot of unnecessary stores. ocamlopt
> avoids this because it has knowledge not available to LLVM, i.e. when
> calling a native C floating point primitive (sin, cos, ...), ocamlopt
> knows that i.e. %r12 is preserved and the primitive does not allocate
> or raise so no need to spill the value to the stack cell, LLVM does not
> know this (and there's no way to tell, yet) so it has to spill and
> reload all (possible) gcroots.
> 
> You can think of various similar examples, where ocamlopt generates
> better code, simply because it "knows" OCaml (and even tho it lacks
> complex data/control analysis), whereas LLVM is generic (not
> necessarily a bad thing) but lacks ways to "learn" OCaml (or other
> language specific stuff). Adding more ways to tell LLVM about certain
> constraints will also be beneficial to other projects (not only
> functional programming languages...).

Yes. I suspect the effort required to bring such features up-to-par with
ocamlopt will be pretty huge as well though.

Cheers,
Jon.