Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Google summer of Code proposal
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2009-03-23 (19:32)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] Google summer of Code proposal
On Monday 23 March 2009 14:19:00 Xavier Leroy wrote:
> 3- A language implementation like OCaml breaks down in four big parts:
>         1- Front-end compiler
>         2- Back-end compiler and code emitter
>         3- Run-time system
>         4- OS interface
> Of these four, the back-end is not the biggest part nor the most
> complicated part.  LLVM gives you part 2, but you still need to
> consider the other 3 parts.  Saying "I'll do 1, 3 and 4 from scratch",
> Harrop-style, means a 5-year project.

On the contrary, my "style" was to provide the features that I value 
(multicore & FFI) in a usable form (stop-the-world) with the shortest 
possible development time (i.e. <<6 months to create something useful). 

1- Front-end compiler: use camlp4 to provide an embedded DSL for 
high-performance parallel numerics and/or reuse front-ends from existing 
compilers like OCaml, PolyML, MosML, NekoML to build completely new language 

2- Back-end compiler and code emitter: reuse LLVM.

3- Run-time system: write the simplest possible precise GC and use 
stop-the-world to apply it to threads that may then run in parallel.

4- OS interface: make it as easy as possible to call C directly.

HLVM had solved (2), (3) and (4) after only 3 months of part-time work. I 
vindicated my style!

> 7- To finish, I'll preventively deflect some likely reactions by Jon
> Harrop:
> "But you'll be tied to OCaml's data representation strategy!"
>    Right, but 1- implementing you own data representation strategy is
>    a lot of work, especially if it is type-based, and

Actually I found that easy, not least because I wanted a user-friendly FFI so 
I just used C's data representation whenever possible.

>    2- OCaml's strategy is close to optimal for symbolic computing.

Is MLton not several times faster than OCaml for symbolic computing?

> "But LLVM assembly is typed, so you must have types!"
>    Just use int32 or int64 as your universal type and cast to
>    appropriate pointer types in loads or stores.

That is entirely possible and could be useful as an incremental improvement to 
OCaml's existing bytecode interpreter but it is not a step toward my goals.

> "But your code will be tainted by OCaml's evil license!"
>    It is trivial to make ocamlopt dump Cmm code in a file or pipe.
>    (The -dcmm debug option already does this.)  Then, you can have a
>    separate, untainted program that reads the Cmm code and transforms it.

Again, that is another technically-feasible step away from my goals because 
OCaml's CMM has already been mangled for its data representation (e.g. 31-bit 
ints, boxed floats).

> "But shadow stacks are the only way to go for GC interface!"
>    No, it's probably the worst approach performance-wise; even a
>    conservative GC should work better.

Building a state-of-the-art optimized concurrent GC Leroy-style means an 
infinity-year project. =:-p

Seriously though, I think it is essential to get a first working version of a 
GC that permits parallel threads. HLVM will be useful to a lot of people long 
before its GC gets optimized.

Dr Jon Harrop, Flying Frog Consultancy Ltd.