Version française
Home     About     Download     Resources     Contact us    
Browse thread
Comparison of OCaml and MLton for numerics
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics
On Friday 01 June 2007 17:14:36 Markus Mottl wrote:
> Absolutely!  E.g. we had to specialize hash tables for integer and
> string keys, because the generic implementation calls a function for
> each key comparison rather than generating specialized code for e.g.
> integer comparisons.  This has a noticable impact in production
> systems.

Sets are another example. In that case, function call in trivial comparison is 
a side effect of using functors.

> ...
> I'd surely be happy to see the addition of some (optional)
> higher-level code transformations to OCaml.  Not just inlining, maybe
> some partial evaluation of the resulting code, which could also reduce
> code size if the compiler can prove that certain branches will not be
> taken.

The stdlib has a lot of scope for optimization. The Set implementation 
currently does not specialize 1-element nodes. Doing this improves 
performance by ~30%, partly by relieving the GC and partly by avoiding 
branches in common cases. I believe some compilers automate this optimization 
(trimming the bottom layer from trees).

Non tail-recursive functions in the stdlib is another example. You can easily 
get quadratic instead of linear behaviour without realising. Hash tables can 
have big hidden performance costs, especially for soft real-time work.

Actually, while we're here. I've long thought that the stdlib should provide 
abstract implementations of concrete data structures like RB trees and AVL 
trees, and functorize the Set and Map modules over the tree type they use. 
This would let people add new abstract data structures (I like purely 
functional sequences based on AVL trees) built upon solid concrete data 
structures from the stdlib rather than cutting and pasting code (one of the 
more embarassing OCaml FAQs).

Making this feasible by optimizing away the abstractions requires more than 
just defunctorizing though. You need to partially specialize by type, as 
Markus says. You also need to do whole-program transformations to flatten 
data structures. For example, a set would require:

  Node of 'a t * 'a * 'a t * height

and a sequence would require:

  Node of 'a t * 'a * 'a t * height * size

So a generic OCaml solution would add an indirection to the metadata that 
could be flattened out.

Not being hardcore enough to tinker with the OCaml compiler itself, I'd write 
an OCaml program that generated OCaml implementations of data structures with 
the necessary specialization. Indeed, I've already done this for 
low-dimensional vectors and matrices, but doing it for trees would be more 

Just out of curiosity, how many of the optimizations being discussed can be 
done with camlp4?

Anyway, we should try to build a coherent list of optimizations we'd like and 
then try to prioritize them.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists