Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Comparison of OCaml and MLton for numerics
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-06-04 (14:04)
From: Mike Furr <furr@c...>
Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics
Jon Harrop wrote:
> Actually, while we're here. I've long thought that the stdlib should provide 
> abstract implementations of concrete data structures like RB trees and AVL 
> trees, and functorize the Set and Map modules over the tree type they use. 
> This would let people add new abstract data structures (I like purely 
> functional sequences based on AVL trees) built upon solid concrete data 
> structures from the stdlib rather than cutting and pasting code (one of the 
> more embarassing OCaml FAQs).

My OCaml summer project to build an improved data structure library will 
(hopefully) address many of the issues you have raised.  Although I'm 
just getting started, I have already started using your suggestion of 
including a 1-element constructor for my trees as it does indeed seem to 
give a noticeable speedup.

> Making this feasible by optimizing away the abstractions requires more than 
> just defunctorizing though. You need to partially specialize by type, as 
> Markus says. You also need to do whole-program transformations to flatten 
> data structures. For example, a set would require:
>   Node of 'a t * 'a * 'a t * height
> and a sequence would require:
>   Node of 'a t * 'a * 'a t * height * size
> So a generic OCaml solution would add an indirection to the metadata that 
> could be flattened out.

Indeed this would be preferable.  Right now, my tree implementations 
include a generic tree mixin with the type constructor:

    Node of ('data,'inv) t * 'data * ('data,'inv) t * 'inv

forcing all invariant information into the last cell.  Implementations 
that require more than a single value here must use an extra level of 
indirection (so in your example, 'inv = height * size).  I don't think 
this will be too bad for performance since these values only need to be 
retrieved for re-balancing leaving read-only operations unaffected. 
However, this has the huge benefit of allowing me to only write *one* 
version of find, min/max, fold, iter, etc. for all of the trees I 
implement, which in and of itself is a big win. :)