English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
The OCaml Summer Project
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-01-26 (10:52)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] The OCaml Summer Project
On Thursday 25 January 2007 16:52, Yaron M. Minsky wrote:
> I am pleased to announce the OCaml Summer Project.

Sounds like an excellent idea and the projects all look fascinating. However, 
I do have some comments on the "Binary tree library" project:

OCaml currently has two separate implementations of AVL trees in Map and Set 
functors. Set already has fast union and split operations.

Having two separate implementations is wasteful but more efficient. The 
underlying tree code could be factored out into another functor but this is 
costly in terms of performance. Also, the OCaml stdlib has used an odd choice 
of optimisations: inlining height calculation (which is quite a small benefit 
in the context of functors and polymorphism) but not amortising single-child 
trees into a separate constructor (which can remove up to 50% of the GC's 
effort). So the code can be made shorter and faster.

I've already implemented my own AVL set using the node-specialisation trick. 
Performance is ~30% faster, IIRC. I've also wanted to write a functional 
array based on AVL trees (O(log n) lookup but fast sub, append, insert, 
delete etc.) and a camlp4 extension to support pattern matching over this 
type. Lists and arrays are rather priviledged containers in OCaml, having 
pattern matching and literals, but trees are better in many respects and 
would make an excellent general-purpose container.

Finally, having to use functors does obfuscate OCaml code that deal with Sets 
and Maps in many cases, particularly because there are no built-in Int and 
Float modules so you must write your own. I often find that this superfluous 
code is as long as all of the code using the Sets/Maps. Although it would 
be "dangerous", Sets and Maps implemented without functors are much easier to 
use. After all, Hashtbl is typically used in that way.

It is also worth noting that several people (Diego, Jean-Christophe) have 
written other tree libraries using various data structures (RB, trie, splay 
etc.). As far as I can tell, AVL trees are a good all-rounder.

Best of luck with the projects!

Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists