Version franēaise
Home     About     Download     Resources     Contact us    
Browse thread
Help with simple ocaml memoization problem
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jon@f...>
Subject: Re: [Caml-list] Help with simple ocaml memoization problem
On Thursday 29 November 2007 08:58, Jean-Christophe Filliātre wrote:
> Jon Harrop wrote:
> > The Map implementation in the OCaml stdlib is also quite inefficient. I
> > did a little benchmark once and discovered that Maps actually waste more
> > space than Hashtbls.
>
> I find it unfair to compare an imperative and a persistence data
> structure for performances.

I agree.

> Of course you are going to use some extra 
> space if you need to keep old versions of the data stuctures valid.
> But you are sharing *a lot* among the various versions. So if you are
> manipulating several sets/maps with common ancestors at the same time,
> you are saving memory w.r.t. other data structures such as hash tables.

True, my benchmark was a drop-in replacement with no sharing.

> Of course, if you are using a single data structure, in a linear way,
> then yes a hash table is probably more efficient (provided you have a
> good hash function, which is not always easy to achieve).
>
> Regarding implementation of ocaml maps, I wouldn't say that it is
> inefficient: I did my own benchmarls (on sets, but this is the same
> code) and found that ocaml AVLs are really efficient, on the contrary.
> It usually beats other implementations (e.g. red-black trees from the
> SML stdlib), or even specialized structures such as Patricia trees (when
> keys are integers) on some operations.

I found that by manually unrolling with a Node1 constructor for single-element 
nodes you can reduce GC load and increase performance by ~30%.

Perhaps "badly optimized" would have been a better phrase. For example, the 
Map implementation in the OCaml stdlib manually inlines the height function 
even thought it makes relatively little difference to performance: GC load is 
the real killer for most immutable data structures.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e