Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Suggestion about balanced trees in stdlib
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <info@g...>
Subject: Re: [Caml-list] Suggestion about balanced trees in stdlib

On 2002.05.11 00:31 Stefano Zacchiroli wrote:
> On Fri, May 10, 2002 at 06:59:54PM +0200, Gerd Stolpmann wrote:
> > Of course, Set and Map would recur to that more special module.
> > 
> > Balanced_tree would have the representation of Map, i.e. the elements are
> > pairs of keys and attached values. To emulate Set, the value () is used.
> 
> I don't know how the compiler treat unit values wrt optimization or
> such, but using an unit in each node to emulate Set seems to me
> inefficient for a module of the standard library.
> 
> I think, but I haven't thought a lot about it, that the problem can be
> solved using an implementation polymorphic in the leaf type hidden to
> the use that access Set and Map through functors as usual.

type 'a map = Empty | Node of 'a map * key * 'a * 'a map * int

type set = Empty | Node of set * elt * set * int

The difference is that every map node has 6 words, and every set node
consists of 5 words. My suggestion is to prefer the map representation,
and define set = unit map, wasting one word per node, and making sets
20% larger.

The other possibility would be to prefer the set representation, e.g.

type 'a baltree = Empty | Node of 'a baltree * 'a elt * 'a baltree * int

and to reduce map and set with some functor tricks to this representation.
Now sets are not larger, but maps require 8 words of memory, because
you have to use an additional pair 'a elt = ('a, map_elt), i.e. maps are
33% larger. I think this is the worse choice.

BTW, if memory consumption matters, the current representation can be still
optimized by removing the balance factor, and encoding it in the variant
tag:

type set = Empty | Node_0 of set * elt * set | Node_l of <same> | Node_r of <same>

Of course, the readability of the code would suffer, but one can argue that 
the overall performance is more important than code beauty for the standard 
library. However, the point is that it is a matter of discussion what is
acceptable in the stdlib, because there are a lot of views to this central
library. I would prefer to pay an increased memory consumption of 20% for
sets, and to get the possibility to use balanced trees directly for that
costs. Maybe other people have different views on this.

> Anyway the idea seems really good.
> 
> I also suggest to add different kind of visit (i.e. not only iter) like
> {pre,post,in}visit via different functions or specifing the desired
> behaviour when using the functor or both.

Why not export functions that access the subtrees directly:
left_subtree, right_subtree, top_element.

Gerd
-- 
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 45             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
Germany                     
----------------------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners