Re: sets
 Xavier Leroy
[
Home
]
[ Index:
by date

by threads
]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
Date:   (:) 
From:  Xavier Leroy <xavier@T...> 
Subject:  Re: sets 
In reply to Pierre Weis' message: > As Xavier pointed out, a naive implementation based on lists is very > easy to implement. Its drawback is evidently efficiency (O(n^2) for > basic operations union and intersection). But as far as I know, it is > the only one which is absolutely compatible with ML polymorphism: you > get a truly polymorphic empty set and a truly polymorphic function to > add an element to a set. I disagree with this point. The listbased implementation of sets assumes that you're either satisfied with the existing polymorphic equality primitive, or able to define a reasonable polymorphic equality function incorporating userdefined equality functions at some types. In this case, it is easy to extend the polymorphic equality function to a polymorphic comparison function (just do some kind of lexicographic ordering over records and sums). Then, we can have balanced tree implementations of sets, with operations in O(log n) or O(n log n) instead of O(n) and O(n^2). This implementation would be as "truly polymorphic" as the one based on lists. The problem with both lists and trees is: we don't know how to build a good polymorphic equality/ordering function. > If we agree that a library implementing sets must be very efficient or > useless, then the problem becomes very hard. The reason is that there > exists many clever ways to represent sets, each one adapted to a particular > usage of sets and to different underlying type. You set out to do something very ambitious here: define a polymorphic data structure with nonuniform representations. Sets are not the only data structure that could benefit from this technique: for instance, arrays of characters and arrays of floatingpoint numbers could be represented more compactly than generic arrays. But I still need to be convinced we need to go that far. Generic arrays are a little bit wasteful, but this seems tolerable in practice. Similarly, I think generic sets implemented as balanced trees would also be tolerable in many situations. Granted, the union of two binary trees is never going to be as fast as an "or" operations between two integers. Do we really need this efficiency? In SETL, the answer is certainly "yes", because sets are the main data structure. But this is not so in ML.  Xavier