Re: new library modules (sets and maps)

Xavier Leroy (xavier@Theory.Stanford.EDU)
Wed, 5 May 1993 17:37:01 -0700 (PDT)

From: Xavier Leroy <xavier@Theory.Stanford.EDU>
Message-Id: <9305060037.AA03305@Tamuz.Stanford.EDU>
Subject: Re: new library modules (sets and maps)
To: remy@research.att.com (Didier Remy)
Date: Wed, 5 May 1993 17:37:01 -0700 (PDT)
In-Reply-To: <m0nqlad-000CmwC@hunny.research.att.com> from "Didier Remy" at May 5, 93 11:40:00 am

Didier Remy writes:

> Of course, the typechecker cannot tell whether you are using two different
> orderings on the same map.

Yes, it can. With a decent type abstraction mechanism, as in Quest or SML.
Come on, let's be honest and tell the truth. This is the one example
where I really miss a more powerful module system than Caml Light's.

> Then you could dynamically check whether two sets sets are
> originated from the same package. Except that you cannot generate stamps in
> a purely applicative style!

We're programming in ML. Who cares about purely applicative style?
Anyway, dynamic checks are evil, generally speaking, and in the
particular case of sets and maps, speed is absolutely critical.
Otherwise, we could just as well stick to the naive implementation of
sets as lists without duplicates.

> Using an object oriented style, you could consider the functions on sets as
> methods, and pack them with each set (instead of the stamp).

Please, no object-oriented nonsense. This is a matter of type
abstraction. I strongly doubt objects will help clarify anything here.

More to the point, the solution to which you finally arrive (storing
the ordering function in each set, and give it as argument to
set__empty), which is also the solution proposed by Damien Doligez,
solves the problem for functions that take only one set argument, but
fail on functions such as "union", which take two. In general you
can't decide whether the two orderings are the same, though a == test
would probably work fine in practice. Also, storing the ordering at
the top of the set makes all set operations slightly more expensive
(all set constructions involve an extra "cons"). Storing the ordering
in the closures of the operations is cheaper.

> Of course,
> there is an ambiguity for functions working on several sets. The functions
> could be written to work with two orderings, one for each set

I take this situation to be a serious programming error. Programmers
who dare to order one type in two different ways, then mix sets
inconsistently deserve to burn in hell. Besides, there is no easy way
to handle this situation with balanced tree implementations, you
basically have to sort and rebuild entirely one of the trees.

- Xavier