Re: Structure sharing

Xavier Leroy (xavier@Theory.Stanford.EDU)
Wed, 16 Mar 1994 12:26:25 -0800 (PST)

From: Xavier Leroy <xavier@Theory.Stanford.EDU>
Message-Id: <9403162026.AA22491@Tamuz.Stanford.EDU>
Subject: Re: Structure sharing
To: John.Harrison@cl.cam.ac.uk (John Harrison)
Date: Wed, 16 Mar 1994 12:26:25 -0800 (PST)
In-Reply-To: <"swan.cl.cam.:120340:940316170214"@cl.cam.ac.uk> from "John Harrison" at Mar 16, 94 05:01:12 pm

To complement Pierre's answer:

> But [hash-consing] might make it much more usable on really small
> systems, which is consonant with the CAML Light ideology.

Generally speaking, the design of Caml Light goes a long way to get
compact code, but does not try very hard to represent data compactly.
For instance, cons cells are 3 words instead of 2. In practice, the
only system where more compact data representations would make a big
difference is the 8086 PC, with its 640k limit; as soon as, say, 2M
are available (which is the case on all Macintoshes and PCs nowadays),
the current representations seems to be unproblematic.

> Alternatively, how well would it work if I used the "hashtbl" library
> and did everything myself for the datatypes I'm interested in? Am I
> making things non GC-able by putting them in a hash table?

Yes. By programming directly in C and using special features of the
current garbage collector (finalized objects and the fact that objects
allocated in the old generation are never relocated), you could get a
more efficient implementation of hash-consing that does not prevent
garbage collection. That's definitely non-trivial.

> How do the hash tables in "hashtbl" work exactly?

The only bit of magic is the hashing primitive, which walks the data
structure representing any Caml Light value (in the same way as
generic structural equality) and computes a hash value based on the
nodes encountered. Apart from this, it's standard dynamic hashing.

- Xavier Leroy