Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Estimating the size of the ocaml community
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2005-02-04 (12:51)
From: Gerd Stolpmann <info@g...>
Subject: Re: [Caml-list] Estimating the size of the ocaml community

Richard Jones said:
> I've been thinking about this a bit more, and I'm not sure I
> understand why the tag needs to be so large.  If you look at the "tag
> space" now, it's something like this:
> 0	used for tuples, arrays, records
> 1-251	used for constructors (eg. Some, None)
> 252	marks strings
> 253 	marks floats
> 254	marks float arrays
> 255	marks structures with custom ops (lots of stuff, like Int32.t)
> It's not clear to me why so much "tag space" is used for constructed
> values, at the same time limiting you to around 250 different
> constructors in a type definition.  Couldn't the constructor number be
> encoded in the first field in the value (obviously shifting all the
> subsequent fields along one, and making constructed values 4 bytes
> larger)?  Then the tag could be reduced to a few bits, making strings a
> few orders larger.

Sure, this is an option, but I think the price is quite high. A number of
frequently used data structures would need a lot more memory, and would
become slower. For example lists. Currently, every list element needs
three words (header + contents + reference to next element). If the
variant tag is not part of the header, another word is needed = 33 %
more space. There are similar effects for many basic data
structures, including options, trees and hash tables.

>From a practical point of view, the non-polymorphic variants are still
part of the language because they are more efficient than the polymorphic
counterparts (which could fully replace the former, and for which there
is no limit for the number of constructors).

If one wanted really to improve the situation, it would be better to change
the representation of strings, and to encode the string length in the
first word after the header. This wastes much less memory (except in the
special case when you have many small strings), and is probably neutral
for the execution speed.

Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany