Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Estimating the size of the ocaml community
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2005-02-04 (12:46)
From: Marcin 'Qrczak' Kowalczyk <qrczak@k...>
Subject: Re: [Caml-list] Estimating the size of the ocaml community
Richard Jones <> writes:

> 0	used for tuples, arrays, records
> 1-251	used for constructors (eg. Some, None)
> 252	marks strings
> 253 	marks floats
> 254	marks float arrays
> 255	marks structures with custom ops (lots of stuff, like Int32.t)
> It's not clear to me why so much "tag space" is used for constructed
> values, at the same time limiting you to around 250 different
> constructors in a type definition.  Couldn't the constructor number be
> encoded in the first field in the value (obviously shifting all the
> subsequent fields along one, and making constructed values 4 bytes
> larger)?  Then the tag could be reduced to a few bits, making strings
> a few orders larger.

If all constructed values were larger, it would be a noticeable
overhead. For example a list of ints would take 4/3 times more space
than currently.

IMHO a better design would distinguish fixed size blocks from variable
length blocks and enlarge only the latter. For example:
- fixed size blocks: 15 bits of the tag, 15 bits of the size
  (assuming a 32-bit platform)
- variable size blocks (arrays and strings): a special value in the
  header, the actual length is in the next field

As for unifying strings with char arrays, it would make all array
accesses slower. They already have a performance hit because of float
arrays; in particular almost all functions from the Array module make
a conditional jump for float arrays in their inner loops.

IMHO it would be better if float arrays were a separate type, and the
type system would somehow allow writing code which is generic wrt. the
type of the sequence. Then more kinds of arrays could be added without
increasing the overhead for plain arrays, e.g. bool arrays as packed

   __("<         Marcin Kowalczyk