Version française
Home     About     Download     Resources     Contact us    
Browse thread
How to do this properly with OCaml?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] How to do this properly with OCaml?
On Sun, 2005-07-24 at 16:23 -0700, Stephane Glondu wrote:
> On Sunday 24 July 2005 14:55, skaller wrote:
> > If you have to initialise the store, it is expensive,
> 
> I understand your point now. However, if you want your array to hold 
> allocated values, you will always have to initialise the array. 

Not if the collector is modified by INRIA to support variable
length arrays: this is some kind of tagged block with
a mutable length count which limits how far along the block
the collector looks for boxes: the length count is less
than or equal to the actual number of allocated slots.

> Moreover, 
> I think the overhead of this initialisation is insignificant compared to 
> the GC overhead.

I am not so sure: to initialise an array causes a store
in each slot, which forces the whole array to scroll
through your cache: if the array is big this means
disk paging/swapping activity.

Why do all that when the values will never be read?
We do not even know in advance how much of the
array will actually be used. Consider an array
of length 10,000 elements, where only 100 are used.
We allocate space for 10,000 because we're conservative
and want the program to run on many sized data sets.

BTW: this is a *real* issue an Ocaml user had.

Ok, so my argument is flawed as follows: the whole
point of a variable length array is to keep the size
of the array roughly the size of the data it actually
holds: when extending, we're already initialising
the new array with data from the old one, and the
overhead initialising the rest is only going
to double the time taken.. unless you happen
to hit a cache size boundary, when it could
take 10-100 times longer than necessary.

So -- you could indeed be correct, but it isn't
entirely clear that gratuitous writes are not
going to impact performance.


> > and if you have to provide a dummy value it is hard
> > to use.
> 
> Maybe hard is not the appropriate word. I would just say annoying. You can 
> always define easily a dummy value when you define a (non-empty) type.

Think of a class type, whose constructor function itself 
takes other class values as arguments .. it can be quite 
complicated to set up such values, and, worse, 
Ocaml being imperative doing so may have side-effects.

This isn't far fetched at all -- it happened to me :)

That is why I implemented a variable length array,
to get around this obstacle. I used the 'one element
is taken as a dummy value' technique (no Obj.magic!).

However that turned what in C is a trivial set of
library functions into a complex unreliable mess
and left me wondering whether the encoding was
properly transparent.

> I didn't mean to change the dummy value all the time! Just take the first 
> value as the dummy value, and keep it even if the first entry in the array 
> changes. This will work fine for small values. 

But if you do that you have a reference to an object
that the client does not know about. So the client
will be confused if the object isn't finalised,
and that in turn could cause many other objects
to remain alive unexpectedly. Even if there are
no side-effects on finalisation, the extra memory
use could be a problem.

> If you are planning to put 
> huge values in the array,

Not just huge values: values that contain references
to other values .. such as in a graph .. occupy
a lot of memory. Much of that data structure may be
shared, but we normally expect unreferenced parts
of the graph to die, and release memory.

> > That won't happen in Buffer because you can't modify it,
> > but it must be allowed in more general variable length
> > mutable array.
> 
> What do you mean?

The current implementation makes buffers extensible
but immutable. However, for a general variable length
array you would like to not only modify elements,
but insert and delete as well.

Without this additional requirement, using the
'first' element as a dummy value would work fine:
that is, you are correct 'a array option idea
for Buffer would be make efficient elimination
of the 'dummy value' requirement possible (although
not eliminating the need to initialise the array).

But for a true variable length array, and without
gratuitously hanging on to an unused value,
you'd need each store to the first element to
also store in every unused slot.

-- 
John Skaller <skaller at users dot sourceforge dot net>