Re: Data structures in ocaml

From: Markus Mottl (mottl@miss.wu-wien.ac.at)
Date: Sun Oct 10 1999 - 01:17:37 MET DST


From: Markus Mottl <mottl@miss.wu-wien.ac.at>
Message-Id: <199910092217.AAA27378@miss.wu-wien.ac.at>
Subject: Re: Data structures in ocaml
To: skaller@maxtal.com.au (skaller)
Date: Sun, 10 Oct 1999 00:17:37 +0100 (MET DST)
In-Reply-To: <37FE327A.ED47D428@maxtal.com.au> from "skaller" at Oct 9, 99 04:05:46 am

> You see, it is NOT possible to implement the
> 'give_room' function. Instead, you must provide special
> code in EVERY function of the module to do the
> reallocation, and you must put dummy values at the end
> of the array, which will not be collected. This code
> must be special purpose for each function, to find
> a suitable dummy value: if there is no such value,
> the array can be set to zero length: it CAN be done,
> but it is a mess, it is inefficent, and it can
> waste a lot of storage, particularly if,
> as in my implementation, I chose to leave 'old' values
> lying around in the unused part instead of resetting
> them all to a single (shared) dummy value.

My solution in the (soon to be released) resizable array module is as
follows (I hope I haven't overlooked any problem with this approach):

Elements of arrays which are not int- or float-arrays are always boxed.
If we do not want to waste space in the "free" part of the array, we
will have to initialize it with a special value. We could have the user
pass one to us, but he would have to create it and this can cost memory
if this value is very space-hungry and it is also not very elegant.

Thus, we would like to have a default value for all kinds of types which
does not need much space and needs not be known to the user.

If I have understood the garbage collector correctly, it does not know
anything about things going on in the type system, but it knows how
to recognize the basic types (atoms). If we take a very "small" atom
(e.g. an integer) and "cast" it to the required type (using Obj.magic),
we can cheat and make the compiler believe that we have a correct value
of the type of the array elements.

The garbage collector on the other hand does not care about the "true"
type of the array elements. Because they are boxed (otherwise the GC
wouldn't scan through the array), it will believe that there are boxed
integers in the "free" slots (more precisely, it's always the same
integer which is "contained" in the box).

With this trick we can initialize the array and (very important) have
removed elements be reclaimed when we overwrite their "box" so that it
points to the integer.

> Yes, and this requires getting a dummy value to initialise
> the whole array with, followed by immediately overwriting
> that dummy value with the actual data, and thus is twice
> as slow as C.

In the case of unboxed arrays (the same can be said about strings), we
need not do this "overwriting", because the values will not be reclaimed,
anyway (wouldn't make much sense). Thus, for these kinds of "contiguous
memory" it is ok to simply adjust the index to the last element.

For strings, there is a create-function that doesn't initialize the
memory. If there were a similar function for int- and float-arrays, I
could greatly improve the speed of array creation in my module and (very
important) for reallocation if the array is being resized. Maybe I will
supply a C-function that allocates such arrays without initialization. But
I will rather focus on functionality first...

Regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:26 MET