Re: Stdlib regularity

From: skaller (skaller@maxtal.com.au)
Date: Sun Oct 10 1999 - 07:38:24 MET DST


Date: Sun, 10 Oct 1999 15:38:24 +1000
From: skaller <skaller@maxtal.com.au>
To: Francois Rouaix <frouaix@mail.liquidmarket.com>
Subject: Re: Stdlib regularity

Francois Rouaix wrote:
>
> <skaller@maxtal.com.au> writes:
>
> skaller> There are several cases where the core language prevents
> skaller> this, because it lacks functionality available in C++: the
> skaller> ability to create uninitialised values, and the ability to
> skaller> destroy them are two that I've become aware of trying to build
> skaller> a variable length array module.
>
> Uninitialized values are easily implemented with the 'a option type.
> Of course, the code is then ugly, because you have to match your
> values everywhere to None | Some x. In C/C++ terms, this forces you
> to check for NULL pointers systematically, which is a Really Good Thing.

        There is a difference: Some/None must be matched EVERY use.
Null pointers only need to be checked where they can occur.
In the case of an array of variable length, with extra uninitialised
slots, the Some/None typing of each element is extra overhead that
isn't required if the accessing codes are correct, it is only
necessary to check if the index is in range.

> Adding uninitialised values is a major source of bugs, and it's kind
> of natural to pay the price for it in the readability of the source,
> if you want your code to be robust.

        No. This isn't necesarily the case. There are other
solutions. In C++, the philosophy is to provide as much protection
as possible, without costing at run time, and with protection
that does cost at run time being optional.

        For example, the designer of a class can determine
whether or not use of a variable of the class requires initialisation,
whether a default value makes sense, etc. [in this case, the client
of the class cannot override the designers decisions].
 
        To put this another way: static typing, and other safety guarrantees,
are intrinsically limited in what can be expressed: it cannot prevent
all run time
errors, and it may _exclude_ correct cases as well. So it is always a
compromise.
It makes sense to improve these systems to be more expressive, and
provide
even better guarrantees, but there will (almost always) be a need to
escape
the system when the programmer knows better.

        In C++, such escapes are provided, for example with
new style casts, but in a way which discourages use, rather than
outright prevention. In ocaml, Obj.magic does some of the same
kind of things, I believe, and there is always the ability to
write C, or even patch the compiler.

        In general, I really think it is necessary to provide
a 'low level unsafe' interface in a language, which at least
will not be used by accident, and where deliberate uses are at least
easy to find. These low level features are not needed where reasonable
and complete safe solutions are available: for example, it has been
proved that 'goto' is not required in a language with a few sensible
looping constructions, and so one can argue this feature is not
required.

        I think you can argue that a powerful enough functional
language does _not_ require uninitialised values (that is,
there is always an efficient way to initialise variables),
but that isn't the case in a procedural language. Because
referential transparency is lost, compilers cannot reason as
easily about program structure, and hence cannot optimise away
useless dummy initialisations, so efficiency is lost.

        
> Builtin arrays require you to provide a initialization value, even for
> zero-length array. Why don't you carry the same requirement to your
> extensible arrays, and simply use a polymorphic type:
>
> type 'a earray = {
> mutable current : 'a array;
> mutable used : int;
> }
>
> let create n i = { current = Array.create n i; used = n }

        Sure, but that only works for the 'create' method.
In other cases, like, concatenation, the solution requires
testing if the two arrays to be concatenated are zero length,
if so creating a zero length array using [| |], otherwise
finding an object from one of the two arrays that contains one,
to initialise the result array.

        It is always possible to do this, but it is messy,
and it _requires_ that the physical length of an array used to
hold no elements be zero in cases when there is no available
'dummy' value to fill the unused slots.
 
> And then, if you want to have the equivalent of NULL pointers, use None,
> and option types everywhere.

        Ocaml arrays are already slower than the C ones Python uses,
and I am trying to match Python performance as closely as possible.

> It seems to me that you are trying to force the language to do something
> it has been purposely designed against. I'm not sure you can win this fight.

        I am not trying to do this per se, since I believe, more or less,
that the design principles are good. Rather, I would seek a 'proper'
solution,
in terms of these principles, believing one should exist, since the
principles
surely do not deliberately try to make inefficient code. This is why I
proposed
a categorical 'Initial' value, since this should fit well into the
category
theoretic type system framework. By having a special initial value,
which can be used to initialise anything, and testing for it in the
'safe'
versions of the generated code (and eliding the test in the unsafe
ones),
there may be a well typed solution, and a possibility the compiler can
reason about the code enough to optimise even the safe code.

        I happen to 'know' that this is the state of the art in case
of array bounds checking: in Modula code, apparently, 70% of the
mandatory
array bounds checks can be elided by adding suitable reasoning
algorithms
to the code.

        I do not make a numerical claim about the case of uninitialised
values, although I note that some C compilers including gcc can issue
warnings in some cases when it appears an uninitialised value is used.

        I think what I am saying, is that this kind of solution
probably _is_ within the spirit of ocaml. My particular suggestion
may be no good. Perhaps there is a better one?

        For example, a standard variable length array module would
solve the problem, but only in that special case. Is there a more
general, if not complete, solution, that is not as risky as the one I
propose?

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:26 MET