Re: OCaml et l'évaluation paresseuse

From: Pierre Weis (Pierre.Weis@inria.fr)
Date: Mon Oct 04 1999 - 00:53:12 MET DST


From: Pierre Weis <Pierre.Weis@inria.fr>
Message-Id: <199910032253.AAA19376@pauillac.inria.fr>
Subject: Re: OCaml et l'évaluation paresseuse
To: williamc@dai.ed.ac.uk (William Chesters)
Date: Mon, 4 Oct 1999 00:53:12 +0200 (MET DST)
In-Reply-To: <14326.22713.12780.605064@beertje.william.bogus> from "William Chesters" at Oct 2, 99 09:38:05 pm

 In this long message I explain that lazy evaluation is appealing but
has also some drawbacks. Those drawbacks are so severe that we cannot
use lazy evaluation as the default evaluation regime in Caml.

> David Jemmi writes:
> > Pourquoi OCaml (ou caml) n'utilise pas l'évaluation paresseuse
> par défaut ??

 As William Chesters wrote, lazy evaluation is not at all easy to
implement efficiently, and in some case it is not easy to implement
correctly (for instance the lazy pattern matching algorithm is so
strange and difficult that they are seldomly (if ever) implemented in
lazy compilers). In most cases, lazy languages are slower than their
strict equivalent and their memory footprints are much larger (due to
suspensions for optimising lazy compilers or due to graph reduction
for other kind of implementation).

Semantics advantages of lazy evaluation:
----------------------------------------

 Lazy evaluation has many theoretical and pratical advantages: the
programmer has not to worry about the order of evaluation, he also has
not not worry about what computation are useless and can be removed
from the program: if an expression is useless to compute the final
result it will not be evaluated at all, whereas in a strict context
the same program may loop for ever or fail during the computation of a
useless expression. For the same reason, in a lazy language the
recursive definition of values is sound and easy. Ocaml definitions of
recursive values are much more restrictive, since no function calls may
appear in the right-hand-side of a recursive value definition.

 Also noticeable in lazy languages, the reasoning about programs is
somewhat easier, since the model is more mathematical, one can base
the proofs on general laws of operators, and thus programs are
amenable to simpler proofs than loops invariants in imperative programs.

Practical disadvantages of lazy evaluation:
-------------------------------------------

 The complexity of algorithms is difficult to study and difficult to
prove in lazy languages, since it is not easy to predict what will be
evaluated. Not only the complexity is difficult, but also the design
of lazy algorithms having the same complexity as their strict
imperative counterparts is often extremely hard and still an active
research field.

 The most important drawback of lazy evaluation appears when adding
imperative features to the language (side-effects, exception handling,
I/Os): since the order of evaluation is basically impredictable, it is
extremely difficult to perform side-effecting commands in a lazy
language.

 Since imperative side-effects are mandatory to interact with the
external world, the lazy languages implementors worked hard to add
side-effect in lazy contexts (hence the ``monades'' theory that is
discussed elsewhere in this mailing list). Unfortunately, side-effects
are not smoothly integrated with lazy evaluation: for instance, adding
printing orders in a program is not as simple as in Caml (since the
insertion of printing orders can change the semantic of the lazy
program, forcing some useless evaluations to output the arguments of
the printing command); also the debugging of lazy programs is
absolutely non-trivial, since even the notion of debugger for lazy
programs is not clearly defined (it is still an active research area).

The choice for Caml:
--------------------

 In Caml, the strict evaluation regime preveals for a lot of practical
reasons. With strict evaluation, we can provide
 -- a true replay debugger and debugging printf command insertion in programs
 -- the easy implementation of classical algorithms, directly from the
algorithmic hand-books
 -- a good support for the imperative programming paradigm that is easy to
understand and easy to learn (or supposed to be)

In addition, strict evaluation permits a somewhat more standard and
simpler compiler technology (no need for strictness analysis for instance).

Furthermore, if we adopt strict evaluation, we can easily implement
some kind of lazy evaluation on top of it (hence the Lazy module of
the standard library of objecti ve Caml). (The call-by-need evaluation
strategy just requires a few more assignments). Compared to lazy
languages, lazy evaluation in Caml is still a bit more difficult: lazy
program parts are not perfectly integrated to the rest of the
language; there is no automatic forcing of thunks during the access
nor automatic handling of lazy force during pattern matching; lazy
recursive definitions of values is possible but cumbersome.

The Caml's design hypothesis is thus that lazy evaluation is not very
often necessary, unless when infinite data structures are
required. (That's why the lazy thunks in Caml are created inside data
structures, not as part of a function call.)
Hence the default strict evaluation regime.

In contrast, lazy languages made the opposite choices and the status
of imperative features is upside-down: strict evaluation is supposed
to be exceptional, so in those rare cases the imperative commands are
encapsulated into monades that ensure strict evaluation.

 We claim that the Caml compromise is the most practical available
today, since it is the only one that can support a debugger and the
easy interaction with the outside world, and those two features are
simply mandatory for a general purpose language. Admittedly, the ideal
language would be a perfect integration of lazy and strict features,
but this is out of reach for the time being (in the current state of
the computer science knowledge). We try to approach this ideal
language by providing some support for lazy evaluation in our strict
language. In he future, the support for lazy evaluation in Caml could
be enforced (for instance, the force orders can be automatically
inserted during pattern matching by the compiler, or the definitions
of recursive values can be compiled using lazy evaluation).

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/

PS: William Chesters wrote:

> The basic point is that a strict language like ocaml is just a
> fancy version of assembler or C---a function call really is a CALL
> instruction (and a higher order call is a "call *%ecx"). Since the
> computational model presented to the programmer is essentially just
> that of the underlying CPU, it's easy to write fast code. The only
> concession made to "abstraction" is the garbage collector.

No, Caml is NOT ``just a fancy version of assembler or C''. Caml
provides much more abstraction than C (pattern matching, complex data
type definitions and automatic allocation, closures, first class
citizen functions, exceptions, ...). No, ``a function call really is''
NOT ``a CALL instruction (and a higher order call is a "call
*%ecx")''. That's what happens when the function is trivial and when
the compiler does a good job, but very often a function call generates
much more stuff than a single assembler instruction.

Yes, the computational model in Caml is the strict evaluation, and you
can freely use the imperative style (loops and assignment) if you
want or need to; in this sense Caml can be used as a ``fancy C'', since C
programs get a simple Caml translation. But in Caml, you can also use a
completely different programming style, namely the functional style,
based on computations in the mathematical sense, and using functions
and functional data structures in a way that is just out of reach of a
C program.



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:25 MET