Re: convincing management to switch to Ocaml

From: Pierre Weis (Pierre.Weis@inria.fr)
Date: Tue Aug 31 1999 - 19:10:34 MET DST


From: Pierre Weis <Pierre.Weis@inria.fr>
Message-Id: <199908311710.TAA18381@pauillac.inria.fr>
Subject: Re: convincing management to switch to Ocaml
To: skaller@maxtal.com.au (John Skaller)
Date: Tue, 31 Aug 1999 19:10:34 +0200 (MET DST)
In-Reply-To: <3.0.6.32.19990831053557.0096ea40@mail.triode.net.au> from "John Skaller" at Aug 31, 99 05:35:57 am

> Sure, I will try. First note that my background is
> in procedural languages. So I am used to calling functions like:
>
> f(x,y)
>
> In ocaml, if I used curried functions, then
>
> f x+1 y
>
> is an error: I have to put brackets around the individual arguments
> instead of the argument list. Secondly, I have a math background,

So, we can consistently consider the mathematical notations as natural, and
use them in Caml :)

The problem in this example has nothing to do with currying, since you would the
same problem writing
        f x+1
instead of
        f (x+1)

Then, compare this expression with the mathematical meaning of
        sin x+1

versus
        sin (x+1)

As in Caml, sin x+1 means (sin x) + 1.

> where forward polish is right associative:
>
> f g x = f (g (x))
>
> but with curried functions,
>
> f g x = (f g) x
>
> and the _other_ style I've used in math is reverse polish, which
> is similar to the usual OO notation:
>
> x g f = x.g().f()
>
> and now ocaml is yet another convention.

Yes, and once more the Caml convention is the mathematical one.

If you read a succession of functions separated by spaces in a
mathematical book, you certainly will interpret it as an error,
instead of trying to interpret it as some strange stack notation!

For instance

        sin cos tan 0

will be seen as a meaningless sequence of symbols, unless you're
studying the mathematical theory of functions (namely the
lambda-calculus), where f g h means f (g h). Then, since Caml is a
functional language, it uses the mathematical notation for function
application borrowed from the mathematical theory of functions.

> As a usage example,
> I often have to write:
>
> print_endline (x ^ (string_of_int (f y)))
>
> whereas in a procedural language it would be
>
> print_endline(x ^ string_of_int(f(y)))

As a mathematician, and knowing the mathematical natural conventions
of precedences between function applications and operations, you write

       sin x + cos x

and not

       (sin (x)) + (cos (x))

(even if the second expression, overparenthesized, is mathematically accepted).

Hence, as you simply write

       x + sin (cos y)

instead of

       x + (sin (cos (y)))

you write in Caml

       x ^ string_of_int (f y)

instead of

       x ^ (string_of_int (f (y)))

(even if the second expression, overparenthesized, is camlly accepted).

> In other words, I'm used to function calls being syntactically
> atomic: the arguments are bracketed and can be 'ignored' visually,
> 'as if they were subscripts'

As in mathematics you can use this technique, and add parens to any
function calls if you want. As in

        sin (x)

instead of

        sin x

>
> print_endline (...)
> x ^ string_of_int (...)
> f (..)
> x
>
> In ocaml, the function being called is _inside_ the brackets,
> first on the left, instead of labelling the 'node'. So it
> has the same visual status as its arguments.

Normal for a functional programming language where functions can be
arguments of other functions, at any nested level. Completely normal
when functions can be stored into data structures as any other
values. In fact, we do want the functions to be regular objects of the
langage, not just ``things'' with special status that can only be
applied to something. So, you're perfectly right ``the function as the
same'' blabla ``as its argument'' is just what we want in Caml!
(including when ``blabla'' is visual status).

> When I write ocaml, I often do the following to get brackets right:
> I put the brackets in first:
>
> print_endline ()
> print_endline (x ^ ())
> print_endline (x ^ (string_of_int x))
>
> because I cannot count :-). This is a pain, because I have to
> keep backspacing. Even worse is: I have a string,
> and want to put an integer inside it:
>
> "xxxx"
> "xx""xx"
> "xx"^^"xx"
> "xx"^()^"xx"
> "xx"^(string_of_int x)^"xx"
>
> which is four editing steps, requiring 6 punctuators,
> and three backspaces. If the
> string was the argument to a print_endline function, I _also_
> have to put brackets around the whole string expression
> where none were previously needed:
>
> ("xx"^(string_of_int x)^"xx")
>
> This is particularly bad, since an insertion reqires
> non-local editing.

Once more this is a problem you already had in mathematics when you
change
         sin 0.5
into
         sin (x + 0.5)

And, as in maths, where you write

         x + sin x + x instead of x + (sin x) + x

you can write
         "xx" ^ string_of_int x ^ "xx" instead of "xx"^(string_of_int x)^"xx"

And, as in maths, you must add parens around the whole
expression to pass it as argument to a function. You write

        atan (x + sin x + x)

and not

        atan x + sin x + x

[...]
> This problem is exacerbated by the fact that a lot of
> ocaml is an expression, so you can do a lot of nesting.

As in maths ?

> The second problem I have is with the constructions that
> are like 'prefix/infix' operators, for example
>
> if .. then .. else ...
> try .. with ...
> match .. with | .. -> .. | .. -> ..
>
> Sudies were done in the 70's showing this is not as readable as
> constructions bounded lexically at both ends.
>
> Both Pascal and C have this fault, but only in a few constructions.
> These faults were corrected in Modula: IF .. THEN .. ELSE .. ENDIF
> It is a welcome relief to see while .. do .. done in ocaml:
> it is not necessary to remember precedence rules that span
> many lines of code.

You're right. You just have to learn the rules for these constructs
(more exactly for parts of them):

1) Any sequence in a arm of a conditional MUST be bracketed.
2) Any match or try in to a pattern matching clause MUST be bracketed.

> Similarly, using separators like pascal's ";" is not as good
> as terminators like C's ";". Clearly, these things are much harder
> to edit (i.e, cut and paste doesn't work with separators, especially
> ones that are visually terminators like ";" -- you can write
>
> a
> ;
> b
>
> but it looks horrid compared to:
>
> a;
> b
>
> and then
>
> a;
> b;
> is more 'symmetrical' visually.

You're right, that's why YOU CAN use the ; as terminator in Caml.

> For example, an expression like
>
> if x then y else z
>
> cannot be augmented to
>
> if x then y; y' else z
>
> instead, nonlocal editing is required:
>
> if x then begin y; y' end else z

Same problem as for function arguments: if you want to be able to add
stuff to a function argument, you just have to systematically add
parens to any function argument. If you want to do the same for
conditional, add preventive begin and end keywords to any arms of any
conditional.

> Similarly, nested matches/try expressions always require begin/end:
> unless the nesting is right at the end.
>
> A general rule for a good syntax would be that augmentation
> of some term of a construction only required local editing.
> This means:
>
> terminators, not separators
> mandatory begin/end keywords

This cannot be done elegantly with function application, except if we
use a lisp-like syntax.

> Of course, that would lead to a grossly OVER verbose syntax,
> so it is certainly necessary to have _some_ infix operators
> with precedences: a reasonable set is:
>
> unary +-
> binary */
> binary +-
> binary < > <= >= <> !=
> unary not
> binary and
> binary or
>
> plus at most two or three more. (the listed set is
> _already_ 7 levels of precedence!)
>
> Anything more than that will conflict with more or less universal
> conventions. For example, Python and C and Ocaml all have bitshift operations,
> with _different_ precedences which lead to a fault in
> a routine I had doing UTF8 conversions, it took an hour
> to figure out that I needed more brackets.

Bitwise operations in Caml are not operators with special precedences,
which I consider as a problem, and that why you had a problem, since
relative priority of land lor lsl and asr are all the same, which is
confusing!

> A good technique is probably to have a lot of non-associating
> operators, so that there is never any 'mental' ambiguity.
> For example, the power operator should be unassociative:
>
> a ** b ** c
>
> should be an error.

Difficult not to generalize to other operators ...

> I think there is a point here that, because I know a lot
> of programming languages, I get confused with the various
> precedence rules: I JUST DON'T WANT TO KNOW. So I program
> defensively: I put the brackets in to be sure of the semantics.
> This is useful for readers who are also not experts in the
> language as well.

This is legal to add parens almost everywhere in Caml. If you want to
program with a lot of parens, you can, and that's a safe style,
although I will not recommend it. As in maths, I will recommend to
know what you're writing when you are writing it.

As a mathematician, you are accustomed to more baroque and complex
conventions than those of Caml: prefix operators (primitive and
integral of functions, and usual functions application), postfix
operators (factorial, derivation), superscript (exponentiation),
prefix-postfix operator (2 numbers between big parens to express
binomial coefficients), division (one argument over, one argument
under a small horizontal line), infix operators (+, -, ...). The more
interesting are f' that is completely ambiguous or the invisible
operator * in xy ...

When designing Caml, we tried not to be that ambiguous, and to use the
conventions you already have learned in mathematics; hence the use of
parens in expression, the relative precedences of + and *, etc...

[...]
> and the same applies in ocaml. Indeed, [] is used in ocaml
> for lists as well:
>
> b [ a.[1] ]
>
> and here only the . tells that .[] is an infix/suffix operator,
> whereas b [] is a function with a list argument, so the
> [ in the argument is a prefix/suffix operator. :-)

In the same vein, would you complain about r.lab compared to r.(lab)
or r(lab) or even r lab ? or also module.f compared to Module.f ?

More generally, do you mean that if we change a single character in a
program the meaning should not change ? Or that any notation should
not be a prefix or a suffix of another one ?

There are many constructs to express, so we need many notations. We
try to reuse traditional notations if available, but from time to time
we have to invent a new one. In this case the users need to learn it
if they want to feel easy when using the language !

> Another visual problem I have is that I think symbols
> (i.e. punctuation marks) should bind more strongly
> than words: it looks ugly to put brackets around words:
>
> (fun x -> print_endline ..)
>
> is ugly compared to:
>
> begin fun x -> print_endline .. end

You can use this form if you want it.

> but the ideal would be
>
> fun x is print_endline .. efun

This is not light enough for a language that uses so many anonymous functions.
Once more generalizing it to efor, ewhile, elet, etry, ematch would
lead to a painful language.

[...]

> Of course, there are worse languages: the lambda calculus
> being a case that is only marginally better than C declarations
> in the competition for all time most unreadable languages :-)
> [The prize winner, however, is certainly category theory :-]

Wao! Too bad for us to have chosen the function application convention
from the lambda-calculus!
[Moreover, don't forget that the Caml effort has been started after
studies in category theory, witnessed in the name of the language
that is an acronyme for Categorical Abstract Machine Language!]

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:25 MET