Re: convincing management to switch to Ocaml

From: John Skaller (
Date: Mon Aug 30 1999 - 21:35:57 MET DST

Message-Id: <>
Date: Tue, 31 Aug 1999 05:35:57 +1000
To: Pierre Weis <>
From: John Skaller <>
Subject: Re: convincing management to switch to Ocaml
In-Reply-To: <>

At 10:02 30/08/99 +0200, Pierre Weis wrote:
>As part as a discussion O'Caml versus C++, John Skaller wrote:
>> OTOH, I find the ocaml precedence rules are a
>> real annoyance -- I can't remember them, and I find all the brackets
>> not only make code hard to read, they make it hard to write (for me).
>I don't want to start a flamewar on syntax,

        Of course, neither do I (want to start any kind of flamewar) ..

        So i should say that while I find the syntax annoying,
especially after using the nice clean Python syntax, it is a small
price for the powerful semantics.

>but as a computer science
>teacher and Caml implementor and designer, I'm interested at those
>facts you mentioned about the syntax of the language, since I just
>think the opposite way: I find the Caml precedence rules pretty
>convenient, easy to teach, and fairly easy to remember since
>absolutely intuitive and natural (provided they have been explained to
>you and you have understood the design ideas).
>So, there is something interesting to understand here, could you
>elaborate a bit on your difficulties on precedences and especially
>about ``all the brackets'' that make the code hard to read and hard to
>write ?

        Sure, I will try. First note that my background is
in procedural languages. So I am used to calling functions like:


In ocaml, if I used curried functions, then

        f x+1 y

is an error: I have to put brackets around the individual arguments
instead of the argument list. Secondly, I have a math background,
where forward polish is right associative:

        f g x = f (g (x))

but with curried functions,

        f g x = (f g) x

and the _other_ style I've used in math is reverse polish, which
is similar to the usual OO notation:

        x g f = x.g().f()

and now ocaml is yet another convention. As a usage example,
I often have to write:

        print_endline (x ^ (string_of_int (f y)))

whereas in a procedural language it would be

        print_endline(x ^ string_of_int(f(y)))

In other words, I'm used to function calls being syntactically
atomic: the arguments are bracketed and can be 'ignored' visually,
'as if they were subscripts'

        print_endline (...)
                x ^ string_of_int (...)
                        f (..)

In ocaml, the function being called is _inside_ the brackets,
first on the left, instead of labelling the 'node'. So it
has the same visual status as its arguments.

When I write ocaml, I often do the following to get brackets right:
I put the brackets in first:

        print_endline ()
        print_endline (x ^ ())
        print_endline (x ^ (string_of_int x))

because I cannot count :-) This is a pain, because I have to
keep backspacing. Even worse is: I have a string,
and want to put an integer inside it:

        "xx"^(string_of_int x)^"xx"

which is four editing steps, requiring 6 punctuators,
and three backspaces. If the
string was the argument to a print_endline function, I _also_
have to put brackets around the whole string expression
where none were previously needed:

        ("xx"^(string_of_int x)^"xx")

This is particularly bad, since an insertion reqires
non-local editing.

The other technique I use is to leave all the brackets out,
and then add them later -- get the semantics down first,
before I forget, then add the lexical stuff afterwards.

This problem is exacerbated by the fact that a lot of
ocaml is an expression, so you can do a lot of nesting.

The second problem I have is with the constructions that
are like 'prefix/infix' operators, for example

        if .. then .. else ...
        try .. with ...
        match .. with | .. -> .. | .. -> ..

Sudies were done in the 70's showing this is not as readable as
constructions bounded lexically at both ends.

Both Pascal and C have this fault, but only in a few constructions.
These faults were corrected in Modula: IF .. THEN .. ELSE .. ENDIF
It is a welcome relief to see while .. do .. done in ocaml:
it is not necessary to remember precedence rules that span
many lines of code.

Similarly, using separators like pascal's ";" is not as good
as terminators like C's ";". Clearly, these things are much harder
to edit (i.e, cut and paste doesn't work with separators, especially
ones that are visually terminators like ";" -- you can write


but it looks horrid compared to:


and then


is more 'symmetrical' visually.
For example, an expression like

        if x then y else z

cannot be augmented to

        if x then y; y' else z

instead, nonlocal editing is required:

        if x then begin y; y' end else z

Similarly, nested matches/try expressions always require begin/end:
unless the nesting is right at the end.

A general rule for a good syntax would be that augmentation
of some term of a construction only required local editing.
This means:

        terminators, not separators
        mandatory begin/end keywords

Of course, that would lead to a grossly OVER verbose syntax,
so it is certainly necessary to have _some_ infix operators
with precedences: a reasonable set is:

        unary +-
        binary */
        binary +-
        binary < > <= >= <> !=
        unary not
        binary and
        binary or

plus at most two or three more. (the listed set is
_already_ 7 levels of precedence!)

Anything more than that will conflict with more or less universal
conventions. For example, Python and C and Ocaml all have bitshift operations,
with _different_ precedences which lead to a fault in
a routine I had doing UTF8 conversions, it took an hour
to figure out that I needed more brackets.

A good technique is probably to have a lot of non-associating
operators, so that there is never any 'mental' ambiguity.
For example, the power operator should be unassociative:

        a ** b ** c

should be an error.

I think there is a point here that, because I know a lot
of programming languages, I get confused with the various
precedence rules: I JUST DON'T WANT TO KNOW. So I program
defensively: I put the brackets in to be sure of the semantics.
This is useful for readers who are also not experts in the
language as well.

It is even more confusing when there are prefix, infix,
and postfix operators: in C


is visually ambiguous: which is applied first, the * or the [1]?
I can never remember exactly, so I often add brackets:


and the same applies in ocaml. Indeed, [] is used in ocaml
for lists as well:

        b [ a.[1] ]

and here only the . tells that .[] is an infix/suffix operator,
whereas b [] is a function with a list argument, so the
[ in the argument is a prefix/suffix operator. :-)

Another visual problem I have is that I think symbols
(i.e. punctuation marks) should bind more strongly
than words: it looks ugly to put brackets around words:

        (fun x -> print_endline ..)

is ugly compared to:

        begin fun x -> print_endline .. end

but the ideal would be

        fun x is print_endline .. efun

because I can type it from left to write,
and augment it without non-local changes.

The point is if you see a complex expression with
lots of brackets and dots, it is hard to pick out what is
operating on what. In the above waffling, I have probably
contradicted myself several times: and that is probably
the point.

Of course, there are worse languages: the lambda calculus
being a case that is only marginally better than C declarations
in the competition for all time most unreadable languages :-)
[The prize winner, however, is certainly category theory :-]

John Skaller email:
                phone: 61-2-96600850
                snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia

This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:25 MET