Previous Up Next
Chapter 5 The revised syntax
The revised syntax is an alternative syntax for OCaml. Its purposes are 1/ fix some problems of the normal syntax (unclosed constructions sometimes introducing ambiguities, constructors arity, end of top level phrases and structure items, etc) 2/ avoid unjustified double constructions (":=" vs ``<-'', ``fun'' vs ``function'', ``begin..end'' vs parentheses) or concepts (types and types declarations) 3/ bring some ideas (lists, types). In a word, propose a syntax which be more logical, simpler, more consistent and easier to parse and to pretty print.

The revised syntax, being few used, is less constrained by the history than the normal one, and can try to answer the question: ``how things should be done'' instead of ``how to remain compatible with old versions''.

Other motivations are: 1/ show that syntax is just a ``shell'' of the language: you can change it without modifying the background 2/ experiment right to the end the ability of Camlp4 of doing syntax extensions.

It is a syntax of the complete language, therefore it can be used for all OCaml programs: by the way, Camlp4 is itself completely written in that syntax. Notice that it is not a constraint: it is always possible to convert from and to the normal syntax, using the pretty print facilities of Camlp4.

Remark: syntax in programming languages is much a question of personal taste. This syntax represents mine, with some ideas taken here and there. Some choices may seem arbitrary (other solutions are possible), but I tried to keep some consistency, and without being too far from the normal syntax: I guess that it is possible to understand a program written in revised syntax even without having read this chapter.

Most of the constructions in revised syntax are therefore the same than in the normal syntax. This chapter presents only the differences, and the motivations of them.

The quotations for OCaml syntax trees, which we shall see in next chapter, use the revised syntax.

5.1 Practical points

To compile the file foo.ml written in revised syntax, use:
            $ ocamlc -pp camlp4r foo.ml
To use the revised syntax in the toplevel, do:
            $ ocaml
            #load "camlp4r.cma";;
5.2 Phrases
Motivation of the simple semicolon

The double semicolon in OCaml exists for historical reasons: the first parsers were driven by the tokens, not by the rules: all constructions needed to have a specific token.

But because of the introduction of modules in OCaml, the double semicolon, which was mandatory in Caml Light to end sentences, became optional: the reason is that in OCaml, a ``phrase'' and a ``structure item'' are actually the same notion. The problem is that the double semicolon is associated with the idea of ``terminating'' something: for a phrase, it is exact, but not for a structure item inside a structure, since other structure items and the keyword ``end'' follow.

That choice of letting the double semicolon be optional in normal syntax has introduced several problems:
My opinion is that the structure items should end with a token in a context where there is never need to read another token. This ensures a correct behavior in the interactive toplevel. The fact that the sequence is closed, in the revised syntax, frees the simple semicolon. And a simple semicolon is perfectly acceptable inside structures and objects, to end their item, the same way they close a record item. In the revised syntax, this ending semicolon is mandatory.

It is easier to treat a language whose all phrases end with a token: at end of the sentences, the characters and the tokens streams are synchronized (no need to read an extra token to be sure that the phrase is ended). This property can bring simplifications in other treatments (extraction of comments or code for documentation, indentation, editors modes, interactive tools).

Motivation of ``value''

The choice of having a different keyword
value instead of let, for a toplevel value definition, is to mark the difference with the let..in construct. At toplevel, to see if it is a let or or let..in, we have to look at the end of the let binding.

In the abstract syntax tree,
let and let...in are very different: they do not even have the same type: let is a structure item, while let...in is an expression. This deserves to be more visible in the concrete syntax.

Why not
val instead of value? It is to be coherent with the other declarations type and exception, which are not abbreviations: we don't write typ for type declarations, nor exc for exception declarations.

5.3 Imperative constructions
Motivation of ``do'' and braces

First, the sequence needed to be closed. For the reason of the previous section (toplevel phrases), but also because there are too many ambiguities with other constructions. For example in the list:

     [ a; b; c ]
We know that it is the list of a, b and c. But it could be interpreted as a list one element, the sequence "a; b; c". In the grammar, it supposes that list items are not ``top'' expressions (expressions of the first level of the ``expr'' grammar entry): it is mandatory to use things like ``expression-1'' or ``simple expression'' in the grammar.

In revised syntax, this case never occurs: when a rule needs an expression, it always uses the top level of the ``expr'' entry. The grammar is then simpler and easier to read and understand.

The choice of
"do" followed by braces has something arbitrary. However, the keyword "do" let us easily think of something imperative (not functional). And the braces remind the sequence in the C language.

Why not
do..done? Question of taste. It could have been do..done. The idea is to remain relatively discrete. And the proposed construction saves a keyword.

Note that a
let...in in the sequence applies up to the end of the sequence, like in normal syntax. However, in normal syntax, because of the fact that the sequence is an opened construction, you can obtain strange results. In the example:

     if condition then
        a-simple-statement;
     statement-2;
     statement-3;
Let us suppose that you need to add a let binding for the ``simple statement'': if you just add it, this is what you see:

     if condition then
        let v = expr in
        a-simple-statement;
     statement-2;
     statement-3;
But what you get is actually:

     if condition then
        let v = expr in
        a-simple-statement;
        statement-2;
        statement-3;
The let has ``absorbed'' the rest of the sequence, which is now included in the if condition. To be correct, you need to add an enclosing begin..end or parentheses.

5.4 Tuples and lists
Motivation to close the tuples by parentheses

In mathematics, tuples are always between parentheses.

Moreover, it is in a general policy of the revised syntax: close more constructions: it is easier to read and don't need to learn certain subtle precedences levels.

Motivation for the syntax of lists

In revised syntax, the lists are always closed. Be a ``cons''
[a :: b] or an enumeration of all items [a; b; c], we always know syntactically where a list starts and when it ends.

This syntax have something similar of the lists in Lisp: the brackets are like the parentheses, the semicolons are like the spaces and the double colon is like the dot.

Moreover, the syntax:

     [ x; y; z :: t ]
is more understandable and more logical than the equivalent in normal syntax:

     x :: y :: z :: t
Indeed, reading it in normal syntax, the types are not clear: x, y and z are not of same type than t, we have to remember that this double colon is right associative, which is generally not natural. In revised syntax, x, y, and z are at the same level (separated by semicolons), different from the one of t (separated from the rest by the double colon).

In revised syntax, it is clear that
x, y and z are the first items of the list, because the syntax is identical when the list is ended by a ``cons'' and when it is not, what is not the case in normal syntax.

5.5 Irrefutable patterns

There is a notion of ``irrefutable patterns'' used by some syntactic constructions (next sections). Matching against these patterns never fails. An ``irrefutable pattern'' is either: Note that the term ``irrefutable'' does not apply to all patterns which never fail: constructors alone in their type definition, except ``()'', are not said ``irrefutable'' (the fact that they be alone or not cannot be determined at parsing time).

5.6 Constructions with matching
Motivation for one alone keyword ``fun''

The presence of
fun and function is somewhat strange, since they have the same semantics.

In revised syntax, by adding this notion of ``irrefutable patterns'', there is no ambiguity: a list not being an irrefutable pattern, the construction with brackets is not a parsing problem. When using an irrefutable pattern, there must be only one case, and therefore no close construction is necessary, allowing us to keep the simple frequent form:
fun x -> x.

Motivation to close the constructions

It is to avoid the problem of the ``dangling bar'' (the same than the ``dangling else'' in the ``if'' construct). In normal syntax, this program:

     match ... with
       case1 ->
         match ... with
           case11 -> ...
         | case12 -> ...
     | case2 -> ...
is wrongly interpreted: to obtain what you want, you need to use parentheses or begin..end to close the internal match construct. There is a same problem with the if construct, because of the optional else (see further).

I admit that the fact that all cases do not start with the same token (the first starting with a left brace, the other ones with a vertical bar) is not practical in editing programs: it is indeed complicated to exchange the first case and the other ones. However readability and absence of ambiguity are more important than easiness to use and absence of verbosity: when it is easy to edit but risk to introduce bugs or irregularities, it is not sure that it be better.

Why not close the construction by a keyword,
end for example, like the Ada language does? It is because an ending keyword gives an idea of something imperative, it does not make think that something is returned, which is however the case in the match construct, like most of OCaml constructs.

Motivation for the empty forms

The empty function is useful for initial cases of iterations or initial references values. It is not absolutely essential since it is possible to write:

     fun _ -> assert False
The empty match existed before the introduction of the assert construction in OCaml. Like the assert, it indicates the position of the error in the file.

These constructions are there because they are the limit when the number of the matching cases reach zero.

Motivation for irrefutable patterns in ``let''

In normal syntax, if you use a ``let'' binding with a non irrefutable pattern, you get a typing message ``pattern matching is not exhaustive''. If you want to be clean and add the missing cases, you have to torture your sources. Indeed, for example, the

     let x :: y = a in b
must be changed into:

     match a with x :: y -> b | ...
In revised syntax, since it is forbidden, you are never in this situation.

Motivation for the ``where'' construct

This construction existed in the old ``Caml'' V3.1 (whose development was stopped by the beginning of the 90ies) and I liked it much. There was a problem in this construct, because it was possible to add several bindings separated with ``and'', which sometimes could enter in conflict (another ``dangling'' case) with a possible ``and'' in an enclosing ``let'':

          let a =
             b where c = d
          and e = f in ...
In this situation, the ``where'' construct used to ``absorb'' the ``and'' of the ``let'' binding. The program was interpreted as:

          let a =
             b where c = d and e = f
          in ...
Because of that, in Caml Light and OCaml, the ``where'' construction were removed. But a ``where'' with only one binding could works. Anyway, having several bindings is not interesting nor useful nor readable, in this construction.

I personally use this construction in the case when the ``let'' binding is a function definition and the expression a call to this function. I generally prefer to write:

             loop 0 where rec loop i = ...
than the equivalent form:

             let loop i = ... in loop 0
I consider the form with where more readable in this situation.

5.7 Mutables and assignment
Motivation

Having two constructions for the assignment is abnormal. In normal syntax, the
":=", specific to the ref type, is an old rest of the time when references where implemented with a constructor (there were mutable constructors, then), and the codes to extract a reference value and to change it were complicated:

            match x with Ref x -> x
            match x with Ref x -> x <- y
It was then justified to have specific constructions "!x" and "x := y" for these cases. Now, references are implemented with a record type, and these constructions can be written:

            x.contents
            x.contents <- y
In normal syntax, there are 2 ways to access and assign references, although the method using the label ``contents'' is rarely used. In revised syntax, it is the only method. However, I consider ``contents'' as a too long identifier, it is why I changed it into ``val''. It is actually not a change in the definition of ref (since Camlp4 does only syntax), it is changed in the syntax trees, the real name of the field remaining ``contents''.

As
":=" is no more necessary with the semantics of assigning a reference value, it can be used in the place of "<-", a token less natural and introducing confusions (when we read it) with the "->" of the functions and pattern matchings.

The construction
!x is no more necessary either since we can write x.val. We then save two tokens which were used only for the reference type.

5.8 Types
Motivation for the applying order of type constructors

The order is to look like the constructors values: you can then read value in the same order than their types. The syntax with currification style is used also for value constructors.

Motivation for the abstract types syntax

It was to look like existential types, because abstract types are actually some kind of existential types. This may have a meaning if existential types are included one day in
OCaml.

Motivation for the parentheses around tuple types

Close more constructions. Closed like tuples are. Moreover it is more visible in constructor declarations to differentiate the case of two parameters and one parameter being a tuple.

Motivation for the constructor declaration type

The revised syntax have tried to be the most general possible, to plan the possible future extensions of the language.

Record types are closed by braces (no change). Symmetrically, the sum types (declaring constructors) are closed by brackets. This is also a way to consider them just as ``types''. We could imagine that they be authorized one day outside type declarations. For example like this:

            fun (x : [ A | B ]) -> ...
            type t = { lab : [ A | B ] }
            type u = [ C of { lab : ...} ]
The form of the last line is, by the way, the method used in the language SML, where record types are always anonymous.

In
Camlp4 abstract syntax, there is no notion of ``type declaration'': a type declaration is just a type. The fact that sum types and record types are accepted only in type declarations is done when converting into the abstract syntax which ocamlc uses.

Motivation for the empty type

As the type constructor definition is closed, it is possible to imagine the empty type. Not very useful, but we have it without any cost: a type inhabited by nothing (empty set).

Motivation for the currified syntax for constructors

This reflects the actual semantics. There are indeed two cases, and the values in the two cases are implemented differently. The arity of constructors are more clear.

In normal syntax, it is difficult to understand (and to explain) why if C is a constructor with two parameters, this is accepted:

            fun C (x, y) -> (x, y)
but not that:

            fun C x -> x
In revised syntax you have to write:

            fun [ C x y -> (x, y) ]
The revised syntax reflects the fact that the two parameters of the constructor C cannot be considered as a tuple.

This does not mean that the ``partial evaluation'' of constructors is accepted: accept it or not is a semantic issue, treated at
OCaml typing time.

Motivation for the uppercase for True and False

In normal syntax,
true and false are the only constructors which start with a lowercase letter. It is due to historical reasons: in Caml Light, no constructors (of any type) need to be capitalized. When OCaml was created, this was changed, but strangely, true and false escaped to this rule. They are even now considered as keywords, what they should not be, since they are not syntactic constructs or part of syntactic constructs.

In revised syntax, they must be written
True and False and are not keywords.

Motivation for mutable syntax in records

It is just to read: ``the label x is a mutable integer'' instead of ``the mutable label x is an integer'', which is less clear.


5.9 Modules

Modules application uses the currified form:
OCamlRevised
type t = Set.Make(M).t;;type t = (Set.Make M).t;


Motivation

Currification syntax is more natural in functional languages. There is no reason to have two different syntaxes for applications (whatever we apply): one with parentheses, one with currification.


5.10 Classes and objects

The classes and objects also have a revised syntax. To see it, the simplest way is to write examples in normal syntax and to convert them into revised syntax using the command:
     camlp4o pr_r.cmo file.ml
(documentation to be updated)

5.11 Miscellaneous
Motivation for the ``else''

The
else is mandatory to avoid the ``dangling else'' problem. In normal syntax, you can write:

    if a then
      if b then c
    else d
In the above program, the ``else d'' will actually corresponds to the ``if b'' not to the ``if a''. In revised syntax, the ``else'' being mandatory, the problem does not exist.

OCaml being a functional language, it is normal that the ``else'' case be mandatory: indeed if the condition is false, what is returned by the statement is not clear in normal syntax.

All these ``dangling'' problems cause also problems in pretty printing: it is not easy to know if the constructions have to be parenthesized or not. In revised syntax, there are no dangling problems and no problem in pretty printing. To pretty print in normal syntax, a solution had to be used, using an extra parameter transmitted in all functions.

We remark that in revised syntax, the
if construct is not closed, it does not need to be.

Motivation for the ``or'' and ``and'' operators

There is no reason to accept two syntaxes for the ``or'' operator and two for the ``and'' operator. The syntaxes
or and & are actually old constructions, kept for an old backward compatibility.

Motivation for the suppression of begin..end

In normal syntax, the construction with
begin and end is actually the same than the parentheses: often a question of personal taste. In normal syntax, when parenthesis is necessary, some programmers prefer "begin match...end", other "(match...)".

In revised syntax, the cases when such a parenthesization is necessary is much less frequent, since most constructions are already parenthesized. Two constructions for that are not necessary.

Motivation for syntax for alone operators

To avoid the case of the
* operator which must be specifically written with spaces around it, since (*) in lexically interpreted as a beginning of a comment.

Motivation for the fact that there are no automatic infixes

Since we are under Camlp4, we can use Camlp4 features.

Motivation for the ``declare'' construction

Essential when a syntax extension in
OCaml structure item generates several structure items. For example, if you make a syntax change in order that a type declaration generates 1/ the type declaration itself and 2/ functions to be applied to this type.

When converted into
OCaml normal syntax tree, this construct is inlined.

5.12 Streams and parsers
Motivation for the keyword "parser", rather than "parse"

Actually, it is not different from the choice of the normal syntax, since the same keyword is used.

The keyword ``parser'' is like ``function'', not like ``match''. The ``match'' and ``try'' statements are direct actions, with their immediate parameters. On the other hand, the parsers and functions are just ``concepts'': they are not immediately applied with their parameters. One must read: ``this is a parser'' just like ``this is a function''.

The word ``parse'' might have been used if the construction was ``parse xxx with''. This is written ``match xxx with parser'' in order to save a keyword.

Motivation for
[: instead of [<

It is a question of readability, because of the presence of quotations in our extended language, whose syntax use many ``less'' and ``greater'' characters. And it is a problem for a list of quoted things:

          [<:expr< xx >>; <:expr< yy >>]
Motivation for quotes and backquotes

Actually, this should have been done in
OCaml normal syntax, since from Caml Light to OCaml, the character used to enclose characters changed from backquote into right quote. It would have been then normal to invert that for the streams terminals, but it was forgotten.

In normal syntax, this creates sometimes problems in characters streams:

          parser [< '('a' | 'b') >] -> ...
The lexer interprets the first parenthesis as a character, which causes thus parsing error. You must add a space before the left parenthesis:

          parser [< ' ('a' | 'b') >] -> ...
In revised syntax, which backquotes, this problem does not appear.

Motivation for closing the syntax of parsers

To resolve the same problem of ``dangling bar'' than for functions, matches and tries. This syntax is closed the same way.

Motivation for the empty parser

Useful in initial cases in iterations or initial references values.




Previous Up Next