Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
camlp5/revised syntax questions
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2009-10-10 (12:31)
From: blue storm <bluestorm.dylc@g...>
Subject: Re: [Caml-list] camlp5/revised syntax questions
On Thu, Oct 8, 2009 at 4:39 PM, Aaron Bohannon <> wrote:
> Thanks for your detailed reply.  I had a suspicion I would have to
> read the source code to get the all of the necessary documentation.

It is actually possible to pretty-print the grammar rules during
camlp* execution. For example, here is the code I gave to the toplevel
(using ocamlfind/findlib) to print the default "expr" grammar and
levels :

  #use "topfind";;
  open Camlp4.PreCast;;
  Gram.Entry.print Format.std_formatter Syntax.expr;;

This is probably camlp4-specific, but the printing routine is
documented for camlp5 (
) so an equivalent code should work. I however prefer to read the
source code, wich is easier to browse and contains more information
(pretty printing shows the parsing rules, but not the parse action).

> However, I'm still missing some basic point here.
> On Wed, Oct 7, 2009 at 4:16 PM, blue storm <> wrote:
>> The different "level names" are not absolute among all camlp4 grammars
>> : they're a property of each grammar rule of each grammar. If you want
>> to "modify" a specific grammar (that is, EXTEND it), you must check
>> the different levels available in the definition.
> Yes, I understand that.  But how do you specify which grammar your
> file is extending?  My file is structured like this:
> #load "pa_extend.cmo";
> #load "q_MLast.cmo";
> open Pcaml;
>  GLOBAL: expr;
>  ...
> END;
> So where did I specify whether I was extending the original syntax or
> the revised syntax (or some other grammar entirely)?  I suppose I must
> have implicitly chosen the original syntax because my code works fine
> on that.

The syntax extension mechanism is imperative in nature : the EXTEND
statement works on an existing grammar and add/change/delete rules
(camlp5 documentation : ) : more
precisely, the EXTEND syntax is a camlp4 extension itself, wich gets
desugared to a bare ocaml expression wich modifies the given
Grammar.Entry.t values (in an imperative way).

The revised and classical syntax are designed as syntax extensions
( that extend an empty grammar, wich already contains
some (empty) grammar entries. They first clear every entry of that
grammar (probably to make sure it's really empty), then add by
extension every syntaxic construct of the ocaml language. They get
compiled to pa_o.cmo and pa_r.cmo, wich you can pass to camlp4 to
choose one of the two syntax :
  camlp4 pa_o.cmo my_extension.cmo ...

What happens here is that :
  - camlp4 starts with an empty ocaml grammar
  - you link it to pa_o.cmo, wich gets executed and set up the
classical syntax (by mutation of the (empty) grammar entries)
  - you then add your own extension wich makes additional mutations

In essence, the effect of your extension depends on the side effects
that were done before. If pa_o.cmo or pa_r.cmo was passed as a
parameter, you build upon their syntax rules, but it can be the case
that an additional syntax extension was added before yours, and thus
you're actually working upon slightly modified syntax rules.

camlp4o and camlp4r are just packaged versions of camlp4, wich
respectively "pa_o.cmo" and "pa_r.cmo" implicitly linked.

In general, reasonably local syntax extension tends to work on both
the classical and the revised syntax (because their syntax rules are
quite similar). If your extension depends on one of the syntax, you
should specify it. If your extension tries to delete a rule wich was
not present in the syntax you're extending, you will get a runtime
error (for example, trying to delete the "where"-related rule in the
classical syntax).

> 1) In the parsing rule for the simple dot noation...
>      | e1 = SELF; "."; e2 = SELF -> <:expr< $e1$ . $e2$ >> ]
> ...why is the field label an "expr"?  This does not agree with the
> OCaml manual, which has a separate syntactic category for "field"
> (, nor with my
> intuition about the meaning of the code.

Is suppose this presentation was chosen to make the grammar rules simpler.

Camlp4 parsers are not tied to the documented ocaml grammar. Camlp4
grammars for ocaml (you can use camlp4 to parse other languages,
without necessarily starting from the OCaml grammar) use a
camlp4-specific ocaml AST with then get translated to the specific AST
the OCaml compiler expects (when no camlp4 preprocessing is needed,
the ocaml compiler use its own yacc parser wich directly produces the
ocaml-compiler AST).
There are actually subtle differences in parsing (for example "let id
x = x in id fun _ -> ()" gets rejected by the non-camlp4 parser but
parses fine under camlp4 and camlp5), and I don't think any of them is
"right" : they are all tied to implementation-specific parsing
strategies (weird recursive descent for camlp{4,5} and yacc), and I'm
not sure even the yacc version rigourously respects the documented BNF

> 2) Furthermore, as one can see from the ":=" entry above, the entire
> left side of a record update is parsed as its own subexpression.  So
> this means, that in the context of a record update, that subexpression
> has to be interpreted as a reference, but in other contexts, the very
> same expression must be interpreted as a value.  I don't necessarily
> care what kind of magic makes this possible on the back end, but I am
> wondering whether this has any implications for modifying the record
> syntax.

I'm not sure what you mean here, but I'm under the impression that
you're confusing the syntaxic representation of the expression and its
runtime/compile-time semantic. Camlp* knows nothing of the meaning of
the code it produces; the output is an AST wich has no idea of what a
"reference" and a "value" means. The semantic of the given code
depends on the deeper passes of the compiler (for example typing),
wich probably have an internal language of their own, and surely make
the difference between lvalue and rvalues.