Previous Up Next
Chapter 4 Quotations
Quotations are expressions or patterns enclosed by special parentheses: <:id< and >> (id is a quotation identifier). They exist also enclosed by << and >>.

Examples of quotations:
         <:expr< let a = b in c >>
         << [x](x y) >>
         <:myquot< quotations can be any text >>
The contents of quotations are not lexed: quotations themselves are tokens, exactly like strings. Therefore like the contents of strings, their contents do not have to respect any special lexing rule.

4.1 OCaml syntax extensions

In the previous chapter, we saw the grammar system of Camlp4. This was a first step in the way to be able to write syntax extensions in OCaml.

The second step is: how to make OCaml syntax trees nodes? The immediate answer is: use the module defining them. Indeed, this module exist: its name is MLast. You can then try do understand it or... you can use quotations.

Let us suppose we want to generate the syntax tree of the OCaml expression: "let a = b in c". The version using the module MLast directly is:
          MLast.ExLet
            (loc, false, [MLast.PaLid (loc, "a"), MLast.ExLid (loc, "b")],
             MLast.ExLid (loc, "c"))
Not so complicated, perhaps. But you need directions for use detailing all tree nodes, their parameters, their usages. If you are courageous, you may try to look inside the code of Camlp4 to see how they are used.

But the quotation system of Camlp4 provides a handy way to represent these trees. In this system, if the right file is loaded, you are able to write the above example as:
           <:expr< let a = b in c >>
Simpler, isn't it? Everything inside is treated at compile time by Camlp4 which generates exactly the same code than the above ``long'' version.

Let us look in details at the Camlp4 quotation system, then. It can be used for these tree notes of OCaml syntax, but actually for any other type. You can define your own quotations, using any syntax you choose.

4.2 Camlp4 quotation system

The quotations contents are not lexed, as said above, but they are even so treated at parse time. Actually they are not lexed by the OCaml lexer (of Camlp4), but they are analyzed by other functions, which are called ``quotation expanders''. These expanders just take a string as parameter (the contents of the quotation) and return a piece of OCaml program.

There are two kinds of quotations expanders, which do the same things: but one version is easy to use, although not general, and the other one is general but need to have a knowledge of... OCaml syntax trees quotations (them again).

Well... this second version needs the knowledge of quotations to learn quotations... and the OCaml syntax trees quotations are explained in the next chapter.

Ok, since we are stuck in a mutually recursive documentation, let us start rather with the ``simple'' quotations, the ones ``easy to use''. This will explain you how work quotation expanders. In a second step, we explain the general quotations which need MLast quotations to be defined.

4.3 Example: defining constants

The simple version of quotation expanders just return strings. These returned strings are a piece of code, in concrete syntax, like source code. And they *are* source code, in a meaning that they need to be parsed after the quotation expansion.

Let us take a small example. We want to define constants by their name (this example is almost the equivalent of the #define in C for the case when a simple constant is defined).

You are going to create our examples in the toplevel by typing (under the toplevel):
     #load "camlp4o.cma";;
Now, we can type this:
     # let expand _ s =
         match s with
           "PI" -> "3.14159"
         | "goban" -> "19*19"
         | "chess" -> "8*8"
         | "ZERO" -> "0"
         | "ONE" -> "1"
         | _ -> "\"" ^ s ^ "\""
       ;;
Let us call the quotation ``foo''. We can associate the quotation ``foo'' to the above expander ``expand'' by typing:
     # Quotation.add "foo" (Quotation.ExStr expand);;
We can experiment the new quotation immediately:
     # <:foo<PI>>;;
     - : float = 3.14159
     # <:foo< hello, world >>;;
     - : string = " hello, world "
     # <:foo<ONE>> + <:foo<ONE>>;;           
     - : int = 2
     # let rec fact x =
         if x = <:foo<ZERO>> then <:foo<ONE>> else x * fact (x - 1)
       ;;
     val fact : int -> int = <fun>
And so on. But quotations can also be used as patterns:
     # let rec fib =
         function
           <:foo<ZERO>> | <:foo<ONE>> -> 1
         | n -> fib (n - 1) + fib (n - 2)
       ;;
     val fib : int -> int = <fun>
You can remark that a specific quotation has not a specific type: its type depends on what the quotation generates: it this case, it can be a float number, an integer, a string. Notice also that the spaces inside quotations are significant: here the expander do not strip them, and therefore:
     # <:foo< PI >>;;
since " PI " with two spaces around is not matched by "PI" without spaces, doesn't return the value of PI but:
     - : string = " PI "
If we want that both cases return the value of PI, we should have written a more subtle quotation expander. A quotation expander can use any parsing technology: string pattern matching (in our example), stream parsers, ocamllex, ocamlyacc, Camlp4 grammars... What is important is that they take a string as parameter and return a piece of program.

4.4 Quotations and the compiler

We just saw an example of quotation in the toplevel: but in this context, compilation level and program level are mixed. When using the compiler ocamlc, the quotation system requires that the two levels be separated: The ``compiler part'' must be previously compiled. To test the above example, we have to copy the text of the function expand and the call to Quotation.add in a file, e.g. foo.ml, which must be compiled like this:
       ocamlc -I +camlp4 -c foo.ml
This creates an object file named foo.cmo. In Camlp4, all syntax extensions are done through OCaml object files. The preprocessor camlp4o takes a list of object files as first arguments and load them. Let us write a file fib.ml:
     (* file fib.ml *)
     let rec fib =
       function
         <:foo<ZERO>> | <:foo<ONE>> -> 1
       | n -> fib (n - 1) + fib (n - 2)
     ;;
As a first remark, we can see that the normal OCaml compiler does not know quotations:
     $ ocamlc -c fib.ml
     File "fib.ml", line 4, characters 4-6:
     Syntax error
But Camlp4 does...
     $ ocamlc -pp camlp4o -c fib.ml
     File "fib.ml", line 4, characters 4-16:
     While expanding quotation "foo":
     Uncaught exception: Not_found
     Preprocessing error
... providing the quotation expander object file is given as parameter (it must be written ./foo.cmo because camlp4 does not have the current directory in its default research path):
     $ ocamlc -pp "camlp4o ./foo.cmo" -c fib.ml
4.5 Pretty printing the result

How to be sure that the quotations are correctly expanded? If you are perfecting a quotation expander or if you already have got a quotation expander, and you want to see the results, you can ask camlp4 to pretty print the result.

For that, use camlp4o as command with the predefined printing kit, named "pr_o.cmo":
     $ camlp4o ./foo.cmo pr_o.cmo fib.ml
     (* file fib.ml *)
     let rec fib =
       function
         0 | 1 -> 1
       | n -> fib (n - 1) + fib (n - 2)
     ;;
The quotations have been replaced by their value.

4.6 Quotations returning syntax trees

Our quotation expander, the function ``expand'', returns strings. Internally, when camlp4 encounters a quotation ``foo'' in the program text, it calls this function and get the resulted string. This string is parsed with the grammar entry ``expr'' (expressions) or ``patt'' (pattern).

But this has the following drawbacks:

1/ It needs a new parsing phase (which takes time, not much but it is too bad).

2/ If the expander is bad written, this resulting string may be syntactically incorrect, and it is difficult to debug (however see the option -QD of camlp4).

3/ It is dependent from the enclosing syntax: the same expander may work e.g. in revised syntax but not in normal syntax.

To illustrate the point 2/, try to type this:
         # <:foo< to"to >>;;
The result is this strange message:
         # <:foo< to"to >>;;
           ^^^^^^^^^^^^^^^
         While parsing result of quotation "foo":
         (consider setting variable Pcaml.quotation_dump_file)
         Parse error: end of input expected after [expr] (in [expression])
It is because our quotation expander was too simple: it created a string containing: a double quote, the contents of the quotation, another double quote, i.e:
          " to"to "
The parser then fails with this input. Debugging this can be sometimes complicated, especially if the expander does not pretty print its results, or add a lot of redundant parentheses, etc. Here, the solution would have been to use "String.escaped s" instead of "s" in the expander.

To avoid that, and all the other drawbacks, there is the other quotation system, the one where expanders return abstract syntax trees. In ``expand'', instead of returning the string "3.14159", we may want to say ``the syntax tree representation of the float number 3.14159''. In this case, it does not need another parsing phase, no risk of parse error, and it is independent from the enclosing syntax.

The way to create OCaml syntax trees is explained in chapter 6. They can be written in quotations, using the syntax extension kit named q_MLast.cmo.

The same quotation expander in our file foo.ml could be written:
     (* file foo.ml *)
     let loc = (0, 0);;
     let expand_expr s =
       match s with
         "PI" -> <:expr< 3.14159 >>
       | "goban" -> <:expr< 19 * 19 >>
       | "chess" -> <:expr< 8 * 8 >>
       | "ZERO" -> <:expr< 0 >>
       | "ONE" -> <:expr< 1 >>
       | _ -> <:expr< $str:s$ >>
     ;;
     let expand_patt s =
       match s with
         "PI" -> <:patt< 3.14159 >>
       | "ZERO" -> <:patt< 0 >>
       | "ONE" -> <:patt< 1 >>
       | _ -> <:patt< $str:s$ >>
     ;;
     Quotation.add "foo" (Quotation.ExAst (expand_expr, expand_patt))
This time we used ExAst instead of ExStr. This constructor needs two expanders: one for the quotations in position of expression and one for the quotations in position of pattern. Notice that the cases "goban" and "chess" are not in the pattern version since 19*19 and 8*8 are not correct patterns.

The compilation of foo.ml needs the quotation expander kit q_MLast.cmo:
     $ ocamlc -pp "camlp4o q_MLast.cmo" -I +camlp4 -c foo.ml
This creates an object file "foo.cmo" which can be used to compile "fib.ml".

Just for curiosity, you can pretty print the expander itself, using the pretty printing kit pr_o.cmo. Type:
     $ camlp4o q_MLast.cmo pr_o.cmo foo.ml
4.7 Example: lambda terms

We can now take a bigger example, bigger than just creating constants. We want to manipulate lambda terms. A lambda term can be defined by the following type:
     type term =
         Var of string
       | Func of string * term
       | Appl of term * term
     ;;
The first case, Var, represents variables.

The second case, Func, represents functions. Its first parameter is the function parameter and its second parameter the function body. We write that in concrete syntax [parameter]body.

The third case, App, represents an application of two lambda terms. We write that in concrete syntax (term1 term2).

But, for the moment, we just defined a type term, and we can just write these terms using the constructors. Here is an example:
     let id = Func ("x", Var "x")
     let k = Func ("x", Func ("y", Var "x"))
     let s =
       Func ("x", Func ("y", Func ("z",
         Appl (Appl (Var "x", Var "y"), Appl (Var "x", Var "z")))))
     let delta = Func ("x", Appl (Var "x", Var "x"))
     let omega = Appl (delta, delta)
A nice quotation expander would allow us to use concrete syntax. The same piece of program could look like this, which is more readable:
     let id = << [x]x >>
     let k = << [x][y]x >>
     let s = << [x][y][z]((x y) (x z)) >>
     let delta = << [x](x x) >>
     let omega = << (^delta ^delta) >>
Let us write the corresponding quotation expander, then.

Here, the contents of our quotations is too complicated to be parsed just by string pattern matching. We could use a stream parser, but the simplest way is to use grammars.

No need to write a lexer, the default lexer Plexer provided in the Camlp4 library fits. Using our knowledge (previous chapter) about Camlp4 grammars, here is a quotation expander for the lambda terms: (file named q_term.ml):
     let gram = Grammar.gcreate (Plexer.gmake ());;
     let term_eoi = Grammar.Entry.create gram "term";;
     let term = Grammar.Entry.create gram "term";;
     EXTEND
        term_eoi: [ [ x = term; EOI -> x ] ];
        term:
           [ [ "["; x = LIDENT; "]"; t = term -> <:expr< Func $str:x$ $t$ >>
             | "("; t1 = term; t2 = term; ")" -> <:expr< Appl $t1$ $t2$ >>
             | x = LIDENT -> <:expr< Var $str:x$ >> ] ]
        ;
     END;;
     let term_exp s = Grammar.Entry.parse term_eoi (Stream.of_string s);;
     let term_pat s = failwith "not implemented term_pat";;
     Quotation.add "term" (Quotation.ExAst (term_exp, term_pat));;
     Quotation.default := "term";;
Several remarks about the text of this quotation expander: To compile q_term.ml, we need:

1/ the pa_extend.cmo syntax extension, for the EXTEND statement

2/ the q_MLast.cmo quotation extension, for the OCaml syntax trees quotations.

The compilation must then be done by the command:
     $ ocamlc -pp "camlp4o q_MLast.cmo pa_extend.cmo" -I +camlp4 \
          -c q_term.ml
Now we can use the lambda term quotation q_term.cmo we just created. Under the toplevel, you can load it (after having loaded camlp4o.cma) and use the term quotation directly. But, at this level, we need to have defined the type term, otherwise:
     # let id = << [x]x >>;;
                ^^^^^^^^^^
     Unbound constructor Func
Ok, enter the definition of the type term in the toplevel. Then:
     # let id = << [x]x >>;;
     val id : term = Func ("x", Var "x")

     # let k = << [x][y]x >>;;
     val k : term = Func ("x", Func ("y", Var "x"))

     # let s = << [x][y][z]((x y) (x z)) >>;;
     val s : term =
       Func
        ("x",
         Func
          ("y",
           Func
            ("z", Appl (Appl (Var "x", Var "y"), Appl (Var "x", Var "z")))))

     # let delta = << [x](x x) >>;;
     val delta : term = Func ("x", Appl (Var "x", Var "x"))
The definition of omega given in the initial example is a special case that we are going to see in the next section. For the moment, it just answers:
     # let omega = << (^delta ^delta) >>;;
                       ^
     While expanding quotation "term":
     Parse error: [term] expected after '(' (in [term])
We can remark that the location of the syntax error is correct: this is due to the grammar system: in case of syntax error, the error exception is enclosed with the exception exc_located which transmits the error location. Receiving this error, the quotation expansion machinery just has to add the location of the quotation to be able to display the error location correctly in the input text.

Let us see now the case of this definition of the variable omega. It can be resolved with antiquotations.

4.8 Antiquotations

Antiquotation is a way to insert code inside quotations. Unlike quotations, antiquotations are not a predefined notion of Camlp4: it is just a programming technique.

In our example, we would like omega to be ``the application of delta to itself''. But when we say ``delta'', we don't mean ``a variable delta'' in the context of a lambda term (which would be Var "delta"), but ``the value of the variable delta'' previously defined. We want to insert its value (twice, in this example) to create the new lambda term.

In the very initial version, we had written:
     let omega = Appl (delta, delta)
Ok, we could use that, it is still correct, but as we have a system of quotations, we would like to represent that as concrete syntax, with an application (the two terms between parentheses).

In our concrete syntax, we need to add a specific case to specify ``a value of the enclosing environment''. Here we chose the caret sign ^ followed by an identifier.

We then have to add a grammar rule which says: ``if caret sign followed by and identifier, return the syntax tree of the identifier itself considered as a variable''. The rule can be written inside the EXTEND statement:
       "^"; x = LIDENT -> <:expr< $lid:x$ >>
This right "expr" quotation represents the OCaml syntax tree of a variable whose name is "x". See chapter 6. Adding this rule in the quotation expander, recompiling it, we can now test in the toplevel:
     # let delta = << [x](x x) >>;;                                        
     val delta : term = Func ("x", Appl (Var "x", Var "x"))

     # let omega = << (^delta ^delta) >>;;                         
     val omega : term =
       Appl
         (Func ("x", Appl (Var "x", Var "x")),
          Func ("x", Appl (Var "x", Var "x")))
4.9 Locations in antiquotations

Warning: this section is a little bit subtle, to resolve a specific problem. You may skip it, if you find it too complicated or not interesting in a first step.

It is about the location of possible semantic error. By default the whole quotation is underlined:
     # let omega = << (^delta ^xxx) >>;;
                   ^^^^^^^^^^^^^^^^^^^
     Unbound value xxx
However the variable xxx has a location, in the quotation. And the grammar system is supposed to take care of locations, via a variable named loc transmitted from the grammar rule to the action part, which is used by the quotations of OCaml syntax trees. But precise locations has been lost. Why?

It is because the Camlp4 quotation machinery ignores if the syntax trees you built have correct locations. It just receives a syntax tree, but does not know which technique you used1. The quotation expander might have inserted eccentric locations: in this case, in case of semantic error, the error location could be anywhere in the input text, inside the quotation but at a wrong place, or outside the quotation. even possibly outside the text.

To avoid a possible problem, the Camlp4 quotation machinery wisely scans the resulting syntax tree and put the location of the whole quotation in all nodes (erasing the old ones): better being sure that all semantic error underline the whole quotation rather than risk that the errors messages be anywhere in the input text, or referring bad parts of the quotation.

By default, thus, the quotation machinery does not trust the programmer of the quotation expander. There is however a way to tell it that the location is correct: you can specify in the resulting tree that some specific part contains correct locations.

This can be done by creating an ``antiquotation'' node. It can be written, for an expression e or a pattern p:
       <:expr< $anti:e$ >>
       <:patt< $anti:p$ >>
Here, in the rule:
       "^"; x = LIDENT -> <:expr< $lid:x$ >>
the right part is just the tree node for the identifier x. It contains the location of the identifier (including the caret, actually). But as it is not enclosed by an ``antiquotation'' node, the quotation machinery will erase the location and replace it by the location of the whole quotation.

We are then going to use such an node. But we have to remark first that the location of the rule is not exactly what we want, since it includes the caret sign and the possible spaces between it and the identifier. To be very correct, we should use another grammar entry, e.g. quotation, holding just one case LIDENT and create the antiquotation there where we are sure that the "loc" variable represents the location of the LIDENT.

The subtree under the antiquotation node is the tree to represent the variable x. We cannot use <:expr< $lid:x$ >> directly, because it would take loc which is the location of the antiquotation itself, and we need the location of the variable relative to the beginning of the antiquotation. Its location is then:
     (0, String.length x)
With all these remarks, the line for the antiquotation must then be:
       "^"; x = antiquot -> x
and the antiquot entry:
     antiquot:
       [ [ x = LIDENT ->
            let ast =
              let loc = (0, String.length x) in
              <:expr< $lid:x$ >>
            in
            <:expr< $anti:ast$ >> ] ]
     ;
The syntax of your antiquotations is the one you want, providing you can isolate it from the rest of your quotation. In this example, the antiquotation is introduced by "^" and the antiquotation is just an identifier.

But you may want that your antiquotations are between some kind of ``parentheses''. In this case, you need a way to build the syntax tree of the antiquotation. For that, use:
     Grammar.Entry.parse Pcaml.expr_eoi (Stream.of_string s)
where s is the antiquotation string: this will create your antiquotation subtree (if your quotation is in a pattern, instead of an expression, use patt_eoi instead of expr_eoi).

For example, our predefined antiquotations building syntax trees (see chapter 6) are between two ``dollar'' signs, allowing users to write things like that:
    <:sig_item< value $x ^ string_of_int n$ : unit -> unit >>
To build the syntax tree of this antiquotation, our quotation ``sig_item'' applies the above call to the antiquotation string, where s is the contents of the antiquotion, i.e.:
     "x ^ string_of_int n"
Notice that the antiquotation subtree will, by default, inherit the location of the whole quotation. If you want it to have its own location (which is very interesting in case of typing errors), don't forget to enclose it with the antiquotation node; you also need to enclose this call with a try...with in case of syntax error in the antiquotation string and recompute the error location:
     let ast =
       try Grammar.Entry.parse Pcaml.expr_eoi (Stream.of_string s) with
         Stdpp.Exc_located (bp, ep) exc ->
           raise_with_loc (fst loc + bp, fst loc + ep) exc
     in
     <:expr< $anti:ast$ >>
4.10 Example: lambda terms, finished

Here is now our quotation expander for lambda terms, including the antiquotation location system of the previous section and the pattern version:
    let gram = Grammar.gcreate (Plexer.gmake ());;
    let term_exp_eoi = Grammar.Entry.create gram "term";;
    let term_pat_eoi = Grammar.Entry.create gram "term";;
    EXTEND
      GLOBAL: term_exp_eoi term_pat_eoi;
      term_exp_eoi: [ [ x = term_exp; EOI -> x ] ];
      term_exp:
        [ [ "["; x = LIDENT; "]"; t = term_exp -> <:expr< Func $str:x$ $t$ >>
          | "("; t1 = term_exp; t2 = term_exp; ")" -> <:expr< Appl $t1$ $t2$ >>
          | x = LIDENT -> <:expr< Var $str:x$ >>
          | "^"; x = exp_antiquot -> x ] ]
      ;
      exp_antiquot:
        [ [ x = LIDENT ->
             let ast = let loc = (0, String.length x) in <:expr< $lid:x$ >> in
             <:expr< $anti:ast$ >> ] ]
      ;
      term_pat_eoi: [ [ x = term_pat; EOI -> x ] ];
      term_pat:
        [ [ "["; x = LIDENT; "]"; t = term_pat -> <:patt< Func $str:x$ $t$ >>
          | "("; t1 = term_pat; t2 = term_pat; ")" -> <:patt< Appl $t1$ $t2$ >>
          | x = LIDENT -> <:patt< Var $str:x$ >>
          | "^"; x = pat_antiquot -> x ] ]
      ;
      pat_antiquot:
        [ [ x = LIDENT ->
             let ast = let loc = (0, String.length x) in <:patt< $lid:x$ >> in
             <:patt< $anti:ast$ >> ] ]
      ;
    END;;
    let term_exp s = Grammar.Entry.parse term_exp_eoi (Stream.of_string s);;
    let term_pat s = Grammar.Entry.parse term_pat_eoi (Stream.of_string s);;
    Quotation.add "term" (Quotation.ExAst (term_exp, term_pat));;
    Quotation.default := "term";;
After compilation of this file, q_term.ml, some experiments in the toplevel (after having loaded camlp4o.cma and q_term.cmo and defined the type term):
     # let omega = << (^delta ^xxx) >>;;
                        ^^^^^
     Unbound value delta

     # let delta = << [x](x x) >>;;                                          
     val delta : term = Func ("x", Appl (Var "x", Var "x"))

     # let omega = << (^delta ^delta) >>;;
     val omega : term =
       Appl
        (Func ("x", Appl (Var "x", Var "x")),
         Func ("x", Appl (Var "x", Var "x")))

     # match omega with << (^a ^b) >> -> a | x -> x;;
     - : term = Func ("x", Appl (Var "x", Var "x"))
Many improvements could be done, for example, to be able to accept the any pattern "_" (any) in order to be able to write this last example:
       match omega with << (^a ^_) >> -> a | x -> x;;
4.11 Conclusion

You know the main points about quotations. In the road to be able to make syntax extension in OCaml, we have advanced to the second step: the general quotation system of Camlp4. Now, we could look in detail at the specific predefined quotations for OCaml syntax trees. We already used some of them. However, as they use the ``revised syntax'', we must introduce it before.


1
The quotation machinery receives the quotation expander via the function Quotation.add: for it, the expander is just a function; it does not know how it is implemented.


Previous Up Next