Quotations are expressions or patterns enclosed by special
parentheses: <:id<
and >>
(id
is a quotation
identifier). They exist also enclosed by <<
and
>>
.
Examples of quotations:
<:expr< let a = b in c >>
<< [x](x y) >>
<:myquot< quotations can be any text >>
The contents of quotations are not lexed: quotations themselves are
tokens, exactly like strings. Therefore like the contents of strings,
their contents do not have to respect any special lexing rule.
4.1 |
OCaml syntax extensions |
|
In the previous chapter, we saw the grammar system of Camlp4. This was
a first step in the way to be able to write syntax extensions in
OCaml.
The second step is: how to make OCaml syntax trees nodes? The
immediate answer is: use the module defining them. Indeed, this
module exist: its name is MLast
. You can then try do
understand it or... you can use quotations.
Let us suppose we want to generate the syntax tree of the OCaml
expression: "let a = b in c"
. The version using the module
MLast
directly is:
MLast.ExLet
(loc, false, [MLast.PaLid (loc, "a"), MLast.ExLid (loc, "b")],
MLast.ExLid (loc, "c"))
Not so complicated, perhaps. But you need directions for use detailing
all tree nodes, their parameters, their usages. If you are courageous,
you may try to look inside the code of Camlp4 to see how they are
used.
But the quotation system of Camlp4 provides a handy way to represent
these trees. In this system, if the right file is loaded, you are able
to write the above example as:
<:expr< let a = b in c >>
Simpler, isn't it? Everything inside is treated at compile time by
Camlp4 which generates exactly the same code than the above ``long''
version.
Let us look in details at the Camlp4 quotation system, then. It can be
used for these tree notes of OCaml syntax, but actually for any other
type. You can define your own quotations, using any syntax you choose.
4.2 |
Camlp4 quotation system |
|
The quotations contents are not lexed, as said above, but they are
even so treated at parse time. Actually they are not lexed by the
OCaml lexer (of Camlp4), but they are analyzed by other functions,
which are called ``quotation expanders''. These expanders just take
a string as parameter (the contents of the quotation) and return
a piece of OCaml program.
There are two kinds of quotations expanders, which do the same things:
but one version is easy to use, although not general, and the other
one is general but need to have a knowledge of... OCaml syntax trees
quotations (them again).
Well... this second version needs the knowledge of quotations to learn
quotations... and the OCaml syntax trees quotations are explained in
the next chapter.
Ok, since we are stuck in a mutually recursive documentation, let us
start rather with the ``simple'' quotations, the ones ``easy to
use''. This will explain you how work quotation expanders. In a second
step, we explain the general quotations which need MLast
quotations to be defined.
4.3 |
Example: defining constants |
|
The simple version of quotation expanders just return strings. These
returned strings are a piece of code, in concrete syntax, like source
code. And they *are* source code, in a meaning that they need to be
parsed after the quotation expansion.
Let us take a small example. We want to define constants by their name
(this example is almost the equivalent of the #define
in C for
the case when a simple constant is defined).
You are going to create our examples in the toplevel by typing (under
the toplevel):
#load "camlp4o.cma";;
Now, we can type this:
# let expand _ s =
match s with
"PI" -> "3.14159"
| "goban" -> "19*19"
| "chess" -> "8*8"
| "ZERO" -> "0"
| "ONE" -> "1"
| _ -> "\"" ^ s ^ "\""
;;
Let us call the quotation ``foo''. We can associate the quotation
``foo'' to the above expander ``expand'' by typing:
# Quotation.add "foo" (Quotation.ExStr expand);;
We can experiment the new quotation immediately:
# <:foo<PI>>;;
- : float = 3.14159
# <:foo< hello, world >>;;
- : string = " hello, world "
# <:foo<ONE>> + <:foo<ONE>>;;
- : int = 2
# let rec fact x =
if x = <:foo<ZERO>> then <:foo<ONE>> else x * fact (x - 1)
;;
val fact : int -> int = <fun>
And so on. But quotations can also be used as patterns:
# let rec fib =
function
<:foo<ZERO>> | <:foo<ONE>> -> 1
| n -> fib (n - 1) + fib (n - 2)
;;
val fib : int -> int = <fun>
You can remark that a specific quotation has not a specific type: its
type depends on what the quotation generates: it this case, it can be
a float number, an integer, a string. Notice also that the spaces
inside quotations are significant: here the expander do not strip
them, and therefore:
# <:foo< PI >>;;
since " PI "
with two spaces around is not matched by
"PI"
without spaces, doesn't return the value of PI
but:
- : string = " PI "
If we want that both cases return the value of PI, we should have
written a more subtle quotation expander. A quotation expander can
use any parsing technology: string pattern matching (in our example),
stream parsers, ocamllex, ocamlyacc, Camlp4 grammars... What is
important is that they take a string as parameter and return a piece
of program.
4.4 |
Quotations and the compiler |
|
We just saw an example of quotation in the toplevel: but in this
context, compilation level and program level are mixed. When using the
compiler ocamlc
, the quotation system requires that the two
levels be separated:
-
The quotation expander and the statement
Quotation.add
are directives to the compiler.
- The usage of quotations are in the user program.
The ``compiler part'' must be previously compiled. To test the above
example, we have to copy the text of the function expand
and
the call to Quotation.add
in a file, e.g. foo.ml
, which
must be compiled like this:
ocamlc -I +camlp4 -c foo.ml
This creates an object file named foo.cmo
. In Camlp4, all
syntax extensions are done through OCaml object files. The
preprocessor camlp4o
takes a list of object files as first
arguments and load them. Let us write a file fib.ml
:
(* file fib.ml *)
let rec fib =
function
<:foo<ZERO>> | <:foo<ONE>> -> 1
| n -> fib (n - 1) + fib (n - 2)
;;
As a first remark, we can see that the normal OCaml compiler does not
know quotations:
$ ocamlc -c fib.ml
File "fib.ml", line 4, characters 4-6:
Syntax error
But Camlp4 does...
$ ocamlc -pp camlp4o -c fib.ml
File "fib.ml", line 4, characters 4-16:
While expanding quotation "foo":
Uncaught exception: Not_found
Preprocessing error
... providing the quotation expander object file is given as parameter
(it must be written ./foo.cmo
because camlp4
does not
have the current directory in its default research path):
$ ocamlc -pp "camlp4o ./foo.cmo" -c fib.ml
4.5 |
Pretty printing the result |
|
How to be sure that the quotations are correctly expanded? If you are
perfecting a quotation expander or if you already have got a quotation
expander, and you want to see the results, you can ask camlp4
to
pretty print the result.
For that, use camlp4o
as command with the predefined printing
kit, named "pr_o.cmo"
:
$ camlp4o ./foo.cmo pr_o.cmo fib.ml
(* file fib.ml *)
let rec fib =
function
0 | 1 -> 1
| n -> fib (n - 1) + fib (n - 2)
;;
The quotations have been replaced by their value.
4.6 |
Quotations returning syntax trees |
|
Our quotation expander, the function ``expand'', returns
strings. Internally, when camlp4
encounters a quotation ``foo''
in the program text, it calls this function and get the resulted
string. This string is parsed with the grammar entry ``expr''
(expressions) or ``patt'' (pattern).
But this has the following drawbacks:
1/ It needs a new parsing phase (which takes time, not much but it is
too bad).
2/ If the expander is bad written, this resulting string may be
syntactically incorrect, and it is difficult to debug (however see the
option -QD
of camlp4).
3/ It is dependent from the enclosing syntax: the same expander may
work e.g. in revised syntax but not in normal syntax.
To illustrate the point 2/, try to type this:
# <:foo< to"to >>;;
The result is this strange message:
# <:foo< to"to >>;;
^^^^^^^^^^^^^^^
While parsing result of quotation "foo":
(consider setting variable Pcaml.quotation_dump_file)
Parse error: end of input expected after [expr] (in [expression])
It is because our quotation expander was too simple: it created a
string containing: a double quote, the contents of the quotation,
another double quote, i.e:
" to"to "
The parser then fails with this input. Debugging this can be sometimes
complicated, especially if the expander does not pretty print its
results, or add a lot of redundant parentheses, etc. Here, the
solution would have been to use "String.escaped s"
instead of
"s"
in the expander.
To avoid that, and all the other drawbacks, there is the other
quotation system, the one where expanders return abstract syntax
trees. In ``expand'', instead of returning the string
"3.14159"
, we may want to say ``the syntax tree representation
of the float number 3.14159''. In this case, it does not need another
parsing phase, no risk of parse error, and it is independent from the
enclosing syntax.
The way to create OCaml syntax trees is explained in chapter
6. They can be written in quotations, using the syntax
extension kit named q_MLast.cmo
.
The same quotation expander in our file foo.ml
could be written:
(* file foo.ml *)
let loc = (0, 0);;
let expand_expr s =
match s with
"PI" -> <:expr< 3.14159 >>
| "goban" -> <:expr< 19 * 19 >>
| "chess" -> <:expr< 8 * 8 >>
| "ZERO" -> <:expr< 0 >>
| "ONE" -> <:expr< 1 >>
| _ -> <:expr< $str:s$ >>
;;
let expand_patt s =
match s with
"PI" -> <:patt< 3.14159 >>
| "ZERO" -> <:patt< 0 >>
| "ONE" -> <:patt< 1 >>
| _ -> <:patt< $str:s$ >>
;;
Quotation.add "foo" (Quotation.ExAst (expand_expr, expand_patt))
This time we used ExAst
instead of ExStr
. This
constructor needs two expanders: one for the quotations in position of
expression and one for the quotations in position of pattern. Notice
that the cases "goban"
and "chess"
are not in the
pattern version since 19*19
and 8*8
are not correct patterns.
The compilation of foo.ml
needs the quotation expander kit
q_MLast.cmo
:
$ ocamlc -pp "camlp4o q_MLast.cmo" -I +camlp4 -c foo.ml
This creates an object file "foo.cmo"
which can be used to
compile "fib.ml"
.
Just for curiosity, you can pretty print the expander itself, using
the pretty printing kit pr_o.cmo
. Type:
$ camlp4o q_MLast.cmo pr_o.cmo foo.ml
4.7 |
Example: lambda terms |
|
We can now take a bigger example, bigger than just creating
constants. We want to manipulate lambda terms. A lambda term can be
defined by the following type:
type term =
Var of string
| Func of string * term
| Appl of term * term
;;
The first case, Var
, represents variables.
The second case, Func
, represents functions. Its first
parameter is the function parameter and its second parameter the
function body. We write that in concrete syntax
[parameter]body
.
The third case, App
, represents an application of two lambda
terms. We write that in concrete syntax (term1 term2)
.
But, for the moment, we just defined a type term
, and we can
just write these terms using the constructors. Here is an example:
let id = Func ("x", Var "x")
let k = Func ("x", Func ("y", Var "x"))
let s =
Func ("x", Func ("y", Func ("z",
Appl (Appl (Var "x", Var "y"), Appl (Var "x", Var "z")))))
let delta = Func ("x", Appl (Var "x", Var "x"))
let omega = Appl (delta, delta)
A nice quotation expander would allow us to use concrete syntax. The
same piece of program could look like this, which is more readable:
let id = << [x]x >>
let k = << [x][y]x >>
let s = << [x][y][z]((x y) (x z)) >>
let delta = << [x](x x) >>
let omega = << (^delta ^delta) >>
Let us write the corresponding quotation expander, then.
Here, the contents of our quotations is too complicated to be parsed
just by string pattern matching. We could use a stream parser, but the
simplest way is to use grammars.
No need to write a lexer, the default lexer Plexer provided in the
Camlp4 library fits. Using our knowledge (previous chapter) about
Camlp4 grammars, here is a quotation expander for the lambda terms:
(file named q_term.ml
):
let gram = Grammar.gcreate (Plexer.gmake ());;
let term_eoi = Grammar.Entry.create gram "term";;
let term = Grammar.Entry.create gram "term";;
EXTEND
term_eoi: [ [ x = term; EOI -> x ] ];
term:
[ [ "["; x = LIDENT; "]"; t = term -> <:expr< Func $str:x$ $t$ >>
| "("; t1 = term; t2 = term; ")" -> <:expr< Appl $t1$ $t2$ >>
| x = LIDENT -> <:expr< Var $str:x$ >> ] ]
;
END;;
let term_exp s = Grammar.Entry.parse term_eoi (Stream.of_string s);;
let term_pat s = failwith "not implemented term_pat";;
Quotation.add "term" (Quotation.ExAst (term_exp, term_pat));;
Quotation.default := "term";;
Several remarks about the text of this quotation expander:
- Inside the expr quotations (action parts in the entry
term
), you see things between dollar signs: these are antiquotations. Antiquotations allow to insert piece of expressions
(or patterns) inside quotations. We see that very soon.
- Still in the quotations, the constructors
Func
and
Appl
are applied without parentheses: it is because inside the
predefined quotations for OCaml syntax trees, the ``revised syntax''
is used: close to the normal OCaml syntax, this syntax has however
some small differences: it is the case for constructors application:
in revised syntax, you have to write Func x y
instead of
Func (x, y)
. See chapter 5.
- The final statement
Quotation.default := "term"
tells camlp4
that "term"
is the default quotation, allowing us to use them
between <<
and >>
.
- Notice that you don't need to define the type
term
in this
file: the constructors are used only in the syntax trees which
internally represent them in strings.
- The pattern version is not written, we shall add it in
the final version.
To compile q_term.ml
, we need:
1/ the pa_extend.cmo
syntax extension, for the
EXTEND
statement
2/ the q_MLast.cmo
quotation extension, for the OCaml syntax
trees quotations.
The compilation must then be done by the command:
$ ocamlc -pp "camlp4o q_MLast.cmo pa_extend.cmo" -I +camlp4 \
-c q_term.ml
Now we can use the lambda term quotation q_term.cmo
we just
created. Under the toplevel, you can load it (after having loaded
camlp4o.cma
) and use the term quotation directly. But, at this
level, we need to have defined the type term
, otherwise:
# let id = << [x]x >>;;
^^^^^^^^^^
Unbound constructor Func
Ok, enter the definition of the type term
in the toplevel. Then:
# let id = << [x]x >>;;
val id : term = Func ("x", Var "x")
# let k = << [x][y]x >>;;
val k : term = Func ("x", Func ("y", Var "x"))
# let s = << [x][y][z]((x y) (x z)) >>;;
val s : term =
Func
("x",
Func
("y",
Func
("z", Appl (Appl (Var "x", Var "y"), Appl (Var "x", Var "z")))))
# let delta = << [x](x x) >>;;
val delta : term = Func ("x", Appl (Var "x", Var "x"))
The definition of omega
given in the initial example is a
special case that we are going to see in the next section. For the
moment, it just answers:
# let omega = << (^delta ^delta) >>;;
^
While expanding quotation "term":
Parse error: [term] expected after '(' (in [term])
We can remark that the location of the syntax error is correct: this
is due to the grammar system: in case of syntax error, the error
exception is enclosed with the exception exc_located
which
transmits the error location. Receiving this error, the quotation
expansion machinery just has to add the location of the quotation to
be able to display the error location correctly in the input text.
Let us see now the case of this definition of the variable
omega
. It can be resolved with antiquotations.
Antiquotation is a way to insert code inside quotations. Unlike
quotations, antiquotations are not a predefined notion of Camlp4: it
is just a programming technique.
In our example, we would like omega
to be ``the application of
delta to itself''. But when we say ``delta'', we don't mean ``a
variable delta'' in the context of a lambda term (which would be
Var "delta"
), but ``the value of the variable delta''
previously defined. We want to insert its value (twice, in this
example) to create the new lambda term.
In the very initial version, we had written:
let omega = Appl (delta, delta)
Ok, we could use that, it is still correct, but as we have a system of
quotations, we would like to represent that as concrete syntax, with
an application (the two terms between parentheses).
In our concrete syntax, we need to add a specific case to specify ``a
value of the enclosing environment''. Here we chose the caret sign
^
followed by an identifier.
We then have to add a grammar rule which says: ``if caret sign
followed by and identifier, return the syntax tree of the identifier
itself considered as a variable''. The rule can be written inside the
EXTEND
statement:
"^"; x = LIDENT -> <:expr< $lid:x$ >>
This right "expr"
quotation represents the OCaml syntax tree of
a variable whose name is "x"
. See chapter 6.
Adding this rule in the quotation expander, recompiling it, we can now
test in the toplevel:
# let delta = << [x](x x) >>;;
val delta : term = Func ("x", Appl (Var "x", Var "x"))
# let omega = << (^delta ^delta) >>;;
val omega : term =
Appl
(Func ("x", Appl (Var "x", Var "x")),
Func ("x", Appl (Var "x", Var "x")))
4.9 |
Locations in antiquotations |
|
Warning: this section is a little bit subtle, to resolve a specific
problem. You may skip it, if you find it too complicated or not
interesting in a first step.
It is about the location of possible semantic error. By default the
whole quotation is underlined:
# let omega = << (^delta ^xxx) >>;;
^^^^^^^^^^^^^^^^^^^
Unbound value xxx
However the variable xxx
has a location, in the quotation. And
the grammar system is supposed to take care of locations, via a
variable named loc
transmitted from the grammar rule to the
action part, which is used by the quotations of OCaml syntax
trees. But precise locations has been lost. Why?
It is because the Camlp4 quotation machinery ignores if the syntax
trees you built have correct locations. It just receives a syntax
tree, but does not know which technique you
used1.
The quotation expander might have inserted eccentric locations: in
this case, in case of semantic error, the error location could be
anywhere in the input text, inside the quotation but at a wrong place,
or outside the quotation. even possibly outside the text.
To avoid a possible problem, the Camlp4 quotation machinery wisely
scans the resulting syntax tree and put the location of the whole
quotation in all nodes (erasing the old ones): better being sure that
all semantic error underline the whole quotation rather than risk that
the errors messages be anywhere in the input text, or referring bad
parts of the quotation.
By default, thus, the quotation machinery does not trust the
programmer of the quotation expander. There is however a way to tell
it that the location is correct: you can specify in the resulting tree
that some specific part contains correct locations.
This can be done by creating an ``antiquotation'' node. It can be
written, for an expression e
or a pattern p
:
<:expr< $anti:e$ >>
<:patt< $anti:p$ >>
Here, in the rule:
"^"; x = LIDENT -> <:expr< $lid:x$ >>
the right part is just the tree node for the identifier x. It contains
the location of the identifier (including the caret, actually). But as
it is not enclosed by an ``antiquotation'' node, the quotation
machinery will erase the location and replace it by the location of
the whole quotation.
We are then going to use such an node. But we have to remark first
that the location of the rule is not exactly what we want, since it
includes the caret sign and the possible spaces between it and the
identifier. To be very correct, we should use another grammar entry,
e.g. quotation
, holding just one case LIDENT
and create
the antiquotation there where we are sure that the "loc"
variable represents the location of the LIDENT
.
The subtree under the antiquotation node is the tree to represent the
variable x
. We cannot use <:expr< $lid:x$ >>
directly,
because it would take loc
which is the location of the
antiquotation itself, and we need the location of the variable relative to the beginning of the antiquotation. Its location is then:
(0, String.length x)
With all these remarks, the line for the antiquotation must then be:
"^"; x = antiquot -> x
and the antiquot
entry:
antiquot:
[ [ x = LIDENT ->
let ast =
let loc = (0, String.length x) in
<:expr< $lid:x$ >>
in
<:expr< $anti:ast$ >> ] ]
;
The syntax of your antiquotations is the one you want, providing you
can isolate it from the rest of your quotation. In this example, the
antiquotation is introduced by "^"
and the antiquotation is just
an identifier.
But you may want that your antiquotations are between some kind of
``parentheses''. In this case, you need a way to build the syntax
tree of the antiquotation. For that, use:
Grammar.Entry.parse Pcaml.expr_eoi (Stream.of_string s)
where s
is the antiquotation string: this will create your
antiquotation subtree (if your quotation is in a pattern, instead
of an expression, use patt_eoi
instead of expr_eoi
).
For example, our predefined antiquotations building syntax trees (see
chapter 6) are between two ``dollar'' signs, allowing users
to write things like that:
<:sig_item< value $x ^ string_of_int n$ : unit -> unit >>
To build the syntax tree of this antiquotation, our quotation
``sig_item''
applies the above call to the antiquotation
string, where s
is the contents of the antiquotion, i.e.:
"x ^ string_of_int n"
Notice that the antiquotation subtree will, by default, inherit the
location of the whole quotation. If you want it to have its own
location (which is very interesting in case of typing errors), don't
forget to enclose it with the antiquotation node; you also need to
enclose this call with a try...with
in case of syntax error in
the antiquotation string and recompute the error location:
let ast =
try Grammar.Entry.parse Pcaml.expr_eoi (Stream.of_string s) with
Stdpp.Exc_located (bp, ep) exc ->
raise_with_loc (fst loc + bp, fst loc + ep) exc
in
<:expr< $anti:ast$ >>
4.10 |
Example: lambda terms, finished |
|
Here is now our quotation expander for lambda terms, including the
antiquotation location system of the previous section and the pattern
version:
let gram = Grammar.gcreate (Plexer.gmake ());;
let term_exp_eoi = Grammar.Entry.create gram "term";;
let term_pat_eoi = Grammar.Entry.create gram "term";;
EXTEND
GLOBAL: term_exp_eoi term_pat_eoi;
term_exp_eoi: [ [ x = term_exp; EOI -> x ] ];
term_exp:
[ [ "["; x = LIDENT; "]"; t = term_exp -> <:expr< Func $str:x$ $t$ >>
| "("; t1 = term_exp; t2 = term_exp; ")" -> <:expr< Appl $t1$ $t2$ >>
| x = LIDENT -> <:expr< Var $str:x$ >>
| "^"; x = exp_antiquot -> x ] ]
;
exp_antiquot:
[ [ x = LIDENT ->
let ast = let loc = (0, String.length x) in <:expr< $lid:x$ >> in
<:expr< $anti:ast$ >> ] ]
;
term_pat_eoi: [ [ x = term_pat; EOI -> x ] ];
term_pat:
[ [ "["; x = LIDENT; "]"; t = term_pat -> <:patt< Func $str:x$ $t$ >>
| "("; t1 = term_pat; t2 = term_pat; ")" -> <:patt< Appl $t1$ $t2$ >>
| x = LIDENT -> <:patt< Var $str:x$ >>
| "^"; x = pat_antiquot -> x ] ]
;
pat_antiquot:
[ [ x = LIDENT ->
let ast = let loc = (0, String.length x) in <:patt< $lid:x$ >> in
<:patt< $anti:ast$ >> ] ]
;
END;;
let term_exp s = Grammar.Entry.parse term_exp_eoi (Stream.of_string s);;
let term_pat s = Grammar.Entry.parse term_pat_eoi (Stream.of_string s);;
Quotation.add "term" (Quotation.ExAst (term_exp, term_pat));;
Quotation.default := "term";;
After compilation of this file, q_term.ml
, some experiments
in the toplevel (after having loaded camlp4o.cma
and
q_term.cmo
and defined the type term
):
# let omega = << (^delta ^xxx) >>;;
^^^^^
Unbound value delta
# let delta = << [x](x x) >>;;
val delta : term = Func ("x", Appl (Var "x", Var "x"))
# let omega = << (^delta ^delta) >>;;
val omega : term =
Appl
(Func ("x", Appl (Var "x", Var "x")),
Func ("x", Appl (Var "x", Var "x")))
# match omega with << (^a ^b) >> -> a | x -> x;;
- : term = Func ("x", Appl (Var "x", Var "x"))
Many improvements could be done, for example, to be able to accept the
any pattern "_"
(any) in order to be able to write this last
example:
match omega with << (^a ^_) >> -> a | x -> x;;
You know the main points about quotations. In the road to be able to
make syntax extension in OCaml, we have advanced to the second step:
the general quotation system of Camlp4. Now, we could look in detail
at the specific predefined quotations for OCaml syntax trees. We
already used some of them. However, as they use the ``revised
syntax'', we must introduce it before.
- 1
- The quotation
machinery receives the quotation expander via the function
Quotation.add: for it, the expander is just a function; it does not
know how it is implemented.
For remarks about Camlp4, write to: