Up Next
Chapter 1 Introduction

Camlp4 is a preprocessor for OCaml. As a preprocessor, you can do syntax extensions to your OCaml programs. But Camlp4 also provides some other features: Camlp4 is syntax, syntax, syntax. It uses its own syntax systems to do its own syntax extensions: it is highly bootstrapped. Camlp4 stops at syntax level: it does not know anything about semantic, typing nor code generation (for it, a type definition is just a syntactic thing which starts with ``type'').

The ``p4'' in the name ``camlp4'' stands for the 4 ``p'' of ``Pre-Processor-Pretty-Printer''.

1.1 Extending the syntax of OCaml

To start with the beginning, we could try to learn how to make simple syntax extensions to OCaml. If you know the C language, you probably experimented the define construction, very easy to use:
     #define FOO xyzzy
and all occurrences of FOO in the rest of the program are replaced by xyzzy.

In Camlp4, is it not so simple. A syntax extension is not just text replacing: it is an extension of an entry of the grammar of the language, and you need to create syntax trees.

It is therefore necessary 1/ to know what is the grammar system provided by Camlp4 2/ to know how to create syntax trees. It is what we are going to do in this tutorial. Once these points described, we have got the tools to do the syntax extensions of the language.

If you are impatient, and you want to create your syntax extension in the next quarter of an hour, and you don't want to learn all that stuff, you may consider taking the text of an already existing syntax extension and change it for you own needs. A syntax extension is not necessarily a long program (for example, adding the repeat..until construction of Pascal takes 6 lines) and you can guess ``how it works'' and ask the wizards...

Examples are given in chapter 7.

However, if you read this manual, you may be interested on learning the original system of grammars that Camlp4 provides. It can be used for other goals than extending the OCaml language: for your own grammars. This system of grammars is an alternative of yacc: a different approach, but you can describe your language in some identical way.

Just the practical things before (what do I type to experiment?)

1.2 Using Camlp4 as a command and in the toplevel

You must first know that camlp4 is a command. This chapter does not explain all the details of the command and its options: we see that further (chapter 8; you can also use the man pages by typing "man camlp4" in your shell).

For the moment, here is a magic incantation to compile a file named foo.ml:
     ocamlc -pp "camlp4o pa_extend.cmo" -I +camlp4 -c foo.ml
This command just compiles foo.ml as a normal OCaml file, but where the parsing is done by camlp4. The first examples in this documentation (grammars in Camlp4) can be compiled using this command. Otherwise, the examples are given with the correct command to use in order to compile the files.

Another (recommended) better way is to use the OCaml toplevel. In the toplevel, type:
            #load "camlp4o.cma";;
            #load "pa_extend.cmo";;
You can type the examples of this documentation in the toplevel. You can also type them in files and use the directive #use to include them.

All the examples in this documentation are written in the normal syntax of OCaml, but if you know and prefer the revised syntax provided by Camlp4, change camlp4o into camlp4r in the ocamlc command, or, load "camlp4r.cma" instead of "camlp4o.cma" in the toplevel.

1.3 Linking applications using Camlp4 libraries

Many examples of this tutorial use some specific Camlp4 libraries. In the toplevel, you don't need to load them because they are in the file camlp4o.cma.

To link a standalone application, you need to add the library named gramlib.cma of the Camlp4 library directory. The command is:
            ocamlc -I +camlp4 gramlib.cma <the_files_you_link>
1.4 Differences in parsing behavior

Even if you use the normal syntax, there are some small differences in the parsing behavior between the normal ocamlc parser (bottom up, LALR parsing) and the camlp4 parser (top down, recursive descent parsing). These differences appear notably when giving erroneous input. As a trivial example, suppose that you wanted to type
   (* correct intended input *)
   type t = Buf of Buffer.t
          | Str of string
Instead of typing the above example, you forgot the second occurrence of the of keyword, getting
   (* file wrongsyntax.ml : wrong input - missing keyword *)
   type t = Buf of Buffer.t
          | Str (*missing "of"*) string
The ocamlc compiler1 (invoked as ocamlc -c wrongsyntax.ml) finds a syntax error on the string word; it parses the whole file as a single type declaration and finds a syntax error inside it.

The camlp4 parser (with ordinary syntax), invoked as ocamlc -c -pp camlp4o wrongsyntax.ml don't find any (shallow) syntactic error, but parses the above input as two items:
type t = Buf of Buffer.t
       | Str
which is a correct type declaration (different from the one intended by the author), followed by a simple expression
            string
which is understood like let _ = string, and produces the following message Unbound value string


1
The interactive toplevel ocaml has the same behavior, unless you load a different parser.


Up Next