Previous Contents Next

Steps of Compilation

An executable file is obtained by translating and linking as described in figure 7.1.

  Source program  
  Source program  
  Assembly program  
  Machine instructions  
  Executable code  

Figure 7.1: Steps in the production of an executable.

To start off, preprocessing replaces certain pieces of text by other text according to a system of macros. Next, compilation translates the source program into assembly instructions, which are then converted to machine instructions. Finally, the linking process establishes a connection to the operating system for primitives. This includes adding the runtime library, which mainly consists of memory management routines.

The Objective CAML Compilers

The code generation phases of the Objective CAML compiler are detailed in figure 7.2. The internal representation of the code generated by the compiler is called an intermediate language (IL).

  Sequence of characters  
lexical analysis  
  Sequence of lexical elements  
  Syntax tree  
semantic analysis  
  Annotated syntax tree  
generation of intermediate code  
  Sequence of IL  
optimization of intermediate code  
  Sequence of IL  
generation of pseudo code  
  Assembly program  

Figure 7.2: Compilation stages.

The lexical analysis stage transforms a sequence of characters to a sequence of lexical elements. These lexical entities correspond principally to integers, floating point numbers, characters, strings of characters and identifiers. The message Illegal character might be generated by this analysis.

The parsing stage constructs a syntax tree and verifies that the sequence of lexical elements is correct with respect to the grammar of the language. The message Syntax error indicates that the phrase analyzed does not follow the grammar of the language.

The semantic analysis stage traverses the syntax tree, checking another aspect of program correctness. The analysis consists principally of type inference, which if successful, produces the most general type of an expression or declaration. Type error messages may occur during this phase. This stage also detects whether any members of a sequence are not of type unit. Other warnings may result, including pattern matching analysis (e.g pattern matching is not exhaustive, part of pattern matching will not be used).

Generation and the optimization of intermediate code does not produce errors or warning messages.

The final step in the compilation process is the generation of a program binary. Details differ from compiler to compiler.

Description of the Bytecode Compiler

The Objective CAML virtual machine is called Zinc (``Zinc Is Not Caml''). Originally created by Xavier Leroy, Zinc is described in ([Ler90]). Zinc's name was chosen to indicate its difference from the first implementation of Caml on the virtual machine CAM (Categorical Abstract Machine, see [CCM87]).

Figure 7.3 depicts the bytecode compiler. The first part of this figure shows the Zinc machine interpreter, linked to the runtime library. The second part corresponds to the Objective CAML bytecode compiler which produces instructions for the Zinc machine. The third part contains the set of libraries that come with the compiler. They will be described in Chapter 8.

Figure 7.3: Virtual machine.

Standard compiler graphical notation is used for describing the components in figure 7.3. A simple box represents a file written in the language indicated in the box. A double box represents the interpretation of a language by a program written in another language. A triple box indicates that a source language is compiled to a machine language by using a compiler written in a third language. Figure 7.4 gives the legend of each box.

Figure 7.4: Graphical notation for interpreters and compilers.

The legend of figure 7.3 is as follows:


The majority of the Objective CAML compiler is written in Objective CAML. The second part of figure 7.3 shows how to pass from version v1 of a compiler to version v2.

Previous Contents Next