What are the basic control structures in Caml ?

Created in June 1996.

Caml offers the following control structures:

simple alternative tests if ... then ... else
complexes tests using pattern matching match ... with ...
sequence of instructions ;
loops for and while
escapes raise ... and corresponding error handling try ... with ....

If none of these control structures can fit your needs, you must define your own structures using functions (recursive or not).

Table of contents:

if ... then ... else
if ... then ...
Pattern matching
Partial or unused matches
Intricate matching
Pattern matching with conditional
Sequence
Sequence and pattern matching
Sequence and if ... then ... else
Loops
Premature end of loops (break, exit, abort)
Escapes
Error handling
Defining exceptions
Predefined exceptions

`if ... then ... else`

For tests we classically use an if condition then expression1 else expression2 construct, that returns expression1 or expression2, according to the condition boolean value.

Boolean arithmetic comparisons are =, <, >, <=, >=, <>, and boolean operators are &&, ||, not.

For instance:

let absolute_value x = if x >= 0 then x else - x;;
absolute_value : int -> int = <fun>

Type checking: the condition must have type bool, and both branches must have the same type (which is the type of the whole construct).

`if ... then`

A if construct without the corresponding else part is automatically completed with an extra else () clause. This kind of construct is useful inside sequences.

Type checking: as a regular alternative, hence the condition must have type bool and both branches must have the same type (unit).

Pattern matching

Pattern matching or shape analysis is used to perform complex tests involving the shape of the argument applied to the construct.

Syntactically, a pattern matching is a list of clauses. Each clause is a construct of the form | pattern -> expression. The pattern matching mechanism is an analysis of the shape of a given value to compare it with each of the patterns or shapes of the list of clauses. The analysis succeeds as soon as the value can be considered as an instance of the pattern of the clause. A value is an instance of a pattern if it is a special case of this pattern (in other words, if the pattern is a generalized form of the value or else the pattern is more general than the value).

A pattern is made of constants (basic constants or constructors), variables, and applications of functional constructors to argument patterns. Consider for instance the clauses | 0 -> 1 (integer constant pattern 0) or | n -> 1 (the pattern is just a single variable n). The only value that matches the pattern 0 is obviously the integer 0, when the n pattern agrees with every value whatever it could be. Finally, a pattern such as C f where C is a constructor and f another pattern, matches all the values built using the constructor C and an argument value v that matches pattern f.

Patterns of clauses are tried in turn following the order of the text of the program, until one succeeds.

If none matches the value at hand, an run time error is raised (see for more information).

The value returned by a pattern matching is the one returned by the corresponding expression of the clause that succeeded.

The basic syntactic construct to perform pattern matching is match expression with clauses: it evaluates expression, then matches the resulting value with each clause of the list of clauses given in the clauses part of the matching.

For instance, succ 0 being 1, the following pattern matching returns the expression corresponding to the clause | 1 -> "unit":

#match succ 0 with
| 0 -> "null"
| 1 -> "unit"
| _ -> "several";;
- : string = "unit"

Note the last clause | _ -> "several", that involves an underscore in the pattern part: the _ pattern means ``anything else'' and matches any value. It is very often used as the last pattern to catch all the remaining cases.

Pattern matching a value against a variable has another effect: during the computation of the expression on the right of the corresponding clause, the variable evaluates to the value ( pattern matching performs bindings of variables to values). For instance, succ 1 being 2, the following pattern matching returns the expression of the clause | n -> string_of_int n, thus string_of_int n with n evaluating to 2 (we say that n is bound to 2):

#match succ 1 with
| 0 -> "null"
| 1 -> "unit"
| n -> string_of_int n;;
- : string = "2"

Note that the if ... then ... else ... construct is just a special case of pattern matching on booleans: if cond then e1 else e2 corresponds to

match cond with
| true -> e1
| _ -> e2

Type checking: all clauses of a pattern matching must have the same type: patterns must have the same type and the expressions must have a common type as well. In the case of the match e with matching construct, the type that patterns have in common must be the type of e, and the type of all the expressions of the clauses of the pattern matching is the type of the whole construct.

Partial matches or unused match cases

Whenever possible, pattern matchings must be exhaustive (every possible case gets a corresponding clause) and must not contain useless clauses (presumably because there are two distinct clauses for the same case). The Caml compiler will probably emit a warning when faced with such a bad pattern matching. Whenever the pattern matching mechanism fails at runtime because the value is not an instance of one of the patterns, a failure occurs indicating where in the source code the pattern matching failed.

Intricate pattern matching

For syntactic reasons, a pattern matching that occurs inside another one must be parenthesized (using parentheses or begin and end keywords).

The reason is clearly evident, if you consider the pattern matching of the expression e2, inside the pattern matching of e1:

match e1 with
| f1 ->
   match e2 with
   | f2 -> ...
   | f3 -> ...
| f4 -> e2

As it is written here, one understands that f2 and f3 belongs to the pattern matching of e2, and this is correct. But, a human being will probably believe that the indentation is significant, and will consider that the pattern f4 belongs to the surrounding pattern matching, the one of e1. Still f4 simply follows f3, which we agree belongs to the pattern matching of e2. That's why, although the indentation is here error prone, f4 syntactically belongs to the internal pattern matching (the one of e2). Presumably, the indented meaning of the program was:

match e1 with
| f1 ->
   begin match e2 with
   | f2 -> ...
   | f3 -> ... end
| f4 -> e2

To avoid such errors, you must always use parens when writing pattern matchings inside other pattern matchings.

Pattern matching with conditional

If you have some condition to test during pattern matching, you may use a guard, that is a boolean condition that will monitor the selection of the pattern matching clause. A guard is introduced by when, followed by the expression to test. If this expression evaluates to true the clause is selected (when its pattern part matches), otherwise the next pattern is checked. For instance:

(* power i j computes i ^ j, provided j is positive. *)
let rec power i j =
 match j with
 | 0 when i != 0 -> 1
 | 1 -> i
 | j when j mod 2 = 1 -> let p = power i (j / 2) in i * p * p
 | j when i = 0 -> invalid_arg "power"
 | _ -> let p = power i (j / 2) in p * p;;
power : int -> int = <fun>

Another typical example is to test the equality of two variables of the pattern. The following predicate tests if a list has two successive elements that are identical:

let rec two_successive_equal = function
| [] -> false
| [_] -> false
| x :: y :: l when x = y -> true
| x :: l -> two_successive_equal l;;
two_equal : 'a list -> bool = <fun>

Sequence of expressions

The sequence e1; e2 means successive evaluation of the two expressions e1 then e2. The construct returns the value of e2. This construct is used to perform effects, for instance printing effects:

#print_string "Hello ";
 print_string "World!";;
Hello World!- : unit = ()

Type checking: a sequence has the type of its last expression.

Sequence and pattern matching

In Caml, there is no need to parenthesize a sequence appearing in a pattern matching clause: ... -> e1; e2 is equivalent to ... -> begin e1; e2 end.

Sequence and `if ... then ... else ...`

A sequence appearing in a branch of an alternative must be parenthesized. The common habit is to use begin and end. One writes

if cond
 then begin e1; e2 end
 else begin e3; e4 end

On the other hand, an alternative inside a sequence does not need any parens. So

if cond1 then e1;
if cond2 then e2 else begin e3; e4 end;;

means:

if cond1 then e1 else ();
if cond2 then e2 else begin e3; e4 end;;

If you forgot to parenthesized a sequence corresponding to one of the branch of an alternative, this sequence belongs to the surrounding expression. For instance, in the expression:

if cond1 then e2 else e3; e4;;

the expression e4 is always evaluated, whatever the value of the cond1 condition could be. That's because the whole expression is equivalent to:

(if cond1 then e2 else e3);
e4;;

Loops

There are two kinds of loops, for and while loops. They are used to perform effects and always evaluate to the value () or ``nothing''.

While loop

One writes while condition do body done, to mean repetitive evaluation of body, while condition evaluates to true. In particular, if condition always evaluates to false body is never evaluated.

Type checking: the condition must have type bool, and a loop always have type unit.

For loop

One writes for ident = initial to final do body done, to mean repetitive evaluation of body, with ident successively bound to initial, initial + 1, ..., including final. In particular, if initial is strictly greater than final, body is never evaluated.

Note that ident is defined by the for loop, there is no need to declare or define it in advance. Notice also that you cannot modify the value of ident when the loop is running (since ident is not bound to a mutable reference).

#for i = 0 to 10 do print_int i; print_char ` ` done;;
0 1 2 3 4 5 6 7 8 9 10 - : unit = ()

Type checking: loop index has type int, as well as expressions initial and final. The loop gets type unit.

Premature exit from loops

To exit from a while or for loop, you must use the exception mechanism. Escape from the loop by raising the exception devoted to this case, the Exit predefined exception (see below).

Escapes

The raise primitive is used to thraw exceptions (signal errors) when computation cannot go on. These exceptions ought to be caught by a surrounding ``try ... with'' construct to treat the error (either by aborting the program after some error message, or going on another way). If there is no handler the failure propagates until the entire evaluation process finally aborts. For instance:

#print_string "Hello ";
#raise (Failure "division by zero");
#print_string " world!\n";;
HelloUncaught exception: Failure "division by zero"

Type checking: if exception has type exn, then raise exception has type 'a.

Catching errors

The try expression with matching construct is used to handle errors: expression is evaluated while catching errors that may happen during this evaluation.

Errors that may appear during the evaluation are carried out via exceptions that tell the kind of errors at hand. These errors may be as numerous as necessary, so that we list them in the ``pattern matching'' part of the try ... with: if an exception of that kind occurs during the evaluation of expression, it is matched against the clauses of this pattern matching of the try ... with. If the match succeeds, the exception is stopped (that is, ``the error is recovered''): the expression corresponding to the matching clause is then returned, and the evaluation goes on (that is, the program does not abort).

If no clause matches the exception, this exception is propagated (that is, the exception is thrown again, and the program continue abortion).

Type checking: the type checking rule for try expression with matching is analogous to the type checking rule for match expression with matching: the patterns must have the type of exceptional values, i.e. exn, and all the expressions of the right hand side of clauses must have the same type as the expression that the handler protects, expression. This is the type of the whole construct.

Escaping from loops

Using exception handling, you can easily escape from any kind of loops (for or while, or even loops encoded into a recursive function): this is equivalent to the C break instruction, or the Ada exit instruction. Premature end of loop is obtained by raising the predefined exception Exit from within the body of the loop, while handling this exception from outside the loop.

For instance, we define a predicate that test if a vector contains a zero: if we encounter a 0, we escape from the loop and return true; otherwise the loop goes on until its normal end, and we return false.

let has_zero v =
 try
  for i = 0 to vect_length v - 1 do
   if v.(i) = 0 then raise Exit;
  done;
  false
 with Exit -> true;;

Defining exceptions

The user defines his own exceptions using exception name;; for a constant exception, or exception name of typ;; for an exception having an argument. These exceptions are generative, that is to say: two successive definitions of an exception with the same name lead to the definition of two distinct exceptions that will not be confused by Caml programs (handlers for the first exception will not handle the second exception).

Type checking: a constant exception has type exn, an exception defined by exception name of typ;; has type typ -> exn.

Predefined exceptions

Some exceptions are predefined in Caml to signal some frequent errors or problems:

Failure of string: report an error that occurred in some user's code. The reason of the failure or the name of the function where the error occurred is mentioned in the string argument.
Not_found: signal that a searching function has failed (e.g. assoc).
Exit: used to jump out of loops or functions.
Break: interruption (the user has send an explicit interrupt to the program).
Invalid_argument of string: the argument provided to the function cannot be handled properly.
Sys_error of string: the operating system host has raised an error (this exception is defined in the sys library module).
Match_failure of string * int * int: signal a pattern matching failure in the file mentioned, at the given location (in characters).
End_of_file: an end of file occurred while reading some data (this exception is defined in the io library module).

Contact the author Pierre.Weis@inria.fr