A C++/Java programmer's introduction to Objective Caml

by Stephan Houben

Introduction

Objective Caml is a programming language developped by INRIA, the French National Institute for Research in Computer Science and Control. Why would anyone want to use it, instead of C++ or Java. Well, there are basically three reasons:
  1. It's convenient.
    O'Caml has lots of modern features, like garbage collection (as in Java), closures (lacking in both Java and C++), parametrized modules (like C++ templates on steroids), automatic type inference (you don't have to write typing information by hand, the compiler can figure it out). This makes O'Caml very convenient to write programs in. Moreover, it comes with an interactive interpreter, so you can easily try out O'Caml statements and debug your code.
  2. It's safe.
    O'Caml is fully statically typed, no casts needed as in C++ or Java. There are also no pointers, so you're program can never dump core because it tried to dereference a null pointer. All arrays must be initialized before use, and by default, all array access is boundary-checked. (But this can be turned off if you rally want to squeeze out that last 1% of performance.)
  3. It's fast.
    O'Caml comes with a native-code compiler AND a byte-code compiler. You can use the byte-code compiler for portability and fast compilation during development, and when you're done, you use the native-code compiler to produce a program that runs at near-C speed. Actually, O'Caml sometimes beats C: read this Usenet post if you don't believe me, and find out for yourself...
Well, so you have decided to give O'Caml a go. Great! The O'Caml distribution can be downloaded from here. You will have to download and install it before continuing with the rest of this tutorial.

Your first O'Caml program

When you have finished installing O'Caml, it's time for the all-time classic: "Hello, world" in O'Caml! Start the O'Caml interpreter by typing ocaml at the shell prompt. You then see something like this:
        Objective Caml version 2.02

# 
After the # prompt, you can type O'Caml commands, terminated by a double semicolon (;;). So let's try something:; type the following (and press enter afterwards):
print_string "Hello, world\n";;
This will produce the following result:
Hello, world
- : unit = ()
The first line is the output of this one-line "program", and the second line tells you that the function print_string returned a value () of type unit. The type unit is what is called void in C++ and Java: it means that print_string didn't return any interesting value at all.

Now let's see if we can make this one-liner into a standalone executable. To that end, first leave the O'Caml interpreter by pressing Control-D, and then create a file hello.ml using your favourite text editor. Put exact the same line in it as shown above. Save the file, and compile it with the following command (type it at the shell prompt, NOT at the O'Caml # prompt):

ocamlc hello.ml -o hello
This instructs the O'Caml bytecode compiler ocamlc to compile your program hello.ml into an executable hello. You can run it with:
./hello
If you installed the native-code compiler, you can also make a native executable. Do this with the following command:
ocamlopt hello.ml -o hello.opt
This will create an executable hello.opt. Run it with:
./hello.opt
Note the speed difference? Yes? Well, I didn't... at least not for this example.

Using O'Caml as a fancy calculator

OK, now we know that we can create standalone executables using O'Caml, let's continue to use the interpreter. Start the O'Caml inerpreter ocaml again, and now try the following:
1 + 1;;
The result is:
- : int = 2     
O'Caml tells us that the result is of type int (an INTegral number) and that its value is 2. And that is correct! I checked the result with pen-and-paper myself, just to be sure...

You can use all the common mathematical operators: +, -, *, and / (the latter is the integer division, just like in C++ and Java: 1/2 = 0). It is also possible to do floating-point arithmetic:

3.0 *. 4.0;;
results in:
- : float = 12
The type float is the same as double in C++ and Java; there is no single-precision floating-point type in O'Caml like C++'s float. Also note that the floating-point multiplication is spelled differently than the integer multiplication. In general, the floating-point analogs of +, -, *, and / are +., -., *., and /.. Unlike C++ and Java, there is no "automatic conversion" between floats and ints. You have to do the conversion explicitely using the functions float_of_int and int_of_float. As an example, try:
3.0 *. float_of_int 4;;
This gives again the answer 12.

Let-bindings and functions

Consider the following O'Caml statement:
let x = 3;;
The O'Caml interpreter responds with:
val x : int = 3   
This means that the value 3, of type int, is now bound to the identifier x. This is very much like a const declaration in C++: you can use x as a placeholder for 3 now. So if you enter:
x + 3;;
O'Caml will tell you that the result is 6. Note that you don't have to write the type explicitely: the system figures out itself that x has to be of type int.

As said before, x is like a const declaration in C++. This means that you cannot change the value of x by assigning to it. But how can you declare a variable, like in C++ or Java? Well, the answer is... YOU CAN'T! There is no such thing in O'Caml as a variable in C++ or Java.. There are only let-bindings like x, but they are immutable, meaning that, once defined, they cannot be changed. So the value of x will remain 3, now and for all eternity. In contrast, variables in C++ and Java are called mutable.

You might wonder how anyone can ever write a program in a language which lacks something basic as an assignment operator. Well, as it turns out, you really don't need mutable variables at all. But before I explain that, let's see how function definitions look in O'Caml:

let add_one n = n + 1;;
This defines a function add_one. If you enter this definition in the interpreter, it responds with the following:
val add_one : int -> int = <fun>
This means that add_one is of type int -> int, that is, a function accepting one argument of type int and returning an int. Again, O'Caml figures out the types itself. I think it is obvious what this function is supposed to do, but you can try it out for yourself:
add_one 5;;
Does this give the result you expected?

OK, now let us try to write a function sum_until, that given an integer n gives us the sum of the integers from 0 to n. Well that's easy, isn't it: just write a loop from 0 to n, and add the loop variable to a variable sum on every iteration... but wait, we cannot change a variable! How can we possibly do this without assignment? The answer is simple: instead of using a loop, use recursion.

let rec sum_until n = 
  if n = 0 
  then 0 
  else n + sum_until (n - 1);;
That's all, folks! Note that instead of using let, let rec was used, to indicate that it is a possibly recursive function. Perhaps now it becomes clear how one can write programs without assignment: the only place were assignment is really needed is in a loop, and every loop can be replaced by recursion. As opposed to C++ and Java, recursion in O'Caml is just as efficient as iteration. So we don't need assignments, and we actually also don't need loops. Wow, that surely simplifies things a lot!

Now let's see if this function works correctly:

# sum_until 10;;
- : int = 55
Looks OK to me. Perhaps you are not yet completely convinced that assignments are never necessary. OK, let's try the following example: compute the greates common divisor (gcd) of two numbers. It is quite easy to prove the following three properties of the gcd:
gcd n m = gcd m n,
gcd 0 m = m,
gcd n m = gcd (n-m) m, provided that n >= m.
Again, we take a recursive approach:
let rec gcd n m =
  if n = 0
  then m
  else if n >= m
       then gcd (n - m) m
       else gcd m n;;
Let's try it out:
# gcd 24 48;;
- : int = 24
# gcd 6 8;;
- : int = 2
Note that the recursive solution is short, elegant, and readable, much more readable than a similar iterative solution would be.

More input and output

Input and output is essential in any program. A program is not much good if you never get to see its results! So let's see if we can write a program which does some more I/O: we're going to write a simple calculator program. This program will display a menu with 5 choices. The first 4 allow you to add, subtract, multiply or divide two numbers. The fifth menu choice exits the program.

First, displaying the menu. For this, we're going to write a function display_menu:

let display_menu () =
  print_string "Make your choice:\n";
  print_string "1. Add two numbers\n";
  print_string "2. Subtract two numbers\n";
  print_string "3. Multiply two numbers\n";
  print_string "4. Divide two numbers\n";
  print_string "5. Exit program\n";;
This is pretty straight-forward, but nevertheless, there are a few new things here. First of all, display_menu is a function that really takes no arguments at all. But in O'Caml, every function has to take at least one argument, otherwise O'Caml would consider display_menu to be an ordinary let-binding, not a function. So in fact, display_menu takes an argument of type unit (remember, unit is the equivalent of void in C++ and Java). The only possible value of unit is (). So we call this function in the following way:
display_menu ();;
The second new thing is the use of the single semicolon ;. The difference between the single semicolon ; and the double semicolon ;;, is that ;; is used to separate different let-constructs, while ; is used to separate expressions within a single let-construct. Also note that , unlike in C++ and Java, you don't put a ; at the end. The best way to think about ; is as being similar to + and *, and you don't write a + b + c + when you're trying to add a, b and c, now do you?

Now let's write a function that performs the desired operation on two numbers, based upon the user's choice:

let perform_operation a b choice =
  if choice = 1
  then a +. b
  else if choice = 2
       then a -. b
       else if choice = 3
            then a *. b
            else if choice = 4
                 then a /. b
                 else raise (Failure "Invalid choice");;
There are a few things new here. First, we didn't see functions of more than one variable before, but the syntax is not really surprising, I hope. Secondly, we see that when choice is not one of 1, 2, 3 or 4, an exception is raised. Failure is a predefined exception; if it is not catched, it will terminate the program. This is quite similar to exception handling in C++ and Java. You can test this function with something like this:
perform_operation 1.0 2.0 1;;
Note that the arguments a and b are of type float.

OK, this last function was a bit ugly, I admit. In Java, you would use a switch statement. Well, there is such a statement in O'Caml too, except that it is called match.

let perform_operation a b choice =
  match choice with
      1 -> a +. b
    | 2 -> a -. b
    | 3 -> a *. b
    | 4 -> a /. b
    | _ -> raise (Failure "Invalid choice");;
That looks much nicer, doesn't it? The _ represents the default case, i.e. it is executed when choice is neither 1, 2, 3 or 4.

We're almost done. We only have to write the main program.

let rec main () =
  display_menu ();
  print_string "Enter your choice: ";
  let choice = read_int ()
  in
    if choice = 5
    then ()
    else let a = print_string "Enter first number: "; read_float () 
         and b = print_string "Enter second number: "; read_float ()
         in
           print_float (perform_operation a b choice);
           print_string "\n";
           main ();;
Oof! There are quite a few things new here. First of all, you see a few new I/O functions read_int, read_float and print_float. You can probably figure out what they are supposed to do... More confusing might be the use of local let-bindings, that is, the let...in... construct you see. This can be used to make definitions that are only known inside a function body. The let...in... construct can actually be used anywhere where an ordinary expression is valid, so you can simple enter something like:
let x = 5 in x + 4;;
in the O'Caml interpreter, and get the answer 9. Another refinement is that you can even have more than one definition within a single let...in... construct by putting and between the definitions; so the following should also give the answer 9:
let x = 5 and y = 4 in x + y;;
OK, that should explain most of this code. By the way, note how main calls itself recursively, except when you enter a choice of 5. That's why we need to define main using let rec instead of plain let. When you enter a choice of 5, main returns (), that is, nothing of interest at all.

So let's test our calculator:

main ();;
Well, it works for me... but how about our customers? Obviously, we cannot expect all our customers to enter the above definitions in their own O'Caml interpreter. No, we really need to build a stand-alone executable, which we can burn on a CD and distribute via shops world-wide, together with a big marketing campaign... OK, you get the idea. I cannot help you with the marketing campaing (if you actually succeed to convince people to buy this program, you have obviously more talent for marketing than I), but building an executable is easy. We simply put all the files in a file calc.ml.
(* Simple calculator program *)

let display_menu () =
  print_string "Make your choice:\n";
  print_string "1. Add two numbers\n";
  print_string "2. Subtract two numbers\n";
  print_string "3. Multiply two numbers\n";
  print_string "4. Divide two numbers\n";
  print_string "5. Exit program\n"

let perform_operation a b choice =
  match choice with
      1 -> a +. b
    | 2 -> a -. b
    | 3 -> a *. b
    | 4 -> a /. b
    | _ -> raise (Failure "Invalid choice")

(* Main program *)
let rec main () =
  display_menu ();
  print_string "Enter your choice: ";
  let choice = read_int ()
  in
    if choice = 5  (* choice 5 means exit *)
    then ()
    else let a = print_string "Enter first number: "; read_float () 
         and b = print_string "Enter second number: "; read_float ()
         in
           print_float (perform_operation a b choice);
           print_string "\n";
           main ();;  (* recursive call to continue the program *)

(* Call the main program *)
main ()
OK, that should be pretty straightforward now. Note two small changes: first, comments between (* and *) were added, and second, the double semicolon ;; was removed in some places. The double semicolon can be omitted when O'Caml can figure out for itself that the definition terminates, that is: you can omit it just before another let-definition, or just before the end of the file. So the only place where it is really required here is at the end of the definition of main. Finally, note that main needs to be called explicitely, unlike the situation in C++, were main gets called "automagically". In fact, there is nothing special about the name main; I just used it to make things more recognisable for C++ programmers, however, the name pipo should do just as well.

We are now ready to compile the program. Using the bytecode compiler, do (at the shell prompt):

ocamlc calc.ml -o calc
Start the program now with:
./calc
Using the native code compiler (only if you installed it!), do:
ocamlopt calc.ml -o calc.opt
Start the program with:
./calc.opt

Conclusion

Well, although I have hardly covered every aspect of O'Caml, this should be enough to get you going. A more in-depth tutorial is included with the O'Caml distribution, and can also be viewed here. If anyone actually cares, I might extend this tutorial in the future. Suggestions for future extensions, as well as comments about spelling and other misteakes, can be sent to my e-mail address, stephanh@win.tue.nl. Thanks in advance!

Stephan Houben