What are basic the data structures ?

Created in October 1995.

Caml provides the following data structures: numbers (integers or floating point numbers), booleans, characters, character strings, arrays, lists, references. In addition, some library modules define extra abstract data types on top of these basic ones (in particular sets and streams, see below).

If none of these predefined data structures is suitable to model your data, you have to define your own data structures, with a type definition.

Table of contents:

Numbers: integers and floating point numbers.
Characters
Character strings
Arrays
Lists
Booleans
Nothing
References
Pairs
Other data structures

Numbers: integers and floating point numbers.

Integers and floating point numbers correspond to the basic arithmetic capabilities provided by the hardware. Hence these numbers are easy to use but have some limitations, either in accuracy or range. Caml also provides numbers that support exact arithmetic operations: the arbitrary precision arithmetic package implements rational numbers, whose size may be as large as several thousand decimal digits. This package is a bit harder to use, than built-in arithmetics.

Integers are the elements of the type int and get usual operations: +, -, *, / (that returns the integer quotient of the division).
```
#1 + 2 * 3;;
- : int = 7
```
Other operations:
- Printing: print_int.
- Logical operations on bits: lsl (logical shift left), lsr (logical shift right), asr (arithmetic shift right), land (et), lor (or), lxor (or exclusive), lnot (non).
- Coercions to int: int_of_string, int_of_float, int_of_char.
- See the module int.
Floating point numbers are the elements of the type float and get the usual operations (with an extra . suffix): +., -., *., /.
```
#1.0 +. 2.0 *. 3.0;;
- : float = 7.0
```
Other operations:
- Printing: print_float.
- Transcendental operations: sin, cos, tan, asin, acos, atan, atan2, exp, log, **.
- Coercions to float: float_of_string, float_of_int.
- See the module float.

Characters

Characters are the elements of the type char. They are enclosed by the ` symbol:

#`a`;;
- : char = `a`
#`\n`;;
- : char = `\n`
#`\\`;;
- : char = `\\`

Other operations:

Printing: print_char.
Coercions to char: char_of_int, char_for_read
See the module char.

Character strings

Character strings are the elements of the type string; these are ordered sequels of letters. The size of a string is definitively defined when the string is created. Elements of a string may be updated.

Creation by citation: a string is implicitly created when we cite its characters enclosed by " symbols.
```
#let s = "ok";;
s : string = "ok"
```
Creation by filling: the room for the string is first created, and the string is entirely filled with an initial character. Then this initial value is modified where appropriate.
```
#let s1 = make_string 2 `1`;;
s1 : string = "11"
#s1.[1] <- `2`;;
- : unit = ()
#s1;;
- : string = "12"
```
Access to elements: the characters of a string are numbered, starting with 0, and are accessed by their number. So str.[i] returns the character number i of the string str.
```
#let s = "ok";;
s : string = "ok"
#s.[0];;
- : char = `o`
#s.[1];;
- : char = `k`
```
Updating elements: the command str.[i] <- char, writes the character char into the number i location of the string str:
```
#let s = "ok";;
s : string = "ok"
#s.[1] <- `h`;;
- : unit = ()
#s;;
- : string = "oh"
```

Iteration: a string is an iterative structure, so use a ``for'' loop to walk through a string.

#let s = "ok";;
s : string = "ok"
#for i = 0 to string_length s - 1 do
  print_char s.[i]; print_string " "
 done;;
o k - : unit = ()

Memory usage: a string generally needs a byte for each character, and an extra word to record its length.
Other operations:
- Printing: print_string.
- Length: string_length.
- Modification and copies: blit_string.
- Extraction of a sub string: sub_string.
- Coercions: string_of_int, string_of_float, string_of_bool, string_for_read, char_for_read.
- Concatenation: infix operator ``^''.
- Pattern matching: you may cite a string in a pattern.
- Deallocation is implicit.
- See the module string.

Arrays

Arrays or vectors are the elements of the type 'a vect; these are ordered sequels of elements of type 'a. The size of an array is definitively defined when the array is created. Elements of an array may be updated.

Creation by citation: an array is implicitly created when we cite its elements separated by ; and enclosed by the symbols [| and |].
```
#let v = [| 1; 2; 3 |];;
v : int vect = [|1; 2; 3|]
```
Creation by filling: the room for the array is first created, then the array is entirely filled with an initial value. Then this initial value is modified where appropriate.
```
#let v1 = make_vect 2 "Hello";;
v1 : string vect = [|"Hello"; "Hello"|]
#v1.(1) <- "World!";;
- : unit = ()
#v1;;
- : string vect = [|"Hello"; "World!"|]
```
Beware: the initial value is physically shared by all the elements of the vector being defined. If this initial value is a mutable value, this semantics may be surprising for the beginner. This appears in particular for matrices, defined as vectors of vectors: a careless initialization returns a matrix with a unique raw vector, shared by all the raws.
Access to elements: elements of an array are numbered, starting with 0, and are accessed by their number. So tab.(i) returns the element number i of the array tab.
```
#let v = [| 1; 2; 3 |];;
v : int vect = [|1; 2; 3|]
#v.(0);;
- : int = 1
```
Updating elements: the command tab.(i) <- elem, writes the value elem into the number i location of the array tab.
```
#let v = [| 1; 2; 3 |];;
v : int vect = [|1; 2; 3|]
#v.(1) <- 0;;
- : unit = ()
#v;;
- : int vect = [|1; 0; 3|]
```

Iteration: an array is an iterative structure, so use a ``for'' loop to walk through an array.

#let v1 = [|"Hello"; "World!"|];;
v1 : string vect = [|"Hello"; "World!"|]
#for i = 0 to vect_length v1 - 1 do
  print_string v1.(i); print_string " "
 done;;
Hello World! - : unit = ()

Memory usage: an array generally needs a word per element, and an extra word to record its length.
Other operations:
- Printing: use a for loop or the do_vect functional.
- Length: vect_length.
- Modification and copies: blit_vect.
- Extracting sub arrays: sub_vect.
- Concatenation: function concat_vect.
- Iterators: map_vect, do_vect,...
- Pattern matching: you cannot cite an array in a pattern.
- Deallocation is implicit.
- See the module vect.

Lists

Lists are the elements of the type 'a list; these are ordered sequels of elements of type 'a. The size of a list is dynamically modified, since elements can be added if necessary. A list is a recursive data structure: it is either empty, and then it is [] (say it ``nil''), or it is not empty and then it is built with an ``head'' of list, put in front of the ``rest'' or queue of the list. A non empty list is built by the application of the list constructor :: (an infix operator read as ``conse''). A non empty list with head ``x'' and queue ``rest'' is thus x :: rest. Elements of a list cannot be updated.

Creation by citation: a list is implicitly created when we cite its elements separated by ; and enclosed by the symbols [ and ].
```
#let l = [ 1; 2; 3 ];;
l : int list = [1; 2; 3]
```
Element access: elements of a list are not numbered, since one cannot directly access to a random element of a list. In fact direct access is only possible for the first item of the list, other elements are obtained after a sequential access to every preceding items of the list (recursively accessing the successive queues of the list).
```
#match l with
   x :: _ -> x
 | _ -> raise (Failure "empty");;
- : int = 1
#let rec nth n l =
   match l with
     [] -> raise (Failure "nth")
   | x :: l -> if n <= 0 then x else nth (n - 1) l;;
nth : int -> 'a list -> 'a = <fun>
#nth 2 l;;
- : int = 3
```
Updating elements: you cannot modify elements of a list.

Creation by filling: start with an empty list (that is []), and add the elements.

#let l1 = [];;
l1 : 'a list = []
#let l2 = "World!" :: l1;;
l2 : string list = ["World!"]
#let l3 = "Hello" :: l2;;
l3 : string list = ["Hello"; "World!"]
#let rec interval n m =
 if n > m then [] else n :: interval (n + 1) m;;
interval : int -> int -> int list = <fun>
#interval 3 7;;
- : int list = [3; 4; 5; 6; 7]

Iteration: a list is a recursive data structure, so use a recursive function to walk through a list.

#let rec print_list = function
  [] -> ()
| x :: l -> print_string x; print_string " "; print_list l in
print_list l3;;
Hello World! - : unit = ()

Memory usage: a list generally requires three memory words per element.
Other operations:
- Printing: use do_list.
- Length: list_length.
- Concatenation: infix operator ``@''.
- Iterators: map, it_list, list_it, ...
- Pattern matching: use pattern [] or pattern x :: lf.
- Deallocation is implicit.
- See the module list.

Booleans

Booleans are the elements of the type bool; these are values of boolean logic. true and false. These are the results of tests.

#1 = 2;;
- : bool = false
#1 <= 2;;
- : bool = true

Booleans get infix operations ``or'' and ``and'' of mathematical logic (respectively denoted by || and &&), and the negation (function not).

#(1 = 2) || not (1 <= 2);;
- : bool = false
#1 = 2 || not 1 <= 2;;
- : bool = false

Other operations:

Printing: print_bool where let print_bool b = print_string (string_of_bool b);;.

Coercion: string_of_bool, bool_of_string where

   let bool_of_string = function
     "true" -> true
   | "false" -> false
   | _ -> raise (Invalid_argument "bool_of_string");;

See the module bool.

Nothing

``void'' is the only element of the type unit. It is denoted by () (say it ``void''), and means the dummy result and argument of procedures.

#print_string "Hello World!";;
Hello World!- : unit = ()
#print_newline();;

- : unit = ()

References

References are the elements of the type 'a ref; these are memory locations that contain a value of type 'a. They are used in imperative programming to implement accumulators. The element stored in the location may be modified, that is replaced by another value of the same type.

Creation by citation: a reference is created when the constructor ref is applied to its inial contents.
```
#let r = ref 1;;
r : int ref = ref 1
```
Access to the contents of a reference: apply the operator ! (say it ``deref''). So !r returns the contents of the reference r.
```
#!r;;
- : int = 1
```

Updating the contents: the command r := elem, writes the value elem into the reference r.

#r := 0;;
- : unit = ()
#r;;
- : int ref = ref 0

Using a reference and a loop, you can write an imperative version of the factorial function:

#let fact x =
  let result = ref 1 in
  for i = 1 to x do
   result := i * !result
  done;
  !result;;
fact : int -> int = <fun>
#fact 10;;
- : int = 3628800

Memory usage: a reference generally requires 3 memory words.
Other operations:
- Incrementation and decrementation for integer reference: incr, decr.
- Pattern matching: you may use a reference in a pattern.
- Deallocation is implicit.
- See the module ref.

Pairs

Pairs of a value of type 'a, and of a value of type 'b are the elements of the Cartesian product of 'a and 'b, denoted as 'a * 'b in Caml.

Creation by citation: a pair is implicitly created when we cite its elements, comma separated and enclosed between parens.
```
#let p = (1, 2);;
p : int * int = 1, 2
```

Element access: one perform access to elements of a pair via pattern matching:

#let first = function (x, y) -> x;;
first : 'a * 'b -> 'a = <fun>
#first p;;
- : int = 1

Updating the elements: elements of a pair cannot be updated.
Memory usage: a pair generally needs 3 memory words.
Other operations:
- See the module pair.

Other data structures

Other data structures available are system dependent. For instance, the Caml Light system offers:

sets,
queues,
stacks,
streams,
hash tables,
balanced binary trees.

Contact the author Pierre.Weis@inria.fr