# What are basic the data structures ?

Contact the author Pierre.Weis@inria.fr

Created in October 1995.

Caml provides the following data structures: numbers (integers or floating point numbers), booleans, characters, character strings, arrays, lists, references. In addition, some library modules define extra abstract data types on top of these basic ones (in particular sets and streams, see below).

If none of these predefined data structures is suitable to model your data, you have to define your own data structures, with a type definition.

## Numbers: integers and floating point numbers.

Integers and floating point numbers correspond to the basic arithmetic capabilities provided by the hardware. Hence these numbers are easy to use but have some limitations, either in accuracy or range. Caml also provides numbers that support exact arithmetic operations: the arbitrary precision arithmetic package implements rational numbers, whose size may be as large as several thousand decimal digits. This package is a bit harder to use, than built-in arithmetics.

• Integers are the elements of the type `int` and get usual operations: +, -, *, / (that returns the integer quotient of the division).
```#1 + 2 * 3;;
- : int = 7
```

Other operations:

• Printing: `print_int`.
• Logical operations on bits: `lsl` (logical shift left), `lsr` (logical shift right), `asr` (arithmetic shift right), `land` (et), `lor` (or), `lxor` (or exclusive), `lnot` (non).
• Coercions to `int`: `int_of_string`, `int_of_float`, `int_of_char`.
• See the module `int`.
• Floating point numbers are the elements of the type `float` and get the usual operations (with an extra `.` suffix): `+.`, `-.`, `*.`, `/.`
```#1.0 +. 2.0 *. 3.0;;
- : float = 7.0
```

Other operations:

• Printing: `print_float`.
• Transcendental operations: `sin`, `cos`, `tan`, `asin`, `acos`, `atan`, `atan2`, `exp`, `log`, `**`.
• Coercions to `float`: `float_of_string`, `float_of_int`.
• See the module `float`.

## Characters

Characters are the elements of the type `char`. They are enclosed by the ``` symbol:

```#`a`;;
- : char = `a`
#`\n`;;
- : char = `\n`
#`\\`;;
- : char = `\\`
```

Other operations:

• Printing: `print_char`.
• Coercions to `char`: `char_of_int`, `char_for_read`
• See the module `char`.

## Character strings

Character strings are the elements of the type `string`; these are ordered sequels of letters. The size of a string is definitively defined when the string is created. Elements of a string may be updated.

• Creation by citation: a string is implicitly created when we cite its characters enclosed by `"` symbols.
```#let s = "ok";;
s : string = "ok"
```
• Creation by filling: the room for the string is first created, and the string is entirely filled with an initial character. Then this initial value is modified where appropriate.
```#let s1 = make_string 2 `1`;;
s1 : string = "11"
#s1. <- `2`;;
- : unit = ()
#s1;;
- : string = "12"
```
• Access to elements: the characters of a string are numbered, starting with 0, and are accessed by their number. So `str.[i]` returns the character number `i` of the string `str`.
```#let s = "ok";;
s : string = "ok"
#s.;;
- : char = `o`
#s.;;
- : char = `k`
```
• Updating elements: the command `str.[i] <- char`, writes the character `char` into the number `i` location of the string `str`:
```#let s = "ok";;
s : string = "ok"
#s. <- `h`;;
- : unit = ()
#s;;
- : string = "oh"
```
• Iteration: a string is an iterative structure, so use a ```for`'' loop to walk through a string.
```#let s = "ok";;
s : string = "ok"
#for i = 0 to string_length s - 1 do
print_char s.[i]; print_string " "
done;;
o k - : unit = ()
```
• Memory usage: a string generally needs a byte for each character, and an extra word to record its length.
• Other operations:
• Printing: `print_string`.
• Length: `string_length`.
• Modification and copies: `blit_string`.
• Extraction of a sub string: `sub_string`.
• Coercions: `string_of_int`, `string_of_float`, `string_of_bool`, `string_for_read`, `char_for_read`.
• Concatenation: infix operator ```^`''.
• Pattern matching: you may cite a string in a pattern.
• Deallocation is implicit.
• See the module `string`.

## Arrays

Arrays or vectors are the elements of the type `'a vect`; these are ordered sequels of elements of type `'a`. The size of an array is definitively defined when the array is created. Elements of an array may be updated.

• Creation by citation: an array is implicitly created when we cite its elements separated by `;` and enclosed by the symbols `[|` and `|]`.
```#let v = [| 1; 2; 3 |];;
v : int vect = [|1; 2; 3|]
```
• Creation by filling: the room for the array is first created, then the array is entirely filled with an initial value. Then this initial value is modified where appropriate.
```#let v1 = make_vect 2 "Hello";;
v1 : string vect = [|"Hello"; "Hello"|]
#v1.(1) <- "World!";;
- : unit = ()
#v1;;
- : string vect = [|"Hello"; "World!"|]
```
Beware: the initial value is physically shared by all the elements of the vector being defined. If this initial value is a mutable value, this semantics may be surprising for the beginner. This appears in particular for matrices, defined as vectors of vectors: a careless initialization returns a matrix with a unique raw vector, shared by all the raws.
• Access to elements: elements of an array are numbered, starting with 0, and are accessed by their number. So `tab.(i)` returns the element number `i` of the array `tab`.
```#let v = [| 1; 2; 3 |];;
v : int vect = [|1; 2; 3|]
#v.(0);;
- : int = 1
```
• Updating elements: the command `tab.(i) <- elem`, writes the value `elem` into the number `i` location of the array `tab`.
```#let v = [| 1; 2; 3 |];;
v : int vect = [|1; 2; 3|]
#v.(1) <- 0;;
- : unit = ()
#v;;
- : int vect = [|1; 0; 3|]
```
• Iteration: an array is an iterative structure, so use a ```for`'' loop to walk through an array.
```#let v1 = [|"Hello"; "World!"|];;
v1 : string vect = [|"Hello"; "World!"|]
#for i = 0 to vect_length v1 - 1 do
print_string v1.(i); print_string " "
done;;
Hello World! - : unit = ()
```
• Memory usage: an array generally needs a word per element, and an extra word to record its length.
• Other operations:
• Printing: use a `for` loop or the `do_vect` functional.
• Length: `vect_length`.
• Modification and copies: `blit_vect`.
• Extracting sub arrays: `sub_vect`.
• Concatenation: function `concat_vect`.
• Iterators: `map_vect`, `do_vect`,...
• Pattern matching: you cannot cite an array in a pattern.
• Deallocation is implicit.
• See the module `vect`.

## Lists

Lists are the elements of the type `'a list`; these are ordered sequels of elements of type `'a`. The size of a list is dynamically modified, since elements can be added if necessary. A list is a recursive data structure: it is either empty, and then it is `[]` (say it ``nil''), or it is not empty and then it is built with an ``head'' of list, put in front of the ``rest'' or queue of the list. A non empty list is built by the application of the list constructor `::` (an infix operator read as ``conse''). A non empty list with head ``x'' and queue ``rest'' is thus `x :: rest`. Elements of a list cannot be updated.

• Creation by citation: a list is implicitly created when we cite its elements separated by ; and enclosed by the symbols `[` and `]`.
```#let l = [ 1; 2; 3 ];;
l : int list = [1; 2; 3]
```
• Element access: elements of a list are not numbered, since one cannot directly access to a random element of a list. In fact direct access is only possible for the first item of the list, other elements are obtained after a sequential access to every preceding items of the list (recursively accessing the successive queues of the list).
```#match l with
x :: _ -> x
| _ -> raise (Failure "empty");;
- : int = 1
#let rec nth n l =
match l with
[] -> raise (Failure "nth")
| x :: l -> if n <= 0 then x else nth (n - 1) l;;
nth : int -> 'a list -> 'a = <fun>
#nth 2 l;;
- : int = 3
```
• Updating elements: you cannot modify elements of a list.
```#let l1 = [];;
l1 : 'a list = []
#let l2 = "World!" :: l1;;
l2 : string list = ["World!"]
#let l3 = "Hello" :: l2;;
l3 : string list = ["Hello"; "World!"]
#let rec interval n m =
if n > m then [] else n :: interval (n + 1) m;;
interval : int -> int -> int list = <fun>
#interval 3 7;;
- : int list = [3; 4; 5; 6; 7]
```
• Iteration: a list is a recursive data structure, so use a recursive function to walk through a list.
```#let rec print_list = function
[] -> ()
| x :: l -> print_string x; print_string " "; print_list l in
print_list l3;;
Hello World! - : unit = ()
```
• Memory usage: a list generally requires three memory words per element.
• Other operations:
• Printing: use `do_list`.
• Length: `list_length`.
• Concatenation: infix operator ```@`''.
• Iterators: `map`, `it_list`, `list_it`, ...
• Pattern matching: use pattern `[]` or pattern `x :: lf`.
• Deallocation is implicit.
• See the module `list`.

## Booleans

Booleans are the elements of the type `bool`; these are values of boolean logic. `true` and `false`. These are the results of tests.

```#1 = 2;;
- : bool = false
#1 <= 2;;
- : bool = true
```

Booleans get infix operations ``or'' and ``and'' of mathematical logic (respectively denoted by `||` and `&&`), and the negation (function `not`).

```#(1 = 2) || not (1 <= 2);;
- : bool = false
#1 = 2 || not 1 <= 2;;
- : bool = false
```

Other operations:

• Printing: `print_bool` where `let print_bool b = print_string (string_of_bool b);;`.
• Coercion: `string_of_bool`, `bool_of_string` where
```   let bool_of_string = function
"true" -> true
| "false" -> false
| _ -> raise (Invalid_argument "bool_of_string");;
```
• See the module bool.

## Nothing

``void'' is the only element of the type `unit`. It is denoted by `()` (say it ``void''), and means the dummy result and argument of procedures.

```#print_string "Hello World!";;
Hello World!- : unit = ()
#print_newline();;

- : unit = ()
```

## References

References are the elements of the type `'a ref`; these are memory locations that contain a value of type `'a`. They are used in imperative programming to implement accumulators. The element stored in the location may be modified, that is replaced by another value of the same type.

• Creation by citation: a reference is created when the constructor `ref` is applied to its inial contents.
```#let r = ref 1;;
r : int ref = ref 1
```
• Access to the contents of a reference: apply the operator `!` (say it ``deref''). So `!r` returns the contents of the reference `r`.
```#!r;;
- : int = 1
```
• Updating the contents: the command `r := elem`, writes the value elem into the reference `r`.
```#r := 0;;
- : unit = ()
#r;;
- : int ref = ref 0
```

Using a reference and a loop, you can write an imperative version of the factorial function:

```#let fact x =
let result = ref 1 in
for i = 1 to x do
result := i * !result
done;
!result;;
fact : int -> int = <fun>
#fact 10;;
- : int = 3628800
```
• Memory usage: a reference generally requires 3 memory words.
• Other operations:
• Incrementation and decrementation for integer reference: `incr`, `decr`.
• Pattern matching: you may use a reference in a pattern.
• Deallocation is implicit.
• See the module `ref`.

## Pairs

Pairs of a value of type `'a`, and of a value of type `'b` are the elements of the Cartesian product of `'a` and `'b`, denoted as `'a * 'b` in Caml.

• Creation by citation: a pair is implicitly created when we cite its elements, comma separated and enclosed between parens.
```#let p = (1, 2);;
p : int * int = 1, 2
```
• Element access: one perform access to elements of a pair via pattern matching:
```#let first = function (x, y) -> x;;
first : 'a * 'b -> 'a = <fun>
#first p;;
- : int = 1
```
• Updating the elements: elements of a pair cannot be updated.
• Memory usage: a pair generally needs 3 memory words.
• Other operations:

## Other data structures

Other data structures available are system dependent. For instance, the Caml Light system offers: