Modifiable Data Structures

Modifiable Data Structures

Values of the following types: vectors, character strings, records with mutable fields, and references are the data structures whose parts can be physically modified.

We have seen that an Objective CAML variable bound to a value keeps this value to the end of its lifetime. You can only modify this binding with a redefinition---in which case we are not really talking about the ``same'' variable; rather, a new variable of the same name now masks the old one, which is no longer directly accessible, but which remains unchanged. With modifiable values, you can change the value associated with a variable without having to redeclare the latter. You have access to the value of a variable for writing as well as for reading.

Vectors

Vectors, or one dimensional arrays, collect a known number of elements of the same type. You can write a vector directly by listing its values between the symbols [| and |], separated by semicolons as for lists.


# let v = [| 3.14; 6.28; 9.42 |] ;;
val v : float array = [|3.14; 6.28; 9.42|]

The creation function Array.create takes the number of elements in the vector and an initial value, and returns a new vector.


# let v = Array.create  3  3.14;;
val v : float array = [|3.14; 3.14; 3.14|]

To access or modify a particular element, you give the index of that element:

Syntax

expr₁ . ( expr₂ )

Syntax

expr₁ . ( expr₂ ) <- expr₃

expr₁ should be a vector (type array) whose values have type expr₃. The expression expr₂ must, of course, have type int. The modification is an expression of type unit. The first element of a vector has index 0 and the index of the last element is the length of the vector minus 1. The parentheses around the index expression are required.


# v.(1) ;;
- : float = 3.14
# v.(0) <- 100.0 ;;
- : unit = ()
# v ;;
- : float array = [|100; 3.14; 3.14|]

If the index used to access an element in an array is outside the range of indices of the array, an exception is raised at the moment of access.


# v.(-1) +. 4.0;;
Uncaught exception: Invalid_argument("Array.get")

This check is done during program execution, which can slow it down. Nevertheless it is essential, in order to avoid writing to memory outside the space allocated to a vector, which would cause serious execution errors.

The functions for manipulating arrays are part of the Array module in the standard library. We'll describe them in chapter 8 (page ??). In the examples below, we will use the following three functions from the Array module:

create which creates an array of the given size with the given initial value;
length which gives the length of a vector;
append which concatenates two vectors.

Sharing of Values in a Vector

All the elements of a vector contain the value that was passed in when it was created. This implies a sharing of this value, if it is a structured value. For example, let's create a matrix as a vector of vectors using the function create from the Array module.


# let v = Array.create 3 0;;
val v : int array = [|0; 0; 0|]
# let m = Array.create 3 v;;
val m : int array array = [|[|0; 0; 0|]; [|0; 0; 0|]; [|0; 0; 0|]|]

Figure 3.1: Memory representation of a vector sharing its elements.

If you modify one of the fields of vector v, which was used in the creation of m, then you automatically modify all the ``rows'' of the matrix together (see figures 3.1 and 3.2).


# v.(0) <- 1;;
- : unit = ()
# m;;
- : int array array = [|[|1; 0; 0|]; [|1; 0; 0|]; [|1; 0; 0|]|]

Figure 3.2: Modification of shared elements of a vector.

Duplication occurs if the initialization value of the vector (the second argument passed to Array.create) is an atomic value and there is sharing if this value is a structured value.

Values whose size does not exceed the standard size of Objective CAML values---that is, the memory word---are called atomic values. These are the integers, characters, booleans, and constant constructors. The other values---structured values---are represented by a pointer into a memory area. This distinction is detailed in chapter 9 (page ??).
Vectors of floats are a special case. Although floats are structured values, the creation of a vector of floats causes the the initial value to be copied. This is for reasons of optimization. Chapter 12, on the interface with the C language (page ??), describes this special case.

Non-Rectangular Matrices

A matrix, a vector of vectors, does not need not to be rectangular. In fact, nothing stops you from replacing one of the vector elements with a vector of a different length. This is useful to limit the size of such a matrix. The following value t constructs a triangular matrix for the coefficients of Pascal's triangle.


# let t = [| 
           [|1|];
           [|1; 1|];
           [|1; 2; 1|];
           [|1; 3; 3; 1|];
           [|1; 4; 6; 4; 1|];
           [|1; 5; 10; 10; 5; 1|]
         |] ;;
val t : int array array =
  [|[|1|]; [|1; 1|]; [|1; 2; 1|]; [|1; 3; 3; 1|]; [|1; 4; 6; 4; ...|]; ...|]
# t.(3) ;;
- : int array = [|1; 3; 3; 1|]

In this example, the element of vector t with index i is a vector of integers with size i+1. To manipulate such matrices, you have to calculate the size of each element vector.

Copying Vectors

When you copy a vector, or when you concatenate two vectors, the result obtained is a new vector. A modification of the original vectors does not result in the modification of the copies, unless, as usual, there are shared values.


# let v2 = Array.copy v ;;
val v2 : int array = [|1; 0; 0|]
# let m2 = Array.copy m ;;
val m2 : int array array = [|[|1; 0; 0|]; [|1; 0; 0|]; [|1; 0; 0|]|]
# v.(1)<- 352;;
- : unit = ()
# v2;; 
- : int array = [|1; 0; 0|]
# m2 ;;
- : int array array = [|[|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|]|]

We notice in this example that copying m only copies the pointers to v. If one of the elements of v is modified, m2 is modified too.

Concatenation creates a new vector whose size is equal to the sum of the sizes of the two others.


# let mm = Array.append m m ;;
val mm : int array array =
  [|[|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|];
    [|1; 352; ...|]; ...|]
# Array.length mm ;;
- : int = 6
# m.(0) <- Array.create 3 0 ;;
- : unit = ()
# m ;;
- : int array array = [|[|0; 0; 0|]; [|1; 352; 0|]; [|1; 352; 0|]|]
# mm ;;
- : int array array =
[|[|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|]; [|1; 352; 0|];
  [|1; 352; ...|]; ...|]

On the other hand, modification of v, a value shared by m and mm, does affect both these matrices.


# v.(1) <- 18 ;;
- : unit = ()
# mm;;
- : int array array =
[|[|1; 18; 0|]; [|1; 18; 0|]; [|1; 18; 0|]; [|1; 18; 0|]; [|1; 18; ...|];
  ...|]

Character Strings

Character strings can be considered a special case of vectors of characters. Nevertheless, for efficient memory usage¹ their type is specialized. Moreover, access to their elements has a special syntax:

Syntax

expr₁ . [expr₂]

The elements of a character string can be physically modified:

Syntax

expr₁ . [expr₂] <- expr₃


# let s = "hello";;
val s : string = "hello"
# s.[2];;
- : char = 'l'
# s.[2]<-'Z';;
- : unit = ()
# s;;
- : string = "heZlo"

Mutable Fields of Records

Fields of a record can be declared mutable. All you have to do is to show this in the declaration of the type of the record using the keyword mutable.

Syntax

type name = { ...; mutable name_i : t ; ...}

Here is a small example defining a record type for points in the plane:


# type point = { mutable xc : float; mutable yc : float } ;;
type point = { mutable xc: float; mutable yc: float }
# let p = { xc = 1.0; yc = 0.0 } ;;
val p : point = {xc=1; yc=0}

Thus the value of a field which is declared mutable can be modified using the syntax:

Syntax

expr₁ . name <- expr₂

The expression expr₁ should be a record type which has the field name. The modification operator returns a value of type unit.


# p.xc <- 3.0 ;;
- : unit = ()
# p ;;
- : point = {xc=3; yc=0}

We can write a function for moving a point by modifying its components. We use a local declaration with pattern matching in order to sequence the side-effects.


# let moveto p dx dy = 
   let () = p.xc <- p.xc +. dx 
   in p.yc <- p.yc +. dy ;;
val moveto : point -> float -> float -> unit = <fun>
# moveto p 1.1 2.2 ;;
- : unit = ()
# p ;;
- : point = {xc=4.1; yc=2.2}

It is possible to mix mutable and non-mutable fields in the definition of a record. Only those specified as mutable may be modified.


# type t = { c1 : int; mutable c2 : int } ;;
type t = { c1: int; mutable c2: int }
# let r = { c1 = 0; c2 = 0 } ;;
val r : t = {c1=0; c2=0}
# r.c1 <- 1 ;;
Characters 0-9:
The label c1 is not mutable
# r.c2 <- 1 ;;
- : unit = ()
# r ;;
- : t = {c1=0; c2=1}

On page ?? we give an example of using records with modifiable fields and arrays to implement a stack structure.

References

Objective CAML provides a polymorphic type ref which can be seen as the type of a pointer to any value; in Objective CAML terminology we call it a reference to a value. A referenced value can be modified. The type ref is defined as a record with one modifiable field:

type 'a ref = {mutable contents:'a}

This type is provided as a syntactic shortcut. We construct a reference to a value using the function ref. The referenced value can be reached using the prefix function (!). The function modifying the content of a reference is the infix function (:=).


# let x = ref 3 ;;       
val x : int ref = {contents=3}
# x ;;
- : int ref = {contents=3}
# !x ;;    
- : int = 3
# x := 4 ;;               
- : unit = ()
# !x ;;
- : int = 4
# x := !x+1 ;;
- : unit = ()
# !x ;;
- : int = 5

Polymorphism and Modifiable Values

The type ref is parameterized. This is what lets us use it to create references to values of any type whatever. However, it is necessary to place certain restrictions on the type of referenced values; we cannot allow the creation of a reference to a value with a polymorphic type without taking some precautions.

Let us suppose that there were no restriction; then someone could declare:

let x = ref [] ;;

Then the variable x would have type 'a list ref and its value could be modified in a way which would be inconsistent with the strong static typing of Objective CAML:


x := 1 :: !x ;; 
x := true :: !x ;;

Thus we would have one and the same variable having type int list at one moment and bool list the next.

In order to avoid such a situation, Objective CAML's type inference mechanism uses a new category of type variables: weak type variables. Syntactically, they are distinguished by the underscore character which prefixes them.


# let x = ref [] ;;
val x : '_a list ref = {contents=[]}

The type variable '_a is not a type parameter, but an unknown type awaiting instantiation; the first use of x after its declaration fixes the value that '_a will take in all types that depend on it, permanently.


# x := 0::!x ;;
- : unit = ()
# x ;;
- : int list ref = {contents=[0]}

From here onward, the variable x has type int list ref.

A type containing an unknown is in fact monomorphic even though its type has not been specified. It is not possible to instantiate this unknown with a polymorphic type.


# let x = ref [] ;;
val x : '_a list ref = {contents=[]}
# x := (function y -> ())::!x ;;
- : unit = ()
# x ;;
- : ('_a -> unit) list ref = {contents=[<fun>]}

In this example, even though we have instantiated the unknown type with a type which is a priori polymorphic ('a -> unit), the type has remained monomorphic with a new unknown type.

This restriction of polymorphism applies not only to references, but to any value containing a modifiable part: vectors, records having at least one field declared mutable, etc. Thus all the type parameters, even those which have nothing to do with a modifiable part, are weak type variables.


# type ('a,'b) t = { ch1 :'a list ; mutable ch2 : 'b list } ;;
type ('a, 'b) t = { ch1: 'a list; mutable ch2: 'b list }
# let x = { ch1 = [] ; ch2 = [] } ;;
val x : ('_a, '_b) t = {ch1=[]; ch2=[]}

Warning

This modification of the typing of application has consequences for pure functional programs.

Likewise, when you apply a polymorphic value to a polymorphic function, you get a weak type variable, because you must not exclude the possibility that the function may construct physically modifiable values. In other words, the result of the application is always monomorphic.


#  (function x -> x) []  ;;
- : '_a list = []

You get the same result with partial application:


# let f a b = a ;;
val f : 'a -> 'b -> 'a = <fun>
# let g = f 1 ;;
val g : '_a -> int = <fun>

To get a polymorphic type back, you have to abstract the second argument of f and then apply it:


# let h x = f 1 x ;;
val h : 'a -> int = <fun>

In effect, the expression which defines h is the functional expression function x -> f 1 x. Its evaluation produces a closure which does not risk producing a side effect, because the body of the function is not evaluated.

In general, we distinguish so-called ``non-expansive'' expressions, whose calculation we are sure carries no risk of causing a side effect, from other expressions, called ``expansive.'' Objective CAML's type system classifies expressions of the language according to their syntactic form:

``non-expansive'' expressions include primarily variables, constructors of non-mutable values, and abstractions;
``expansive'' expressions include primarily applications and constructors of modifiable values. We can also include here control structures like conditionals and pattern matching.