Many features of OCaml (and of other dialects of ML) can actually be formalized on top of core ML, either by selecting a particular choice of primitives, by encoding, or by a small extension.
The OCaml language contains primitive datatypes such as integers, floats, strings, arrays, etc. and operations over them. New datatypes can also be defined using a combinations of named records or variants and later be explored using pattern matching — a powerful mechanism that combines several projections and case analysis in a single construction.
For instance, the type of play cards can be defined as follows:

This declaration actually defines four different data types.
The type card
of cards is a variant type with two cases.
Joker
is a special card. Other cards are of the form Card
v
where v
is an element of the type regular
.
In turn regular
is the type of records with
two fields suit
and name
of respective types
card_suit
and card_name
, which are themselves variant types.
Cards can be created directly, using the variant tags and labels as constructors:
 

Of course, cards can also be created via functions:
 

Functions can be used to shorten notations, but also as a means of enforcing invariants.
The language OCaml, like all dialects of ML, also offers a convenient mechanism to explore and destructure values of datatypes by pattern matching, also known as case analysis. For instance, we could define the value of a card as follows:

The function value
explores the shape of the card given as
argument, by doing case analysis on the outermost constructors,
and whenever necessary, pursuing the analysis on the
inner values of the datastructure. Cases are explored in a topdown fashion:
when a branch fails, the analysis resumes with the next possible branch.
However, the analysis stops as soon as the branch is successful; then, its
right hand side is evaluated and returned as result.
find_compatible
that given a hand
(given as an unordered list of cards) returns a list of solutions. Each
solution should be a compatible set of cards (represented as an unordered
list of cards) of size greater or equal to four, and two different solutions
should be incompatible.
Data types may also be parametric, that is, some of their constructors may take arguments of arbitrary types. In this case, the type of these arguments must be shown as an argument to the type (symbol) of the datastructure. For instance, OCaml predefines the option type as follows:
type 'a option = Some of 'a  None 
The option type can be used to get inject values v of type '
a
into
Some
(v) of type '
a
option
with an extra value None
.
(For historical reason, the type argument '
a
is postfix in
'
a
option
.)
Superficial pattern matching (ie. testing only the top constructor) can easily be formalized in core ML by the declaration of new type constructors, new constructors, and new constants. For the sake of simplicity, we assume that all datatype definitions are given beforehand. That is, we parameterize the language by a set of type definitions. We also consider the case of a single datatype definition, but the generalization to several definitions is easy.
Let us consider the following definition, prior to any expression:
type g(α) = C_{1}^{g} of τ_{i} ∣ … C_{n}^{g} of τ_{n} 
where free variables of τ_{i} are all taken among α. (We use the standard prefix notation in the formalization, as opposed to OCaml postfix notation.)
This amounts to introducing a new type symbol g_{f} of arity given by the length of α, n unary constructors C_{1}^{g}, …C_{n}^{g}, and a primitive f^{g} of arity n+1 with the following δrule:
f^{g} (C_{k}^{g} v) v_{1} … v_{k} … v_{n} → v_{k} v (δ_{g}) 
The typing environment must also be extended with the following type assumptions:

Finally, it is convenient to add the syntactic sugar
match a with C_{1}^{g} (x) ⇒ a_{1} … ∣ C_{n}^{g} (x) ⇒ a_{n} 
for
f^{g} a (λ x. a_{1}) … (λ x. a_{n}) 
Note that, since we can assume that the type symbol g is given first, then the types τ_{i} may refer to g. This allows, recursive types definitions such as the natural numbers in unary basis (analogous to the definition of list in OCaml!):
type IN = Zero ∣ Succ of IN 
OCaml imposes a restriction, however, that if a datatype definition of g(α) is recursive, then all occurrences of g should appear with exactly the same parameters α. This restriction preserves the decidability of the equivalence of two type definitions. That is, the problem “Are two given datatype definitions defining isomorphic structures?” would not be decidable anymore, if the restriction was relaxed. However, this question is not so meaningful, since datatype definitions are generative, and types (of datatypes definitions) are always compared by name. Other dialects of ML do not impose this restriction. However, the gain is not significant as long as the language does not allow polymorphic recursion, since it will not be possible to write interesting function manipulating datatypes that would not follow this restriction.
As illustrated by the following exercise, the fixpoint combinator, and more generally the whole lambdacalculus, can be encoded using variant datatypes. Note that this is not surprising, since the fix point can be implemented by a δrule, and variant datatypes have been encoded with special forms of δrules.
Note that the encoding uses negative recursion, that is, a recursive occurrence on the left of an arrow type. It could be shown that restricting datatypes to positive recursion would preserve termination (of course, in ML without any other form of recursion).
value
to define two inverse functions fold
and unfold
of
respective types:

fold
and unfold
so that for an expression of the
encode of any expression of the lambda calculus are welltyped terms.
Finally, check that [[fix ]] is welltyped.
OCaml also allows type abbreviations declared as type g(α) = τ. These are conceptually quite different from datatypes: note that τ is not preceded by a constructor here, and that multiple cases are not allowed. Moreover, a data type definition type g(α) = C^{g} τ would define a new type symbol g incompatible with all others. On the opposite, the type abbreviation type g(α) = τ defines a new type symbol g that is compatible with the top type symbol of τ since g(τ') should be interchangeable with τ anywhere.
In fact, the simplest, formalization of abbreviations is to expand them
in a preliminary phase. As long as recursive abbreviations are not allowed,
this allows to replace all abbreviations by types without any
abbreviations. However, this view of abbreviation raises several problem.
As we have just mentioned, it does not work if abbreviations can be defined
recursively. Furthermore, compact types may become very large after
expansions. Take for example an abbreviation window
that stands for a
product type describing several components of windows: title
,
body
, etc. that are themselves abbreviations for larger types.
Thus, we need another more direct presentation of abbreviations. Fortunately, our treatment of unifications with unificands is welladapted to abbreviations: Formally, defining an abbreviation amounts to introducing a new symbol h together with an axiom h (α) = τ. (Note that this is an axiom and not a multiequation here.) Unification can be parameterized by a set of abbreviation definitions {h (α_{h}) = τ_{h} ∣ h ∈ A} Abbreviations are then expanded during unification, but only if they would otherwise produce a clash with another symbol. This is obtained by adding the following rewriting rule for any abbreviation h:

Note that sharing is kept all the way, which is represented by variable α in both the premise and the conclusion: before expansions, several parts of the type may use the same abbreviation represented by α, and all of these nodes will see the expansions simultaneously.
The rule Abbrev can be improved, so as to keep the abbreviation even after expansion:

The abbreviation can be recursive, in the sense that h may appear in τ_{h} but, as for datatypes, with the tuple of arguments α as the one of its definition. The the occurrence of τ_{h} in the conclusion of rule Abbrev' must be replaced by τ_{h} [g(α) ← α].
See also Section 3.2.1 for use of abbreviations with object types.
Record type definitions can be formalized in a very similar way to variant type definitions. The definition
type g(α) = {f_{1}^{g} : τ_{1}; … f_{2}^{g} : τ_{n}} 
amounts to the introduction of a new type symbol g of arity given by the length of α, one nary constructor C^{g} and n unary primitives f_{i}^{g} with the following δrules:
f_{i}^{g} (C^{g} v_{1} … v_{i} … v_{n}) → v_{i} (δ_{g}) 
As for variant types, we require that all free variables of τ_{i} be taken among α. The typing assumptions for these constructors and constant are:

The syntactic sugar is to write a.f_{i}^{g} and { f_{1}^{g} = a_{1}; … f_{n}^{g} = a_{n}} instead of f_{i}^{g} a and C^{g} a_{1} … a_{n}.
The language we have described so far is purely functional. That is,
several evaluations of the same expression will always produce the same
answer. This prevents, for instance, the implementation of a counter whose
interface is a single function next
:
unit
>
int
that increments the
counter and returns its new value. Repeated invocation of this function
should return a sequence of consecutive integers —a different answer each
time.
Indeed, the counter needs to memorize its state in some particular location,
with read/write accesses, but before all, some information must be shared
between two calls to next
. The solution is to use mutable storage and
interact with the store by socalled side effects.
In OCaml, the counter could be defined as follows:

Another, maybe more concrete, example of mutable storage is a bank account. In OCaml, record fields can be declared mutable, so that new values can be assigned to them later. Hence, a bank account could be a twofield record, its number, and its balance, where the balance is mutable.

In fact, in OCaml, references are not primitive: they are special cases of mutable records. For instance, one could define:

We choose to model singlefield store cells, ie. references. Multiplefield records with mutable fields can be modeled in a similar way, but the notations become heavier.
Certainly, the store cannot be modeled by just using δrules. There should necessarily be another mechanism to produce some side effects so that repeated computations of the same expression may return different values.
The solution is to model the store, rather intuitively. For that purpose, we introduce a denumerable collection of store locations l ∈ L. We also extend the syntax of programs with store locations and with constructions for manipulating the store:
a ::= … ∣ l ∣ ref a ∣ deref a ∣ assign a a' 
Following the intuition, the store is modeled as a global partial mapping s from store locations to values. Smallstep reduction should have access to the store and be able to change its content. We model this by transforming pairs a/s composed of an expression and a store rather than by transforming expressions alone.
Store locations are values.
v ::= … ∣ l 
The semantics of programs that do not manipulate the store is simply lifted to leave the store unchanged:
a / s → a' / s if a → a' 
Primitives operating on the store behaves as follows:

Hence, we must count store location among values: Additionally, we lift the context rule to valuestore pairs:
E [a] / s → E [a'] /s if a/s → a' /s 

An attempt to model garbage collection of unreachable locations is to use an additional rule.
a / s → a / (s ∖ l) l∉ a 
However, this does not work for several reasons.
Firstly, the location l may still be accessible, indirectly: starting from the expression a one may reach a location l' whose value s(l') may still refer to l. Changing the condition to l ∉ a, (s ∖ l) would solve this problem but raise another one: cycles in s will never be collected, even if not reachable from a. So, the condition should be that of the form “l is not accessible from a using store s”. Writing, this condition formally, is the beginning of a specification of garbage collection...
Secondly, it would not be correct to apply this rule locally, to a subterm, and then lift the reduction to the whole expression by an application of the context rule. There are two solutions to this last problem: one is to define a notion of toplevel reduction to prevent local applications of garbage collection; The other one is to complicate the treatment of store so that locations can be treated locally (see [77] for more details).
In order to type programs with locations, we must extend typing environment with assumptions for the type of locations:
A ::= … ∣ A, l : τ 
Remark that store locations are not allowed to be polymorphic (see the discussion below). Hence the typing rule for using locations is simply

Operations on the store can be typed as the application of constants with the following type schemes in the initial environment A_{0}:

(Giving specific typing rules Ref, Deref, and Assign would unnecessarily duplicate rule App into each of them)
We first define store typing judgments: we write A ⊢ a/s : τ if there exists a store extension A' of A (ie. outside of domain of A) such that A' ⊢ a : τ and A' ⊢ s(l) : A'(l) for all l ∈ dom (A'). We then redefine ≤ to be the inclusion of store typings.
(a / s ≤ a' / s') ⇔∀ (A, τ) (A ⊢ a / s : τ ⇒ A ⊢ a' / s' : τ) 
Note that store locations cannot be polymorphic. Furthermore, so as to preserve subject reduction, expressions such as ref v should not be polymorphic either, since ref v reduces to l where l is a new location of the same type as the type of v. The simplest solution to enforce this restriction is to restrict let x = a in a' to the case where a is a value v (other cases can still be seen as syntactic sugar for (λ x. a') a.) Since ref a is not a value —it is an application— it then cannot be polymorphic. Replacing a by a value v does not make any difference, since ref is not a constructor but a primitive. Of course, this solution is not optimal, ie. there are safe cases that are rejected. All other solutions that have been explored end up to be too complicated, and also restrictive. This solution, known as “valueonly polymorphism” is unambiguously the best compromise between simplicity and expressiveness.
To show how subject reduction could fail with polymorphic references, consider the following counterexample.

If "id" had a polymorphic type ∀ α. τ, it would be possible to assign to id
a function of the
less general type, eg. the type int
>
int
of succ
, and then to
read the reference with another incompatible less general type
bool
>
bool
; however, the new content of id
, which is the function
succ
, does not have type bool
>
bool
.
Another solution would be to ensure that values assigned to id
have a
type scheme at least as general as the type of the location. However, ML
cannot force expressions to have polymorphic types.
fix
can be defined using
references alone (ie. using without recursive bindings, recursive types
etc.).
In OCaml references cells are just a particular case of records with mutable fields. To model those, one should introduce locations with several fields as well. The does not raise problem in principle but makes the notations significantly heavier.
Exceptions are another imperative construct. As for references, the semantics of exceptions cannot be given only by introducing new primitives and δrules.
We extend the syntax of core ML with:
a ::= … ∣ try a with x ⇒ a ∣ raise a 
We extend the evaluation contexts, so as to allow evaluation of exceptions and exception handlers.
E :: … ∣ try E with x ⇒ a ∣ raise E 
Finally, we add the following redex rules:

with the side condition for the Raise rule that the evaluation context E' does not contain any node of the form (try _ with _ ⇒ _). More precisely, such evaluation contexts can be defined by the grammar:
E' ::= [·] ∣ E' a ∣ v E' ∣ raise E' 
Informally, the Raise rule says that if the evaluation of a raises an exception with a value v, then the evaluation should continue at the first enclosing handler by applying the right handside of the handler value v. Conversely, is the evaluation of a returns a value, then the Try rule simply removes the handler.
The typechecking of exceptions raises similar problems to the typechecking of references: if an exception could be assigned a polymorphic type σ, then it could be raised with an instance τ_{1} of σ and handled with the asumption that it has type τ_{2} —another instance of σ. This could lead to a type error if τ_{1} and τ_{2} are incompatible. To avoid this situation, we assume given a particular closed type τ_{0} to be taken for the type of exceptions. The typing rules are:

We have only formalized a few of the ingredients of a real language. Moreover, we abstracted over many details. For instance, we assumed given the full program, so that type declaration could be moved ahead of all expressions.
Despite many superficial differences, Standard ML is actually very close to OCaml. Standard ML has also been formalized, but in much more details [51, 50]. This is a rather different task: the lower level and finer grain exposition, which is mandatory for a specification document, may unfortunately obscure the principles and the underlying simplicity behind ML.
Among many extensions that have been proposed to ML, a few of them would have deserved more attention, because there are expressive, yet simple to formalize, and in essence very close to ML.
Records as datatype definitions have the inconvenience that they must always be declared prior to their use. Worse, they do not allow to define a function that could access some particular field uniformly in any record containing at least this field. This problem, known as polymorphic record access, has been proposed several solutions [65, 57, 33, 37], all of which relying more or less directly on the powerful idea of row variables [73]. Some of these solutions simultaneously allow to extend records on a given field uniformly, ie. regardless of the other fields. This operation, known as polymorphic record extension, is quite expressive. However, extensible records cannot be typed as easily or compiled as efficiently as simple records.
Dually, variant allows building values of an open sum type by tagging with labels without any prior definition of all possible cases. Actually, OCaml was recently extended with such variants [23]
Datatypes can also be used to embed existential or universal types into ML [41, 64, 53, 24].