Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Proposal for separate compilation
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-05-21 (16:32)
From: Alain Frisch <Alain.Frisch@e...>
Subject: [Caml-list] Proposal for separate compilation
Hello list,

I'd like to get feedback on a suggestion to modify the way OCaml
handle separate compilation. I already mentioned that proposal to
Xavier and Jacques.

This is an attempt to address the issue of collision between the names
of two external modules. This should also merge the concept of
-pack'ed units and libraries, to get the best of both worlds.

First, let me describe (my understanding of) the current system.

i) The compiler transforms a source interface (.mli) to a compiled
interface (.cmi). When other external modules are referenced in the
.mli file, the corresponding .cmi files are looked up in the "include"
directories (-I options on the command line). The name of these
interfaces are stored symbolically in the .cmi. Their md5 are stored
in the produced .cmi so as to detect inconsistencies.

ii) The bytecode compiler transforms a source implementation (.ml)
to a compiled implementation (.cmo). This name of the external
implementation and interfaces used in .ml are stored in the .cmo.
The md5 of the interfaces are all stored in the .cmo.

iii) The native compiler produces a .cmx and a .o files. If the .cmx
of other compiled implementation are found at compile time,
they are used to propagate constants are perform cross-unit
inlining. Their md5 are also stored in the produced .cmx.

Currently, a compiled unit (.cmi,.cmo,.cmx,.o) refers to another unit
through its symbolic name, as used in the source file.  My proposal is
to replace these names with unique identifiers.

Bottom-line: when a unit (.ml,.mli) is compiled, the external units
are looked up. The information we really want to store in the compiled
unit is a pointer to the external units as found during compilation.
Since we want to be able to move or to repackage (in a library)
compiled units, we can implement these pointers with unique
identifiers: either a random number, or a md5 hash of the unit.

Let me give an example.

type t

type s = X of A.t

First you compile a.mli and you get a.cmi. When you compile b.mli,
the file a.cmi is found, and intuitvely, the content of b.cmi
corresponds to something like:

type s = X of #{a.cmi}.t

where #{a.cmi} denote the unique identifier (uid) of a.cmi as found during
compile time. Same story for implementations.

The compiler needs two distinct lookup mechanisms:

- a mapping from symbolic names (found in source code) to uids
- a mapping from uids to compiled units (either files, or part of an

The first mapping is (almost) a preprocessing pass.

Now it possible to have several units with the same name.
For instance, consider a project with these files:


Imagine that the only dependency is from dir1/b to dir1/a.
To compile these:
(cd dir1; ocamlopt -c a.mli b.mli
(cd dir2; ocamlopt -c a.mli

ocamlopt produces: dir1/{a,b}.{cmi,cmx,o} and dir2/a.{cmi,cmx,o}.
dir1/b.{cmi,cmx} reference dir1/a.{cmi,cmx} through the uid
of these files.

Now you can compile a module that uses dir1/b.cmx and dir2/a.cmx.
You just need to let the compiler/linker know how to find a compiled
unit given its uid. For instance, you could just give it a list
of directoroes, and it would look for all the .cmi,.cmx files in this
directory. More realistically, you would keep into the uid the symbol
name of the unit, so as to look only at the .cmi,.cmx files with
the correct name. (You also want to store where the unit was
found, for error messages, see below.)

A library (.cma,.cmxa) would just be an archive of compiled units
(*including* compiled interfaces) (plus probably an index: uid ->
component).  It can be used for both mappings.  You need only to add a
reference to the library file on the compiler command line (no need
for a -I flag). Additionnaly, when building and/or compiling with a
library, it should be possible to specify a prefix, or to give new
names to its components. This would only affect the mapping from names
to compiled units. This replaces the -pack option, with a slightly
different semantics: when the same units is packaged in several
libraries, and several of these libraries are used, the fact that it
is the same module is retained (for the typing and link phases: types
are compatibles, and only one implementation of the module is
included). As a side product, one no longer needs to rename symbols in
.o objects when we -pack (we get rid of objutils), because the symbols
in the .o objects would be prefixed with the uid.

We can imagine fancy features on the command line, like adding a
prefix for a library:

ocamlopt -c -prefix:A=mylib.cmxa

or for a directory:

ocamlopt -c -prefix:A=/home/joe/mylib/

It would compile, and references to modules A.* in would actually
be resolved in mylib.cmxa (resp. ~/mylib/).
(And for linking, simply: ocamlopt -o mylib.cmxa x.cmx, resp.:
ocamlopt -o -I /home/joe/mylib x.cmx).

You no longer get an error message:
« Files b.cmo and a.cmo make inconsistent assumptions over interface A »
but something like:
« Couldn't find interface A with hash XXXXXX (which was found in
  /home/joe/mylib.cma during compilation) »

There are a lot of practical issues to deal with (such as: getting
decent outputs for ocamlc -i and gprof, etc...).

What do people think about the proposal ?

-- Alain

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: