Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Proposal for separate compilation
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] Proposal for separate compilation
> Currently, a compiled unit (.cmi,.cmo,.cmx,.o) refers to another unit
> through its symbolic name, as used in the source file.  My proposal is
> to replace these names with unique identifiers.

This is an interesting proposal.  You might be interested in Martin
Elsman's PhD (DIKU, 1998), which uses techniques along these lines in
a defunctorizing, whole-program SML compiler.

There is one thing that puzzles me in your proposal.  Consider two
compilation units A and B, where B refers to A.  

When we compile B.ml, it can be that the only thing the compiler knows
about A is its interface A.cmi.  This is certainly true for ocamlc.
ocamlopt can take advantage of information on A's implementation, as
found in A.cmx, but in the current model the presence of A.cmx isn't
mandatory to compile B.ml, ocamlopt will generate less efficient but
correct code if A.cmx isn't there.

So, what unique identifier is B going to use to refer to A's definition?
Since A.cmi is the only available info on A, that
identifier must be tied to A.cmi: either a hash of A's interface, or
some unique identifier generated when A.mli is compiled into A.cmi.
It looks like you're going to get name collisions when several
compilation units have the same interface.  More generally, you
haven't fully severed the connection that we have in the current
system between the identifier representing the definitions of a
compilation unit and the name or identifier of its interface.

There are several ways to work around this issue.  One is to restrict
your scheme to native-code compilation (ocamlopt) and demand that
A.cmx is available when compiling B.ml where B depends on A.  Then,
A.cmx can contain the truly unique identifier for A's implementation,
and that identifier can be used in the code generated for B.  But the
bytecode compiler will not be able to take advantage of your scheme.

Another option is to generate unique identifiers not just for the
exports of a compilation unit but also for its imports.  That is,
compiling B.ml will use some random ident "xyz" for B's exports
and another random ident "abc" for A's exports.  Compiling A.ml will
assign a different ident "def" to A's exports.  All these (ident, name)
associations are recorded in the generated .cmo and .cmx files, of course.
At link-time, the linker recognizes that "abc" and "def" are two
identifiers for the same thing, and equates them.  (Most native-code
linkers allow to identify two symbols, although I'm not sure all of
them support this feature.)

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners