Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Re: [Caml-list] Bytecode object files structure
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2006-11-15 (13:50)
From: Xavier Clerc <xcforum@f...>
Subject: Re: [Caml-list] Bytecode object files structure

Le 13 nov. 06 à 16:50, Pierre-Etienne Meunier a écrit :

> Hello,
> I'd like to write an assembler, to be able to understand how the vm  
> really
> works. I've to work on this for a school project (a compiler, I  
> want it to
> output caml bytecode object files).

If you are working on a compiler that should output files to be  
executed by the ocaml runtime, it does not seem necessary to handle  
cmo/cmi files as the format of bytecode file should be sufficient to  
code your compiler. Unless you have to link with ocaml modules.

> I've understood that the data part, after the code itself, was  
> generated using
> output_value (I didn't know this function before).

This fonction is used by the Marshal module. It transforms any non- 
abstract value into a chain of bytes.
The format of marshalling can be understood from the extern_rec  
function of the byterun/extern.c file.

> What I don't get now are
> the cu_reloc, cu_primitives and cu_imports fields of the  
> compilation_unit
> type.

You should remember that cmo files are parts that will be put  
together (linked) in order to create a bytecode file.
Given this context :
	- cu_imports lists the name of imported (used) modules the current  
cmo should be linked with in order to produce a bytecode file (the  
digest of the imported modules is also kept to ensure that you link  
with the same version you compiled against) ;
	- cu_primitives lists the primitives declared by the current module  
(each 'external f : type1 -> type2 = "primitive" ' will result in a  
"primitive" entry of this list), needed to ensure that all required C  
primitives are provided ;
	- cu_reloc : as each module is compiled independently, it can  
declare some elements (e.g. global variables) and use them using a 0- 
based index ; thus, when you link several modules together, you have  
to relocate this information to ensure that the first module uses  
indexes from 0 to n, the second module uses indexes from n+1 to n+m  
and so on ...

Hope this helps,

Xavier Clerc

PS : I am working on some documents describing marshalling format,  
bytecode files as well as instruction opcodes.
I will hopefully release them before xmas but don't hold your breath  
as I don't have much spare time these days.
In the meantime, you can contact me off-list for any related question.

> If you can help on this,
> Thanks
> P.E. Meunier
> On Monday 13 November 2006 11:53, you wrote:
>> Hello,
>> As I read a substancial part of the ocaml source code, I may help you
>> understanding file formats.
>> Could you be more precise about what you are particularly interested
>> in :
>> 	- file type : bytecode file, cmo file, cmi file ?
>> 	- code or data section of these files ?
>> May I also ask you what you are trying to do using these elements ?
>> Cordially,
>> Xavier Clerc
>> Le 12 nov. 06 à 15:42, Pierre-Etienne Meunier a écrit :
>>> Hi,
>>> I'm trying to decrypt .cmo files produced by simple programs,  
>>> such as
>>> 1+1;;
>>> or
>>> print_string "string";;
>>> or
>>> List.length [1;2;3;4;5];;
>>> According to the source of Ocaml, there's something called the
>>> "cmo_magic_number", systematically written at the beginning of
>>> all .cmo
>>> files. Does it have a real function for executing the programs, or
>>> is it just
>>> a way to make sure the file contains ocaml bytecode ?
>>> Then, there's the address of what seems to be the last bytecode
>>> instruction.
>>> Then, the bytecode instructions, as documented in
>>> After that, I can't understand anything : there vaguely seems to be
>>> some
>>> information related to linking or so... What is the precise
>>> structure of this
>>> part ? Is there some kind of a bytecode assembler ?
>>> Thanks,
>>> P.E. Meunier (
>>> _______________________________________________
>>> Caml-list mailing list. Subscription management:
>>> Archives:
>>> Beginner's list:
>>> Bug reports: