Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Adding new architecture to ocamlopt
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-10-12 (12:45)
From: Gordon Henriksen <gordonhenriksen@m...>
Subject: Re: [Caml-list] Adding new architecture to ocamlopt
On Oct 12, 2007, at 09:17, Christoph Sieghart wrote:

> Is there any documentation for adding a new architecture to  
> ocamlopt? I would like to do a crosscompiler from one of the  
> existing architectures to an embedded microcontroller.
> I have searched the mailinglist archives and the documenation, but  
> have not found anything. Any pointers are welcome? Is my assumption  
> that the major codegeneration work is done by the code in $caml/ 
> asmcomp?


Yes, asmcomp contains both the middle-end and the back-end code  
generators. Note that the architecture-specific features are injected  
by configure creating various symlinks of the form asmcomp/<foo>.ml - 
 > asmcomp/<arch>/<foo>.ml. On one hand, this means you should be  
able to clone the contents of one of the asmcomp/<arch>  
subdirectories and get your project off to a start pretty quickly. On  
the other, ocamlopt is not a cross-compiler, so you may have a bit of  
a challenge just getting the paths to the cross tools into the right  
places without breaking ocamlc.

I'm sure you'll get more detailed pointers, but here's a quick  

ocamlc and ocamlopt share code through the "Lambda" representation  
(bytecomp/lambda.mli). After this point, ocamlopt transfers control  
into asmcomp/, which has a fairly straightforward pass  
pipeline in Asmgen.compile_implementation.

The Lambda representation is first translated into Closed Lambda  
(asmcomp/clambda.mli), which is similar except that closures are  

Next, ocamlopt transforms Clambda into its middle-end representation,  
C--. This form is somewhat well documented at  
and in various academic papers. The C-- representation is  
architecture-neutral in form, but not content. Target dependencies  
are injected through the Arch module, which specifies address sizes,  
endianness, etc. This is the point where displacement calculations  
are performed, etc.

The C-- representation is the input to the architecture-specific back- 
end code generators, which are driven by the architecture-neutral  
Asmgen.compile_phrase and Asmgen.compile_fundecl. In particular, this  
pipeline is pleasantly self-documenting:

let (++) x f = f x

let compile_fundecl (ppf : formatter) fd_cmm =
   fd_cmm (* <-- The C-- representation for the function *)
   ++ Selection.fundecl
   ++ pass_dump_if ppf dump_selection "After instruction selection"
   ++ Comballoc.fundecl
   ++ pass_dump_if ppf dump_combine "After allocation combining"
   ++ liveness ppf
   ++ pass_dump_if ppf dump_live "Liveness analysis"
   ++ Spill.fundecl
   ++ liveness ppf
   ++ pass_dump_if ppf dump_spill "After spilling"
   ++ Split.fundecl
   ++ pass_dump_if ppf dump_split "After live range splitting"
   ++ liveness ppf
   ++ regalloc ppf 1
   ++ Linearize.fundecl
   ++ pass_dump_linear_if ppf dump_linear "Linearized code"
   ++ Scheduling.fundecl
   ++ pass_dump_linear_if ppf dump_scheduling "After instruction  
   ++ Emit.fundecl

You can identify the target-dependent phases by correlating the  
passes with the contents of a target subdirectory.  Have fun!

— Gordon