Version française
Home     About     Download     Resources     Contact us    
Browse thread
Function Inlining
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: John Carr <jfc@M...>
Subject: Re: [Caml-list] Function Inlining

> You're probably running into restrictions other than size that OCaml has 
> on function inlining, such as function definitions inside the function, 
> structured constants and let rec.
> 
> Getting rid of these restrictions could potentially improve OCaml code 
> generation considerably, but as far as I can tell, that would require 
> some additional features in the code generation such as sharing 
> structured constants, explicitly transforming tail calls to loops etc.

I have been working on part of this.

My 64 bit SPARC code generator produces bad assembly code when
compiling the TK library.  The failure is caused by poor code
generation in the intermediate phase of the compiler.  There is a
nested (not top level) module named Tkintf containing hundreds of
values.  The code generated to initialize Tk.Tkintf looks like:

	1. compute the hundreds of values to be stored in the Tkintf module
	2. allocate a block to hold Tkintf
	3. store the hundreds of values computed earlier into the Tkintf block
	4. store a pointer to the Tkintf block into the appropriate field of Tk

This sequence results in hundreds of live values.  Most have to be
spilled to the stack between 1 and 3, but the stack can only hold
about 256 values.  (Because ocamlopt does not use register windows
it is limited to a 2 kilobyte stack frame.)  The assembler fails
with address offset out of range errors.

The ocaml developers were aware of this sort of problem and a
different method is used to initialize top level modules, though
there is a separate inefficiency there because it isn't necessary to
compute at runtime the majority of module values which are constant,
either integer constants or pointers to statically allocated blocks.

I have been experimenting with making both constant closures and
structured constants shareable and (for top level modules) compiling
constant initializers into the data section.  So given a module like

	let x = 1 let y = (1,2,3) let rec f x = ... and g x = ...

the assembly code would look like (with tag values omitted for clarity):

	.data
L1:	# begin three value tuple
	.word	3	# integer 1
	.word	5	# integer 2
	.word	7	# integer 3
L2:	# begin block describing mutually recursive functions f and g
	.word	1	# f arity
	.word	f	# pointer to function f
	.word	1	# g arity
	.word	g	# pointer to function g
module:
	.word	3	# integer 1
	.word	L1	# pointer to (1,2,3)
	.word	L2	# pointer to closure for f
	.word	L2+8	# pointer to closure for g

Values not computable at compile time cause a .skip directive and the
module entry code computes the initial value as in the current compiler.

This solves the stack overflow problem.  I'm not convinced that
the asmcomp phase of the compiler is the right place to do this
optimization.  Perhaps it should be moved to the intermediate
phase where the top level module optimization is done now.

(This is more than just optimizing the performance of code that is
executed once.  For TK a substantial fraction of the code size is
the module entry point.  Also, a few jobs ago I cut the startup time
of a product I worked on by several seconds by optimizing the dynamic
loading process, which is similarly a one time act.)