Browse thread
Function Inlining
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | John Carr <jfc@M...> |
| Subject: | Re: [Caml-list] Function Inlining |
> You're probably running into restrictions other than size that OCaml has > on function inlining, such as function definitions inside the function, > structured constants and let rec. > > Getting rid of these restrictions could potentially improve OCaml code > generation considerably, but as far as I can tell, that would require > some additional features in the code generation such as sharing > structured constants, explicitly transforming tail calls to loops etc. I have been working on part of this. My 64 bit SPARC code generator produces bad assembly code when compiling the TK library. The failure is caused by poor code generation in the intermediate phase of the compiler. There is a nested (not top level) module named Tkintf containing hundreds of values. The code generated to initialize Tk.Tkintf looks like: 1. compute the hundreds of values to be stored in the Tkintf module 2. allocate a block to hold Tkintf 3. store the hundreds of values computed earlier into the Tkintf block 4. store a pointer to the Tkintf block into the appropriate field of Tk This sequence results in hundreds of live values. Most have to be spilled to the stack between 1 and 3, but the stack can only hold about 256 values. (Because ocamlopt does not use register windows it is limited to a 2 kilobyte stack frame.) The assembler fails with address offset out of range errors. The ocaml developers were aware of this sort of problem and a different method is used to initialize top level modules, though there is a separate inefficiency there because it isn't necessary to compute at runtime the majority of module values which are constant, either integer constants or pointers to statically allocated blocks. I have been experimenting with making both constant closures and structured constants shareable and (for top level modules) compiling constant initializers into the data section. So given a module like let x = 1 let y = (1,2,3) let rec f x = ... and g x = ... the assembly code would look like (with tag values omitted for clarity): .data L1: # begin three value tuple .word 3 # integer 1 .word 5 # integer 2 .word 7 # integer 3 L2: # begin block describing mutually recursive functions f and g .word 1 # f arity .word f # pointer to function f .word 1 # g arity .word g # pointer to function g module: .word 3 # integer 1 .word L1 # pointer to (1,2,3) .word L2 # pointer to closure for f .word L2+8 # pointer to closure for g Values not computable at compile time cause a .skip directive and the module entry code computes the initial value as in the current compiler. This solves the stack overflow problem. I'm not convinced that the asmcomp phase of the compiler is the right place to do this optimization. Perhaps it should be moved to the intermediate phase where the top level module optimization is done now. (This is more than just optimizing the performance of code that is executed once. For TK a substantial fraction of the code size is the module entry point. Also, a few jobs ago I cut the startup time of a product I worked on by several seconds by optimizing the dynamic loading process, which is similarly a one time act.)