Version franaise
Home About Download Resources Contact us

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Function inlining and functor
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Quc_Peyrot <chojin@l...>
Subject: Function inlining and functor

I have a program with a generic function which takes a function as
a parameter and calls it heavily. Something along the lines of:

let toto f =
   (* call f a couple of million times *)

I was trying to see wether or not I could force the inlining of "f"
when f is small function.

For the sake of simplicity, let's imagine we have:

let toto f =
   let a = ref 0 in
   for i = 0 to 10 do
     a := !a + f i

let f a = a * a

let _ =
   print_endline (string_of_int (toto f))

of course we can see that f is not inlined in the inner loop:
    lwz   r4, 0(r1)
    lwz   r17, 0(r4)
    mtctr r17 -> prepare the call
L108: bctrl -> call it

I tried to use a functor, hoping that it would help the compiler to
inline the function:

module type A =
val f: int -> int

module Make (F:A) = struct

let toto () =
   let a = ref 0 in
   for i = 0 to 10 do
     a := !a + F.f i


let f x = x * x

module Mod = Make (struct let f = f end)

let _ =
   print_endline (string_of_int (Mod.toto ()))

but it doesn't seem to help at all, I can still see the call in my
inner loop:
    lwz   r5, 0(r1)
    lwz   r19, 8(r5)
    lwz   r4, 0(r19)
    lwz   r17, 0(r4)
    mtctr r17
L114: bctrl

I was in fact hoping to get the same results than in C++ using
#include <iostream>

using namespace std;

template<class F>
class Mod
   int toto()
     int res = 0;
     for (int i = 0; i <= 10; ++i)
       res += F::f(i);
     return res;

class Foo
   static int f(int i) { return i * i; }

int main(int argc, char**argv)
   Mod<Foo> mod;
   cout << mod.toto() << endl;
   return 0;

which gives this nice inlining:
    mullw r0,r2,r2
    addi r2,r2,1
    add r4,r4,r0
    bdnz L15
    addis r2,r31,ha16(L__ZSt4cout$non_lazy_ptr-"L00000000002$pb")
    lwz r3,lo16(L__ZSt4cout$non_lazy_ptr-"L00000000002$pb")(r2)

Am I out of luck to get similar performance than C++?


Best Regards,