Browse thread
Function inlining and functor
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Quôc_Peyrot <chojin@l...> |
| Subject: | Function inlining and functor |
Hello,
I have a program with a generic function which takes a function as
a parameter and calls it heavily. Something along the lines of:
let toto f =
(* call f a couple of million times *)
I was trying to see wether or not I could force the inlining of "f"
when f is small function.
For the sake of simplicity, let's imagine we have:
let toto f =
let a = ref 0 in
for i = 0 to 10 do
a := !a + f i
done;
!a
let f a = a * a
let _ =
print_endline (string_of_int (toto f))
of course we can see that f is not inlined in the inner loop:
(PPC)
L106:
lwz r4, 0(r1)
lwz r17, 0(r4)
mtctr r17 -> prepare the call
L108: bctrl -> call it
I tried to use a functor, hoping that it would help the compiler to
inline the function:
module type A =
sig
val f: int -> int
end
module Make (F:A) = struct
let toto () =
let a = ref 0 in
for i = 0 to 10 do
a := !a + F.f i
done;
!a
end
let f x = x * x
module Mod = Make (struct let f = f end)
let _ =
print_endline (string_of_int (Mod.toto ()))
but it doesn't seem to help at all, I can still see the call in my
inner loop:
L109:
lwz r5, 0(r1)
lwz r19, 8(r5)
lwz r4, 0(r19)
lwz r17, 0(r4)
mtctr r17
L114: bctrl
I was in fact hoping to get the same results than in C++ using
meta-programming/template:
#include <iostream>
using namespace std;
template<class F>
class Mod
{
public:
int toto()
{
int res = 0;
for (int i = 0; i <= 10; ++i)
res += F::f(i);
return res;
}
};
class Foo
{
public:
static int f(int i) { return i * i; }
};
int main(int argc, char**argv)
{
Mod<Foo> mod;
cout << mod.toto() << endl;
return 0;
}
which gives this nice inlining:
L15:
mullw r0,r2,r2
addi r2,r2,1
add r4,r4,r0
bdnz L15
addis r2,r31,ha16(L__ZSt4cout$non_lazy_ptr-"L00000000002$pb")
lwz r3,lo16(L__ZSt4cout$non_lazy_ptr-"L00000000002$pb")(r2)
Am I out of luck to get similar performance than C++?
Thanks,
--
Best Regards,
Quôc