Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006019OCamlback end (clambda to assembly)public2013-05-23 16:562015-03-27 07:49
Assigned To 
PlatformOSOS Version
Product Version4.00.1 
Target VersionFixed in Version 
Summary0006019: Unnecessary calls to caml_modify
DescriptionFor some mutation heavy code, the c-call to caml_modify is extremely high on our profiles (exhibiting almost a full order of magnitude difference in tight loops). This is in some sense a duplicate of 0005592, but it highlights the fact that writes could be made cheaper even absent the inlining improvements by a emitting a dynamic check that would skip the C-call to caml_modify.

We've added something like this:

let unsafe_set t i obj =
  let old_obj = unsafe_get t i in
  if Obj.is_int old_obj && Obj.is_int obj then
    (* It is OK to skip [caml_modify] if both the old and new values are integers. *)
    Array.unsafe_set (Obj.magic (t : t) : int array) i (Obj.obj obj : int)
  else if not (old_obj == obj) then
    Array.unsafe_set t i obj

around wrappers of arrays that often hold immediate values with good results. It seems like it makes more sense to perform at the level of code generation. For the included test I've demonstrated it for a polymorphic mutable field, but doing this sort of dynamic check in our code feels somewhat impractical.

Steps To Reproduce$ ocamlopt.opt -inline 20 -nodynlink -g -o z
$ ./z
int field took 0.120980
poly field took 1.021845
poly field with dynamic check took 0.240963
TagsNo tags attached.
Attached Files? file icon [^] (1,132 bytes) 2013-05-23 16:56 [Show Content]

- Relationships

-  Notes
xleroy (administrator)
2013-05-24 19:22

Yes, caml_modify is too slow, and we should do something about this. Thanks for raising the issue on the bug tracker.

Before inlining a fast path, as you do, I think we need to rewrite caml_modify so that fast paths (there are several possible) are clearly identified. I suspect most of the cost of caml_modify comes primarily from calls to Is_in_heap, which can be avoided in several cases, and not from the call to caml_modify instead.

To be investigated...
gasche (developer)
2013-05-24 23:53

For the record, I modified the proposed example as such to test the performance hit on non-integer modifications.

  time_it "addr field" (fun () ->
    let t = Poly_field.create [] in
    for i = 1 to times do
      Poly_field.set t [i]

  time_it "addr field with dynamic check" (fun () ->
    let t = Poly_field.create [] in
    for i = 1 to times do
      Poly_field.set_dynamic t [i]

The results on my amd64 machine are the following:

  int field took 0.116000
  int poly field took 0.940000
  int field with dynamic check took 0.468000
  addr field took 1.680000
  addr field with dynamic check took 2.244000

so this particular microbenchmark must be relativized in that it may speed up integer modifications by 50%, but also slow down the other modifications by 30%. It probably wouldn't be appropriate to blindly inject the corresponding code in all modification places.
bnigito (reporter)
2013-05-27 17:32

Would you mind running that benchmark again ensuring the resulting set_dynamic code is inlined? I believe that is the only reason you would see a difference of such magnitude (our standard compilation tools already have aggressive inlining options enabled, so I sometimes assume it).

Here's what I see:
addr field took 1.872715
addr field (no-inlining) with dynamic check took 2.492622
addr field (inlined) with dynamic check took 1.929706

Certainly it is a tradeoff, not in all modification sites but only those with a polymorphic value (if this is known/possible at compilation). If it were to slow down polymorphic modifications by 30% I would not have suggested it, but if it's in the range of 3-5%, I think there is a case to consider it.

I also believe the microbenchmark might be understating the benefit, as the an unnecessary caml_modify call could also be evicting useful cache lines, etc.

thanks for looking into it.
xleroy (administrator)
2013-06-01 09:59

The SVN trunk (r13723) reimplements caml_modify() to reduce overheads and take advantage of several fast paths. On the microbenchmark of this PR, the performance improvement is significant:

                                 old implem new implem

int field took 0.060003 0.056003
poly field took 0.872055 0.568036 <--
poly field with dynamic check took 0.412026 0.412025
addr field took 1.292080 0.888056 <--
addr field with dynamic check took 1.704107 1.340084

In particular, the performance improvement resulting from open-coding the dynamic check is much more modest. Given that the dynamic check is not always a win, I'd rather not pursue this direction.
yminsky (reporter)
2013-06-01 16:35

When trying to build the latest trunk on MacOSX, I get the following error. I'm guessing this is related to the present changes.

rm minor_gc.d.c
ln -s -f memory.c memory.d.c
gcc -c -DCAML_NAME_SPACE -g -DDEBUG -fno-defer-pop -Wall -D_FILE_OFFSET_BITS=64 -D_REENTRANT memory.d.c
### stderr ###
memory.d.c: In function ‘caml_modify’:
memory.d.c:541: error: invalid operands to binary & (have ‘value *’ and ‘int’)
memory.d.c:541: warning: left-hand operand of comma expression has no effect
make[2]: *** [memory.d.o] Error 1
make[1]: *** [coldstart] Error 2
make: *** [world] Error 2
yminsky (reporter)
2013-06-01 16:54

Hmm. I tried to replicate this outside of OPAM, but the checkout of the specific revision builds without incident, and the current tip breaks in a different place (scanf related). When tip settles down, I'll try again.
xleroy (administrator)
2013-06-01 16:56

Indeed there was a missing cast, but it only shows up when building the runtime system in debug mode. Fixed in commit r13727.
yminsky (reporter)
2013-06-01 17:22

I suppose this bug is unrelated?

### stderr ###
File "", line 640, characters 14-75:
Error: This expression has type string -> string
       but an expression was expected of type string
make[2]: *** [scanf.cmo] Error 2
make[1]: *** [coldstart] Error 2
make: *** [world] Error 2
xleroy (administrator)
2013-06-01 17:25

Yes it is unrelated. Just look at the committers.
yminsky (reporter)
2013-06-06 02:47

I've tested out the new version, and it's clearly and significantly better, but I think an update of Brian's original benchmark shows that there's still significant room to improve. In particular, the dynamic check still seems like a real win. Here's the benchmark results I see:

First, with the old compiler:

                               int field took: 0.110356
                     poly field with int took: 1.184308
                     poly field with obj took: 1.398438
      poly field with int, dynamic check took: 0.253358
      poly field with obj, dynamic check took: 1.386732

And with the new compiler:

                               int field took: 0.113187
                     poly field with int took: 0.627932
                     poly field with obj took: 0.874185
      poly field with int, dynamic check took: 0.253959
      poly field with obj, dynamic check took: 0.868465

In both cases, it looks to me like the dynamic check wins, and by a decent margin. The fact that the poly-field version with the dynamic check does better by a bit I suspect is just some noise, more indicating that the two are very close than that there's a real advantage. That said, for the int, the poly check seems to be more than 3 times better. I suspect this win comes from avoiding the C call to caml_modify.

My benchmark code is below:

module Int_field = struct
  type t = { mutable x : int; }
  let create x = { x }
  let set t x = t.x <- x

module Poly_field = struct
  type 'a t = { mutable x : 'a; }
  let create x = { x }
  let set t x = t.x <- x

  let set_dynamic t x =
    let old_obj = Obj.repr t.x in
    let new_obj = Obj.repr x in
    if Obj.is_int old_obj && Obj.is_int new_obj then
      Array.unsafe_set (Obj.magic t : int array) 0 (Obj.magic x : int)
    else if not (old_obj == new_obj) then
      t.x <- x

type obj = Obj of int

let time_it desc f =
  let start = Sys.time () in
  f ();
  let elapsed = (Sys.time ()) -. start in
  Printf.printf "%40s took: %6f%!\n" desc elapsed

let obj0 = Obj 0
let obj1 = Obj 1

let () =

  let times = 200_000_000 in
  time_it "int field" (fun () ->
    let t = Int_field.create 0 in
    for i = 1 to times do
      Int_field.set t i;

  time_it "poly field with int" (fun () ->
    let t = Poly_field.create 0 in
    for i = 1 to times do
      Poly_field.set t i;

  time_it "poly field with obj" (fun () ->
    let t = Poly_field.create (Obj 0) in
    for i = 1 to times do
      Poly_field.set t (if i mod 2 = 0 then obj0 else obj1)

  time_it "poly field with int, dynamic check" (fun () ->
    let t = Poly_field.create 0 in
    for i = 1 to times do
      Poly_field.set_dynamic t i;

  time_it "poly field with obj, dynamic check" (fun () ->
    let t = Poly_field.create (Obj 0) in
    for i = 1 to times do
      Poly_field.set t (if i mod 2 = 0 then obj0 else obj1)
doligez (administrator)
2015-03-05 21:04

Note to future visitors: there's a bug in Yaron's latest benchmark: it runs the exact same code for cases 3 and 5, which explains the very close performance numbers.

- Issue History
Date Modified Username Field Change
2013-05-23 16:56 bnigito New Issue
2013-05-23 16:56 bnigito File Added:
2013-05-24 19:22 xleroy Note Added: 0009332
2013-05-24 19:22 xleroy Status new => acknowledged
2013-05-24 23:53 gasche Note Added: 0009334
2013-05-27 17:32 bnigito Note Added: 0009359
2013-06-01 09:59 xleroy Note Added: 0009374
2013-06-01 09:59 xleroy Status acknowledged => resolved
2013-06-01 09:59 xleroy Resolution open => suspended
2013-06-01 16:35 yminsky Note Added: 0009375
2013-06-01 16:54 yminsky Note Added: 0009376
2013-06-01 16:56 xleroy Note Added: 0009377
2013-06-01 17:22 yminsky Note Added: 0009378
2013-06-01 17:25 xleroy Note Added: 0009379
2013-06-06 02:47 yminsky Note Added: 0009416
2015-03-05 21:04 doligez Note Added: 0013393
2017-02-23 16:35 doligez Category OCaml backend (code generation) => Back end (clambda to assembly)
2017-02-23 16:44 doligez Category Back end (clambda to assembly) => back end (clambda to assembly)

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker