Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005674OCamlback end (clambda to assembly)public2012-07-07 18:152012-07-10 10:35
Assigned Togarrigue 
PlatformMACOSMac OS XOS Version10.7.4
Product Version4.00.0+beta2/+rc1 
Target Version4.00.0+devFixed in Version4.00.0+dev 
Summary0005674: Methods call are 2 times slower with the 4.0 native compiler code than with the 3.12 native compiler code
DescriptionI notice a slow down of several of my programs that use objects. The attached file shows an example that exhibits a slowdown of factor 2 between the native compilation with 3.12 and the native compilation with 4.0.

This slowdown exists also for byte-code but with a factor of 1.3.
Steps To ReproduceCompile and run the attached file with the various versions of the compiler. The program must be linked with the Unix library to print de time usage.

On my mac it produces:

The OCaml compiler, version 4.00.0+beta2
### resultat 98765432 in 37.940392

The OCaml native-code compiler, version 4.00.0+beta2
### resultat 98765432 in 1.009058

The Objective Caml compiler, version 3.12.1+rc1
### resultat 98765432 in 28.797747

The Objective Caml native-code compiler, version 3.12.1+rc1
### resultat 98765432 in 0.535362
TagsNo tags attached.
Attached Files? file icon [^] (975 bytes) 2012-07-07 18:15 [Show Content]
? file icon simplif.diffs [^] (819 bytes) 2012-07-08 08:13
? file icon exp_extras.diffs [^] (9,570 bytes) 2012-07-09 03:20

- Relationships

-  Notes
gasche (developer)
2012-07-07 19:15

I could partially reproduce the bug, with strange result:
- with 3.12.1, the test takes roughly 0.5
- on SVN trunk, the test takes 1.0s as reported
- I backtracked before the merge of the binannot branch, and the test seems to take 0.9s rather reliably

So we might be looking for several performance-affecting changes, some more severe than others.
gasche (developer)
2012-07-07 19:29

I could reproduce the bug with revision 11121 of the SVN:
> The OCaml native-code compiler, version 3.13.0+dev4 (2011-06-20)
gasche (developer)
2012-07-08 00:25

With the bytecode compiler, the performance story is different. The test case above terminates in 5s on my machine just before the merging of bin-annot (like for 3.12), and in 10s just after. There is clearly a performance regression introduced here, only it is very sensible in bytecode and not so much on native code.

Manual inspection of the -dlambda output reveals that the method definitions are no longer curried; this seems to be a kind of arity-raising problem. The following kind code was emitted before the bin-annot merge:

           (apply (field 10 (global CamlinternalOO!)) class/1066
                  (makeblock 0 m1/1035
                    (function self-1/1041 x/1042
                      (seq (+:=1 (field 0 (global Test2!))) x/1042))

While the following is emitted just after the merge:

                (apply (field 10 (global CamlinternalOO!)) class/1044
                  (makeblock 0 m1/1013
                    (function self-1/1019
                      (function x/1020
                        (seq (+:=1 (field 0 (global Test2!))) x/1020)))

My guess is that the real performance regression is introduced here (in the giant bin-annot patch), while other changes also influence the native measurement.
garrigue (manager)
2012-07-08 08:19

I've tried fixing this without trying to understand what binannot did,
just by merging curried lambdas in Simplif.simplify_lets.
The patch is attached to this report.

This seems to completely fix the problem for native code: I get exactly the same speed as with 3.12.
For bytecode, I still get a 10% slowdown, but this seems small enough.

Anybody sees a problem in merging this patch?
garrigue (manager)
2012-07-08 08:42

I've looked at the -dlambda code after with the patch, and the only difference seems to be whether an array of strings is shared or not. As this cannot make such a big difference (it is only used for class initialization), the source should rather be in the bytecode interpreter. However I see no change there either...
After more tests, it might just be the C compiler used for compiling ocamlrun.
I.e., the ocamlrun 3.12.1 compiled by hand is slower than the one from macports.
gasche (developer)
2012-07-08 15:45

I think I now understand where the bug comes from.

The "currification" of consecutive abstractions is performed in

  and transl_function loc untuplify_fn repr partial pat_expr_list =
    match pat_expr_list with
      [pat, ({exp_desc = Texp_function(_, pl,partial')} as exp)]
      when Parmatch.fluid pat ->
        let param = name_pattern "param" pat_expr_list in
        let ((_, params), body) =
          transl_function exp.exp_loc false repr partial' pl in
        ((Curried, param :: params),
         Matching.for_function loc None (Lvar param) [pat, body] partial)
    | ({pat_desc = Tpat_tuple pl}, _) :: _ when untuplify_fn ->
        let params = (fun p -> Ident.create "param") pl in
        ((Tupled, params),
         Matching.for_tupled_function loc params
           (transl_tupled_cases pats_expr_list) partial)
    | _ ->
        let param = name_pattern "param" pat_expr_list in
        ((Curried, [param]),
         Matching.for_function loc repr (Lvar param)
           (transl_cases pat_expr_list) partial)

This pattern matches the exact Texp_function constructor to use
a Curried function. The functions of interest in our case where
produced by typing/

  let make_method self_loc cl_num expr =
    let mkpat d = { ppat_desc = d; ppat_loc = self_loc } in
    let mkid s = mkloc s self_loc in
    { pexp_desc =
        Pexp_function ("", None,
                       [mkpat (Ppat_alias (mkpat (Ppat_var (mkid "self-*")),
                                           mkid ("self-" ^ cl_num))),
      pexp_loc = expr.pexp_loc }

This code is produced during method typing (as a desugaring of "method
foo = ..."), then typed, which produces a Texp_function, whose
translation to Lambda-code in turns call the `transl_function`
function above. For currifycation to be performed, we need to make
sure that the method body (the `expr` argument of `make_method`) has
`Texp_function` as head constructor.

But the parser does not produce a function directly:

  concrete_method :
      METHOD override_flag private_flag label strict_binding
        { mkloc $4 (rhs_loc 4), $3, $2, ghexp(Pexp_poly ($5, None)) }

There is this silent Pexp_poly constructor that, in this case, carries
no type information (it is used for polymorphic methods to carry the
explicit polymorphic annotation). The content of Pexp_poly,
`$5`, is indeed a Pexp_function in our case.

Those Pexp_poly constructors used to be erased during the typing phase.
The bin-annot merge has added a new Texp_poly constructor to preserve
that information explicitly in the typedtree. In the bin-annot patch:

  | Pexp_poly(sbody, sty) ->
- begin
+ let exp =
        match (expand_head env ty).desc with
          Tpoly (ty', []) ->
            let exp = type_expect env sbody ty' in
            re { exp with exp_type = instance env ty }
        | Tpoly (ty', tl) ->
            re { exp with exp_type = instance env ty }
        | Tvar _ ->
            let exp = type_exp env sbody in
            let exp = {exp with exp_type = newty (Tpoly (exp.exp_type, []))} in
            unify_exp env exp ty;
            re exp
        | _ -> assert false
- end
+ in
+ re { exp with exp_desc = Texp_poly(exp, cty) }

So here is the problem: with the extraneous Texp_poly constructor
inserted in the AST, a simple optimizations at the typedtree level
that proceeded by naive matching of the AST doesn't fire anymore.

I'm not sure what is the best way to solve the issue. Jacques solution
of re-implementing the currifying optimization at the Lambda level
seems to be a good approach: Texp_poly are erased during the
translation to lambda-code, so it will work, should be robust to
further changes of the typedtree format. At the same time, I have two
small concerns with this:

- I am not sure Jacques simple implementation is the correct one
  (it seems to neglect the Curried/Tupled information of the outer
  function, maybe this is not right)

- I suspect the reason why this transformation was done at the
  typedtree level might be a synergy with pattern matching
  compilation, which is also done at the typedtree -> lambdacode
  transition. If that was the case, relocating the optimization at
  a lower level (with no pattern information) could result in
  performance degradation in some cases.
garrigue (manager)
2012-07-08 22:41

> - I am not sure Jacques simple implementation is the correct one
> (it seems to neglect the Curried/Tupled information of the outer
> function, maybe this is not right)

The kind of the outer function is checked in the when clause of the inner match.

> - I suspect the reason why this transformation was done at the
> typedtree level might be a synergy with pattern matching
> compilation, which is also done at the typedtree -> lambdacode
> transition. If that was the case, relocating the optimization at
> a lower level (with no pattern information) could result in
> performance degradation in some cases.

Right, for native code one may wrongly create a tuplified function.
So removing the Texp_poly node (moving it to exp_extra) seems a better solution.
garrigue (manager)
2012-07-08 23:20

I added a patch exp_extras.diffs, moving the Texp_poly and Texp_newtype nodes to exp_extra.
Also fixed and
garrigue (manager)
2012-07-10 10:35

Fixed by apply the patch exp_extras.diffs, which moves Texp_poly and Texp_newtype to exp_extra.
This way binannot does not introduce new nodes anymore.
Committed in trunk and 4.00, revisions 12680 and 12681.

- Issue History
Date Modified Username Field Change
2012-07-07 18:15 giavitto New Issue
2012-07-07 18:15 giavitto File Added:
2012-07-07 19:15 gasche Note Added: 0007651
2012-07-07 19:15 gasche Status new => confirmed
2012-07-07 19:29 gasche Note Added: 0007652
2012-07-08 00:25 gasche Note Added: 0007653
2012-07-08 08:13 garrigue File Added: simplif.diffs
2012-07-08 08:19 garrigue Note Added: 0007654
2012-07-08 08:42 garrigue Note Added: 0007655
2012-07-08 15:45 gasche Note Added: 0007659
2012-07-08 22:41 garrigue Note Added: 0007660
2012-07-08 23:18 garrigue Assigned To => garrigue
2012-07-08 23:18 garrigue Status confirmed => assigned
2012-07-08 23:18 garrigue File Added: exp_extras.diffs
2012-07-08 23:20 garrigue Note Added: 0007661
2012-07-09 00:42 garrigue File Deleted: exp_extras.diffs
2012-07-09 00:42 garrigue File Added: exp_extras.diffs
2012-07-09 03:19 garrigue File Deleted: exp_extras.diffs
2012-07-09 03:20 garrigue File Added: exp_extras.diffs
2012-07-09 14:30 doligez Target Version => 4.00.0+dev
2012-07-10 10:35 garrigue Note Added: 0007674
2012-07-10 10:35 garrigue Status assigned => closed
2012-07-10 10:35 garrigue Resolution open => fixed
2012-07-10 10:35 garrigue Fixed in Version => 4.00.0+dev
2017-02-23 16:35 doligez Category OCaml backend (code generation) => Back end (clambda to assembly)
2017-02-23 16:44 doligez Category Back end (clambda to assembly) => back end (clambda to assembly)

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker