New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methods call are 2 times slower with the 4.0 native compiler code than with the 3.12 native compiler code #5674
Comments
Comment author: @gasche I could partially reproduce the bug, with strange result:
So we might be looking for several performance-affecting changes, some more severe than others. |
Comment author: @gasche I could reproduce the bug with revision 11121 of the SVN:
|
Comment author: @gasche With the bytecode compiler, the performance story is different. The test case above terminates in 5s on my machine just before the merging of bin-annot (like for 3.12), and in 10s just after. There is clearly a performance regression introduced here, only it is very sensible in bytecode and not so much on native code. Manual inspection of the -dlambda output reveals that the method definitions are no longer curried; this seems to be a kind of arity-raising problem. The following kind code was emitted before the bin-annot merge:
While the following is emitted just after the merge:
My guess is that the real performance regression is introduced here (in the giant bin-annot patch), while other changes also influence the native measurement. |
Comment author: @garrigue I've tried fixing this without trying to understand what binannot did, This seems to completely fix the problem for native code: I get exactly the same speed as with 3.12. Anybody sees a problem in merging this patch? |
Comment author: @garrigue I've looked at the -dlambda code after with the patch, and the only difference seems to be whether an array of strings is shared or not. As this cannot make such a big difference (it is only used for class initialization), the source should rather be in the bytecode interpreter. However I see no change there either... |
Comment author: @gasche I think I now understand where the bug comes from. The "currification" of consecutive abstractions is performed in and transl_function loc untuplify_fn repr partial pat_expr_list = This pattern matches the exact Texp_function constructor to use let make_method self_loc cl_num expr = This code is produced during method typing (as a desugaring of "method But the parser does not produce a function directly: concrete_method : There is this silent Pexp_poly constructor that, in this case, carries Those Pexp_poly constructors used to be erased during the typing phase. | Pexp_poly(sbody, sty) ->
So here is the problem: with the extraneous Texp_poly constructor I'm not sure what is the best way to solve the issue. Jacques solution
|
Comment author: @garrigue
The kind of the outer function is checked in the when clause of the inner match.
Right, for native code one may wrongly create a tuplified function. |
Comment author: @garrigue I added a patch exp_extras.diffs, moving the Texp_poly and Texp_newtype nodes to exp_extra. |
Comment author: @garrigue Fixed by apply the patch exp_extras.diffs, which moves Texp_poly and Texp_newtype to exp_extra. |
Original bug ID: 5674
Reporter: giavitto
Assigned to: @garrigue
Status: closed (set by @garrigue on 2012-07-10T08:35:03Z)
Resolution: fixed
Priority: normal
Severity: minor
Platform: MAC
OS: Mac OS X
OS Version: 10.7.4
Version: 4.00.0+beta2/+rc1
Target version: 4.00.0+dev
Fixed in version: 4.00.0+dev
Category: back end (clambda to assembly)
Monitored by: @gasche @hcarty
Bug description
I notice a slow down of several of my programs that use objects. The attached file shows an example that exhibits a slowdown of factor 2 between the native compilation with 3.12 and the native compilation with 4.0.
This slowdown exists also for byte-code but with a factor of 1.3.
Steps to reproduce
Compile and run the attached file with the various versions of the compiler. The program must be linked with the Unix library to print de time usage.
On my mac it produces:
The OCaml compiler, version 4.00.0+beta2
resultat 98765432 in 37.940392
The OCaml native-code compiler, version 4.00.0+beta2
resultat 98765432 in 1.009058
The Objective Caml compiler, version 3.12.1+rc1
resultat 98765432 in 28.797747
The Objective Caml native-code compiler, version 3.12.1+rc1
resultat 98765432 in 0.535362
File attachments
The text was updated successfully, but these errors were encountered: