Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class compilation shows non-linear runtime behaviour. #6542

Closed
vicuna opened this issue Sep 4, 2014 · 19 comments
Closed

Class compilation shows non-linear runtime behaviour. #6542

vicuna opened this issue Sep 4, 2014 · 19 comments
Assignees
Labels
Milestone

Comments

@vicuna
Copy link

vicuna commented Sep 4, 2014

Original bug ID: 6542
Reporter: choeger
Assigned to: @garrigue
Status: closed (set by @xavierleroy on 2017-02-16T14:18:23Z)
Resolution: fixed
Priority: normal
Severity: major
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Tags: compiler-time-or-space-regression
Monitored by: @jmeber

Bug description

I also asked for this on stackoverflow, but apparantly there is no simple solution:

http://stackoverflow.com/questions/25639619/performance-of-class-compilation

Basically, when compiling large classes, the process takes O(n²) amount of time for n methods:

This indicates that the compilation of a method somehow depends on the presence of other methods. This may be due to some optimization regarding the method lookup, but I am not convinced it is necessary.

Steps to reproduce

(echo "class foo = object" ; for ((i = 0; i < 2000; i++)) do echo method x$i = $i; done ; echo end) > test.ml
ocamlc test.ml

File attachments

@vicuna
Copy link
Author

vicuna commented Sep 24, 2014

Comment author: @chambart

It seams that this creates long lists of linked types that are not compressed. The majority of the run time is taken following those links in Btype.repr.

@vicuna
Copy link
Author

vicuna commented Dec 2, 2015

Comment author: @alainfrisch

I think Jacques recently added compression of Tlinks. Could this have improved this case?

@vicuna
Copy link
Author

vicuna commented Dec 2, 2015

Comment author: @alainfrisch

Actually, it seems this is getting worse. With current trunk, ocamlc.opt compiles test.ml in 1.39s, while it took 0.4s in a version from a few months ago.

Jacques: do you have any idea where the slowdown could come from?

@vicuna
Copy link
Author

vicuna commented Dec 2, 2015

Comment author: @alainfrisch

I bump the severity, since there seems to be some quite serious compile-time degradation here.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @garrigue

I tested with ocamlc and only 500 methods.
I see 2 bumps: commit a335b18 of October 23 (merge gadt-warnings) doubles times (2.8s -> 5.6s), and commit 68f9634 (merge compress-Tlink) increases again (5.6s -> 6.8s).
Still have to see asymptotic behavior.
This seems to be a pathological cases, since no significant performance degradation was observed on normal code.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @garrigue

Actually the tests for ocamlc.opt show that the real problem is gadt-warnings, the impact of compress-Tlink being negative but small.
That's very strange, because there is no pattern-matching in this code (at least none that would require using type information for exhaustiveness!)

ocamlc.opt test.ml (1000 methods) test2.ml (2000 methods)
68f9634: 2s 10s (merge compress-Tlink)
a335b18: 1.9s 9.8s (merge gadt-warnings)
7d5a2ec: 0.4s 1.5s (just before)
4.02.2: 0.4s 1.5s

The quadratic behavior itself cannot be completely avoided, because objects are represented as linked lists of methods (same problem as for polymorphic variants). However, I admit this is pretty bad in this case.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @garrigue

I could track down the two causes:

  • for gadt-warnings, the culprit was the copying of the type to allow delaying the unused pattern check. Only do it when there are polymorphic variants in the patterns.
  • for compress-Tlink, the function definition was inefficient. Correct by unrolling and avoiding the forward reference.

The code is on github: #327
I shall merge it soon.

Concerning the quadratic behavior, I see no easy solution: the type of each method contains the type of the whole object, and we end up having to copy it repeatedly...
(Since we already copy it, path compression doesn't help)

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @alainfrisch

Thanks Jacques!

Pierre noticed that:

The majority of the run time is taken following those links in Btype.repr.

I'm not sure how to interpret your comment:

(Since we already copy it, path compression doesn't help)

Is this because the copy is done before the compression?

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @garrigue

Well, is it really the majority of the time?
If it were the case, then certainly path compression should help.

My point was just that copying can be costly, but in the new instance there are no Tlinks.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @garrigue

By the way, the fact that optimizing repr for the non-recursive case improves performance noticeably suggests that actually there are not many long Tlink chains.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: choeger

I played around with this some weeks ago and came up with at least one regression due to the handling of method types. Since methods are basically handled like ordinary functions, there is some redundancy in the processing of the self-pattern. I cannot check my changes right now, but I recall that this modification was sound and provided some enhancements.

Does the attached patch (against 4.02.3) provide any insights?

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @alainfrisch

at least one regression due to the handling of method types

Did you observe a regression between different versions (independently of those fixed by Jacques)?

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: choeger

No, at that point I focused solely on 4.02.3, I noticed however that some unification of the self-variable was invoked once per method in the post-processing (which is of course O(n²) and non-necessary since self has the same type in all methods).

@vicuna
Copy link
Author

vicuna commented Dec 3, 2015

Comment author: @alainfrisch

Ok, thanks for clarifying. Proper regression between versions have higher priority, but it's certainly worth looking at your suggestion. Jacques is again the one competent to comment on it.

@vicuna
Copy link
Author

vicuna commented Dec 4, 2015

Comment author: @garrigue

Concretely, which unification are you talking about?
I expect all of them to be required (there are many versions of self, whose contents could differ).
Note that unifying two physically identical types costs nothing, but unifying two full copies of the same type has a linear cost. Usually types are partially shared, so the cost lies in between.

If you're convinced some unification is useless, you can always try removing it, and see whether this improves performance, and whether the testsuite goes through.

@vicuna
Copy link
Author

vicuna commented Dec 4, 2015

Comment author: @alainfrisch

Jacques: cf the attached patch. choeger mentions that it improved performance and that it looked safe (I guess here ran the testsuite?).

@vicuna
Copy link
Author

vicuna commented Dec 4, 2015

Comment author: @garrigue

After fixing the badly placed end_def, which cause failure in 7 tests, this tests indeed appears to pass all the tests, and improve perfomance by a factor of 2, but I don't think this solves the quadratic problem: 2000 methods still takes almost 4 times as much as 1000.

Actually I need to correct a bit more: the begin_def/end_def pair should be completely avoided for methods.
I merged the resulting patch in the pull request.
(It would probably be better to have a special path for all cases when the pattern matching is trivial, but this requires more thinking.)

@vicuna
Copy link
Author

vicuna commented Dec 4, 2015

Comment author: @alainfrisch

Thanks! We know the type-checker has non-linear behaviors all over the place, and it's good to track those that can be eliminated (without degrading common cases too much); but then significant constant-factor optimizations are also important.

@vicuna
Copy link
Author

vicuna commented Dec 8, 2015

Comment author: @garrigue

The pull request is now merged: #327.

On trunk, commit 7a2a87f

@vicuna vicuna closed this as completed Feb 16, 2017
@vicuna vicuna added this to the 4.03.0 milestone Mar 14, 2019
@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants