New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-short-paths slow :( #6600
Comments
Comment author: @garrigue Just to clarify: does this slow down only occur when some types are printed, or does it appear for normal compilation too ? |
Comment author: @diml The slow down occurs only when some types are printed. |
Comment author: @mshinwell I have heard that MLton has a solution to this problem that doesn't take so long; it might be worth having a look to see how it does it. |
Comment author: yminsky The fact that it only shows up when types are printed helps a lot, but It's also quite frustrating in utop, where it hits you immediately at |
Comment author: @garrigue I'm not sure I believe the "not so long" part. Now that we have module aliases, I'm looking at how to optimize the way they are handled in Env. Another solution would be to introduce some pruning, such as only looking for long identifiers of length 1 or 2, but I'm not sure it would help much, and it would be arbitrary. |
Comment author: @garrigue I ran your test, with just Core, not Async, and I get the following numbers (user time, for 4.01, but this shouldn't matter) without -short-paths: 2s So this means that there is no miracle: if we want to be faster we need to improve both the module system performance, and the map construction performance. |
Comment author: @garrigue Actually, the numbers with 4.02.1 are a bit different: without -short-paths: 0.14s So it seems that with aliases the main cost is in forcing the environment. |
Comment author: @garrigue Tried again on another machine, with the environment sharing patch: 4.02 branch without sharing
So the main improvement is in the forcing part, but short-paths is twice faster as a result... |
Comment author: @sliquister In case it's helpful, judging from strace, for a compilation taking 1.9s, I see a 1.1s gap in the trace right before loading async_kernel (so I think something in the typer happens when core is fully loaded, applying some substitution maybe?), a 0.4s gap right before writing the error message (I suppose after async is fully loaded) and the remaining 0.4s seem to be loading/searching the various cmis. I am not sure this would be applicable except in toy example, but it sounds like the whole environment doesn't have to be forced. In the example, the types are int and char so why look in any module? Module can only contain type paths with at least one module component, so they are too long. Also, still on trying to force less the environment, when trying to shorten Deferred.t (from async), looking in Core is not going to help simply because Async depends on Core and Deferred.t is abstract, so Deferred.t can't have an alias in Core. I don't have a concrete improvement to suggest that uses this property though. |
Comment author: @garrigue
This is a bit strange, as these numbers do not seem to coincide with what I am observing using 4.02. The probable explanation is that actually this 0.07s is only for loading Core and Async, but not their dependencies, which may be loaded lazily. This means that the loading of async_kernel is triggered by the search, so what you see is really a mix of search, loading and substitution (intermingled because of laziness), and it is hard to conclude anything from it.
Yes, I've thought about all these heuristics. But this is not the way it is implemented: we build the complete map before looking at the type. Such a heuristic would require doing a first path on the type (which actually we might not know fully when we build the map), and would only work when none of the types are qualified. |
Comment author: @garrigue I have fixed the bug in the patch (version 4), but unfortunately it reduces the sharing, and we don't gain that much anymore ( Core with short-paths : 3.7s down to 2.3s (40% speedup) Maybe we should try to do something about #2871, outside of laziness... |
Comment author: @damiendoligez Note: this doesn't look like something that can be fixed in the (very short) time frame of 4.02.1, especially if the fix introduces some incompatibilities, so I'm postponing to the next version. |
Comment author: @garrigue Added a new patch env-share-components5.diff, which fixes the bug without disabling local sharing. I plan to eventually commit this in trunk after discussion on the developers' list. |
Comment author: @garrigue I've tried another approach, with the printtyp-incremental-map.diff patch. Could you give it a try in practice ? |
Comment author: @garrigue printtyp-incremental-map2.diff respects the original specification: it incrementally deepens the traversal of the environment until it finds an optimal path, or there is nothing to deepen. |
Comment author: @let-def Just to let you know, I started working on specific tweaks for merlin. The second part is bringing this into the original -short-paths:
For 2., the reason is that most of the paths that are expected to be rewritten when used from Core are reachable from "open Core.Std" / "open Async.Std". |
Comment author: @garrigue I see. Unfortunately, considering only module aliases is only half of the specification: IIRC, one of the goals of short-paths was to simplify paths that went through functors, and for that you need to fully simplify. Also, with environment sharing, (2) might not give you much: once you've opened Async.Std and Core.Std you have already all of Core and Async there. Cacheing is something I thought about before; ideas are welcome. |
Comment author: @garrigue Just a reminder that this PR is in feedback mode.
|
Comment author: yminsky Here are my thoughts:
def did some experiments that indicated that most of the cost was in the computation, not in loading the modules, which makes one wonder if some kind of eager pre-computation of the map along with the ordinary build would help. Another simple heuristic that seems worth trying would be to not walk the entire environment: to use the ordinary short-paths heuristic, but restricted to the modules that OCaml requires to be loaded given what is being referred to. Thus, any types in modules that by the current rules would not be linked in would not be considered for short paths either. |
Comment author: @garrigue I'm not sure the problem is the same for Merlin and utop. If performance matters, it is possible to do more cacheing. Concerning the cost, I think my numbers already show that there are two costs: forcing the environment and computing the map. For compilation, the cost is dominated by the environment, as both are done only once. But for interactive use, the environment cost becomes negligible after the first computation, so we really need to cache the map more efficiently. |
Comment author: yminsky The utop and merlin case sound like they should be different, but my experience is that they're the same. In particular, the first lookup by merlin is slow, but after that, it gets faster, just as utop does. I showed this to def, who saw the same effect, but didn't understand how it could be so. Def, do you now understand what's going on? |
Comment author: @garrigue The similarity is not surprising: the first lookup requires forcing the whole environment, which is the most costly part. The result of this forcing is fully cached. Subsequent calls only require to force new parts of the environment (usually small), and to recompute the map (if the environment has changed). |
Comment author: yminsky Fair enough. My experience though is that for both of them it's pretty close: a little too long to be pleasant, but an improvement of a factor of 2 or 3 would make a big difference. |
Comment author: @damiendoligez garrigue, do you think this patch is mature enough for inclusion in 4.02.2? |
Comment author: @mshinwell I thought there were still concerns about speed (Damien, Ron knows about this) |
Comment author: @garrigue Since env-share-components.diff changes the semantics (in a very minor way), it should rather go in trunk. This said, if there is demand it is possible to include only printtyp-incremental-map2.diff in 4.02.2. It should make the toplevel much faster, by avoiding to always force everything, and it is only about -short-paths, so it won't break anything. Whether we can do better is another question. |
Comment author: yminsky If there's a material improvement we can get for the toplevel, I'd be quite eager to get it for 4.02.2. Right now, our custom built utop (with lots of libraries linked in) takes 10 or so seconds to evaluate "let x = 3;;". This is a pretty serious usability problem. Weirdly, it gets faster the second time you do it, but it still takes real time, say 4 seconds. Subsequent expressions are essentially instantaneous. |
Comment author: @garrigue Merged printtyp-incremental-map2.diff in 4.02 and trunk (after some cleanup), at revisions 15774 and 15775. |
Comment author: @lpw25 I can confirm that with the 4.02 branch the Jane St. custom utop now evaluates "let x = 3;;" instantaneously. So this is a definite improvement. In terms of testing that the behaviour of -short-paths has not changed, there is not much that I can say. There is no large collection of programs containing errors with which to perform such tests. However, whilst trying to test the performance I did get an unhandled "Not_found" exception from the compiler, which I am currently investigating. |
Comment author: @lpw25 The "Not_found" comes from the call to |
Comment author: @garrigue Do you have a repro case? |
Comment author: @lpw25
Not one that is easily extracted.
I confirm that it fixes the issue.
I don't have commit access, so can someone else please merge this? |
Comment author: @damiendoligez Applied patch to branch 4.02 (rev 16078) and trunk (rev 16079). Some cleanup of the Changes file sneaked into the commit for 4.02, sorry. I'm now postponing this issue to 4.03, as we have done all we can for 4.02.2, any further changes look like they'll be slightly incompatible. |
Comment author: @xavierleroy Do we still have this performance issue in 4.02.2 or in trunk? Otherwise I move to mark this PR resolved. |
Comment author: @gasche My understanding is that short-paths is still a source of performance issues for some workflows, and that there will be further changes down the chain. Thomas Réfis and Frédéric Bour have both worked on short-paths recently, and I think that they plan to work on it again. I kept the issue open, but moved it to "later" target as it's clear that it is not on the agenda for 4.03. |
Comment author: @mshinwell Leo is working on a completely new implementation of short-paths with the aim of solving this performance problem. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
I heard that @lpw25 is planning to submit his improved short-paths implementation any time soon! |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
There is still plan to fix this issue, but no sooner than OCaml 5.1 . |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
This is still on my todo list, hopefully for 5.2 . |
Original bug ID: 6600
Reporter: @diml
Assigned to: @lpw25
Status: assigned (set by @mshinwell on 2017-03-07T13:02:11Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.02.0
Target version: later
Category: typing
Tags: patch
Related to: #2871
Monitored by: @trefis @gasche @jmeber @hcarty yminsky @yakobowski
Bug description
Adding -short-paths can slow down compilation a lot, up to several seconds. This is becoming a problem for us.
All the time is spent in building the path map. Taking into account only module aliases and opens should be enough to get good result with core libraries. So there might be a way to speed up the search.
Steps to reproduce
$ cat > foo.ml <<EOF
open Core.Std
open Async.Std
let x = 1 + 'e'
EOF
$ time ocamlopt.opt
ocamlfind query -i-format -r async
-c foo.mlFile "foo.ml", line 4, characters 12-15:
Error: This expression has type char but an expression was expected of type
int
real 0m0.177s
user 0m0.041s
sys 0m0.013s
$ time ocamlopt.opt
ocamlfind query -i-format -r async
-short-paths -c foo.mlFile "foo.ml", line 4, characters 12-15:
Error: This expression has type char but an expression was expected of type
int
real 0m2.040s
user 0m1.783s
sys 0m0.155s
File attachments
The text was updated successfully, but these errors were encountered: