New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dramatic performance increase when compiling with annotations #6030
Comments
Comment author: @alainfrisch You should try the trunk version. It has already been optimized a lot, after observing (as you did) how slow the previous version was. Commits: http://caml.inria.fr/cgi-bin/viewvc.cgi?view=revision&revision=13070 http://caml.inria.fr/cgi-bin/viewvc.cgi?view=revision&revision=13071 |
Comment author: @gasche Alain has already made those changes in trunk, commits 13070 and 13071. Can you confirm that compiling trunk as-is solves your problem? |
Comment author: gmelquiond This is a bit better with trunk, but nowhere near as good as with my patch. The profile with trunk looks like: 35,36% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt System time is at 8 seconds, while total time is at 14 seconds. A decent compiler should never waste 60% of its time doing I/O. (With my patch, system time is at 1.5 seconds.) For instance, if we take a look again at the compilation of src/core/term.cmo in Why3, we see that:
|
Comment author: @gasche The only difference I found (by code inspection) that may be relevant to performances is the fact that Alain's code has a flush in the [Ti_expr] case of [print_info], after the opening parenthesis, while Guillaume's patch only uses "@\n" (newline with no flush). If I can validate this hypothesis on some long file, we can validate the change. |
Comment author: @alainfrisch I'm working on it. |
Comment author: @alainfrisch Commit 13742 on trunk should fix the problem. When compiling typing/typecore.cmo (with -annot), the number of writes is reduced from 29494 to 39. Can you try it and report how it affects the performances on your example? |
Comment author: gmelquiond Note that this is not the only flush in Alain's code; there is also one hidden behind pp_print_newline. Also, I don't know the internals of Format, but the fact that there is no final flush for annotations in trunk worries me a bit. Could annotations be abruptly truncated once the compiler terminates? (If not, then the explicit flush I put in my patch is redundant.) |
Comment author: @alainfrisch
Annotations are collected and all dumped at once (Stypes.dump), so I don't see how this could happen, except if you kill the process explicitly. |
Comment author: @alainfrisch
The commit also get rid of this one. All uses of Format are now on Format.str_formatter (no syscall). |
Comment author: gmelquiond I understand that they are all dumped at once, but I don't see where the final half-filled buffer is flushed to disk in the code. Anyway, as far as I am concerned, the number of system calls for writing annotations is now optimal, so the bug can be marked as fixed. |
Comment author: @alainfrisch
All open out_channels are flushed when the process exits (Pervasives.flush_all is the default exit_function in Pervasives). |
Comment author: @alainfrisch Guillaume, thanks for the detailed report and quick feedback. |
Comment author: @alainfrisch Actually, it would be useful to compare the total time between the trunk and your patch (which is much less invasive that my attempt to reduce the use of Format and of format strings). |
Comment author: gmelquiond I have measured again how long it takes to compile why3.cma. There is no doubt about it: your version is so much better than mine. In fact, the huge timing difference on the actual compilation time (20% overhead) kind of frightens me about the performance of Format. Trunk: 5.30user 1.80system 0:08.72elapsed 81%CPU Trunk + reverting stypes.ml + applying my patch - the final flush: 6.36user 1.79system 0:09.75elapsed 83%CPU |
Comment author: @alainfrisch Thanks again for the results. They justify the extra effort put into de-Formatting this piece of code. And indeed, there are reasons to be concerned with the performance of Format and format strings. See: http://www.lexifi.com/blog/note-about-performance-printf-and-format Hopefully, #6017 will remove some overhead related to parsing format strings at runtime (but not the overhead related to Format itself, I guess). |
Original bug ID: 6030
Reporter: gmelquiond
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2015-12-11T18:19:29Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.00.1
Fixed in version: 4.01.0+dev
Category: typing
Bug description
When compiling the core library of Why3 (lib/why3/why3.cma) on my computer, it takes about 28 seconds. What is interesting is the ratio between user time and system time: actual compilation takes 7 seconds, while the kernel wastes 21 seconds outputting files.
A look at the top four entries of the kernel profile shows:
49,56% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt
7,17% ocamlc.opt [kernel.kallsyms] [k] memcpy
4,08% ocamlc.opt [kernel.kallsyms] [k] crypto_xor
1,82% ocamlc.opt [kernel.kallsyms] [k] crypto_cbc_encrypt
(As you can guess from the profile, the disk is encrypted.)
If we now take a look at the compilation of a single file, e.g. src/core/term.cmo, strace shows that ocamlc does 47,000 writes to src/core/term.annot. In some cases, it performs one system call per two characters!
The culprits are the print_ident_annot and print_info functions from typing/stype.ml. They contain format strings such as "@.)@.", which are definitely a recipe for disaster.
With the attached patch, overall compilation speed is multiplied by 3.5, system time is divided by 12, and the kernel profile now looks a lot saner:
6,60% ocamlc.opt ocamlc.opt [.] mark_slice
5,21% ocamlc.opt ocamlc.opt [.] caml_page_table_lookup
3,74% ocamlc.opt ocamlc.opt [.] caml_oldify_one
2,48% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt
Please apply.
File attachments
The text was updated successfully, but these errors were encountered: