|Anonymous | Login | Signup for a new account||2015-05-23 10:54 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006030||OCaml||OCaml typing||public||2013-06-04 14:50||2013-06-04 21:24|
|Target Version||Fixed in Version||4.01.0+dev|
|Summary||0006030: Dramatic performance increase when compiling with annotations|
|Description||When compiling the core library of Why3 (lib/why3/why3.cma) on my computer, it takes about 28 seconds. What is interesting is the ratio between user time and system time: actual compilation takes 7 seconds, while the kernel wastes 21 seconds outputting files.|
A look at the top four entries of the kernel profile shows:
49,56% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt
7,17% ocamlc.opt [kernel.kallsyms] [k] memcpy
4,08% ocamlc.opt [kernel.kallsyms] [k] crypto_xor
1,82% ocamlc.opt [kernel.kallsyms] [k] crypto_cbc_encrypt
(As you can guess from the profile, the disk is encrypted.)
If we now take a look at the compilation of a single file, e.g. src/core/term.cmo, strace shows that ocamlc does 47,000 writes to src/core/term.annot. In some cases, it performs one system call per two characters!
The culprits are the print_ident_annot and print_info functions from typing/stype.ml. They contain format strings such as "@.)@.", which are definitely a recipe for disaster.
With the attached patch, overall compilation speed is multiplied by 3.5, system time is divided by 12, and the kernel profile now looks a lot saner:
6,60% ocamlc.opt ocamlc.opt [.] mark_slice
5,21% ocamlc.opt ocamlc.opt [.] caml_page_table_lookup
3,74% ocamlc.opt ocamlc.opt [.] caml_oldify_one
2,48% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt
|Tags||No tags attached.|
|Attached Files||stypes.patch [^] (2,092 bytes) 2013-06-04 14:50 [Show Content]|
You should try the trunk version. It has already been optimized a lot, after observing (as you did) how slow the previous version was.
|Alain has already made those changes in trunk, commits 13070 and 13071. Can you confirm that compiling trunk as-is solves your problem?|
This is a bit better with trunk, but nowhere near as good as with my patch. The profile with trunk looks like:
35,36% ocamlc.opt [kernel.kallsyms] [k] aes_encrypt
4,84% ocamlc.opt [kernel.kallsyms] [k] memcpy
3,80% ocamlc.opt ocamlc.opt [.] mark_slice
2,98% ocamlc.opt [kernel.kallsyms] [k] crypto_xor
System time is at 8 seconds, while total time is at 14 seconds. A decent compiler should never waste 60% of its time doing I/O. (With my patch, system time is at 1.5 seconds.)
For instance, if we take a look again at the compilation of src/core/term.cmo in Why3, we see that:
- ocaml-4.00.01 performs 47,000 calls to write,
- ocaml-trunk performs 15,000 calls to write,
- my patch performs only 18 calls to write.
|The only difference I found (by code inspection) that may be relevant to performances is the fact that Alain's code has a flush in the [Ti_expr] case of [print_info], after the opening parenthesis, while Guillaume's patch only uses "@\n" (newline with no flush). If I can validate this hypothesis on some long file, we can validate the change.|
|I'm working on it.|
edited on: 2013-06-04 15:52
Commit 13742 on trunk should fix the problem. When compiling typing/typecore.cmo (with -annot), the number of writes is reduced from 29494 to 39.
Can you try it and report how it affects the performances on your example?
|Note that this is not the only flush in Alain's code; there is also one hidden behind pp_print_newline. Also, I don't know the internals of Format, but the fact that there is no final flush for annotations in trunk worries me a bit. Could annotations be abruptly truncated once the compiler terminates? (If not, then the explicit flush I put in my patch is redundant.)|
> Could annotations be abruptly truncated once the compiler terminates?
Annotations are collected and all dumped at once (Stypes.dump), so I don't see how this could happen, except if you kill the process explicitly.
edited on: 2013-06-04 15:57
> Note that this is not the only flush in Alain's code; there is also one hidden behind pp_print_newline.
The commit also get rid of this one. All uses of Format are now on Format.str_formatter (no syscall).
I understand that they are all dumped at once, but I don't see where the final half-filled buffer is flushed to disk in the code.
Anyway, as far as I am concerned, the number of system calls for writing annotations is now optimal, so the bug can be marked as fixed.
> I understand that they are all dumped at once, but I don't see where the final half-filled buffer is flushed to disk in the code.
All open out_channels are flushed when the process exits (Pervasives.flush_all is the default exit_function in Pervasives).
|Guillaume, thanks for the detailed report and quick feedback.|
|Actually, it would be useful to compare the total time between the trunk and your patch (which is much less invasive that my attempt to reduce the use of Format and of format strings).|
I have measured again how long it takes to compile why3.cma. There is no doubt about it: your version is so much better than mine. In fact, the huge timing difference on the actual compilation time (20% overhead) kind of frightens me about the performance of Format.
5.30user 1.80system 0:08.72elapsed 81%CPU
5.30user 1.80system 0:08.71elapsed 81%CPU
5.29user 1.78system 0:08.66elapsed 81%CPU
5.25user 1.96system 0:08.84elapsed 81%CPU
5.35user 1.75system 0:08.71elapsed 81%CPU
Trunk + reverting stypes.ml + applying my patch - the final flush:
6.36user 1.79system 0:09.75elapsed 83%CPU
6.34user 1.78system 0:09.72elapsed 83%CPU
6.25user 1.80system 0:09.68elapsed 83%CPU
6.18user 1.92system 0:09.72elapsed 83%CPU
6.34user 1.82system 0:09.77elapsed 83%CPU
Thanks again for the results. They justify the extra effort put into de-Formatting this piece of code. And indeed, there are reasons to be concerned with the performance of Format and format strings. See:
Hopefully, 0006017 will remove some overhead related to parsing format strings at runtime (but not the overhead related to Format itself, I guess).
|2013-06-04 14:50||gmelquiond||New Issue|
|2013-06-04 14:50||gmelquiond||File Added: stypes.patch|
|2013-06-04 14:55||frisch||Note Added: 0009388|
|2013-06-04 14:57||gasche||Note Added: 0009389|
|2013-06-04 14:57||gasche||Status||new => feedback|
|2013-06-04 15:20||gmelquiond||Note Added: 0009391|
|2013-06-04 15:20||gmelquiond||Status||feedback => new|
|2013-06-04 15:33||gasche||Note Added: 0009392|
|2013-06-04 15:38||frisch||Assigned To||=> frisch|
|2013-06-04 15:38||frisch||Status||new => assigned|
|2013-06-04 15:40||frisch||Note Added: 0009393|
|2013-06-04 15:48||frisch||Note Added: 0009394|
|2013-06-04 15:48||gmelquiond||Note Added: 0009395|
|2013-06-04 15:50||frisch||Note Edited: 0009394||View Revisions|
|2013-06-04 15:51||frisch||Note Added: 0009396|
|2013-06-04 15:52||frisch||Note Edited: 0009394||View Revisions|
|2013-06-04 15:57||frisch||Note Added: 0009397|
|2013-06-04 15:57||frisch||Note Edited: 0009397||View Revisions|
|2013-06-04 16:00||gmelquiond||Note Added: 0009398|
|2013-06-04 17:10||frisch||Note Added: 0009399|
|2013-06-04 17:14||frisch||Status||assigned => resolved|
|2013-06-04 17:14||frisch||Resolution||open => fixed|
|2013-06-04 17:14||frisch||Fixed in Version||=> 4.01.0+dev|
|2013-06-04 17:16||frisch||Note Added: 0009400|
|2013-06-04 17:38||frisch||Note Added: 0009402|
|2013-06-04 18:44||gmelquiond||Note Added: 0009404|
|2013-06-04 21:24||frisch||Note Added: 0009405|
|Copyright © 2000 - 2011 MantisBT Group|