New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long compiles (to bytecode) / stackoverflow (to native code) on generated data #6713
Comments
Comment author: @gasche It is a general development principle of the compiler distribution that whenever code readability/maintainability is in conflict with the ability to support large generated programs, readability/maintainability is favored. This means that you will probably have to make your code generator smarter to avoid producing unreasonably-large linear structures (which is the cause for the various stack overflows). The unreasonably-large compile-time you observe in bytecode is solely due to the type-checker. I'm not sure of the details, but each polymorphic variant gets an independent polymorphic type, and you have so many it kills the performances. With a couple of minutes and Emacs macros I have changed each occurence of, say, [| into something like (* at the beginning of your program *) [| al; al; al; ... |] The difference is that, because Uucp_break_base.sequence is a closed variant (of the form [ The resulting program -- attached here -- compiles in only a few seconds. Changing your code generator to produce things in this form should be straightforward. I had a quick look at the native stackoverflow; I'm skeptical it can be straightforwardly fixed at the compiler level. I think could just change your code generator to replace huge arrays [| x1; x2; ...; xN |] into something like Array.concat [ |
Comment author: @dbuenzli Ok thanks for the tips. Went anyways to binary diet maps which compiled fine but since they were too slow, I'm now use byte maps; encode the polyvars as a byte and use an array of array of string, recovering the polyvar from the byte is a table lookup away. Just for the record do you know if there's a solution to this problem that wouldn't involve factorization ? Like putting coercions or type constraints ? That would be less painful to generate when you hit the problem. |
Comment author: @gasche Yes, I think It's also possible that change to the type-checker could improve performances of this use-case (I initially suspected it was an instance of a path-compression issue I'm interested in, but unfortunately that's a separate situation). I'm not sure why type annotation propagation doesn't already solve this problem. |
Original bug ID: 6713
Reporter: @dbuenzli
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2016-12-07T10:37:11Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 4.02.1
Category: ~DO NOT USE (was: OCaml general)
Duplicate of: #4405
Monitored by: @dbuenzli
Bug description
I'm adding new data to uucp and the attached file uucp_break_data.ml takes a lot of time (~8min) to compile. It's four large constants of array of array of array of polyvars but those are closed and constrained in the ml file.
This may not be data structure I use in the end for representing that data as it's a bit too large in memory. But there may be something to do about it. This is with 4.02.1 (and 4.01 has the same kind of behaviour, at least for ocamlc) on macosx 10.9.5.
Steps to reproduce
tar -xvzf long-compile.tgz
cd long-compile
ocamlc.opt -c uucp_tmap.ml
ocamlc.opt -c uucp_break_base.ml
real 8m34.324s
user 8m33.472s
sys 0m0.575s
ocamlopt.opt -c uucp_tmap.ml
ocamlopt.opt -c uucp_break_base.ml
time ocamlopt.opt -c uucp_break_data.ml
Fatal error: exception Stack overflow
Called from file "arg.ml", line 214, characters 2-73
real 8m42.277s
user 8m41.358s
sys 0m0.651s
File attachments
The text was updated successfully, but these errors were encountered: