Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005809OCamlOCaml generalpublic2012-11-05 17:402012-11-08 14:53
Reporterfrisch 
Assigned Tofrisch 
PrioritynormalSeverityminorReproducibilityhave not tried
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version4.01.0+dev 
Summary0005809: Generating .cmt files takes a long time, in case of type error
DescriptionWhen a unit does not type-check, a .cmt file is generated, with one entry per valid sub-node of the typed AST.

I've observed a case where compiling a (quite large, but hand-written) module takes a lot of time in case of type error: about 28 seconds, mostly to generate a 8.8 Mb .cmt file. By setting OCAML_BINANNOT_WITHENV, we ask the compiler not to remove environment for AST node; interestingly, the whole compilation now takes only 0.81 seconds and generates a 2.7 Mb .cmt file (i.e. smaller than the original one, even though environments are not discarded!). Also, I've tried to remove the hash consing in Cmt_format.keep_only_summary. As expected, the size of the .cmt file increases (to 14 Mb) and compilation takes 1.19 seconds.
TagsNo tags attached.
Attached Files

- Relationships
related to 0005814resolvedfrisch read_cmt -annot does not report internal references 

-  Notes
(0008422)
frisch (developer)
2012-11-05 17:48
edited on: 2012-11-05 17:49

I've also tried replacing hash consing with the following simpler strategy, which avoids repeated successive calls to Env.keep_only_summary on identical environment (compared with physical equality):

===================================================================
--- typing/cmt_format.ml	(revision 50428)
+++ typing/cmt_format.ml	(working copy)
@@ -76,17 +76,19 @@
   try ignore (Sys.getenv "OCAML_BINANNOT_WITHENV"); false
   with Not_found -> true
 
-(* Re-introduce sharing after clearing environments *)
+let last_env = ref None
+
 let env_hcons = Hashtbl.create 133
 let keep_only_summary env =
-  let new_env = Env.keep_only_summary env in
-  try
-    Hashtbl.find env_hcons new_env
-  with Not_found ->
-    Hashtbl.add env_hcons new_env new_env;
-    new_env
-let clear_env_hcons () = Hashtbl.clear env_hcons
+  match !last_env with
+  | Some (e0, e1) when e0 == env -> e1
+  | _ ->
+      let new_env = Env.keep_only_summary env in
+      last_env := Some (env, new_env);
+      new_env
 
+let clear_env_hcons () = last_env := None
+
 module ClearEnv  = TypedtreeMap.MakeMap (struct
   open TypedtreeMap
   include DefaultMapArgument
===================================================================


This gives a 8.9 Mb .cmt file, generated in 1.16 seconds.

I'm still confused as to why setting OCAML_BINANNOT_WITHENV generates a much smaller .cmt file.

(0008424)
frisch (developer)
2012-11-06 09:31

The explanation for the increase of size of the .cmt file is actually quite simple: in "error mode", the cmt file lists all type-checked nodes; mapping over them to simplify the environment loses all the sharing between them. I can see two simple work-arounds:

  - Do not export the children of an exported node. (When a node is to be added to the list of exported nodes, this list is first back-tracked to its content before the node was considered.)

  - In error mode, never discard the environment.

  - Keep sharing when mapping the AST to simplify the environment. (Likely to reduce the size of .cmt file but increase even more the compilation time.)


One should also consider displaying the error message before creating the .cmt file.
(0008425)
frisch (developer)
2012-11-06 10:04

A concrete reproduction case with typing/typecore.ml from the OCaml compiler, compiled with ocamlc:

  0.54s without -bin-annot
  0.98s with -bin-annot -> 1.1 Mb .cmt file
  0.54s with -bin-annot and OCAML_BINANNOT_WITHENV=1 -> 1.4 Mb .cmt file
  0.57s with -bin-annot and the "simpler hash-consing" patch -> 1.1 Mb

Now, add a type error at the end of the file:

  0.37s without -bin-annot
  7.82s with -bin-annot -> 4.6 Mb .cmt file
  0.39s with -bin-annot and OCAML_BINANNOT_WITHENV=1 -> 1.5 Mb .cmt file
  0.54s with -bin-annot and the "simpler hash-consing" patch -> 4.8 Mb


What about always keeping the environment in the cmt file? This is much faster, and does not seem to produce huge cmt files.

If we insist on discarding the environment, I'd suggest to adopt a more efficient way to implement that, maybe with my "simpler hash-consing" patch (less sharing, slight increase of size of .cmt files, but more more efficient) and/or by making the env fields
in nodes mutable (so that discarding the environment can be done in place without loosing sharing between nodes in the error case),
or even by making the fields of the env record themselves mutable (so that all sharing can be kept without a costly hash-table).
(0008434)
frisch (developer)
2012-11-06 18:22

Keeping environments would also solve 0005814.
(0008445)
frisch (developer)
2012-11-08 11:34

Commit 13078: remove hash-consing of environments, replaced by a much cheaper one-slot memoization.
(0008446)
frisch (developer)
2012-11-08 11:55

Commit 13080: do not keep both an expression and its sub-expressions in a partial .cmt file (in case of type error).
(0008448)
frisch (developer)
2012-11-08 14:49
edited on: 2012-11-08 14:49

Commit 13081: do not store a structure item together with sub-components in partial cmt files.

(0008449)
frisch (developer)
2012-11-08 14:53

New timings for compiling typing/typecore.ml with ocamlc (not ocamlc.opt):

 0.54s without -bin-annot
 0.56s with -bin-annot -> 1162282 bytes for the .cmt file

With a type error inserted at the end of the module:

 0.35s without -bin-annot
 0.36s with -bin-annot -> 1170623 bytes for the .cmt file

- Issue History
Date Modified Username Field Change
2012-11-05 17:40 frisch New Issue
2012-11-05 17:48 frisch Note Added: 0008422
2012-11-05 17:49 frisch Note Edited: 0008422 View Revisions
2012-11-06 09:31 frisch Note Added: 0008424
2012-11-06 10:04 frisch Note Added: 0008425
2012-11-06 18:22 frisch Note Added: 0008434
2012-11-06 18:22 frisch Relationship added related to 0005814
2012-11-08 11:34 frisch Note Added: 0008445
2012-11-08 11:55 frisch Note Added: 0008446
2012-11-08 14:24 frisch Note Added: 0008447
2012-11-08 14:49 frisch Note Deleted: 0008447
2012-11-08 14:49 frisch Note Added: 0008448
2012-11-08 14:49 frisch Note Edited: 0008448 View Revisions
2012-11-08 14:53 frisch Note Added: 0008449
2012-11-08 14:53 frisch Status new => resolved
2012-11-08 14:53 frisch Fixed in Version => 4.01.0+dev
2012-11-08 14:53 frisch Resolution open => fixed
2012-11-08 14:53 frisch Assigned To => frisch


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker