New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backtraces in toplevel #6468
Comments
Comment author: @whitequark @gasche, I've reworked the patch to apply on trunk. $ ocaml let f () = raise Not_found;;val f : unit -> 'a = let g () = f (); 1;;val g : unit -> int = g ();;Exception: Not_found. The patch is somewhat complex, so I will explain in detail how it works. Logically, it consists out of four parts:
The changes to the runtime representation of the debug information are as follows. In trunk, debug information is represented by an array of events, represented as C structures. In the original patch, it's a list of event lists, represented as OCaml structures. I implemented a middle ground between these approaches: there is a global linked list of all registered debug information, but every entry is a custom block containing a C structure. This allows to keep the fast backtrace extraction code that uses binary search intact. The change to the startup process is necessary, as caml_stash_backtrace walks the stack by looking at every successive value and checking whether it points into a known range of bytecode. Thus, either the debug information could be loaded at startup, or the code would have to handle the "main" bytecode range specially. I think the former is cleaner. The original patch also exported a variant of caml_input_val function that allows to allocate the space for the blocks from the old generation right away. I'm not sure why was this done--I don't see a reason for it--and the compiler can bootstrap itself just fine without it when built with -g (i.e. stressing the relevant code), so I assume it's not necessary. The rest of the changes are straightforward. |
Comment author: @alainfrisch FWIW, it's only a recent addition that debug info is loaded only once in bytecode. Previously, it was loaded everytime it was requested. And this was really slow: on one big application, the order of magnitude was around 1s to load the debug info. Now that this is done at most once, this is fine. But I'm wondering about doing it non-lazily, since one would have to pay this price at startup, which might be sub-optimal e.g. for command-line applications. |
Comment author: @whitequark I had a variant of patch which did it lazily (when Printexc.record_backtrace true was called), but it added quite a bit of complexity for unclear benefit. It also had awkward semantics in presence of dynlinking. What about not compiling command-line applications with debug info when this is critical? |
Comment author: @alainfrisch
I think we always want backtraces to get useful information in case of unexpected errors. I can imagine large application with short-lived processes for which raw performance doesn't really matter but for which a delay on startup will be noticed. |
Comment author: @whitequark But this wouldn't work for unexpected errors. If the error is unexpected, then the backtrace will not be computed at all for absence of debugging information, and if you always enable backtraces via Printexc or read the debugging information on first raised exception, the same delay will be present. |
Comment author: @gasche I think the style Alain has in mind is to use raise_notraice if and only if he can prove that the exception will be caught, and always enable debug information. There are two cases of "unexpected errors", the ones where you didn't use _notrace (typically out-of-bounds error), which get traces, and the ones where the proof was or became wrong -- then you're out of luck as you say. |
Comment author: @whitequark @gasche, actually 6317 is not a complete duplicate. The reason ocamldebug can't set breakpoints on dynlinked and otherwise loaded code is that the runtime keeps a duplicate array of all bytecode so that the byte that has been in the place of breakpoint could be restored. Of course, this array does not cover dynlinked code. |
Comment author: @whitequark It would be unrelated to this issue. The way #6317 should be solved is to move the tracking of replaced bytecode to ocamldebug itself; it is not the runtime's job, and it is even easier to implement in ocamldebug. |
Comment author: @gasche I only did a very quick review of the patch so far (I'm not yet in holidays!):
struct debug_info *caml_find_debuginfo(code_t pc) that returns a debuginfo di such that (di->start <= pc && pc < di->end), or NULL if none exists.
|
Comment author: @whitequark re 2: yes, #debug is probably not useful. Regarding -g (Clflags.debug) always being enabled: I can't imagine a reason someone would want to explicitly disable generation of debug information in toplevel, and the performance hit is certainly insignificant here. Do you have other opinion? |
Comment author: @gasche I thought of the following approach to re-enable lazy loading: in caml_add_debug_info, store a dummy description of the debug_info chunk in the global list, with just the start and stop position available (events is NULL). In the caml_find_debuginfo function I suggested above, force the actual computation of the debuginfo information on the chunk that corresponds to the program counter. This would mean that reading the debuginfo in the file is done eagerly, but the rest is done lazily (there is a "relocation" step that is currently done eagerly and could be delayed or done completely on the fly). We could think of even delaying reading the serialized debuginfo by changing the storage format to serialize the stop position in addition to the start position. Is that the approach you had tried and that was too complex? I haven't implemented it so it's a game of guessing, but I don't think it would make the code too difficult to follow -- the idea is that the structure of the chunk list does not change when the debuginfo is forced. |
Comment author: @whitequark I was thinking of hardcoding a comparison with caml_start_code. I like your suggestion much better and will implement it. |
Comment author: @whitequark I've updated the patch:
|
Comment author: @gasche (Sorry for the long time before feedback. The happy ignorance of vacation. I probably won't be able to do much in the following month for other reasons.) I like the general shape of the patch, but here are a few nitpicking comments. There is a comment in bytecomp/emitcode.mli that is made incorrect by your patch. I think "lazy_loaded" is a bad name choice (I am not able to tell from the name alone whether a 1 value suggests the lazy loading has already happened.) I would suggest just "loaded", or "already_read" (it's strange that the function using to make a debuginfo "loaded" would be "read_debug_info"; "read" is also reasonable but ambiguous). There is a consistency problem with the name "debuginfo" that I unfortunately suggested (it is more natural to me): the existing code uses "debug_info" all over the place. I think you should use "debug_info" as well (in find_debug_info), and maybe in the Debug_info_val macro as well (one may or may not like the fact that the two underscores have a different status here, but that's already used in Unsigned_long_val for example). Finally, I think you could change event_for_location so that you use an early exit (if (di == NULL)\n\treturn NULL;) rather than a long conditional. This removes the "short case last" issue, but more importantly it allows to go back to the indentation level of the previous implementation, which should make for a more readable diff. |
Comment author: @whitequark Addressed. |
Comment author: @gasche Applying the patch to the current trunk and running the testsuite gives me one test failure in testsuite/tests/typing-objects. The failure message is rather worrying: $ diff Tests.ml.{result,reference}
|
Comment author: @gasche I can reliably reproduce this failure -- it seems to be a problem of the patch. Minimized test case, run in the toplevel: (good) with 4.02.1 or trunk before the patch: (Marshal.from_string (Marshal.to_string (object method m = 33 end) [Marshal.Closures]) 0);;
(bad) with the patch: Marshal.from_string (Marshal.to_string (object method m = 33 end) [Marshal.Closures]) 0;;Exception: Invalid_argument "output_value: abstract value (outside heap)". |
Comment author: @whitequark Ok. I know why the failure happens and will fix soon. |
Comment author: @whitequark After investigation, I admit that I did not, in fact, know the cause of the failure. What puzzles me is the fact that the following code:
fails in toplevel with the same message since 4.00.1. My guess is that the following code:
triggers the same bug, but only after my patch exposes it somehow. Since the code you highlight works with both ocamlc and ocamlc -g, but reliably fails in toplevel, I'm going to say that I happened to expose a latent bug in extern.c. I'm not sure which one though, and why is it failing intermittently. |
Comment author: @whitequark After applying the patch for 6760 and adjusting the test outputs for the backtraces that appear, all tests pass. |
Comment author: @gasche I find it disturbing that your patch would provoke a change in difference between closures and object marshalling, and would like to understand better what is happening here. I'll try to get opinions from developpers that actually know about these things (here and #6760) before considering merge. |
Comment author: @whitequark Yes, I would like to understand that better too. I think there might be more bugs hidden in code that is triggered by object serialization. |
Comment author: @jhjourdan Now I understand this: when using this patch, the compiler is instructed to |
Comment author: @gasche Now that we also have the patch for #6760, I'm wondering why we have this duplication of two different tables that basically index information by code fragment intervals. Could we not store the (delayed) ev_info information (the fragment-local set of debug events) in a new field of the caml_code_fragments table? |
Comment author: @whitequark @gasche, not easily, since the caml_code_fragments table is managed in C but caml_debug_info is managed by OCaml GC. It would require a major refactoring and I am not sure what is the value. |
Comment author: @gasche I applied the patch as-is. You changed some tests in the testsuite (exceptions raised from toplevel tests, to add a backtrace), but on my machine the tests are run without OCAMLRUNPARAM=b so I reverted those changes to have the test pass. It may be more robust to decide once and forall whether we want to set OCAMLRUNPARAM=b to run the toplevel tests. Do you always have b=1 in your OCAMLRUNPARAM environment? |
Comment author: @whitequark Yeah; and as described in #6728 it is my view that OCAMLRUNPARAM=b should be the default. Now that I think about it, didn't I always enable backtraces for the toplevel in this patch? This seems odd... |
Comment author: @whitequark Ah right--the toplevel does not currently (try to) enable backtraces by default, it only does that for -g. I was confused momentarily. In my opinion it should. There is little performance reasons not to. What do you think? |
Comment author: @alainfrisch Commit 16077 fixes a bug introduced with this change: caml_init_debug_info needs to be called from caml_startup_code as well to support 'ocamlc -custom'. (The commit messages wrongly mentions caml_init_code_fragments.) |
Original bug ID: 6468
Reporter: @whitequark
Assigned to: @whitequark
Status: closed (set by @xavierleroy on 2016-12-07T10:49:08Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.02.0+beta1 / +rc1
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Tags: patch
Has duplicate: #6317
Related to: #6760
Monitored by: @gasche @hcarty
Bug description
I just realized there appears to be no way to get backtraces in toplevel. This makes it much less useful. I see there is a patch for 3.11; what's holding its integration?
https://github.com/jaked/ocaml-backtrace-patch/blob/master/patch-3.11.2
File attachments
The text was updated successfully, but these errors were encountered: