New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose a way to inspect the current call stack #5899
Comments
Comment author: @dbuenzli A little bit OT but could we at the same time have the same resulting type as Printexc.current_call_stack for a variant of Printexc.get_backtrace ? I already had to parse this string more than once (notably to be able to pretty print it). Thanks. |
Comment author: @alainfrisch Daniel: agreed! |
Comment author: @mshinwell We already have something similar in Core (see module [Backtrace]). I was planning to clean this up; maybe we could merge these patches. |
Comment author: @alainfrisch Mark: great! Let's wait to see if there is any theoretical opposition against the proposal from other core developers. I really believe the feature deserves to be in the core system. |
Comment author: @diml I tried that too for Lwt once ;-) But it wasn't enough so I didn't proposed it. I'm also for having this. |
Comment author: @jhjourdan What I propose is the following : 1- Two functions set_backtrace_limit : option int -> unit and get_backtrace_limit : unit -> option int. set_backtrace_limit changes the mode of the backtrace recorder.
This allows the client code to easily implement the current_call_stack function itself, while allowing to record a full backtrace when an exception is raised for debugging purposes. 2- Exposing the backtrace internal type of the Printexc module, and the function convert_raw_backtrace : raw_backtrace -> backtrace option, in order to make the client code able to manipulate the backtrace programatically. I am not really in favor of 2), because what is recorded in the backtrace is not well specified. It has subtle interaction with the "-g" flag, with whether the code is executed in bytecode mode or in native mode... |
Comment author: @gasche So the way to implement current_call_stack would be to raise an exception and immediately catch it back, with a longer backtrace limit? I think that's a bit ugly, and besides you have no way to tell in your interface that you want the trace upto the beginning of the application (sure you can put max_int there, but urgh.). On the other hand, if you had a feature to just return the N last cells (or all upto the start if None is passed), you could also express the behavior of recording "beyond the exception handler" in user-code: just use this new feature in the "with" block, that by construction will see the same cell history as the "raise" it catched (I mean "get 10 more cells higher than my try" at raise point is equivalent to "get 10 cells" at the handler point). So while I agree that "get farther than my raise" and "get me some cells" are equivalently expressive, I think the latter is a less convoluted primitive to start with. |
Comment author: @alainfrisch
I agree that this is indeed more user-friendly. |
Comment author: @gasche Another point about the different interface: it allows to have the number of desired cell in the function call, instead of as a global mutable variable (which was necessary with the previous model, as "raise" cannot be parametrized). |
Comment author: @jhjourdan I understand that not using a global variable seems to be a good thing. However, for debugging, using a global variable is convenient, as it does not imply modifying the handler code. Concerning the fact that there is no way to ask for the whole trace, I am in favor of that : the user has to know that putting a big value here could be a performance problem in case of a big backtrace. |
Comment author: @alainfrisch
I don't known what kind of debugging you have in mind, but for many uses cases of the proposed feature ("get a description of the top N slots of the current call stack"), there won't even be any exception at all in the client code. |
Comment author: @jhjourdan
This is not a real problem: you can create one as in the proposed patch. I agree that for the use cases of the proposed feature, the problem does not change. However, I see an other use case : what I have in mind is that sometimes it is not enough to see the backtrace of an exception, and having some context on where it has been catched is usefull. Both approaches can help. It can be achieved with the global variable approach by modifying the global variable, and in the gasche's approach by modifying the handler's code. My point is that while debugging, modifying a global variable is maybe more convenient than modifying a handler : there might be several interesting handlers, you have to keep track where they are... By the way, I am thinking of an other feature that could be interesting : why not adding a record_and_raise primitive of the Printexc module, that raises an exception, recording its backtrace even if the backtrace flag is deactivated. That could be interesting, for example, when a performance critical code raises an exception signaling an error, while this code uses exceptions for control (so that you do not want to activate backtraces globally). What do you think ? |
Comment author: @alainfrisch
I'm not against the global variables (in Printexc) if you think it can simplify debugging existing code, but the standard library should also expose a nice API (maybe not in Printexc) to retrieve with top of the stack without a hack (changing a global value, raising a dummy exception, filtering the result, restoring the global value).
I don't think you can easily traverse intermediate exception handlers which re-raise the exception.
I don't think that in practice people will segregate their code into performance-critical modules (compiled without -g) and others (compiled with -g). Tweaking compilation flags on a file-by-file basis is not realistic for large code bases. The safe assumption is rather that code running production will be entirely compiled with -g and executed with stacktrace enabled, and then it makes more sense to provide a light "raise" for those cases where performance matters. |
Comment author: @jhjourdan I uploaded a patch to implement the feature. It brings the following advantages from the original one :
It is still needed to provide a way to programmatically inspect raw_bactrace inhabitants. On this question, I do not know what is the best thing to do : we can either make public the "backtrace" private type of the Printexc module, or provide an interface in the style of the alain's patch. |
Comment author: @gasche Call me superstitious, but I would feel better without the name change and corresponding asm-level renames. What about just keeping ..._backtrace and adding ..._backtrace_custom? Regarding programmatic inspection, I think that the "backtrace" type inside the Printexc module should remain abstract (to allow for easy change of implementation later), with a conversion function (that may be just the identity) to a public type, that may be either the current type or Alain's simpler interface. Of course, so far we aren't exposing any of the functions manipulating this middle-end type, so the distinction between this and a fully public type does not exist yet. |
Comment author: @jhjourdan New version : keeping the names of the ASM functions, added list_of_raw_backtrace. |
Comment author: @alainfrisch I believe the new [current_backtrace] function should rather be in Pervasives since it is not related to exceptions. |
Comment author: @gasche I'm worried about the (<> "printexc.ml") hack coming from Alain's original patch. What is that supposed to do (it would deserve a comment in the code at least), and won't it blow in our face in some scenarios? (You really want to trust the backtrace-reporting functions.) Other than that, I think the documentation for current_backtrace should mention the fact that it will collecth the [n] last frames, or less if less are available. I understand the idea of not favoring the "get me all traces" scenario (though frankly I'm still not fond of having to use a high constant in this case), but the documentation should at least make clear that it is possible and won't fail instead. Alain > moving current_backtrace to Pervasives may require a bit of work, as it currently depends an the abstract type defined in Printexc, and we can't have Pervasives depending on Printexc. Before Jacques-Henri gently does the work, are you sure this is desirable, and do you have more precise guidelines on what you want (eg. should we have raw_backtrace defined in Pervasives and re-exported in Printexc? Only in pervasives? What about an eventual exposed 'backtrace' type? Should 'record_backtrace' also move?). My gut feeling is that it's a bit perfectionist. |
Comment author: @alainfrisch
In the original form, it was supposed to remove the top stack slots corresponding to the fake exception handler used to trigger the stack capture. (IIRC, we needed to drop 1 or 2 slots depending on whether we were running bytecode vs. native).
Desirable, yes, important no.
If the two features share the same type, then yes, raw_backtrace could go to Pervasives. I don't think it is then required to re-export it from Printexc. Or maybe we could introduce a new
Same as raw_backtrace, I guess.
This is related to exceptions (even if not to printing them), so Printexc seems fine. An alternative could be to introduce a new stdlib module "Backtrace", where everything related to stack traces (even for exceptions) would go. Printexc would only keep the exception-printing functions (plus the one which exist in 4.00, marked as deprecated, for compatibility). But this might be overkill for a small feature, hence my suggestion to move the new things to Pervasives. But you should definitely check with Xavier! |
Comment author: @jhjourdan
I removed it. It is no longer necessary in this context. Concerning Printexc vs. Pervasives vs. Backtrace, I would be definitely in favor of putting it in a new Backtrace module, and this is not very difficult. But this is a rather more philosophical question. There is another perfectionists-kind problem : now, the printer of backtraces is shared between exception backtraces and this new kind of backtraces. This causes the first line of the backtrace to be : "Raised by primitive operation at ...." instead of "Called at ..." I do not know what kind of "primitive operation" can cause this message appear in the case of an exception (array accesses ? match failures ? static raises ? ... ?), but I wonder whether it is possible to "re-raise by primitive operation", in which case the message won't be displayed correctly anyway... |
Comment author: @gasche Xavier remarks that a Backtrace module would risk conflicting with userspace modules of the same name (that exist in some dark place with probability 1). Alain > none of Jacques-Henri's recent raw_backtrace work has been released yet; as far as I know, there was no change in Printexc since 3.12, so we have no deprecation to worry about: the raw_backtrace stuff can be moved, and even renamed, if deemed necessary. After some more reflection, I find it weird that list_of_raw_backtrace may lose information if called on a trace recorded from an exception: as the re-raise information is ignored, it is only an injection if it is only used on traces produced by current_backtrace. I think we should either have two types (error_trace and call_trace) and two printing functions, or expose a representation that can fully handle both situations (eg. the one currently used in printexc). A conceptual test for any API design would be to check that the printing function(s) can be reimplemented from the user side, using only the exposed program-friendly representation. |
Comment author: @alainfrisch
I know, but there was already some backtrace-oriented functions in Printexc in 4.00 (like record_backtrace).
Ah, I guess we desperately need namespaces, then. We could use OCaml_backtrace for now and hope that this namespace system will allow it to be re-mapped to just "Backtrace" in client code :-) Or just put everything in Pervasives. Note that technically, any addition to the stdlib could break existing code (with probability < 1, I agree). |
Comment author: @jhjourdan So, what is the conclusion of this discussion ? Should I create a new module of the stdlib, called OCaml_backtrace, that contains all the functions related to backtraces, and depreciate existing ones ? I would rather not adding this to Pervasives, as it is expected to be used only by some specialized parts of code. |
Comment author: @alainfrisch
I don't know. After all, keeping everything in Printexc -- even if illogical -- might not be so bad. |
Comment author: @gasche I agree that Printexc is a good-enough choice. One more reason is that the security-minded people (eg. LaFoSec study) will want to disable this function as well as the others in Printexc. |
Comment author: @gasche Progress note: I'm currently reviewing the patch and changing/discussing it with Jacques-Henri. |
Comment author: @gasche We came up with a new patch proposal that is eager to get external reviews. (1) patch caml_stash_current_backtrace.diff implements the runtime (2) patch printexc_get_callstack.diff implements the user-visible val current_backtrace: int -> raw_backtrace val list_of_raw_backtrace: raw_backtrace -> (string * int * int * int) list the current patch instead exposes the following interface: val get_callstack: int -> raw_backtrace type loc_info = val loc_info_of_raw_backtrace : raw_backtrace -> loc_info array option val loc_info_to_string : int -> loc_info -> string The type "loc_info" is precisely the one used in the Printexc Note that for both interfaces, the user is forced to give a maximal In the case of both (1) and (2), I would also welcome feedback on the |
Comment author: @dbuenzli Naming.
|
Comment author: @gasche That is sound advice, thanks. However, the naming of loc_info was not of my choosing: contrarily to the rest of the exposed interface that is new, this type is what was used internally in the Printexc implementation. I'll have to check that it is ok to rename it (may make life harder for people with on-top-of-the-compiler patches, etc.), and maybe decide not to change it after all. You have probably noticed that loc_info_of_raw_backtrace is partial (may fail when no debug information is available); since I submitted the patch here, I added an explicit mention of it in the .mli documentation. |
Comment author: @alainfrisch
I'd propose something more self-documented and more extensible. What about: type loc_info = and loc = ? |
Comment author: @gasche My hope was to get the time to develop a more complete solution, but I haven't get to do it yet (and will be away for holidays for most of the time between now and beginning of August). If you or Jacques-Henri wants to tackle something, that would be fine. Otherwise exposing this function is already a good first step for 4.01, and we can discuss additional features for trunk later. I think a nice interface would have been along the lines of: type call_kind = Raise | Call type backtrace = (call_kind * location) array with a C-side variant of caml_convert_raw_backtrace that would return a "callstack" directly (I looked, and most of the code would actually be shared in the extract_loc_info function, so that can be implemented rather painlessly). |
Comment author: @alainfrisch Ok, I'll just expose get_callstack in Printexc (so that it is available in 4.01). This is much better than nothing. |
Comment author: @alainfrisch Printexc.get_callstack is now available (commit 13886 on trunk, 13884/13885 on 4.01). |
Comment author: @dbuenzli Just noting that I'm currently parsing the result of raw_backtrace_to_string and while I thank for having the get_callstack function, I'm slightly annoyed. |
Comment author: @alainfrisch It will also be useful to implement generic operation (hash, compare) for the raw_backtrace datatype. An interesting use of get_callstack is to instrument code to collect statistics such as the "most common" call sites for a given function; being able to compare opaque raw_backtrace would make this quite efficient. |
Comment author: @jhjourdan I come with a new patch to include those remarks. There are several changes:
val convert_raw_backtrace_slot: raw_backtrace_slot -> backtrace_slot Rather than returning an option, it raises Failure when it is not possible to get the debugging information. It seems more idiomatic, especially because the exceptional case cannot appear only for a part of the executable.
|
Comment author: @dbuenzli I'm not sure I understand the rationale behind having both raw_backtrace_slot and backtrace_slot and exposing the structure of raw_backtraces. I'd rather have only backtrace_slot and a function: slots_of_raw_backtrace : raw_backtrace -> backtrace_slot list |
Comment author: @gasche The design is constrained by many different use cases. Jacques-Henri uses raw slots for statistical profiling; during profiling, you do not need to inspect OCaml-friendly information, just to store traces away as fast as possible -- and turn them into non-raw slots or string only later when presenting results. Having access to slots instead of just the raw trace is useful to share common suffixes of different traces. You're interested in programmer-friendly access to trace locations. The current design makes this possible, but maybe not as friendly as one could hope (eg. get a record with well-named fields, as in asmcomp/debuginfo.mli http://github.com/ocaml/ocaml/blob/trunk/asmcomp/debuginfo.mli ), to avoid an additional data conversion pass -- or making the change more invasive. |
Comment author: @dbuenzli Ok didn't get this was being used for other things. No need for a record for me, the access is sufficiently programmer-friendly through pattern matching, especially if this means nice profiling tools in the future... I was just concerned that exposing the raw_backtrace structure was a needless interface commitment. |
Comment author: @alainfrisch I did not look at the implementation, but I'm ok with the proposed API. |
Comment author: @alainfrisch Concerning the part which addresses #6302 (not reloading debug events every time in bytecode), I'm wondering if it wouldn't be worth copying the data into a native C data structure. It's not the RAM usage which worries me, but the fact that we keep a rather big data structure in the OCaml heap (which means some overhead for every major GC). |
Comment author: @jhjourdan Yet another patch, addressing Alain's comment. It is also more efficient, as the events are stored in ram in increasing pc order, so that a binary search can be performed when converting backtrace slots. |
Comment author: @alainfrisch This feature is quite useful, and it would be sad to miss the 4.02 feature freeze because of the API design. I suggest to include the latest patch and mark the API as experimental in the documentation. |
Comment author: @jhjourdan I attach a new version of the patch : it removes the backtrace_slot type, and provide getters and setters over the raw_backtrace type, so that the payload is extensible. I do not expect the conversion to take a long time, as it is mainly a context switch to C code, some bits twiddling and a small allocation. I also documented some functions left undocumented in the previous release. Does the API convince you ? |
Comment author: @alainfrisch I'm ok with this latest API. It will always be possible to add in the future a function which returns all the relevant information together if performance concerns justify it. I'm in favor of pushing the patch to trunk (you might want to wait a few hours to see if someone on caml-devel objects to it). |
Comment author: @dbuenzli I personally dislike the use of
|
Comment author: @jhjourdan I added has_location. |
Comment author: @dbuenzli Thanks ! Another thing. Is there any boolean to check to detect if Failure will be raised ? If that is the case the functions could raise Invalid_argument whenever that boolean is set to a given value, rather than Failure (that's a rather personal point of view, use only the Invalid_argument exception for denoting programming errors, here the error being calling these function when the flag is set to a given value). This boolean could also be used at early program startup to indicate that locations will not be reported rather than having to try it to discover that this will not be the case. |
Comment author: @jhjourdan Basically, Failure is raised when it has not been possible to read the debugging information from a bytecode executable. This can happen for several reasons (not linked with -g, executable file not found, out of memory...). However, this cannot happen when this info has been loaded. That is, if one of these calls succeed, none will fail in the future. I can provide a function to load debugging information before using it, guaranteeing it will not raise Failure in the future. We could choose not to report this error and raise Not_found (or even Invalid_argument) instead, but the error message gives some information about the actual cause of the error. I do not know what is the best thing to do... |
Comment author: @dbuenzli Thought a little bit about alternatives, but maybe it's alright the way it is now then. |
Comment author: @gasche Merged in trunk, with a small variant of Jacques-Henri's API (no more exceptions, the raw_backtrace type is kept private). |
git-svn-id: http://caml.inria.fr/svn/ocaml/version/4.01@13884 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
git-svn-id: http://caml.inria.fr/svn/ocaml/version/4.01@13885 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
Original bug ID: 5899
Reporter: @alainfrisch
Assigned to: @jhjourdan
Status: closed (set by @xavierleroy on 2015-12-11T18:26:52Z)
Resolution: fixed
Priority: normal
Severity: feature
Target version: 4.02.0+dev
Category: standard library
Tags: patch
Related to: #5935 #6302
Monitored by: @gasche mfp @diml @ygrek @jmeber @hcarty @dbuenzli gildor @yakobowski
Bug description
The machinery required to support exception traces actually makes it possible to return, at a given point in the program, (a fragment of) the call stack of the current context. This can be useful on its own. I can think of the following uses:
debug;
log the context of an error, where it is detected, but without failing;
run the code with exception traces disabled, for performance reasons, but manually extract the stack and put it as an argument of some specific exception, when it is raised;
instrument the code to perform context-sensitive profiling (how many times is this function called and from which call sites).
I attach a patch which concretely reuses the mechanism for exception traces. With more effort, it would be possible to do something cleaner. And there is no reason to put the new feature in Printexc, since it is not related to exceptions at all.
File attachments
The text was updated successfully, but these errors were encountered: