Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006468OCamlOCaml generalpublic2014-06-24 14:242015-01-25 21:59
Reporterwhitequark 
Assigned Towhitequark 
PrioritynormalSeverityfeatureReproducibilityalways
StatusassignedResolutionopen 
PlatformOSOS Version
Product Version4.02.0+beta1 / +rc1 
Target VersionFixed in Version 
Summary0006468: Backtraces in toplevel
DescriptionI just realized there appears to be no way to get backtraces in toplevel. This makes it much less useful. I see there is a patch for 3.11; what's holding its integration?

https://github.com/jaked/ocaml-backtrace-patch/blob/master/patch-3.11.2 [^]
Tagspatch
Attached Filespatch file icon backtrace.patch [^] (50,838 bytes) 2015-01-25 21:04 [Show Content]

- Relationships
has duplicate 0006317confirmed ocamldebug should be able to work with Dynlink code 
related to 0006760assignedwhitequark Closure marshalling in toplevel is broken since at least 4.00 

-  Notes
(0012897)
whitequark (developer)
2014-12-20 13:03
edited on: 2014-12-20 13:07

Reminder sent to: gasche

@gasche, I've reworked the patch to apply on trunk.

$ ocaml
        OCaml version 4.03.0+dev5-2014-10-15

# let f () = raise Not_found;;
val f : unit -> 'a = <fun>
# let g () = f (); 1;;
val g : unit -> int = <fun>
# g ();;
Exception: Not_found.
Raised at file "//toplevel//", line 1, characters 17-26
Called from file "//toplevel//", line 1, characters 11-15
#

The patch is somewhat complex, so I will explain in detail how it works. Logically, it consists out of four parts:

  * First, the runtime representation of the debug information in backtrace.c is changed, so that new chunks of it, corresponding to newly loaded (and unloaded) code can be added and removed.
  * Second, the startup process is changed so that the debug information, if present in the executable, is loaded at startup and not lazily. This affects startup.c and io.c.
  * Third, the bytecode compiler and the toplevel are changed so as to generate and register the debug information for toplevel phrases. This affects bytecomp/emitcode.ml, bytecomp/meta.ml, toplevel/toploop.ml and toplevel/topdirs.ml. This also requires a bootstrap, as new primitives were added.
  * Fourth, the Dynlink module was extended to also register debug information. This affects otherlibs/dynlink/dynlink.ml.

The changes to the runtime representation of the debug information are as follows. In trunk, debug information is represented by an array of events, represented as C structures. In the original patch, it's a list of event lists, represented as OCaml structures. I implemented a middle ground between these approaches: there is a global linked list of all registered debug information, but every entry is a custom block containing a C structure. This allows to keep the fast backtrace extraction code that uses binary search intact.

The change to the startup process is necessary, as caml_stash_backtrace walks the stack by looking at every successive value and checking whether it points into a known range of bytecode. Thus, either the debug information could be loaded at startup, or the code would have to handle the "main" bytecode range specially. I think the former is cleaner.

The original patch also exported a variant of caml_input_val function that allows to allocate the space for the blocks from the old generation right away. I'm not sure why was this done--I don't see a reason for it--and the compiler can bootstrap itself just fine without it when built with -g (i.e. stressing the relevant code), so I assume it's not necessary.

The rest of the changes are straightforward.

(0012964)
frisch (developer)
2014-12-23 16:12

FWIW, it's only a recent addition that debug info is loaded only once in bytecode. Previously, it was loaded everytime it was requested. And this was really slow: on one big application, the order of magnitude was around 1s to load the debug info. Now that this is done at most once, this is fine. But I'm wondering about doing it non-lazily, since one would have to pay this price at startup, which might be sub-optimal e.g. for command-line applications.
(0012965)
whitequark (developer)
2014-12-23 16:22

I had a variant of patch which did it lazily (when Printexc.record_backtrace true was called), but it added quite a bit of complexity for unclear benefit. It also had awkward semantics in presence of dynlinking.

What about not compiling command-line applications with debug info when this is critical?
(0012966)
frisch (developer)
2014-12-23 16:26

> What about not compiling command-line applications with debug info when this is critical?

I think we always want backtraces to get useful information in case of unexpected errors. I can imagine large application with short-lived processes for which raw performance doesn't really matter but for which a delay on startup will be noticed.
(0012968)
whitequark (developer)
2014-12-23 16:29

But this wouldn't work for *unexpected* errors. If the error is unexpected, then the backtrace will not be computed at all for absence of debugging information, and if you always enable backtraces via Printexc or read the debugging information on first raised exception, the same delay will be present.
(0012969)
gasche (developer)
2014-12-23 16:57

I think the style Alain has in mind is to use raise_notraice if and only if he can prove that the exception will be caught, and always enable debug information. There are two cases of "unexpected errors", the ones where you didn't use _notrace (typically out-of-bounds error), which get traces, and the ones where the proof was or became wrong -- then you're out of luck as you say.
(0012971)
whitequark (developer)
2014-12-23 17:08

@gasche, actually 6317 is not a complete duplicate. The reason ocamldebug can't set breakpoints on dynlinked and otherwise loaded code is that the runtime keeps a duplicate array of all bytecode so that the byte that has been in the place of breakpoint could be restored. Of course, this array does not cover dynlinked code.
(0012972)
gasche (developer)
2014-12-23 17:13

Does it seem unreasonable to ask for a patch that also solves PR#6317, then?
(0012973)
whitequark (developer)
2014-12-23 17:15

It would be unrelated to this issue. The way PR#6317 should be solved is to move the tracking of replaced bytecode to ocamldebug itself; it is not the runtime's job, and it is even easier to implement in ocamldebug.
(0012974)
gasche (developer)
2014-12-23 17:50

I only did a very quick review of the patch so far (I'm not yet in holidays!):

1) the linked-list-traversal logic is duplicated many times; if I understand correctly, for any program counter there is at most one chunk of debuginfo that matches it. You could thus encapsulate almost all the linked-list logic in a single function

  struct debug_info *caml_find_debuginfo(code_t pc)

that returns a debuginfo di such that (di->start <= pc && pc < di->end), or NULL if none exists.

2) The user-interface of control from the toplevel is a bit unclear to me. If I understand correctly:
- the toplevel always consider "-debug" is passed by default (why?)
- but there is a toplevel directive ("#debug true;;", "#debug false;;") to set it explicitly (when do we expect it to be used?)

3) The first diff chunk after
  @@ -200,7 +210,14 @@ let print_out_exception ppf exn outv =
hurts my OCaml sensibility: there should really be a begin..end bracketing between the "then" and the "match".
(0012977)
whitequark (developer)
2014-12-23 18:05

re 2: yes, #debug is probably not useful.

Regarding -g (Clflags.debug) always being enabled: I can't imagine a reason someone would want to explicitly disable generation of debug information in toplevel, and the performance hit is certainly insignificant here. Do you have other opinion?
(0012987)
gasche (developer)
2014-12-25 22:29

I thought of the following approach to re-enable lazy loading: in caml_add_debug_info, store a dummy description of the debug_info chunk in the global list, with just the start and stop position available (events is NULL). In the caml_find_debuginfo function I suggested above, force the actual computation of the debuginfo information on the chunk that corresponds to the program counter.

This would mean that reading the debuginfo in the file is done eagerly, but the rest is done lazily (there is a "relocation" step that is currently done eagerly and could be delayed or done completely on the fly). We could think of even delaying reading the serialized debuginfo by changing the storage format to serialize the stop position in addition to the start position.

Is that the approach you had tried and that was too complex? I haven't implemented it so it's a game of guessing, but I don't think it would make the code too difficult to follow -- the idea is that the structure of the chunk list does not change when the debuginfo is forced.
(0012988)
whitequark (developer)
2014-12-25 22:30

I was thinking of hardcoding a comparison with caml_start_code. I like your suggestion much better and will implement it.
(0012999)
whitequark (developer)
2014-12-29 05:37

I've updated the patch:

  * #debug toplevel directive was removed;
  * toplevel now sets Clflags.debug earlier, so that $ ocaml t.ml also returns proper backtraces, ditto for ocaml init;
  * find_debuginfo was factored out;
  * debug information from the main executable is loaded lazily; debug information from elsewhere is still loaded eagerly, as I do not anticipate this to cause a slowdown.
(0013004)
gasche (developer)
2015-01-03 20:19

(Sorry for the long time before feedback. The happy ignorance of vacation. I probably won't be able to do much in the following month for other reasons.)

I like the general shape of the patch, but here are a few nitpicking comments.

There is a comment in bytecomp/emitcode.mli that is made incorrect by your patch.

I think "lazy_loaded" is a bad name choice (I am not able to tell from the name alone whether a 1 value suggests the lazy loading has already happened.) I would suggest just "loaded", or "already_read" (it's strange that the function using to make a debuginfo "loaded" would be "read_debug_info"; "read" is also reasonable but ambiguous).

There is a consistency problem with the name "debuginfo" that I unfortunately suggested (it is more natural to me): the existing code uses "debug_info" all over the place. I think you should use "debug_info" as well (in find_debug_info), and maybe in the Debug_info_val macro as well (one may or may not like the fact that the two underscores have a different status here, but that's already used in Unsigned_long_val for example).

Finally, I think you could change event_for_location so that you use an early exit (if (di == NULL)\n\treturn NULL;) rather than a long conditional. This removes the "short case last" issue, but more importantly it allows to go back to the indentation level of the previous implementation, which should make for a more readable diff.
(0013010)
whitequark (developer)
2015-01-06 01:49

Addressed.
(0013174)
gasche (developer)
2015-01-24 18:11

Applying the patch to the current trunk and running the testsuite gives me one test failure in testsuite/tests/typing-objects. The failure message is rather worrying:

$ diff Tests.ml.{result,reference}
303,304c303,304
< # Exception: Invalid_argument "output_value: abstract value (outside heap)".
< # Exception: Invalid_argument "output_value: abstract value (outside heap)".
---
> # - : int * int * int * int * int = (6, 7, 8, 33, 33)
> # - : int * int * int * int * int * int * int = (9, 10, 10, 11, 11, 33, 33)

This might be related to other stuff that happened in trunk recently rather than this patch, or maybe a bootstrap error on my part. Can you reproduce the failure on your machine?
(0013175)
gasche (developer)
2015-01-25 14:09

I can reliably reproduce this failure -- it seems to be a problem of the patch.

Minimized test case, run in the toplevel:

(good) with 4.02.1 or trunk before the patch:

# (Marshal.from_string (Marshal.to_string (object method m = 33 end) [Marshal.Closures]) 0);;
- : 'a = <poly>

(bad) with the patch:

# Marshal.from_string (Marshal.to_string (object method m = 33 end) [Marshal.Closures]) 0;;
Exception: Invalid_argument "output_value: abstract value (outside heap)".
(0013176)
whitequark (developer)
2015-01-25 14:20

Ok. I know why the failure happens and will fix soon.
(0013180)
whitequark (developer)
2015-01-25 19:44

After investigation, I admit that I did not, in fact, know the cause of the failure. What puzzles me is the fact that the following code:

    Marshal.to_string (fun () -> ()) [Marshal.Closures];;

fails in toplevel with the same message since 4.00.1. My guess is that the following code:

    Marshal.to_string (object method m = 33 end) [Marshal.Closures]

triggers the same bug, but only after my patch exposes it somehow.

Since the code you highlight works with both ocamlc and ocamlc -g, but reliably fails in toplevel, I'm going to say that I happened to expose a latent bug in extern.c. I'm not sure which one though, and why is it failing intermittently.
(0013182)
whitequark (developer)
2015-01-25 21:04

After applying the patch for 6760 and adjusting the test outputs for the backtraces that appear, all tests pass.
(0013185)
gasche (developer)
2015-01-25 21:50

I find it disturbing that your patch would provoke a change in difference between closures and object marshalling, and would like to understand better what is happening here. I'll try to get opinions from developpers that actually know about these things (here and PR#6760) before considering merge.
(0013186)
whitequark (developer)
2015-01-25 21:59

Yes, I would like to understand that better too. I think there might be more bugs hidden in code that is triggered by object serialization.

- Issue History
Date Modified Username Field Change
2014-06-24 14:24 whitequark New Issue
2014-07-16 10:21 doligez Severity minor => feature
2014-07-16 10:21 doligez Status new => acknowledged
2014-07-16 10:21 doligez Tag Attached: patch
2014-12-20 12:35 whitequark Assigned To => whitequark
2014-12-20 12:35 whitequark Status acknowledged => assigned
2014-12-20 12:35 whitequark File Added: backtrace.patch
2014-12-20 13:03 whitequark Note Added: 0012897
2014-12-20 13:03 whitequark File Deleted: backtrace.patch
2014-12-20 13:04 whitequark File Added: backtrace.patch
2014-12-20 13:06 whitequark File Deleted: backtrace.patch
2014-12-20 13:06 whitequark File Added: backtrace.patch
2014-12-20 13:07 whitequark Note Edited: 0012897 View Revisions
2014-12-23 16:12 frisch Note Added: 0012964
2014-12-23 16:22 whitequark Note Added: 0012965
2014-12-23 16:26 frisch Note Added: 0012966
2014-12-23 16:29 whitequark Note Added: 0012968
2014-12-23 16:57 gasche Note Added: 0012969
2014-12-23 17:02 gasche Relationship added related to 0006137
2014-12-23 17:03 gasche Note Added: 0012970
2014-12-23 17:04 gasche Note Deleted: 0012970
2014-12-23 17:05 gasche Relationship deleted related to 0006137
2014-12-23 17:05 gasche Relationship added has duplicate 0006317
2014-12-23 17:08 whitequark Note Added: 0012971
2014-12-23 17:13 gasche Note Added: 0012972
2014-12-23 17:15 whitequark Note Added: 0012973
2014-12-23 17:50 gasche Note Added: 0012974
2014-12-23 18:05 whitequark Note Added: 0012977
2014-12-25 22:29 gasche Note Added: 0012987
2014-12-25 22:30 whitequark Note Added: 0012988
2014-12-29 05:34 whitequark File Deleted: backtrace.patch
2014-12-29 05:34 whitequark File Added: backtraces.patch
2014-12-29 05:37 whitequark Note Added: 0012999
2015-01-03 20:19 gasche Note Added: 0013004
2015-01-06 01:49 whitequark File Deleted: backtraces.patch
2015-01-06 01:49 whitequark File Added: backtrace.patch
2015-01-06 01:49 whitequark Note Added: 0013010
2015-01-24 18:11 gasche Note Added: 0013174
2015-01-25 14:09 gasche Note Added: 0013175
2015-01-25 14:20 whitequark Note Added: 0013176
2015-01-25 17:28 gasche Note Added: 0013178
2015-01-25 17:28 gasche Note Edited: 0013178 View Revisions
2015-01-25 17:28 gasche Note Deleted: 0013178
2015-01-25 19:44 whitequark Note Added: 0013180
2015-01-25 20:14 whitequark Relationship added related to 0006760
2015-01-25 21:04 whitequark File Deleted: backtrace.patch
2015-01-25 21:04 whitequark File Added: backtrace.patch
2015-01-25 21:04 whitequark Note Added: 0013182
2015-01-25 21:50 gasche Note Added: 0013185
2015-01-25 21:59 whitequark Note Added: 0013186


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker