In error messages, source highlighting does not work for long toplevel phrases #7925

vicuna · 2019-02-20T08:52:25Z

Original bug ID: 7925
Reporter: @Armael
Status: confirmed (set by @gasche on 2019-02-20T15:37:04Z)
Resolution: open
Priority: normal
Severity: minor
Version: 4.09.0+dev
Category: toplevel
Tags: junior_job
Monitored by: @nojb

Bug description

When reporting an error, the toplevel (and since 4.08 also the compiler) highlights the line(s) of the source corresponding to the error:

# let x = 1 + "abc" in ();;
Line 1, characters 12-17:
1 | let x = 1 + "abc" in ();;
                ^^^^^
Error: This expression has type string but an expression was expected of type
         int

However, in the toplevel, if the toplevel phrase is long enough (and the error is near the beginning), the code highlight feature stops working:

# let x = 1 + "abc" in
let x = 1 in
[...] (same x75 times)
let x = 1 in ();;
Line 1, characters 12-17:
Error: This expression has type string but an expression was expected of type
         int

I have seen this issue in a teaching situation (in combination with other issues that have now been fixed in 4.08/trunk),
here students were using emacs+tuareg and using M-x tuareg-eval-buffer to evaluate their 1000-lines long code.

How to fix

The "code highlight" feature for the toplevel is implemented by directly peeking into the lexer's buffer. This stops working if the lexbuf has been refilled in the meantime, which happens if the input is long enough.
A fix would probably to instead have the toplevel store the complete current toplevel phrase in a Buffer.t, and have the error reporting function read into that instead of the lexing buffer.

Steps to reproduce

Paste the attached source snippet into the toplevel.

File attachments

longphrase.ml

The text was updated successfully, but these errors were encountered:

vicuna · 2019-03-01T07:21:19Z

Comment author: kaaira

can i please work on this issue?

mylekiller · 2019-04-01T17:31:07Z

Is there anyone currently working on this bug? If not I would like to try it as a first contribution to the project. Any help in getting started would be greatly appreciated.

gasche · 2019-04-02T06:40:49Z

User "kaaira" mentioned interest last month, but we haven't heard from them since, so I would assume that no one is working on this bug.

On how to get started: I see two approaches

We could just extend the size of the lexing buffer beyond what a user would reasonably send to their toplevel. Currently the size is set in stdlib.ml/lexing.ml in from_function, and is hardcoded to 1024 (bytes). We could bump this to, say, 2^17 (131072), if we decided that the increase in memory usage is reasonable; or we could make this value an optional parameter of the function, and only bump it for the one call that initializes the toplevel lexing buffer, in toplevel/toploop.ml:loop.
We could follow the discipline suggested by @Armael of storing the whole toplevel phrase in a buffer exposed as a new global value by toplevel/toploop. Then in the part of parsing/location.ml that highlights code sections, when we detect that the input name is "//toplevel//", we can go look in that phrase buffer instead of the lexing buffer. This is a cleaner solution, but it is a bit more delicate to implement because you have to understand the phrase-parsing logic in toploop (where phrases start and stop) to reset the phrase buffer at the right time. I think that you should add content to this buffer in toploop.ml:refill_lexbuf and reset it where Lexing.flush_input is called.

Note: To understand how toplevel parsing work, in addition to toplevel/toploop.ml (and maybe stdlib/lexing.ml) you may want to have a look at parsing/parse.ml, the skip_phrase business.

mylekiller · 2019-04-02T15:34:18Z

Thank you so much for your help! I will get started on this.

mylekiller · 2019-04-11T06:29:12Z

While working on this issue, I came across something that I need to be clarified. Line 355 of parsing/location.ml reads if !lines >= Terminfo.num_lines stdout - 2 then raise Exit;. This is in the highlighting code and it seems to me that what this is doing is that regardless of how big the buffer storing the phrase is, the code will bail out if the phrase exceeds a certain number of lines. Is my thinking on this issue correct and if so how would I go about fixing the code so that the cursor can be moved up an arbitrary amount of lines and doesn't bail out for expressions with exceptionally long lines?

Armael · 2019-04-11T06:55:40Z

Ah, indeed. I think my original bug report is missing some information, and I'm sorry about that.

The interactive toplevel has two different ways of "highlighting" an error. In the "fancy" mode, it uses ANSI codes to go up in the terminal and underline the part of the input phrase corresponding to the location. This is the mode used by default (provided your terminal supports this but it is generally the case). The function responsible for this is highlight_terminfo as you have seen.

Alternatively (typically, because it is running in emacs through M-x eval-buffer, or if you run the toplevel with TERM=dumb in the environment), the toplevel can fall back to "highlighting" the location by printing it again (just like the normal compiler output).

Regarding this bug report, I was originally concerned only by the "dumb" highlighting mode. So I think it would already be nice to get this one fixed.
After this is done and if you feel motivated, you could try investigating if something could be done for the "fancy" mode as well. Maybe it could fallback to the "dumb" mode if the error appears outside of the terminal screen -- or maybe underlining text outside of the terminal screen is okay (the user would just have to scroll up to see it), and the test if !lines >= Terminfo.num_lines ... could simply be removed.

mylekiller · 2019-04-11T06:59:11Z

Ahhh thank you so much for the clarification that helps out a lot!

…s for long inputs to the toplevel

Fix #7925: error messages for long toplevel inputs would have dummy locations

vicuna added toplevel newcomer-job bug labels Mar 14, 2019

mylekiller added a commit to mylekiller/ocaml that referenced this issue Apr 11, 2019

This commit Fixes Issue ocaml#7925 which has to do with error message…

25cb0b6

…s for long inputs to the toplevel

mylekiller mentioned this issue Apr 11, 2019

Fix source highlighting for long toplevel phrases (#7925) #8611

Merged

Armael closed this as completed in #8611 Sep 11, 2019

Armael pushed a commit that referenced this issue Sep 11, 2019

Fix source highlighting for long toplevel phrases (#7925) (#8611)

03ddf29

Fix #7925: error messages for long toplevel inputs would have dummy locations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In error messages, source highlighting does not work for long toplevel phrases #7925

In error messages, source highlighting does not work for long toplevel phrases #7925

vicuna commented Feb 20, 2019

vicuna commented Mar 1, 2019

mylekiller commented Apr 1, 2019

gasche commented Apr 2, 2019 •

edited

mylekiller commented Apr 2, 2019

mylekiller commented Apr 11, 2019

Armael commented Apr 11, 2019

mylekiller commented Apr 11, 2019

In error messages, source highlighting does not work for long toplevel phrases #7925

In error messages, source highlighting does not work for long toplevel phrases #7925

Comments

vicuna commented Feb 20, 2019

Bug description

How to fix

Steps to reproduce

File attachments

vicuna commented Mar 1, 2019

mylekiller commented Apr 1, 2019

gasche commented Apr 2, 2019 • edited

mylekiller commented Apr 2, 2019

mylekiller commented Apr 11, 2019

Armael commented Apr 11, 2019

mylekiller commented Apr 11, 2019

gasche commented Apr 2, 2019 •

edited