Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In error messages, source highlighting does not work for long toplevel phrases #7925

Closed
vicuna opened this issue Feb 20, 2019 · 7 comments · Fixed by #8611
Closed

In error messages, source highlighting does not work for long toplevel phrases #7925

vicuna opened this issue Feb 20, 2019 · 7 comments · Fixed by #8611

Comments

@vicuna
Copy link

vicuna commented Feb 20, 2019

Original bug ID: 7925
Reporter: @Armael
Status: confirmed (set by @gasche on 2019-02-20T15:37:04Z)
Resolution: open
Priority: normal
Severity: minor
Version: 4.09.0+dev
Category: toplevel
Tags: junior_job
Monitored by: @nojb

Bug description

When reporting an error, the toplevel (and since 4.08 also the compiler) highlights the line(s) of the source corresponding to the error:

# let x = 1 + "abc" in ();;
Line 1, characters 12-17:
1 | let x = 1 + "abc" in ();;
                ^^^^^
Error: This expression has type string but an expression was expected of type
         int

However, in the toplevel, if the toplevel phrase is long enough (and the error is near the beginning), the code highlight feature stops working:

# let x = 1 + "abc" in
let x = 1 in
[...] (same x75 times)
let x = 1 in ();;
Line 1, characters 12-17:
Error: This expression has type string but an expression was expected of type
         int

I have seen this issue in a teaching situation (in combination with other issues that have now been fixed in 4.08/trunk),
here students were using emacs+tuareg and using M-x tuareg-eval-buffer to evaluate their 1000-lines long code.

How to fix

The "code highlight" feature for the toplevel is implemented by directly peeking into the lexer's buffer. This stops working if the lexbuf has been refilled in the meantime, which happens if the input is long enough.
A fix would probably to instead have the toplevel store the complete current toplevel phrase in a Buffer.t, and have the error reporting function read into that instead of the lexing buffer.

Steps to reproduce

Paste the attached source snippet into the toplevel.

File attachments

@vicuna
Copy link
Author

vicuna commented Mar 1, 2019

Comment author: kaaira

can i please work on this issue?

@mylekiller
Copy link
Contributor

Is there anyone currently working on this bug? If not I would like to try it as a first contribution to the project. Any help in getting started would be greatly appreciated.

@gasche
Copy link
Member

gasche commented Apr 2, 2019

User "kaaira" mentioned interest last month, but we haven't heard from them since, so I would assume that no one is working on this bug.

On how to get started: I see two approaches

  • We could just extend the size of the lexing buffer beyond what a user would reasonably send to their toplevel. Currently the size is set in stdlib.ml/lexing.ml in from_function, and is hardcoded to 1024 (bytes). We could bump this to, say, 2^17 (131072), if we decided that the increase in memory usage is reasonable; or we could make this value an optional parameter of the function, and only bump it for the one call that initializes the toplevel lexing buffer, in toplevel/toploop.ml:loop.

  • We could follow the discipline suggested by @Armael of storing the whole toplevel phrase in a buffer exposed as a new global value by toplevel/toploop. Then in the part of parsing/location.ml that highlights code sections, when we detect that the input name is "//toplevel//", we can go look in that phrase buffer instead of the lexing buffer. This is a cleaner solution, but it is a bit more delicate to implement because you have to understand the phrase-parsing logic in toploop (where phrases start and stop) to reset the phrase buffer at the right time. I think that you should add content to this buffer in toploop.ml:refill_lexbuf and reset it where Lexing.flush_input is called.

Note: To understand how toplevel parsing work, in addition to toplevel/toploop.ml (and maybe stdlib/lexing.ml) you may want to have a look at parsing/parse.ml, the skip_phrase business.

@mylekiller
Copy link
Contributor

Thank you so much for your help! I will get started on this.

@mylekiller
Copy link
Contributor

While working on this issue, I came across something that I need to be clarified. Line 355 of parsing/location.ml reads if !lines >= Terminfo.num_lines stdout - 2 then raise Exit;. This is in the highlighting code and it seems to me that what this is doing is that regardless of how big the buffer storing the phrase is, the code will bail out if the phrase exceeds a certain number of lines. Is my thinking on this issue correct and if so how would I go about fixing the code so that the cursor can be moved up an arbitrary amount of lines and doesn't bail out for expressions with exceptionally long lines?

@Armael
Copy link
Member

Armael commented Apr 11, 2019

Ah, indeed. I think my original bug report is missing some information, and I'm sorry about that.

The interactive toplevel has two different ways of "highlighting" an error. In the "fancy" mode, it uses ANSI codes to go up in the terminal and underline the part of the input phrase corresponding to the location. This is the mode used by default (provided your terminal supports this but it is generally the case). The function responsible for this is highlight_terminfo as you have seen.

Alternatively (typically, because it is running in emacs through M-x eval-buffer, or if you run the toplevel with TERM=dumb in the environment), the toplevel can fall back to "highlighting" the location by printing it again (just like the normal compiler output).

Regarding this bug report, I was originally concerned only by the "dumb" highlighting mode. So I think it would already be nice to get this one fixed.
After this is done and if you feel motivated, you could try investigating if something could be done for the "fancy" mode as well. Maybe it could fallback to the "dumb" mode if the error appears outside of the terminal screen -- or maybe underlining text outside of the terminal screen is okay (the user would just have to scroll up to see it), and the test if !lines >= Terminfo.num_lines ... could simply be removed.

@mylekiller
Copy link
Contributor

Ahhh thank you so much for the clarification that helps out a lot!

mylekiller added a commit to mylekiller/ocaml that referenced this issue Apr 11, 2019
Armael pushed a commit that referenced this issue Sep 11, 2019
Fix #7925: error messages for long toplevel inputs would have dummy locations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants