Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocaml parser error are not meaningfull most of the time #5270

Closed
vicuna opened this issue May 19, 2011 · 16 comments
Closed

Ocaml parser error are not meaningfull most of the time #5270

vicuna opened this issue May 19, 2011 · 16 comments

Comments

@vicuna
Copy link

vicuna commented May 19, 2011

Original bug ID: 5270
Reporter: nicolas_boulay
Status: acknowledged (set by @damiendoligez on 2012-06-27T13:08:03Z)
Resolution: open
Priority: normal
Severity: feature
Version: 3.12.0
Category: lexing and parsing
Child of: #5068
Monitored by: @gasche

Bug description

Ocaml parser did not give enought precise informations on typo error. If you forget a ';' or a 'in', the compilation finish with a "syntax error" pointing to the end of the file. The code shown is an example of code that can make loose a lot of time to debug, compare to typical other langage like java under Eclipse.
Don't forget that humain makes stupid mistakes :) Most of the time, this kind of code is debugged by commenting the code until it compiles.

There is also a problem on the type checker that gives the position of the first incoherency and not the "definition" place, where the bug could be. Both informations is needed most of the time. To debug this, each the "definition" place should be guess and verified4.

This unfriendliness behavior of the tools is a real pain for beginners.

Additional information

type val_t =
| Int32 of Int32.t
| Int24 of int
| Int16 of int
| Int8 of int
| String of string

type df_t = {
mutable h: (val_t * string) list
mutable is_reverted: bool
}
(*
/!\ This list is used backward to be in 0(1) at the insertion. Then
everything as to be revert before printing.
*)

let add_last (ctx:df_t) (comment:string) (x:val_t) =
assert (not ctx.is_reverted);
ctx.is_reverted <- false;
ctx.h <- (x,comment)::ctx.h

let revert_before_printing (ctx:df_t) =
assert (not ctx.is_reverted);
ctx.is_reverted <- true;
ctx.h <- List.rev ctx.h

let add_df_footer ctx =
add_last ctx Int8(0x0E) "FOOTER";
add_last ctx Int24(0x000000) "UnusedPad"

@vicuna
Copy link
Author

vicuna commented May 20, 2011

Comment author: @ygrek

for the first problem there is a (standard?) solution - camlp4o source.ml

@vicuna
Copy link
Author

vicuna commented May 20, 2011

Comment author: @damiendoligez

I don't get it. ocamlc reports a syntax error at the second "mutable". It's obvious that a semicolon is missing right before that keyword.

Can you give an example where it reports a syntax error at the end of the file?

Also, for beginners my advice is to use ;; everywhere, it helps the parser figure out things.

For the typing errors, it might also be an inconsistency between two uses. It's a research problem to find the best way to display type errors. But most of the time, the hard-to-understand errors are in recursive definitions, and in that case you can annotate the definitions with the expected types and get good error reports.

Lastly, if you find it hard to track down the definition of a given identifier, the -annot option and the caml-types.el emacs file can help you.

@vicuna
Copy link
Author

vicuna commented May 20, 2011

Comment author: nicolas_boulay

It's obvious only if you know that 'type' need a ';'. The error points the entire following line, but the error is just before.

Sorry, I can't find a way to reduce my error case where the error is reported at the end of the file. It was each time a missing ';' or a missing 'in'.

For the type error case, the problem is that you use a function before it's definition. So if you give a wrong argument, the error will be reported to the definition of the argument and not at the faulty call. I understood that the compiler did not know which of the 2 places are faulty. But why give you only the place of the incoherency instead of given the 2 places. The error message should juste add the line/column of both place, not only the last one. Otherwise you have to review all the call site of the function.

I haven't try yet the ocaml plugin for eclipse, but todays, java errors are pointed with red underline as you type, which give you a hudge productivity.

When you see a line/column with some error of ocaml, the error could be elsewhere and you don't have any hint of the kind of error. For me, it's like coming back of the year of gcc 2.95 where header error are reported in the C file there are included.

I did not know -annot use, caml-type.el, neither camlp4o, have you a place where you describe such tools to track tricky bugs ?

@vicuna
Copy link
Author

vicuna commented May 23, 2011

Comment author: nicolas_boulay

Here is a new example of stupid typo, where you can loose lot of time :


let digit_of_unsigned (u:int) =
"u"

let string_of_u8 u:int =
assert(0<=u && u<=(1 lsl 8));
let u1 = u mod (1 lsl 4) in
let u2 = u / (1 lsl 4) in
(digit_of_unsigned u2) ^ (digit_of_unsigned u1) (ocamlc: File "essai2.ml", line 8, characters 2-49: Error: This expression has type string but an expression was expected of type int)


let copy_df_t (from:df_t) (_to:df_t) =
assert (not tfrom.is_reverted);
assert (not _to.is_reverted); (File "essai.ml", line 57, characters 2-4: Error: Syntax error)
to.n <- _to.n+from.n;
to.h <- from.h @ _to.h;*)
()
;;

@vicuna
Copy link
Author

vicuna commented Apr 10, 2012

Comment author: @damiendoligez

For the first example, it's not a stupid typo. If you specify that string_of_u8 returns an int, you should not return a string.

For the second example, you should use an emacs mode that colors the keywords, then the problem becomes evident.

@vicuna
Copy link
Author

vicuna commented Apr 11, 2012

Comment author: @pierreweis

As Damien said, please add a double semicolon at the end of each of your phrases. This way the parser would not run for ever to find the rest of the code.

Concerning you type error example, it seems to be clear enough

line 8, characters 2-4:
Error: This expression has type string but an expression was expected of type int

Since character 2-4 are exactly "u", I cannot understand what better message you are expecting from the compiler ?

Also, adding type annotation allover the place is not a very good idea in general: it is a waste of time and further data type modifications becomes cumbersome since you need to modifiy these annotations.

For your information:

  • ``-annot'' is a compiler option to (automatically) annotate your program with types. This way you can read the types from the emacs editor with a simple key binding (C-x-t by default); this is of utmost help when debugging type errors.

  • caml-type.el is an emacs companion package to get this type annotations in emacs. You should add it to your .emacs init file.

  • camlp4 is a preprocessor/parser/pretty-printer for Caml. If you use option
    -pp -camlp4o you get somewhat more relevant syntax error messages. You could use for instance

ocamlc -pp camlp4o file.ml

@vicuna
Copy link
Author

vicuna commented Apr 11, 2012

Comment author: nicolas_boulay

Since 05-2011, i have done some progress in ocaml.

The stupid typo are always where you just edit your code, what ever the compiler said. So you need to compile often the code.

For my point of view type system is a way to declare things 2 times, and compiler check the coherency between the 2. An error is a contradiction between the 2. For a cristal clear error message, some thing as the following should be great : (ocamlc: File "essai2.ml", line 8, characters 2-49: Error: This expression has type string but an expression was expected of type int, file "essai2.ml", line 4, characters 20-23)

I understand that the compiler can't be magic, but it's always strange to have a reference in the source file, which is not where the error is. If you give both location, the error will be between the 2.

I have used the trick of using ";;" everywhere but some people said that this sign is deprecated. So it was not a good idea of keeping it.

When i wrote this feedback, it was to help you understand why beginners find ocaml code "hard to compile".

If you need many tools to debug effectively ocaml code, a dedicated manual should be written.

Nowadays, C compiler suggest the most common error done.

@vicuna
Copy link
Author

vicuna commented Apr 12, 2012

Comment author: @pierreweis

The end of phrase marker ";;" is not deprecated.

It is mandatory for the interactive system and optional for source files.
It is convenient not only to restrict syntax error but also to write code that can be easily cut&paste into the interactive system for rapid testing and modification.

In short: feel free to keep it if you want, it never harms and can be useful sometimes.

@vicuna
Copy link
Author

vicuna commented Apr 12, 2012

Comment author: @pierreweis

The OCaml type system was designed and developed to support programs with no type annotations at all. This is really useful and convenient: you may concentrate on the algorithms and data manipulated instead of thinking about the type of variables and expressions.

Many OCaml programmers never add type annotations to their programs, except in places where it is mandatory, namely in data type definitions and module interface declarations. Elsewhere, just let the compiler handle types.

In short: feel free to annotate your programs with the amount of type information you are comfortable with, knowing that it is not mandatory.

@vicuna
Copy link
Author

vicuna commented Apr 12, 2012

Comment author: nicolas_boulay

I often use annotation to document the code when the name of the parameter could be confusing. This could be also the problem in .mli where only types are shown (i don't like comment that could be obsolete, without any compiler warning:). Maybe labels could help.


For information on new way to write code and algorithm, i think that the following presentation is very interesting. Look at it starting at 17'30.

It's like merging the interpreter and an editor, to show the effect of the code on some exemple.

https://www.youtube.com/watch?v=PUv66718DII at 17 min 30

@vicuna
Copy link
Author

vicuna commented Apr 12, 2012

Comment author: @pierreweis

Last note: you're right about comments that could become obsolete with no compiler warnings. Mutatis mutandis, that's exactly why type annotations can be harmful in your programs: they can become obsolete, because you renamed a type or change its definition. But then the compiler will not transparently modify the type names in the annotations you wrote: you will have to maintain these useless type annotations. Not a big deal, but a real burden and useless waste of time!

@vicuna
Copy link
Author

vicuna commented Jun 27, 2012

Comment author: @damiendoligez

For type errors, it's not just a question of reporting "both" places, because a type error is an inconsistency that can involve an arbitrary number of pieces of the program. So it's really not easy to find an algorithm that gives good error messages.

@vicuna
Copy link
Author

vicuna commented Jun 27, 2012

Comment author: nicolas_boulay

Type annotation can not be false silently, being annoyed by the compiler is much less a problem rather than having false comment (beside the fact that ocaml type declation is heavier graphicaly than C type declaration).

You could have many place that have conflicting type for sur. But ocamlc stop at the first mismatch giving the expected type and the current type. You give the current type position, why not giving also the position seen for the expected type ?

From my external point of view, it's look like have a Lexing.position beside each type, and giving this value when printing the error.

@vicuna
Copy link
Author

vicuna commented Jun 27, 2012

Comment author: @garrigue

You could have many place that have conflicting type for sur. But ocamlc stop at the first mismatch giving the expected type and the current type. You give the current type position, why not giving also the position seen for the expected type ?

Interesting idea.
This could probably be done to some extent, when we know where the expected type comes from.
But as Damien pointed out, in many situations the expected type is actually synthesized by type inference from multiple sources, and you cannot give a single location.
Worse, it is actually difficult to detect whether you are in this situation or not (except when a complete type annotation was given).
A stronger approach, like adding location information to every type node, would require rethinking completely the .cmi format, among other things.

I'll try to think about it.
By the way, this has become completely unrelated to parsing :-)

@vicuna
Copy link
Author

vicuna commented Jun 28, 2012

Comment author: nicolas_boulay

Beginners always begins with simple feature. My ocaml code are simple module without functor or object. Most of the real type can be deducted from the .mli file. Reporting a place where you think the type is deduce will cover most of the stupid mistake. Don't try to cover 100 % of the cases, 80 % is already a big step.

For example, in a match clause, the first "|" clause defined the expected output type, if the second "|" is different, the error string will points the second case but the error could be in the first.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant