Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005270OCamllexing and parsingpublic2011-05-19 17:402017-03-03 16:05
Assigned To 
PlatformOSOS Version
Product Version3.12.0 
Target VersionFixed in Version 
Summary0005270: Ocaml parser error are not meaningfull most of the time
DescriptionOcaml parser did not give enought precise informations on typo error. If you forget a ';' or a 'in', the compilation finish with a "syntax error" pointing to the end of the file. The code shown is an example of code that can make loose a lot of time to debug, compare to typical other langage like java under Eclipse.
Don't forget that humain makes stupid mistakes :) Most of the time, this kind of code is debugged by commenting the code until it compiles.

There is also a problem on the type checker that gives the position of the first incoherency and not the "definition" place, where the bug could be. Both informations is needed most of the time. To debug this, each the "definition" place should be guess and verified4.

This unfriendliness behavior of the tools is a real pain for beginners.
Additional Informationtype val_t =
 | Int32 of Int32.t
 | Int24 of int
 | Int16 of int
 | Int8 of int
 | String of string

type df_t = {
 mutable h: (val_t * string) list
 mutable is_reverted: bool
 /!\ This list is used backward to be in 0(1) at the insertion. Then
everything as to be revert before printing.

let add_last (ctx:df_t) (comment:string) (x:val_t) =
 assert (not ctx.is_reverted);
 ctx.is_reverted <- false;
 ctx.h <- (x,comment)::ctx.h

let revert_before_printing (ctx:df_t) =
 assert (not ctx.is_reverted);
 ctx.is_reverted <- true;
 ctx.h <- List.rev ctx.h

let add_df_footer ctx =
 add_last ctx Int8(0x0E) "FOOTER";
 add_last ctx Int24(0x000000) "UnusedPad"
TagsNo tags attached.
Attached Files

- Relationships
child of 0005068acknowledged ocamlc/camlp4 should give better error messages for syntax errors 

-  Notes
ygrek (reporter)
2011-05-20 10:05

for the first problem there is a (standard?) solution - camlp4o
doligez (administrator)
2011-05-20 14:06

I don't get it. ocamlc reports a syntax error at the second "mutable". It's obvious that a semicolon is missing right before that keyword.

Can you give an example where it reports a syntax error at the end of the file?

Also, for beginners my advice is to use ;; everywhere, it helps the parser figure out things.

For the typing errors, it might also be an inconsistency between two uses. It's a research problem to find the best way to display type errors. But most of the time, the hard-to-understand errors are in recursive definitions, and in that case you can annotate the definitions with the expected types and get good error reports.

Lastly, if you find it hard to track down the definition of a given identifier, the -annot option and the caml-types.el emacs file can help you.
nicolas_boulay (reporter)
2011-05-20 14:48

It's obvious only if you know that 'type' need a ';'. The error points the entire following line, but the error is just before.

Sorry, I can't find a way to reduce my error case where the error is reported at the end of the file. It was each time a missing ';' or a missing 'in'.

For the type error case, the problem is that you use a function before it's definition. So if you give a wrong argument, the error will be reported to the definition of the argument and not at the faulty call. I understood that the compiler did not know which of the 2 places are faulty. But why give you only the place of the incoherency instead of given the 2 places. The error message should juste add the line/column of both place, not only the last one. Otherwise you have to review all the call site of the function.

I haven't try yet the ocaml plugin for eclipse, but todays, java errors are pointed with red underline as you type, which give you a hudge productivity.

When you see a line/column with some error of ocaml, the error could be elsewhere and you don't have any hint of the kind of error. For me, it's like coming back of the year of gcc 2.95 where header error are reported in the C file there are included.

I did not know -annot use, caml-type.el, neither camlp4o, have you a place where you describe such tools to track tricky bugs ?
nicolas_boulay (reporter)
2011-05-23 17:02
edited on: 2011-05-24 15:15

Here is a new example of stupid typo, where you can loose lot of time :
let digit_of_unsigned (u:int) =

let string_of_u8 u:int =
  assert(0<=u && u<=(1 lsl 8));
  let u1 = u mod (1 lsl 4) in
  let u2 = u / (1 lsl 4) in
  (digit_of_unsigned u2) ^ (digit_of_unsigned u1) (*ocamlc: File "", line 8, characters 2-49: Error: This expression has type string but an expression was expected of type int*)
let copy_df_t (from:df_t) (_to:df_t) =
  assert (not tfrom.is_reverted);
  assert (not _to.is_reverted); (*File "", line 57, characters 2-4: Error: Syntax error*)
  to.n <- _to.n+from.n;
  to.h <- from.h @ _to.h;*)

doligez (administrator)
2012-04-10 18:11

For the first example, it's not a stupid typo. If you specify that string_of_u8 returns an int, you should not return a string.

For the second example, you should use an emacs mode that colors the keywords, then the problem becomes evident.
weis (developer)
2012-04-11 17:12
edited on: 2012-04-12 10:22

As Damien said, please add a double semicolon at the end of each of your phrases. This way the parser would not run for ever to find the rest of the code.

Concerning you type error example, it seems to be clear enough

line 8, characters 2-4:
Error: This expression has type string but an expression was expected of type int

Since character 2-4 are exactly "u", I cannot understand what better message you are expecting from the compiler ?

Also, adding type annotation allover the place is not a very good idea in general: it is a waste of time and further data type modifications becomes cumbersome since you need to modifiy these annotations.

For your information:

* ``-annot'' is a compiler option to (automatically) annotate your program with types. This way you can read the types from the emacs editor with a simple key binding (C-x-t by default); this is of utmost help when debugging type errors.

* caml-type.el is an emacs companion package to get this type annotations in emacs. You should add it to your .emacs init file.

* camlp4 is a preprocessor/parser/pretty-printer for Caml. If you use option
-pp -camlp4o you get somewhat more relevant syntax error messages. You could use for instance

ocamlc -pp camlp4o

nicolas_boulay (reporter)
2012-04-11 17:55

Since 05-2011, i have done some progress in ocaml.

The stupid typo are always where you just edit your code, what ever the compiler said. So you need to compile often the code.

For my point of view type system is a way to declare things 2 times, and compiler check the coherency between the 2. An error is a contradiction between the 2. For a cristal clear error message, some thing as the following should be great : (*ocamlc: File "", line 8, characters 2-49: Error: This expression has type string but an expression was expected of type int, file "", line 4, characters 20-23*)

I understand that the compiler can't be magic, but it's always strange to have a reference in the source file, which is not where the error is. If you give both location, the error will be between the 2.

I have used the trick of using ";;" everywhere but some people said that this sign is deprecated. So it was not a good idea of keeping it.

When i wrote this feedback, it was to help you understand why beginners find ocaml code "hard to compile".

If you need many tools to debug effectively ocaml code, a dedicated manual should be written.

Nowadays, C compiler suggest the most common error done.
weis (developer)
2012-04-12 09:33
edited on: 2012-04-12 09:34

The end of phrase marker ";;" is not deprecated.

It is mandatory for the interactive system and optional for source files.
It is convenient not only to restrict syntax error but also to write code that can be easily cut&paste into the interactive system for rapid testing and modification.

In short: feel free to keep it if you want, it never harms and can be useful sometimes.

weis (developer)
2012-04-12 09:48
edited on: 2012-04-12 09:50

The OCaml type system was designed and developed to support programs with no type annotations at all. This is really useful and convenient: you may concentrate on the algorithms and data manipulated instead of thinking about the type of variables and expressions.

Many OCaml programmers never add type annotations to their programs, except in places where it is mandatory, namely in data type definitions and module interface declarations. Elsewhere, just let the compiler handle types.

In short: feel free to annotate your programs with the amount of type information you are comfortable with, knowing that it is not mandatory.

nicolas_boulay (reporter)
2012-04-12 09:57

I often use annotation to document the code when the name of the parameter could be confusing. This could be also the problem in .mli where only types are shown (i don't like comment that could be obsolete, without any compiler warning:). Maybe labels could help.


For information on new way to write code and algorithm, i think that the following presentation is very interesting. Look at it starting at 17'30.

It's like merging the interpreter and an editor, to show the effect of the code on some exemple. [^] at 17 min 30
weis (developer)
2012-04-12 10:20

Last note: you're right about comments that could become obsolete with no compiler warnings. Mutatis mutandis, that's exactly why type annotations can be harmful in your programs: they can become obsolete, because you renamed a type or change its definition. But then the compiler will not transparently modify the type names in the annotations you wrote: you will have to maintain these useless type annotations. Not a big deal, but a real burden and useless waste of time!
doligez (administrator)
2012-06-27 15:08

For type errors, it's not just a question of reporting "both" places, because a type error is an inconsistency that can involve an arbitrary number of pieces of the program. So it's really not easy to find an algorithm that gives good error messages.
nicolas_boulay (reporter)
2012-06-27 15:47

Type annotation can not be false silently, being annoyed by the compiler is much less a problem rather than having false comment (beside the fact that ocaml type declation is heavier graphicaly than C type declaration).

You could have many place that have conflicting type for sur. But ocamlc stop at the first mismatch giving the expected type and the current type. You give the current type position, why not giving also the position seen for the expected type ?

From my external point of view, it's look like have a Lexing.position beside each type, and giving this value when printing the error.
garrigue (manager)
2012-06-28 01:06

> You could have many place that have conflicting type for sur. But ocamlc stop at the first mismatch giving the expected type and the current type. You give the current type position, why not giving also the position seen for the expected type ?

Interesting idea.
This could probably be done to some extent, when we know where the expected type comes from.
But as Damien pointed out, in many situations the expected type is actually synthesized by type inference from multiple sources, and you cannot give a single location.
Worse, it is actually difficult to detect whether you are in this situation or not (except when a complete type annotation was given).
A stronger approach, like adding location information to every type node, would require rethinking completely the .cmi format, among other things.

I'll try to think about it.
By the way, this has become completely unrelated to parsing :-)
nicolas_boulay (reporter)
2012-06-28 09:31

Beginners always begins with simple feature. My ocaml code are simple module without functor or object. Most of the real type can be deducted from the .mli file. Reporting a place where you think the type is deduce will cover most of the stupid mistake. Don't try to cover 100 % of the cases, 80 % is already a big step.

For example, in a match clause, the first "|" clause defined the expected output type, if the second "|" is different, the error string will points the second case but the error could be in the first.

- Issue History
Date Modified Username Field Change
2011-05-19 17:40 nicolas_boulay New Issue
2011-05-20 10:05 ygrek Note Added: 0005913
2011-05-20 14:06 doligez Note Added: 0005914
2011-05-20 14:06 doligez Status new => feedback
2011-05-20 14:48 nicolas_boulay Note Added: 0005915
2011-05-23 17:02 nicolas_boulay Note Added: 0005928
2011-05-23 17:12 nicolas_boulay Note Edited: 0005928
2011-05-24 15:15 nicolas_boulay Note Edited: 0005928
2012-04-10 18:11 doligez Note Added: 0007315
2012-04-11 17:12 weis Note Added: 0007330
2012-04-11 17:55 nicolas_boulay Note Added: 0007331
2012-04-11 17:55 nicolas_boulay Status feedback => new
2012-04-12 09:33 weis Note Added: 0007338
2012-04-12 09:34 weis Note Edited: 0007338 View Revisions
2012-04-12 09:48 weis Note Added: 0007339
2012-04-12 09:50 weis Note Edited: 0007339 View Revisions
2012-04-12 09:57 nicolas_boulay Note Added: 0007340
2012-04-12 10:20 weis Note Added: 0007341
2012-04-12 10:22 weis Note Edited: 0007330 View Revisions
2012-06-27 15:08 doligez Note Added: 0007615
2012-06-27 15:08 doligez Assigned To => doligez
2012-06-27 15:08 doligez Status new => acknowledged
2012-06-27 15:47 nicolas_boulay Note Added: 0007617
2012-06-28 01:06 garrigue Note Added: 0007618
2012-06-28 09:31 nicolas_boulay Note Added: 0007619
2012-08-09 08:48 doligez Assigned To doligez =>
2013-07-29 08:05 gasche Relationship added child of 0005068
2017-02-23 16:36 doligez Category OCaml general => -OCaml general
2017-03-03 16:05 doligez Category -OCaml general => lexing and parsing

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker