Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005768OCamlOCaml generalpublic2012-10-01 12:212012-10-16 19:12
Reportergasche 
Assigned Tofrisch 
PrioritynormalSeverityfeatureReproducibilityN/A
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version 
Target Version4.01.0+devFixed in Version4.01.0+dev 
Summary0005768: [patch] On "unbound identifier" errors, use spell-checking to suggest names present in the environment
DescriptionThe Clang compiler uses spell-checking techniques ( http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker [^] ) :

  t.c:4:13: error: no member named 'st_blocksize' in 'struct stat'; did you mean 'st_blksize'?

I think this feature is cool and could be useful as well in the OCaml compiler. I have attached a patch that implements this.

Some examples:

# mods;;
Error: Unbound value mods
Did you mean mod or modf?
# List.lengt;;
Error: Unbound value List.lengt
Did you mean length?
# open List;;
# lengt;;
Error: Unbound value lengt
Did you mean length?
# {content = 1; foo = 2};;
Error: Unbound record field label content
Did you mean contents?
# (fun (x : int lis) -> x);;
Error: Unbound type constructor lis
Did you mean list?
# Lis.lengt;;
Error: Unbound module Lis
Did you mean List?

Would you be ready to integrate such a feature? If so, which technical changes do you think would be necessary for the patch to be acceptable?


## Additional details

The patch was designed to be non-invasive, and only modifies the final error reporting function in typetexp.ml; it does not modify the semantics of the type-checker at all. I still needed to convey the current environment to the point of error reporting, so the Unbound_foo exceptions have changed to take an additional parameter.
Is it ok to change the exception definitions? (May user code be affected by such a change?). If this is unacceptable, it would be possible to change the patch to use an ugly global variable.

The last example (Lis.lengt) exhibits a slight downside of the feature: while there are two typos in the complete path, it will only report the first one. This is not due to the patch itself, but to the fact that it is grafted onto the error reporting function, after the "unbound name narrowing" of narrow_unbound_lid_error has taken place, rather than at the exception raising site. I think this is the right choice.

The heuristics of the spell-checking (when is another word considered a good suggestion?) can be fine-tuned. Currently, the patch uses the Damerau-Levensthein distance ( http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance [^] : an edit distance taking swapping of adjacent letters into account ), with a low cutoff designed to detect only obvious typos: it does no correction for identifiers of size below 2, look at distance 1 at most for below 4, distance 2 for sizes below 6 and distance 3 for larger identifiers. It returns the set of words found in the environment with the minimal distance, if it is smaller than the distance limit.

I have done some performance testing. For the real-world cases I could try the suggestion is instantaneous. I have included in the patch a test generator used to creates code with 1000 and 500_000 identifiers unqualified in the environment. With 1000 identifiers (to put it in perspective, Pervasives exports around 150 identifiers and the modules of the OCaml compiler itself have at most around 500 identifiers in the global scope), spell-checking takes 0.12s with ocamlc and 0.015s with ocamlc.opt. For 500_000 identifiers it takes 5s with ocamlc and 0.8s with ocamlc.opt. Given that this computation is only done in the case of an Unbound_foo error, I think performance is not an issue -- if it was considered to be, I could add a flag to disable the feature. The spell-checking runs after the preliminary "Error: Unbound ..." has been flushed, so even in the case of a particularly long delay the user would have the choice to interrupt the computation (as there is nothing done afterwards); I have tested that this works in the toplevel as well (without exiting the toplevel).

The patch currently doesn't handle the "Unbound type variable" error, as it relies on a different form of environment and would therefore make the code a bit more complex. I'm ready to add this as a separate patch afterwards.

The patch also does not rely on any form of type information, only names bound in the environment. One could dream of suggestions refined by type-checking, but this is outside the scope of the current, modest implementation.
TagsNo tags attached.
Attached Filespatch file icon 0001-Spell-check-unbound-names-to-suggest-fix-in-error-me.patch [^] (14,945 bytes) 2012-10-03 15:58 [Show Content]
patch file icon 0002-a-testsuite-for-the-edit_distance-function-in-utils-.patch [^] (3,979 bytes) 2012-10-03 15:58 [Show Content]

- Relationships

-  Notes
(0008182)
doligez (administrator)
2012-10-01 16:02

This has been proposed before and I think the consensus was that it's not a job for the compiler, but rather for some development environment like TypeRex. OTOH, it's hard for TypeRex to get the info it needs if the file doesn't compile (and hence doesn't generate a typed AST).
(0008184)
protz (manager)
2012-10-01 16:38

I would like to point out that the error-reporting mechanism of clang has been widely praised in every single presentation/report I've seen about clang; and that's one of the reasons why people are switching from gcc to clang. Moreover, the patch is really small and self-contained, I think this is a low-cost / big-win kind of bug.
(0008186)
meyer (developer)
2012-10-01 19:11

I think with the patch that Gabriel is proposing has my positive vote. The patch itself is not too destructive and improves a lot user experience so it's a step in the right direction.

From the side note there are other things in Clang toolchain worth to look at, also Mlton has nice schemes for typing problems.
(0008187)
frisch (developer)
2012-10-01 19:35

I haven't looked at the implementation, but in general, I believe that improvements to error messages should go to the compiler, not an external tool, even if technically possible.
(0008188)
pilki (reporter)
2012-10-01 22:47

I confirm that some are using Clang *exclusively* for error reporting.

Also, doing this kind of things in the type checker itself allows some clever things:
- type aware suggestion: if c is a record with a field foobar, and there is a variable foocar, when the user types c.foolar, you can suggest only foobar. Same for a module, or even more clever things.
- if there is only one suggestion, the type checking can keep going after replacing the identifier with the suggested one to report more errors in one pass.
(0008191)
frisch (developer)
2012-10-02 15:29
edited on: 2012-10-02 15:30

This is probably not necessary, but there are simple ways to improve performance:

 - Do not allocate a matrix: keeping the last two rows (and the current one) is enough. For repeated distance calculation with a fixed target, these rows can be allocated once and for all. (If the function returns a cutoff + 1 instead of None, one can arrange so that the lookup for suggestions does not allocate at all.)

 - Drop the longest common suffix and the longest common prefix from the two strings.

 - Use the current best distance as a cutoff.

(0008192)
gasche (developer)
2012-10-02 16:02
edited on: 2012-10-02 16:02

I had tested that matrix allocation is not the performance bottleneck (by measuring the time of the allocation alone), so I intentionally neglected the first optimization. I have not experimented with the two others, and I agree this may be worth it (but I'm not sure how to test them reliably). I'm ready to experiment with this afterwards if the feature gets accepted.

(0008196)
gasche (developer)
2012-10-03 16:01

I have slightly modified the patch to take into account:
- an error in the handling of cutoff reported by 'yoch'
- some protection against overflows if "max_int" is used as cutoff
- some style comments by Damien

I have also attached a second patch providing a testsuite for the edit_distance function.
(0008267)
frisch (developer)
2012-10-16 15:55

Thanks!

I've applied the patch to the trunk (commit 13018).
(0008269)
gasche (developer)
2012-10-16 16:16

Thanks Alain, I will be happy to make typos in the next version of OCaml...

Maybe you could also apply the patch that adds the test to the testsuite? It's not world-changing but could provide regression test against very imaginary future changes to the edit function (eg. for performance reasons).
(0008274)
frisch (developer)
2012-10-16 19:12

Ok, I've committed the test to the trunk as well (commit 13021).

- Issue History
Date Modified Username Field Change
2012-10-01 12:21 gasche New Issue
2012-10-01 12:21 gasche File Added: 0001-Spell-check-unbound-names-to-suggest-fix-in-error-me.patch
2012-10-01 16:02 doligez Note Added: 0008182
2012-10-01 16:02 doligez Status new => acknowledged
2012-10-01 16:38 protz Note Added: 0008184
2012-10-01 17:12 gasche Description Updated View Revisions
2012-10-01 19:11 meyer Note Added: 0008186
2012-10-01 19:35 frisch Note Added: 0008187
2012-10-01 22:47 pilki Note Added: 0008188
2012-10-02 15:29 frisch Note Added: 0008191
2012-10-02 15:30 frisch Note Edited: 0008191 View Revisions
2012-10-02 16:02 gasche Note Added: 0008192
2012-10-02 16:02 gasche Note Edited: 0008192 View Revisions
2012-10-03 15:58 gasche File Deleted: 0001-Spell-check-unbound-names-to-suggest-fix-in-error-me.patch
2012-10-03 15:58 gasche File Added: 0001-Spell-check-unbound-names-to-suggest-fix-in-error-me.patch
2012-10-03 15:58 gasche File Added: 0002-a-testsuite-for-the-edit_distance-function-in-utils-.patch
2012-10-03 16:01 gasche Note Added: 0008196
2012-10-16 15:55 frisch Note Added: 0008267
2012-10-16 15:55 frisch Status acknowledged => resolved
2012-10-16 15:55 frisch Fixed in Version => 4.01.0+dev
2012-10-16 15:55 frisch Resolution open => fixed
2012-10-16 15:55 frisch Assigned To => frisch
2012-10-16 16:16 gasche Note Added: 0008269
2012-10-16 19:12 frisch Note Added: 0008274


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker