New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn warning on using non-US-ASCII ISO-8859-1 characters in identifiers into error #6697
Comments
Comment author: @whitequark @gasche, please take a look at the attached patch. The patch simply removes the lexing rules corresponding to Latin-1 in identifiers, which will fall back to the rule handling all other invalid characters. |
Comment author: @gasche I think the patch is correct and would be in favor of merging, but I'll wait for a second positive opinion on this. |
Comment author: @damiendoligez #6695 is not relevant for this: we have some users who use latin-1 characters in their variable names, and the warning was introduced relatively recently. Is it already time to turn it into an error? What's the hurry anyway? |
Comment author: @whitequark I've linked #6695 as a part of a bigger effort to remove latin1-specific code. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Still an issue. |
I'm going to take a decision on this one. I believe we have users for accented letters in identifiers. Hence, we cannot remove support of Latin-1 accented letters in identifiers if we have no alternative to offer, e.g. some support for Unicode / UTF8. So, we will not do what this issue suggests. |
That's a reasonable approach; when filing this I was hoping that this would be a first step to offering UTF-8 support by default, since Latin-1 accented identifiers are invalid in UTF-8. But now that I think about it again, I think it's possible to automatically and reliably disambiguate between a Latin-1 encoded source file and an UTF-8 encoded source file, so it's not necessary to forbid Latin-1 identifiers altogether. |
Original bug ID: 6697
Reporter: @whitequark
Status: acknowledged (set by @gasche on 2014-12-19T17:23:09Z)
Resolution: open
Priority: normal
Severity: feature
Target version: 4.07.0+dev/beta2/rc1/rc2
Category: lexing and parsing
Tags: patch
Related to: #6692 #6695
Monitored by: @gasche @hcarty
Bug description
As discussed in #6695, OCaml is currently unable to turn those into proper filenames anyway.
File attachments
The text was updated successfully, but these errors were encountered: