Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn warning on using non-US-ASCII ISO-8859-1 characters in identifiers into error #6697

Closed
vicuna opened this issue Dec 8, 2014 · 8 comments

Comments

@vicuna
Copy link

vicuna commented Dec 8, 2014

Original bug ID: 6697
Reporter: @whitequark
Status: acknowledged (set by @gasche on 2014-12-19T17:23:09Z)
Resolution: open
Priority: normal
Severity: feature
Target version: 4.07.0+dev/beta2/rc1/rc2
Category: lexing and parsing
Tags: patch
Related to: #6692 #6695
Monitored by: @gasche @hcarty

Bug description

As discussed in #6695, OCaml is currently unable to turn those into proper filenames anyway.

File attachments

@vicuna
Copy link
Author

vicuna commented Dec 19, 2014

Comment author: @whitequark

@gasche, please take a look at the attached patch.

The patch simply removes the lexing rules corresponding to Latin-1 in identifiers, which will fall back to the rule handling all other invalid characters.

@vicuna
Copy link
Author

vicuna commented Dec 19, 2014

Comment author: @gasche

I think the patch is correct and would be in favor of merging, but I'll wait for a second positive opinion on this.

@vicuna
Copy link
Author

vicuna commented Jan 13, 2015

Comment author: @damiendoligez

#6695 is not relevant for this: we have some users who use latin-1 characters in their variable names, and the warning was introduced relatively recently. Is it already time to turn it into an error? What's the hurry anyway?

@vicuna
Copy link
Author

vicuna commented Jan 13, 2015

Comment author: @whitequark

I've linked #6695 as a part of a bigger effort to remove latin1-specific code.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 11, 2020
@whitequark
Copy link
Member

Still an issue.

@xavierleroy
Copy link
Contributor

I'm going to take a decision on this one. I believe we have users for accented letters in identifiers. Hence, we cannot remove support of Latin-1 accented letters in identifiers if we have no alternative to offer, e.g. some support for Unicode / UTF8. So, we will not do what this issue suggests.

@whitequark
Copy link
Member

That's a reasonable approach; when filing this I was hoping that this would be a first step to offering UTF-8 support by default, since Latin-1 accented identifiers are invalid in UTF-8. But now that I think about it again, I think it's possible to automatically and reliably disambiguate between a Latin-1 encoded source file and an UTF-8 encoded source file, so it's not necessary to forbid Latin-1 identifiers altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants