New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no more identifiers with accented characters #5348
Comments
Comment author: poirriez same with ocaml 3.12.1 |
Comment author: @xclerc Well, I am not sure it is an OCaml issue. Your can change the terminal encoding in the "Advanced" pane |
Comment author: gerd Maybe the real issue is: ocaml should support UTF8 identifiers (instead of Latin1). Almost the whole world switched to UTF-8 in the meantime (e.g. many Linux distros use it as default now), and it becomes more and more painful that ocaml is so old-fashioned. The full solution is complicated to implement, though - ocamllex would have to be changed so it can deal with multi-byte encodings. But as a tiny step into this direction, ocaml could at least allow UTF-8 as external encoding but keep Latin1 as internal encoding. That would mean a recoding step for every identifier that is read or written. |
Comment author: @damiendoligez The real solution is to disallow accented letters in identifiers and accept only ASCII letters, but of course we cannot do that without breaking some existing programs. |
Comment author: @alainfrisch Damien: What about a warning that reports accented letters in identifiers as a deprecated feature? Gerd: It would be weird to accept UTF8 identifiers, while still parsing string literals as sequences of bytes (i.e. if we assume the source code to be utf8-encoded, String.length on a string literal would not return the length of the literal seen as a sequence of Unicode code points, unless you also change the semantics of strings, but I don't think you propose that). |
Comment author: gerd Alain: I agree that a complete solution would also include a Unicode version of "string", maybe called ustring, and with literals like U"xyz". I don't think we should redefine "string", because there is also a need for byte arrays, and we would run into endless compatibility problems. So, having both string and ustring would be the ideal world. I see that there are currently not enough resources for getting there, and the question is how many elements we can nevertheless implement. That could also mean to only deprecate accented letters at the moment. Btw, for ustring we won't need that much, given that we accept that ocaml only provides basic Unicode support (string literals, one possible representation (ustring = int array), basic input/output, ocamllex), and leave the rest (alternate representations, character classes, transformations, ...) to add-on libraries like Camomile. But anyway, I hope there is at least consensus that Unicode support is essential nowadays. The world is changing, and it has become irrelevant that Latin1 is still sufficient for most languages. |
Comment author: @damiendoligez Alain: I would like such a warning but I'm not sure we have a consensus among OCaml developers at this point. |
Comment author: @zoggy +1 for Gerd's proposition on Unicode support in ocaml distro ;-) |
Comment author: @damiendoligez What we have here are several feature wishes:
|
Comment author: @ygrek Concerning point 3 - what about adding module Bytes equal to current String and provide the compiler switch to expose only read-only access to String module. This way interested people can start migrating some code right now. |
Comment author: @dbuenzli In my opinion, a first good step would be to
The main of advantage of this is that it allows to write UTF-8 To me UTF-8 identifiers are more a curse than a benefit because I have written many programs that deal with unicode by just UTF-8 Regarding having unicode string support in the distribution. I'd |
Comment author: @damiendoligez Note: 4.01.0 adds a warning signaling Latin1 characters in identifiers as a deprecated feature. |
Comment author: @damiendoligez Note: 4.02.0 added a type for mutable byte arrays. |
Comment author: @damiendoligez What's left of the discussion is a wish for support for unicode strings. Unicode support is best left to external libraries and we already have several such libraries. |
Original bug ID: 5348
Reporter: poirriez
Status: resolved (set by @damiendoligez on 2017-03-03T14:44:53Z)
Resolution: not a bug
Priority: normal
Severity: feature
Version: 3.12.0
Category: ~DO NOT USE (was: OCaml general)
Related to: #6694
Monitored by: poirriez mehdi furuse @glondu @dbuenzli
Bug description
Just update to 3.12.0 and, on MAC OS 10.6.8 the accented characters are no longer usable:
$ ocaml
Objective Caml version 3.12.0
let carr?? x = x*x;;
Error: Illegal character (?)
The ?? was é
And under emacs:
Characters 9-10:
let carré n = n * n;;
^
Error: Illegal character (\251)
Vincent
The text was updated successfully, but these errors were encountered: