New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The accented characters in strings are automatically uppercased #5732
Comments
Comment author: Ted The two characters that I have found for which the problem does not appear are these ones : http://fr.wikipedia.org/wiki/%E1%BA%80 (does not exist in english wikipédia) |
Comment author: @protz From what OCaml prints, your Ô character uses two bytes, so I guess you're inputting utf-8. OCaml still lives in the former millenium and is not utf8-compatible, so I assume these uppercase and lowercase routines only work properly on latin1-encoded strings, unfortunately :). I suggest you take a look at the Batteries project. It has a BatUTF8 module that provides some utf8 handling routines. If you need more advanced routines, Camomile is the Unicode library for OCaml. |
Comment author: @protz OCaml version 4.00.0 String.length "Ô";;
(If you get the same results on your machine, then you're inputting utf8). |
Comment author: Ted It looks like I am inputting utf8 then. It does not surprise me that there is such encoding problems, but I really do not get why I got things like : String.lowercase "é";;
"é";;
Could'nt String.lowercase just ignore accented letter characters when it does not recognize them ? As I do not need to actually print anything, the strange output does not bother me much, but the strange behaviour of String.lowercase does. |
Comment author: @dbuenzli But it does recognize them, the String module interprets strings as latin-1 encoded. The behaviour is correct, in latin-1 \227\169 is the sequence 㩠which it correctly maps to \195\169 which is the sequence é. Consult the table on this page http://en.wikipedia.org/wiki/ISO_8859-1 |
Comment author: Ted Aah, I get it. Well, sorry for the "wrong" bug report, then. |
Original bug ID: 5732
Reporter: Ted
Assigned to: @protz
Status: closed (set by @xavierleroy on 2016-12-07T10:37:03Z)
Resolution: not a bug
Priority: normal
Severity: minor
Platform: Laptop
OS: Debian Unstable
OS Version: 3.2.0-3-amd64
Version: 3.12.1
Category: ~DO NOT USE (was: OCaml general)
Child of: #6694
Bug description
(I have reproduced this bug to 3.10 version of OCaml too)
A little example is worth a long speech :
$ ocaml
Objective Caml version 3.12.1
"Ô, mon brûlant zéphyr doré";;
String.lowercase "Ô, mon brûlant zéphyr doré";;
String.uppercase "Ô, mon brûlant zéphyr doré";;
I don't know if the encoding problem is normal, but I am pretty sure that this behaviour is not : String.uppercase does nothing, which means that the system automatically transforms the letter "é" into "É", etc. This bug is present for many accented letters :
String.uppercase "éèàâôû?ãõëäöÿçùò?" = "éèàâôû?ãõëäöÿçùò?";;
but, quite surprisingly, not for every one of them :
String.uppercase "?" = "?";;
String.uppercase "?" = "?";;
This problem happens even when I do not use my usual alias (ocaml="rlwrap ocaml") or my usual shell (zsh), and this bug occurs too when compiling ocaml code with ocamlc or ocamlopt.
The text was updated successfully, but these errors were encountered: