| Anonymous | Login | Signup for a new account | 2013-05-19 22:34 CEST | ![]() |
| Main | My View | View Issues | Change Log | Roadmap |
| View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
| ID | Project | Category | View Status | Date Submitted | Last Update | |||||||
| 0005732 | OCaml | OCaml general | public | 2012-08-17 11:25 | 2012-08-17 13:06 | |||||||
| Reporter | Ted | |||||||||||
| Assigned To | protz | |||||||||||
| Priority | normal | Severity | minor | Reproducibility | always | |||||||
| Status | resolved | Resolution | no change required | |||||||||
| Platform | Laptop | OS | Debian Unstable | OS Version | 3.2.0-3-amd64 | |||||||
| Product Version | 3.12.1 | |||||||||||
| Target Version | Fixed in Version | |||||||||||
| Summary | 0005732: The accented characters in strings are automatically uppercased | |||||||||||
| Description | (I have reproduced this bug to 3.10 version of OCaml too) A little example is worth a long speech : $ ocaml Objective Caml version 3.12.1 # "Ô, mon brûlant zéphyr doré";; - : string = "\195\148, mon br\195\187lant z\195\169phyr dor\195\169" # String.lowercase "Ô, mon brûlant zéphyr doré";; - : string = "\227\148, mon br\227\187lant z\227\169phyr dor\227\169" # String.uppercase "Ô, mon brûlant zéphyr doré";; - : string = "\195\148, MON BR\195\187LANT Z\195\169PHYR DOR\195\169" I don't know if the encoding problem is normal, but I am pretty sure that this behaviour is not : String.uppercase does nothing, which means that the system automatically transforms the letter "é" into "É", etc. This bug is present for many accented letters : # String.uppercase "éèàâôû?ãõëäöÿçùò?" = "éèàâôû?ãõëäöÿçùò?";; - : bool = true but, quite surprisingly, not for every one of them : # String.uppercase "?" = "?";; - : bool = false # String.uppercase "?" = "?";; - : bool = false This problem happens even when I do not use my usual alias (ocaml="rlwrap ocaml") or my usual shell (zsh), and this bug occurs too when compiling ocaml code with ocamlc or ocamlopt. | |||||||||||
| Tags | No tags attached. | |||||||||||
| Attached Files | ||||||||||||
Notes |
|
|
(0007950) Ted (reporter) 2012-08-17 11:28 |
The two characters that I have found for which the problem does not appear are these ones : http://fr.wikipedia.org/wiki/%E1%BA%80 [^] (does not exist in english wikipédia) http://en.wikipedia.org/wiki/%E1%BA%BC [^] |
|
(0007951) protz (manager) 2012-08-17 11:34 |
From what OCaml prints, your Ô character uses two bytes, so I guess you're inputting utf-8. OCaml still lives in the former millenium and is not utf8-compatible, so I assume these uppercase and lowercase routines only work properly on latin1-encoded strings, unfortunately :). I suggest you take a look at the Batteries project. It has a BatUTF8 module that provides some utf8 handling routines. If you need more advanced routines, Camomile is the Unicode library for OCaml. |
|
(0007952) protz (manager) 2012-08-17 11:35 |
OCaml version 4.00.0 # String.length "Ô";; - : int = 2 (If you get the same results on your machine, then you're inputting utf8). |
|
(0007953) Ted (reporter) 2012-08-17 11:45 |
It looks like I am inputting utf8 then. It does not surprise me that there is such encoding problems, but I really do not get why I got things like : # String.lowercase "é";; - : string = "\227\169" # "é";; - : string = "\195\169" Could'nt String.lowercase just ignore accented letter characters when it does not recognize them ? As I do not need to actually print anything, the strange output does not bother me much, but the strange behaviour of String.lowercase does. |
|
(0007954) dbuenzli (reporter) 2012-08-17 12:34 edited on: 2012-08-17 12:35 |
But it does recognize them, the String module interprets strings as latin-1 encoded. The behaviour is correct, in latin-1 \227\169 is the sequence 㩠which it correctly maps to \195\169 which is the sequence é. Consult the table on this page http://en.wikipedia.org/wiki/ISO_8859-1 [^] |
|
(0007955) Ted (reporter) 2012-08-17 13:06 |
Aah, I get it. Well, sorry for the "wrong" bug report, then. |
Issue History |
|||
| Date Modified | Username | Field | Change |
| 2012-08-17 11:25 | Ted | New Issue | |
| 2012-08-17 11:28 | Ted | Note Added: 0007950 | |
| 2012-08-17 11:34 | protz | Note Added: 0007951 | |
| 2012-08-17 11:35 | protz | Note Added: 0007952 | |
| 2012-08-17 11:35 | protz | Status | new => resolved |
| 2012-08-17 11:35 | protz | Resolution | open => no change required |
| 2012-08-17 11:35 | protz | Assigned To | => protz |
| 2012-08-17 11:45 | Ted | Note Added: 0007953 | |
| 2012-08-17 12:34 | dbuenzli | Note Added: 0007954 | |
| 2012-08-17 12:35 | dbuenzli | Note Edited: 0007954 | View Revisions |
| 2012-08-17 13:06 | Ted | Note Added: 0007955 | |
| Copyright © 2000 - 2011 MantisBT Group |