Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007737OCamlcompiler driverpublic2018-02-20 22:432018-02-21 16:46
Assigned Tooctachron 
StatusresolvedResolutionno change required 
PlatformOSOS Version
Product Version4.06.0 
Target VersionFixed in Version 
Summary0007737: A few character codes do not match in ISO 8859-1 with an azerty keyboard
DescriptionReference manual page 110. It is written: "The current implementation interprets character codes between 128 and 255 following the ISO 8859-1 standard."
But the given codes are wrong after code 126 (7Eh).
# int_of_char('£');;
- : int = 156
instead of 163 (A3h)

# int_of_char('é');;
- : int = 130
instead of 233 (E9h)

... and so on.
The result is the same if I open the module Char.
# open Char;;
# code 'é';;
- : int = 130 (* instead of 233 *)
# code '£';;
- : int = 156 (* instead of 163 *)
... and so on.
TagsNo tags attached.
Attached Files

- Relationships
related to 0007740resolved ISO-Latin 1 characters with Linux does not work version 4.02.3-9 for Debian 

-  Notes
octachron (developer)
2018-02-20 23:02

The encoding error is on your side: you are not using the ISO 8859-1 character encoding and may be using the code page 437 which does encode 'é' and '£' as 130 and 156 respectively. In other words, when you are typing 'é', you are indeed sending '\130' to the REPL. May I encourage you to address your further inquiries to [^] ?
vanto (viewer)
2018-02-20 23:30

I do not agree.
With Haskell, when I type 'é', the result is:
Prelude> import Data.Char
Prelude Data.Char> ord 'é'
Prelude Data.Char>

With Caml Light version 0.74, the result is:
> Caml Light version 0.74

#int_of_char (`é`);;
- : int = 233

Reference manual of Caml Light. Page 12 . Book from Xavier Leroy and Pierre Weis. It is written:
"Les lettres contiennent au moins les 52 lettres majuscules et minuscules du jeu de caractères ASCII. L’implémentation peut autoriser parmi les lettres des caractères hors du jeu de caractères ASCII, c’est-à-dire de code supérieur ou égal à 128, comme par exemple les lettres accentuées dans le jeu ISO 8859–1."

Are you convinced of what I say?
octachron (developer)
2018-02-21 01:47

Am I wrong to think that you are using a windows terminal and that you are comparing with the graphical version of Caml Light? If so please check the codepage of your terminal (probably 850). Otherwise, it would be nice to detail your system and terminal settings.
vanto (viewer)
2018-02-21 12:14

Do not focus on Caml Light.
When I use Haskell in a Windows terminal, it gives me a good answer. See above. When I use OCaml in the same terminal it gives me a wrong answer. It responds with an ASCII code and not with a code 88599-1.
It is obvious that there is a bug in OCaml in the character encoding.
I do not work with Windows 3.0
vanto (viewer)
2018-02-21 14:09

For Code Pages, please look at these two addresses. [^] [^]
dra (developer)
2018-02-21 16:46

It's not obvious that there's a bug in OCaml, because there isn't one - you'd do better to answer the questions asked!

As @octachron notes, the issue is not having code page 1252 selected (here contains `let c = 'é';;` where the character is definitely \233).

Active code page: 437

let c = '?';;
        OCaml version 4.05.0

# #use "";;
val c : char = '\233'
# let c2 = 'é';;
val c2 : char = '\130'
# #quit;;

C:\Users\DRA>chcp 1252
Active code page: 1252

let c = 'é';;
        OCaml version 4.05.0

# #use "";;
val c : char = '\233'
# let c2 = 'é';;
val c2 : char = '\233'
# #quit;;

The Caml Light graphical application will use Code Page 1252, and hence "work". It's more time than I be bothered to invest to spin-up an x86 Windows box to be able to run the binary distribution's CAML.EXE, but I would put considerable metamoney on the behaviour of Caml Light run from a Console being the same as OCaml.

Haskell in interactive mode uses Haskeline which directly calls ReadConsoleW thus bypassing code pages completely. OCaml, via ReadFile, uses ReadConsoleA which performs code page conversion on the input. As you can see from my example, if you supply the correct input (e.g. via a file) then it performs correctly - note that even the console itself cannot display the character correctly when it bounces through 1252->437/850 conversion and back. Note that Haskell's Data.Char type is equivalent to OCaml's Uchar.t rather than char.

- Issue History
Date Modified Username Field Change
2018-02-20 22:43 vanto New Issue
2018-02-20 23:02 octachron Note Added: 0018891
2018-02-20 23:02 octachron Status new => resolved
2018-02-20 23:02 octachron Resolution open => no change required
2018-02-20 23:02 octachron Assigned To => octachron
2018-02-20 23:30 vanto Note Added: 0018892
2018-02-21 01:47 octachron Note Added: 0018893
2018-02-21 12:14 vanto Note Added: 0018897
2018-02-21 14:09 vanto Note Added: 0018898
2018-02-21 15:52 xleroy Relationship added related to 0007740
2018-02-21 16:46 dra Note Added: 0018900

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker