Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few character codes do not match in ISO 8859-1 with an azerty keyboard #7737

Closed
vicuna opened this issue Feb 20, 2018 · 6 comments
Closed

Comments

@vicuna
Copy link

vicuna commented Feb 20, 2018

Original bug ID: 7737
Reporter: vanto
Assigned to: @Octachron
Status: resolved (set by @Octachron on 2018-02-20T22:02:48Z)
Resolution: not a bug
Priority: normal
Severity: minor
Version: 4.06.0
Category: compiler driver
Related to: #7740

Bug description

Reference manual page 110. It is written: "The current implementation interprets character codes between 128 and 255 following the ISO 8859-1 standard."
But the given codes are wrong after code 126 (7Eh).
examples:

int_of_char('£');;

  • : int = 156
    instead of 163 (A3h)

int_of_char('é');;

  • : int = 130
    instead of 233 (E9h)

... and so on.
The result is the same if I open the module Char.

open Char;;

code 'é';;

  • : int = 130 (* instead of 233 *)

code '£';;

  • : int = 156 (* instead of 163 *)
    ... and so on.
@vicuna
Copy link
Author

vicuna commented Feb 20, 2018

Comment author: @Octachron

The encoding error is on your side: you are not using the ISO 8859-1 character encoding and may be using the code page 437 which does encode 'é' and '£' as 130 and 156 respectively. In other words, when you are typing 'é', you are indeed sending '\130' to the REPL. May I encourage you to address your further inquiries to https://discuss.ocaml.org/ ?

@vicuna vicuna closed this as completed Feb 20, 2018
@vicuna
Copy link
Author

vicuna commented Feb 20, 2018

Comment author: vanto

I do not agree.
With Haskell, when I type 'é', the result is:
Prelude> import Data.Char
Prelude Data.Char> ord 'é'
233
Prelude Data.Char>

With Caml Light version 0.74, the result is:

  Caml Light version 0.74

#int_of_char (é);;

  • : int = 233

Reference manual of Caml Light. Page 12 . Book from Xavier Leroy and Pierre Weis. It is written:
"Les lettres contiennent au moins les 52 lettres majuscules et minuscules du jeu de caractères ASCII. L’implémentation peut autoriser parmi les lettres des caractères hors du jeu de caractères ASCII, c’est-à-dire de code supérieur ou égal à 128, comme par exemple les lettres accentuées dans le jeu ISO 8859–1."

Are you convinced of what I say?

@vicuna
Copy link
Author

vicuna commented Feb 21, 2018

Comment author: @Octachron

Am I wrong to think that you are using a windows terminal and that you are comparing with the graphical version of Caml Light? If so please check the codepage of your terminal (probably 850). Otherwise, it would be nice to detail your system and terminal settings.

@vicuna
Copy link
Author

vicuna commented Feb 21, 2018

Comment author: vanto

Do not focus on Caml Light.
When I use Haskell in a Windows terminal, it gives me a good answer. See above. When I use OCaml in the same terminal it gives me a wrong answer. It responds with an ASCII code and not with a code 88599-1.
It is obvious that there is a bug in OCaml in the character encoding.
I do not work with Windows 3.0

@vicuna
Copy link
Author

vicuna commented Feb 21, 2018

@vicuna
Copy link
Author

vicuna commented Feb 21, 2018

Comment author: @dra27

It's not obvious that there's a bug in OCaml, because there isn't one - you'd do better to answer the questions asked!

As @Octachron notes, the issue is not having code page 1252 selected (here foo.ml contains let c = 'é';; where the character is definitely \233).

C:\Users\DRA>chcp
Active code page: 437

C:\Users\DRA>type foo.ml
let c = '?';;
C:\Users\DRA>ocaml
        OCaml version 4.05.0

# #use "foo.ml";;
val c : char = '\233'
# let c2 = 'é';;
val c2 : char = '\130'
# #quit;;

C:\Users\DRA>chcp 1252
Active code page: 1252

C:\Users\DRA>type foo.ml
let c = 'é';;
C:\Users\DRA>ocaml
        OCaml version 4.05.0

# #use "foo.ml";;
val c : char = '\233'
# let c2 = 'é';;
val c2 : char = '\233'
# #quit;;

The Caml Light graphical application will use Code Page 1252, and hence "work". It's more time than I be bothered to invest to spin-up an x86 Windows box to be able to run the binary distribution's CAML.EXE, but I would put considerable metamoney on the behaviour of Caml Light run from a Console being the same as OCaml.

Haskell in interactive mode uses Haskeline which directly calls ReadConsoleW thus bypassing code pages completely. OCaml, via ReadFile, uses ReadConsoleA which performs code page conversion on the input. As you can see from my example, if you supply the correct input (e.g. via a file) then it performs correctly - note that even the console itself cannot display the character correctly when it bounces through 1252->437/850 conversion and back. Note that Haskell's Data.Char type is equivalent to OCaml's Uchar.t rather than char.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants