Re: localization, internationalization and Caml

From: Francis Dupont (Francis.Dupont@inria.fr)
Date: Sun Oct 17 1999 - 17:54:47 MET DST


Message-Id: <199910171554.RAA18696@givry.inria.fr>
From: Francis Dupont <Francis.Dupont@inria.fr>
To: skaller <skaller@maxtal.com.au>
Subject: Re: localization, internationalization and Caml
Date: Sun, 17 Oct 1999 17:54:47 +0200

 In your previous mail you wrote:

> The current 'support' for 8 bit characters in ocaml should be
> deprecated immediately. It is an extremely bad thing to have, since
> Latin-1 et al are archaic 8 bit standards incompatible with the
> international standard for ISO10646 communication, namely
> the UTF-8 encoding.
>
> => there is a rather strong opposition against UTF-8 in France
> because it is not a natural encoding (ie. if ASCII maps to ASCII
> it is not the case for ISO 8859-* characters, imagine a new UTF-X
> encoding maps ASCII to strange things and you'd be able to understand
> our concern).
   
           I do understand the concern, but the decision on
   the International Standards has been made.

=> this is not so obvious because there are other encoding (UTF-X)
without this kind of problems. I'll send this thread to a colleague
who tried to get something better than UTF-8 at the IETF (but he was
too late).

> Yes, I know Latin-1 is useful now for French.
>
> => it is more than useful, Latin-1 (soon ISO IS 8859-15) is necessary
> if you need really readable texts in French.
   
           No, what you mean is that with _current technology_
   there is plenty of support for 8 bit characters, using code pages,
   so that Latin-1 is well supported.
   
=> yes, for instance you have a reasonable set of fonts.

           For example, there are a lot of text editors that
   accept 8 bit characters, and even permit switching code pages.
   There are almost none that work with ISO10646 or unicode,
   let alone accept UTF-8 encoding. (Yudit is the only one I know of).
   
=> I'd like to get some free ISO10646/Unicode fonts. I believe
without them ISO10646/Unicode will not be accepted by users.

           I agree that this is a problem, but supporting
   Latin-1, or any other archaic standard, is not going
   to help move forward.

=> Latin-1 is not so archaic (it should be old enough in order to
become archaic :-).

   It is bad enough that most vendors
   only support Unicode, which is a small, almost filled,
   16 bit subset of the full 31 bit ISO-10646 Standard.
    
=> Unicode is not so supported...
   
           I believe people with international concerns can work
   together no matter what their native language. Some English
   speakers may be concerned, some, like me, are somewhat
   embarrased to be non-fluent in _any_ other language.
   [I speak a smattering of high school German]
   
=> It is great than English speakers support internationalization
but we need other language speakers in order to get an as complete
as possible one. For instance where is the first character of a string?
An Arabic speaker can easily show to them this is not so obvious.

           However, Australia, where I live, has migrants
   from all over the world and support for many languages
   is an important issue here. Particularly Asian languages.

=> Asian languages seem hard and we can't ignore one third of the world...

Regards

Francis.Dupont@inria.fr



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:27 MET