Message-Id: <199910150826.KAA12903@givry.inria.fr>
From: Francis Dupont <Francis.Dupont@inria.fr>
To: skaller <skaller@maxtal.com.au>
Subject: Re: localization, internationalization and Caml
Date: Fri, 15 Oct 1999 10:26:12 +0200
In your previous mail you wrote:
The current 'support' for 8 bit characters in ocaml should be
deprecated immediately. It is an extremely bad thing to have, since
Latin-1 et al are archaic 8 bit standards incompatible with the
international standard for ISO10646 communication, namely
the UTF-8 encoding.
=> there is a rather strong opposition against UTF-8 in France
because it is not a natural encoding (ie. if ASCII maps to ASCII
it is not the case for ISO 8859-* characters, imagine a new UTF-X
encoding maps ASCII to strange things and you'd be able to understand
our concern).
Yes, I know Latin-1 is useful now for French.
=> it is more than useful, Latin-1 (soon ISO IS 8859-15) is necessary
if you need really readable texts in French.
The way forward may well be to provide an input filter to convert
Latin-1 (or any other encoding) to UTF8, and have ocaml process that.
=> my problem is the output of the filter will be no more readable when
I've put too much French in the program (in comments for instance).
This requires almost no changes to the compiler: the design should
open the set of characters acceptable in identifiers, probably
to some subset of the set recommended in one of the ISO10646 related
documents; the other change required is to accept \uXXXX and \UXXXXXXXX
escapes in strings. String processing functions should generally
continue to be 8 bit [per octet]: full internationalisation of client
string handling functions is a very complex, non-trivial, task]
=> I believe internationalization should not be done by countries
where English is the only used language: this is at least awkward...
Regards
This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:27 MET