Re: localization, internationalization and Caml

From: skaller (skaller@maxtal.com.au)
Date: Fri Oct 15 1999 - 00:20:20 MET DST


Date: Fri, 15 Oct 1999 08:20:20 +1000
From: skaller <skaller@maxtal.com.au>
To: STARYNKEVITCH Basile <Basile.Starynkevitch@cea.fr>
Subject: Re: localization, internationalization and Caml

STARYNKEVITCH Basile wrote:
>
> By the way, I more and more believe that the printf interface is (in C
> as in Ocaml) a big mistake (which could easily be avoided in Ocaml,
> thanks to it typing)

        I agree but ..
 
> We should code
>
> print [Int 2; String " < "; Float 3.14]
>
> instead of
>
> printf "%d < %g" 2 3.14

        However, I do not agree with the solution.
The correct method, IMHO, is to provide some proper formatting
functions (ocamls are plain WRONG!) such as

        formatted_string_of_int justify width value

[where justify is LeftSpace | RightSpace | LeftZero]

        and then use the power of functional programming
to create output strings. {the above is only a quick exemplary
interface,
not a well considered one]
 
> Again, I am *not* asking for localization in Ocaml, but if somebody
> needs it (I don't) I still hope it would be implemented better than in
> C. And I think that Unicode would be more useful than localization.

        Please, ISO10646 not unicode.
We have International Standards. There is a lot of work to be done in
internationalisation. If it is worth doing, it is worth doing right.

        The current 'support' for 8 bit characters in ocaml should be
deprecated immediately. It is an extremely bad thing to have, since
Latin-1 et al are archaic 8 bit standards incompatible with the
international standard for ISO10646 communication, namely
the UTF-8 encoding. Yes, I know Latin-1 is useful now for French.
The way forward may well be to provide an input filter to convert
Latin-1 (or any other encoding) to UTF8, and have ocaml process that.
This requires almost no changes to the compiler: the design should
open the set of characters acceptable in identifiers, probably
to some subset of the set recommended in one of the ISO10646 related
documents; the other change required is to accept \uXXXX and \UXXXXXXXX
escapes in strings. String processing functions should generally
continue to be 8 bit [per octet]: full internationalisation of client
string handling functions is a very complex, non-trivial, task]



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:27 MET