Re: localization, internationalization and Caml

From: Matas Giovannini (matias@k-bell.com)
Date: Thu Oct 21 1999 - 14:05:00 MET DST


Date: Thu, 21 Oct 1999 09:05:00 -0300
From: =?iso-8859-1?Q?Mat=EDas?= Giovannini <matias@k-bell.com>
To: caml-list@inria.fr
Subject: Re: localization, internationalization and Caml

Gerd Stolpmann wrote:
>
> On Tue, 19 Oct 1999, John Skaller wrote:
> >Gerd Stolpmann wrote:
> >> The enlarged character sets become more and more important, and it is only a
> >> matter of time until every piece of software which wants to be taken seriously
> >> can process them, even a dumb terminal or simple text editor. So you will be
> >> able to put accented characters into your comments, and you will see them as
> >> such even if you 'cat' the program text to the terminal or printer; this will
> >> work everywhere...
> >
> > Yes. This time is not here yet, but it will come soon that
> >international support is mandatory for all large software purchases
> >by governments and large corporations.
>
> I do not believe that this will be the driving force because the current
> solutions exist, and it is VERY expensive to replace them. It is even cheaper
> to replace a language than a character set/encoding. Looks like another Year
> 2000 but without deadline.

I still don't understand the point of this discussion. As a MacOS
programmer of many years, I tend to view localization and
internationalization as tasks best performed by the operating system, or
at least by pluggable modules. This discussion of patching l12n and i18n
functions *into* OCaml is, to me at least, losing direction.

OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll
agree that my view is chauvinistic (and selfish, perhaps: I already have
"u" for writing in Spanish, why should I ask for more?),
I see no restriction in that (well, If I were Chinese, or Egiptian, I
would see things differently). What's more, the whole syntactic
apparatus of a programming language *assumes* a Latin setting, where
things make sense when read from left to right, from top to bottom; and
where punctuation is what we're used to. Programming languages suited
for a Han, or Arab, or even a Hebrew audience would have to be rethinked
from the grounds up.

On the other hand, OCaml provides a String type that *can be* seen as a
variable-length sequence of uninterpreted bytes. We have uninterpreted
bytes! It's all we need to build whatever I18NString type we may need.
What is missing is *library* facilities to abstract that view into a
full-fledged i18n machinery. Of course, there's a problem with the
manipulation of 32-bit integer values, but if used with care, the Nat
datatype could serve perfectly well as the underlying, low-level datatype.

Which makes me think, John, you already have variable-length int arrays.
Nat's are as unsafe as they get :-)

Regards,
Matas.

-- 
I got your message. I couldn't read it. It was a cryptogram.
-- Laurie Anderson



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:27 MET