Re: localization, internationalization and Caml

From: Matías Giovannini (
Date: Thu Oct 21 1999 - 18:27:41 MET DST

Date: Thu, 21 Oct 1999 13:27:41 -0300
From: =?iso-8859-1?Q?Mat=EDas?= Giovannini <>
Subject: Re: localization, internationalization and Caml

skaller wrote:
> Matías Giovannini wrote:
> >What's more, the whole syntactic
> > apparatus of a programming language *assumes* a Latin setting, where
> > things make sense when read from left to right, from top to bottom; and
> > where punctuation is what we're used to. Programming languages suited
> > for a Han, or Arab, or even a Hebrew audience would have to be rethinked
> > from the grounds up.
> Actually, no. Most of these peoples learn English and learn
> computing, if they are to work with computers. But they still wish
> to use comments, strings, and identifiers in their native script.

Strings can be localized with a package mechanism, à la Java. I don't
like hardwired strings in code, they're a maintenance nightmare (not
that I always abide by my own rule :-)

> Have you ever seen a Japanese program? I have.
> Quite an interesting challenge: normal C/C++ code, with
> Latin characters encoding Japanese character names in identifiers,
> and actual Japanese characters in comments and strings.

I agree that comments should be written in the language most suited to
the intended audience (I normally comment my code in English, unless I
know I wnat someone else to maintain it, in which case I comment it in Spanish.)

> > On the other hand, OCaml provides a String type that *can be* seen as a
> > variable-length sequence of uninterpreted bytes.
> Yes. What ocaml does not provide is a way of encoding
> extended characters -- \uXXXX \UXXXXXXXXX in strings, or in identifiers.

No need to. Use \HH\LL. Again, what OCaml does is sensible, if crude.

> >Of course, there's a problem with the
> > manipulation of 32-bit integer values, but if used with care, the Nat
> > datatype could serve perfectly well as the underlying, low-level datatype.
> >
> > Which makes me think, John, you already have variable-length int arrays.
> But they're not standard (yet).

They are! Don't be put off by its status as "experimental feature".
Nat's been around since CamlLight. You could even use it as a template
implementation of unsafe longint varlen arrays and link a custom
toplevel. Yet again, OCaml provides the tools.

> So there are TWO issues -- one is to make ocaml itself
> ISO10646 aware (i.e., the compiler), and the other is to provide
> users with libraries to manipulate extended characters.

I think a more realistic goal would be making OCaml ISO10646-tolerant in
comments. Perhaps adding real conditional compilation and transparent
comments would suffice.

Again, anyone can download the source code and modify OCaml to suit his
tastes. OCaml's goal is not to be a model of i18n awareness, but a
platform for experimenting with types in a functional setting. It
happens that OCaml is open enough, and extensible enough, and efficient
enough to make a good i18n effort possible, and that is a tribute to its
success as strongly-typed, imperative, fast functional language.

> Please note: neither of these features would be optional,
> were ocaml to be submitted for ISO standardisation. ISO directives
> require all ISO languages to upgrade to provide international
> support. I know ocaml isn't an ISO language, but I think the
> basic intent is sound. [In some sense, ocaml is already a leader,
> accepting Latin-1 characters when other languages only allowed ASCII]

The implementors have made clear in more than one occasion that they're
not interested in making OCaml a standard language (remember the thread
"How to convince management?"). But don't take my word for it, ask Pierre.

I got your message. I couldn't read it. It was a cryptogram.
-- Laurie Anderson

This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:27 MET