English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Re: localization, internationalization and Caml
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 1999-10-28 (17:05)
From: Benoit Deboursetty <debourse@e...>
Subject: Go for ultimate localization!
[ Since my post ended up being *very* long, so I put some headers and a
  summary... "rhetoric sugar" ]

--- Summary

  Present situation: O'CaML allows Latin1 in identifiers, but keywords are
  still in English -> one's code is made of intermingled languages

  Why not localize the parser? the libraries?

  Is this not just another useless theory?
  [ like was said of functional programming a few years ago ]

  Well, time to set to work, people :)


On Thu, 21 Oct 1999, Matías Giovannini wrote:

> OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll
> agree that my view is chauvinistic (and selfish, perhaps: I already have
> "¿¡áéíóúuñÁÉÍÓÚÜÑ" for writing in Spanish, why should I ask for more?),
> I see no restriction in that (well, If I were Chinese, or Egiptian, I
> would see things differently). What's more, the whole syntactic
> apparatus of a programming language *assumes* a Latin setting, where
> things make sense when read from left to right, from top to bottom; and
> where punctuation is what we're used to. Programming languages suited
> for a Han, or Arab, or even a Hebrew audience would have to be rethinked
> from the grounds up.



	Thanks for giving me this idea, Matías. "Rethought from the
ground up" was the only wrong thing: you only have to change the

  [ Since I am catching this thread in the middle, I preventively
    apologize for any possible redundancy with what has been said before]

	So, O'CaML handles Latin-1 characters in the identifiers. This
allows most Europeans, well at least you and me, to use their own
language; but to me, it has always seemed a bit strange to write a program
with keywords in English, and identifiers in French. Indeed, I feel
it's really bogus when I try to read this out loud:

	"let élément = String.sub chaîne where chaîne = "...

Do you take the French accent to read this? an English accent? Do you
switch accents? How do you say "=", "égal" or "equals"? Do you translate
every word to your mother tongue on-the-fly?

  [ You know, there is something with the French and the "purity" of their
    language... ]

Don't think I put this example just to try to be funny: when building
software in teams, it is often that you want to explain what your program
does to someone that speaks your language, and that you feel uncomfortable
with reading your code. I'm speaking from personal experience, I hope not
to be an exception.

	At least, there is a certain consistency in C allowing only
English characters: were I in a normalization committee, I would even
demand, in the C norm, that a compliant source code has everything,
identifiers and comments, in American English (you must be hoping I will
never be given such power).

	Everyone is happy with O'CaML handling Latin1 within Western
Europe and the US *because* the grammar differ very slightly from one
country to another. That was your point, Matias. And the Indo-European
kind of syntax is not universal. The syntax I know that differs most from
Indo-European languages is that of Japanese, of course. And Japan is a
country too often forgotten when it comes to localization. Must be
difficult to deal with Japanese. Anyway, it makes this economically
powerful country over-sensitive to locale-aware software (such an OS as
BeOS is therefore quite popular there).

	To sum up the present situation: half-half. Your program is half
in your language, half in English. The syntax corresponds to the native
language of half the number of potential users. Latin1 is meaningful to
only part of the users, too.


	I know one of the authors of olabl is a Japanese, I would like to
know whether it makes sense to have identifiers in Japanese in a language
having English keywords.

	For instance, we often give verbal identifiers to functions that
perform an action. I think if the Japanese had invented the first computer
languages, we would naturally put such function calls after the arguments
(a bit like in RPL), because the verb is always put after its objects in
Japanese. [it has nothing to do with writing right-to-left]

	So, from my naive point of view, using for identifiers a language
different from the one which the keywords are chosen from is very
unnatural. Another way of seeing things would be to have different parsers
for the same compiler: one with English keywords (the one that exists
now), one with French keywords and a similar syntax, one with Japanese
keywords and perhaps a completely different syntax, etc., each one
adapting the same abstract language, "abstract ocaml", to different
natural languages.

	In other words, localizing error messages, charsets is fine: also
*localizing the parser* seems more consistent to me. Of course, this
raises again the issue I was speaking of in my previous post: using
languages different from the "computer esperanto" in computer programs
would probably block international exchanges (but translation here is just
another pretty-printing -- see in 4.). I am just following the idea of
localization to its logical conclusion.

	Perhaps there would be to go further: library interfaces, after
all, should also have the possibility of being localized. Why would you
use a module named "String" in a French program. This word has a
completely different meaning in French, and every French computer
scientist sounds really suspect to a non-scientific person when he speaks
of "strings" all the time. The same with "bit". It's also because of
things like that, that computer people are sometimes thought of they
can't communicate.

  [ "un string" is only an underwear in French. And there is a
    very unfortunate collision between the English word "bit" and another
    word. ]


	So, well, can all this theory going to be useful to anyone? I
really think there would be a gain in implementing such a feature. There
is a pervasive, wasteful effort of translation being made by non-native
English speakers when programming. It can be ridiculous when two speakers
of the same language come to speaking English to each other.

	Of course, such an extension won't probably be used in business as
long as everybody speaks English at work in almost every software
development group anyway.

	But first, I think I read O'CaML was designed primarily to be an
experimentation in language design. Second, it's no wonder everybody
speaks English in software businesses if it is the language that lies
under every computer language (problème de "la poule et l'oeuf"). Third,
there is a constant motion from natural language to computer language and
vice-versa... For instance, most constructs in computer languages (except
in lisp, scheme,...) are based on English. Perhaps are we missing some
useful constructs that could come from other natural  languages.

	Another example, more concrete. There is no simple translation in
French for the expression "to match some constant against a pattern".
Instead of saying "matcher" (which is what we do), we should better try
and find a better word. But even if a good word had been invented, it
would never be used in practice, I mean in the code of programs, and we
would still use "matcher" (erk).


	All this may be very difficult to implement, I don't know, I am
telling things from the user's point of view. This may also be a project
completely different from what O'CaML wants to focus on. But couldn't the
camlp4 "PreProcessorPrettyPrinter" by D. de Rauglaurde be the beginning of
a solution, at least for parser localization? It would even provide
translations between different localizations of the parser by
pretty-printing, wouldn't it?

	Congratulations if you've read this post until here, I await your
comments on this utopical project.

	Benoît de Boursetty


PS: the camlp4 manual:

PPS: Pour "to match sthg against a pattern", existe-t-il une traduction
     "officielle" ? Que pensez-vous de "plaquer qqch sur un motif" ?

PPPS: I apologize for any mistakes in English (I myself am not perfectly
      localized) and for taking examples only in the French & Japanese