Go for ultimate localization!

From: Benoit Deboursetty (debourse@email.enst.fr)
Date: Tue Oct 26 1999 - 06:36:58 MET DST


Date: Tue, 26 Oct 1999 06:36:58 +0200 (MET DST)
From: Benoit Deboursetty <debourse@email.enst.fr>
To: caml-list@inria.fr
Subject: Go for ultimate localization!
In-Reply-To: <380F0157.CDBBAD7D@k-bell.com>

[ Since my post ended up being *very* long, so I put some headers and a
  summary... "rhetoric sugar" ]

--- Summary

1. INTRODUCTION
  Present situation: O'CaML allows Latin1 in identifiers, but keywords are
  still in English -> one's code is made of intermingled languages

2. ULTIMATE LANGUAGE LOCALIZATION?
  Why not localize the parser? the libraries?

3. RELEVANCE
  Is this not just another useless theory?
  [ like was said of functional programming a few years ago ]

4. CONCLUSION
  Well, time to set to work, people :)

---

On Thu, 21 Oct 1999, Mat=EDas Giovannini wrote:

> OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll > agree that my view is chauvinistic (and selfish, perhaps: I already have > "=BF=A1=E1é=ED=F3=FAu=F1=C1=C9=CD=D3=DA=DC=D1" for writing in Spanish, = why should I ask for more?), > I see no restriction in that (well, If I were Chinese, or Egiptian, I > would see things differently). What's more, the whole syntactic > apparatus of a programming language *assumes* a Latin setting, where > things make sense when read from left to right, from top to bottom; and > where punctuation is what we're used to. Programming languages suited > for a Han, or Arab, or even a Hebrew audience would have to be rethinked > from the grounds up.

1. INTRODUCTION ---------------

=09Hi,

=09Thanks for giving me this idea, Mat=EDas. "Rethought from the ground up" was the only wrong thing: you only have to change the preprocessor...

[ Since I am catching this thread in the middle, I preventively apologize for any possible redundancy with what has been said before]

=09So, O'CaML handles Latin-1 characters in the identifiers. This allows most Europeans, well at least you and me, to use their own language; but to me, it has always seemed a bit strange to write a program with keywords in English, and identifiers in French. Indeed, I feel it's really bogus when I try to read this out loud:

=09"let élément = String.sub cha=EEne where cha=EEne = "...

Do you take the French accent to read this? an English accent? Do you switch accents? How do you say "=", "égal" or "equals"? Do you translat= e every word to your mother tongue on-the-fly?

[ You know, there is something with the French and the "purity" of their language... ]

Don't think I put this example just to try to be funny: when building software in teams, it is often that you want to explain what your program does to someone that speaks your language, and that you feel uncomfortable with reading your code. I'm speaking from personal experience, I hope not to be an exception.

=09At least, there is a certain consistency in C allowing only English characters: were I in a normalization committee, I would even demand, in the C norm, that a compliant source code has everything, identifiers and comments, in American English (you must be hoping I will never be given such power).

=09Everyone is happy with O'CaML handling Latin1 within Western Europe and the US *because* the grammar differ very slightly from one country to another. That was your point, Matias. And the Indo-European kind of syntax is not universal. The syntax I know that differs most from Indo-European languages is that of Japanese, of course. And Japan is a country too often forgotten when it comes to localization. Must be difficult to deal with Japanese. Anyway, it makes this economically powerful country over-sensitive to locale-aware software (such an OS as BeOS is therefore quite popular there).

=09To sum up the present situation: half-half. Your program is half in your language, half in English. The syntax corresponds to the native language of half the number of potential users. Latin1 is meaningful to only part of the users, too.

2. ULTIMATE LANGUAGE LOCALIZATION? ----------------------------------

=09I know one of the authors of olabl is a Japanese, I would like to know whether it makes sense to have identifiers in Japanese in a language having English keywords.

=09For instance, we often give verbal identifiers to functions that perform an action. I think if the Japanese had invented the first computer languages, we would naturally put such function calls after the arguments (a bit like in RPL), because the verb is always put after its objects in Japanese. [it has nothing to do with writing right-to-left]

=09So, from my naive point of view, using for identifiers a language different from the one which the keywords are chosen from is very unnatural. Another way of seeing things would be to have different parsers for the same compiler: one with English keywords (the one that exists now), one with French keywords and a similar syntax, one with Japanese keywords and perhaps a completely different syntax, etc., each one adapting the same abstract language, "abstract ocaml", to different natural languages.

=09In other words, localizing error messages, charsets is fine: also *localizing the parser* seems more consistent to me. Of course, this raises again the issue I was speaking of in my previous post: using languages different from the "computer esperanto" in computer programs would probably block international exchanges (but translation here is just another pretty-printing -- see in 4.). I am just following the idea of localization to its logical conclusion.

=09Perhaps there would be to go further: library interfaces, after all, should also have the possibility of being localized. Why would you use a module named "String" in a French program. This word has a completely different meaning in French, and every French computer scientist sounds really suspect to a non-scientific person when he speaks of "strings" all the time. The same with "bit". It's also because of things like that, that computer people are sometimes thought of they can't communicate.

[ "un string" is only an underwear in French. And there is a very unfortunate collision between the English word "bit" and another=

word. ]

3. RELEVANCE ------------

=09So, well, can all this theory going to be useful to anyone? I really think there would be a gain in implementing such a feature. There is a pervasive, wasteful effort of translation being made by non-native English speakers when programming. It can be ridiculous when two speakers of the same language come to speaking English to each other.

=09Of course, such an extension won't probably be used in business as long as everybody speaks English at work in almost every software development group anyway.

=09But first, I think I read O'CaML was designed primarily to be an experimentation in language design. Second, it's no wonder everybody speaks English in software businesses if it is the language that lies under every computer language (problème de "la poule et l'oeuf"). Third, there is a constant motion from natural language to computer language and vice-versa... For instance, most constructs in computer languages (except in lisp, scheme,...) are based on English. Perhaps are we missing some useful constructs that could come from other natural languages.

=09Another example, more concrete. There is no simple translation in French for the expression "to match some constant against a pattern". Instead of saying "matcher" (which is what we do), we should better try and find a better word. But even if a good word had been invented, it would never be used in practice, I mean in the code of programs, and we would still use "matcher" (erk).

4. CONCLUSION -------------

=09All this may be very difficult to implement, I don't know, I am telling things from the user's point of view. This may also be a project completely different from what O'CaML wants to focus on. But couldn't the camlp4 "PreProcessorPrettyPrinter" by D. de Rauglaurde be the beginning of a solution, at least for parser localization? It would even provide translations between different localizations of the parser by pretty-printing, wouldn't it?

=09Congratulations if you've read this post until here, I await your comments on this utopical project.

=09Beno=EEt de Boursetty =09Benoit.de-Boursetty@polytechnique.org

---

PS: the camlp4 manual: http://caml.inria.fr/pub/old_caml_site/camlp4/manual/index.html

PPS: Pour "to match sthg against a pattern", existe-t-il une traduction "officielle" ? Que pensez-vous de "plaquer qqch sur un motif" ?

PPPS: I apologize for any mistakes in English (I myself am not perfectly localized) and for taking examples only in the French & Japanese languages.



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:27 MET