Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Printf and i18n
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: John Prevost <visigoth@c...>
Subject: Re: [Caml-list] Printf and i18n
>>>>> "dr" == Daniel de Rauglaudre <> writes:

    dr> For the translations, I use a lexicon file, containing a list
    dr> of entries which I want to be i18n. It is an idea I took from
    dr> Caml Light. The first string, without language, is the format
    dr> to be translated, in a vehicular language (actually English),
    dr> and I give the translation in each language I want. E.g. an
    dr> entry can be:

    dr>      hello, my name is %s\n
    dr>  en: hello, my name is %s\n
    dr>  es: buenos dias, me llamo %s\n
    dr>  fr: bonjour, je m'appelle %s\n

A good try, but unfortunately it doesn't allow for the more
significant problem, which is word order variation.  Imagine a simple
format like:

Printf.printf "%s's %s" "John" "document"

The problem here is that we have the following translations (sort of,
no character sets here):

en: X's Y
ja: X no Y
ru: Y u X

And of course, with more arguments you get more variety.

One solution is to treat your formats as functions, rather than
formats.  This cuts down on the utility somewhat, since you have to
sprintf instead of printing directly, but:

let owns = [En, fun owner owned -> Printf.sprintf "%s's %s" owner owned;
            Ja, fun owner owned -> Printf.sprintf "%s no %s" owner owned;
            Ru, fun owner owned -> Printf.sprintf "%s u %s" owned owner]

sort of gets you moving in the right direction.  Of course, you need
an automated system to handle things reasonably.

This is why many modern printf-style systems, especially those for
handling internationalization, include numbered or named slots to be
filled in.  For example, in Python:

fmt = "%(owner)s's %(owned)s"
fmt = "%(owner)s no %(owned)s"
fmt = "%(owned) u %(owner)"

can all be used as:

fmt % {'owner':'John', 'owned':'document'}

In Java:

fmt = new MessageFormat("{0}''s {1}");
fmt = new MessageFormat("{0} no {1}");
fmt = new MessageFormat("{1} u {0}");

can all be used as:

fmt.format(new Object[] {"John", "document"})

Either of these mechanisms allow reading a format from a catalog and
using it in that way.  The only way I can see doing this with
something like O'Caml's current mechanism is introducing a new format
function like printf which takes a format like:

("%s%s:{0}'s {1}" : ('a,'b,'c) format)
("%s%s:{0} no {1}" : ('a,'b,'c) format)
("%s%s:{1} u {0}" : ('a,'b,'c) format)

Use the % model up front to indicate the argument types in argument
order, then indicate what order the arguments are to be used
separately.  This is, of course, rather ugly.  A richer format
language would be preferable.  If keyword arguments can't be made to
work right when included in the type (as seems likely), perhaps a
format language like Python's, but using numbered argument positions
instead of names, so:

("%s %d" : (string -> int -> 'a,'b,'a) format)

but also

("%(1)d %(0)s" : (string -> int -> 'a, 'b, 'a) format)

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: