Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Q: multibyte encoding for CJK
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jun P. FURUSE <Jun.Furuse@i...>
Subject: Re: [Caml-list] Q: multibyte encoding for CJK
Hi,

> When I tested mutibyte variables in caml-light,
> it showed "Illegal character".
> 
> Do you have any Idea
> how to use multibyte variable for Chinese, Japan, Korean
> in caml-light or ocaml?

Camllight (and O'Caml) is not designed for multibyte Asian languages.
In Camllight, the identifiers (variables) must begin with 
an "alphabet" followed by alphabets, numbers, _, or '.

The "alphabets" are A-Z, a-z and the accented characters like
&aacute; &ccedil; (in the HTML encoding).

However, if you have enough luck, you can still use your Asian
keywords. The condition is: you must use EUC (= extended unix code) 
encoding, and your identifier cannot contain any character code except
0xc0-0xd6 0xd8-0xf6 0xf8-0xff in Unix... (The legal upper-byte
characters for identifiers are restricted to the European accented
alphabets.)

Well, as far as I know, this means that the use of Japanese
identifiers is practically impossible. I am not an expert of Asian
encodings, but I am afraid that so do Chinese and Korean.

BTW, the use of your language inside strings "..." has no problem, 
if you use EUC encoding. But of course you will have trouble with 
string_length, sub_string, etc... 

Hope this helps,
--
JPF
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr