Version française
Home     About     Download     Resources     Contact us    
Browse thread
Correct way of programming a CGI script
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: tab@s...
Subject: Re: [Caml-list] Re: Rope is the new string
On Tue, Oct 09, 2007 at 06:42:32PM +0200, Loup Vaillant wrote:
> 2007/10/9, Vincent Hanquez <tab@snarc.org>:
> > On Tue, Oct 09, 2007 at 02:40:48PM +0100, Jon Harrop wrote:
> > > Out of curiosity, do your ropes handle UTF-8 and UTF-16?
> >
> > Out of curiosity, why would a string implementation (has a handle of
> > chars bundle together) has to handle UTF-X ?
> 
> My 2 cents:
> 
> It is more convenient to consider strings as characters arrays. Then,
> these characters are handled as atoms, even if they take several bytes
> in the chosen encoding. Of course, multi-byte characters must be
> supported as well.
> 
> Still, I can use byte arrays as strings. But it limits me to ASCII and
> Latin-like encodings: if I want to do UTF-X, then I have to worry
> about multi-bytes characters myself. Internationalization made hard...
> 
> I would find very convenient to have plain unicode strings (and
> chars), with appropriate scan, print, byte_array_from_string, and
> string_from_byte_array functions, one bundle per supported encoding.
> So I don't need to think about the internals of such a string.

By my question i wasn't suggesting that everybody should do
internationalization by hand.

definitely we also need some UTFstring type library (which can use rope,
string, whatever internally), with all common type of operations
(appending, finding, ...), but it's a just a specific sub case and also
a different type not compatible with strings (in OCaml terminology).

-- 
Vincent Hanquez