Version française
Home     About     Download     Resources     Contact us    
Browse thread
Correct way of programming a CGI script
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Loup Vaillant <loup.vaillant@g...>
Subject: Re: [Caml-list] Re: Rope is the new string
2007/10/9, Vincent Hanquez <tab@snarc.org>:
> On Tue, Oct 09, 2007 at 02:40:48PM +0100, Jon Harrop wrote:
> > Out of curiosity, do your ropes handle UTF-8 and UTF-16?
>
> Out of curiosity, why would a string implementation (has a handle of
> chars bundle together) has to handle UTF-X ?

My 2 cents:

It is more convenient to consider strings as characters arrays. Then,
these characters are handled as atoms, even if they take several bytes
in the chosen encoding. Of course, multi-byte characters must be
supported as well.

Still, I can use byte arrays as strings. But it limits me to ASCII and
Latin-like encodings: if I want to do UTF-X, then I have to worry
about multi-bytes characters myself. Internationalization made hard...

I would find very convenient to have plain unicode strings (and
chars), with appropriate scan, print, byte_array_from_string, and
string_from_byte_array functions, one bundle per supported encoding.
So I don't need to think about the internals of such a string.

Loup Vaillant