Version française
Home     About     Download     Resources     Contact us    
Browse thread
Correct way of programming a CGI script
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: tab@s...
Subject: Re: [Caml-list] Re: Rope is the new string
On Tue, Oct 09, 2007 at 07:32:25PM +0200, Loup Vaillant wrote:
> > definitely we also need some UTFstring type library (which can use rope,
> > string, whatever internally), with all common type of operations
> > (appending, finding, ...), but it's a just a specific sub case and also
> > a different type not compatible with strings (in OCaml terminology).
> 
> Then, we should have both byte arrays (the native Ocaml strings), and
> unicode strings. We will also need proper syntactic sugar for unicode
> strings. Operators, and literal values (like #"example"). Only then,
> ropes could feel like native strings --and be useful as such.

not sure If i see your point here, since your are mixing rope and
unicode. however I think we are missing some other type of string
implementation (maybe rope) *along* the current implementation of
string.

while we also miss unicode support somehow integrated, what
implementation of the underlaying basic byte string is used, is
irrevelant.

> > [...] it's a just a specific sub case [...]
> 
> Internationalization is, mere text crunching is not. (You meant that,
> right?) With properly interfaced unicode strings, I can do my text
> crunching without worrying about internationalization, and with no
> programming overhead. Then, when (if) I have to internationalize, it
> is much easier.

Absolutely. What I meant basicly resume into, that unicode strings are
just a subset of strings (as array of bytes). you can store a unicode
string in a byte string, whereas you can't store a byte string into a
unicode string.

i want a UTF library to be able to do something like:

type ustring = unicode_type * string
of_string: string -> ustring (* raise if not unicode compliant *)
to_string: ustring -> string
append: ustring -> ustring -> ustring
...etc

that way when I'm manipulating unicode string, i won't try to append a
binary string to a unicode string. I can code safely with my unicode
string (whatever the format utf-{8..32}), and certainly expect the type
system to complain loudly when doing something that might break unicode.

> About the incompatibility, the two types of strings are incompatible
> anyway.
>
> Maybe even more than ints and floats. Sure you once tried some
> "Obj.magic" conversions of an non-English text with emacs. :-)

I use vim ;), but heh after using Obj.magic you're on your own :)

-- 
Vincent Hanquez