Browse thread
[Caml-list] Stop at exception
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Mattias Waldau <mattias.waldau@a...> |
| Subject: | RE: [Caml-list] Non-mutable strings |
Nice to see that there is a general interest of non-mutable strings. However, as Xavier says, maybe it is a bit late. We have another string problem, namely handling non-ascii. As I understand it, one of the points of of nML (http://ropas.kaist.ac.kr/n/), with is a new ML language currently built using Ocaml, is that it handles asian characters. Also, their was an entry recently into this group about asian characters codings. I don't think any language can continue to be pure-ascii for ever. One of the reason of Ruby's success is that it handles non-ascii (I think it is made by an japanese). However, even we Swedes have problems, only 2 of our 3 special characters are in the lower 7 bits and sorting is always wrong. A unicode char is between 1 and 4 bytes, that means that str[i] doesn't work (unless you do as NT or Java, store it as wide chars internally, which of course Ocaml could do too). You always have to start at the beginning of the string to find the i:th char. Thus, introducing Unicode strings (or something similar, I heard that Asians don't like Unicode at all) and introducing non-mutable strings should preferrable be done simultaneously. In order to have 8-bit chars strings and unicode strings simultaneously we need something like 'u"', and maybe the possibility to say that all strings are unicode. Can this be done using a module just like 'open Float' redefines '+' to '+.'? Or should Ocaml v 4 go the whole way and let all strings (also identifiers) be Unicode? /mattias P.s. Microsoft NT, 2000, XP handles double byte chars everywhere, it is called BSTR and in order to make string comparasion etc library-routines are called all the time. However, since Unicode can be 4 byte, I don't know how that is encoded into 2 bytes. ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr