Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Stop at exception
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2002-01-16 (19:22)
From: Mattias Waldau <mattias.waldau@a...>
Subject: RE: [Caml-list] Non-mutable strings
Nice to see that there is a general interest of non-mutable strings.
However, as Xavier says, maybe it is a bit late.

We have another string problem, namely handling non-ascii. As I understand
it, one of the points of of nML (, with is a new
ML language currently built using Ocaml, is that it handles asian
characters. Also, their was an entry recently into this group about asian
characters codings.

I don't think any language can continue to be pure-ascii for ever. One of
the reason of Ruby's success is that it handles non-ascii (I think it is
made by an japanese). However, even we Swedes have problems, only 2 of our 3
special characters are in the lower 7 bits and sorting is always wrong.

A unicode char is between 1 and 4 bytes, that means that str[i] doesn't work
(unless you do as NT or Java, store it as wide chars internally, which of
course Ocaml could do too). You always have to start at the beginning of the
string to find the i:th char.

Thus, introducing Unicode strings (or something similar, I heard that Asians
don't like Unicode at all) and introducing non-mutable strings should
preferrable be done simultaneously.

In order to have 8-bit chars strings and unicode strings simultaneously we
need something like 'u"', and maybe the possibility to say that all strings
are unicode. Can this be done using a module just like 'open Float'
redefines '+' to '+.'?

Or should Ocaml v 4 go the whole way and let all strings (also identifiers)
be Unicode?


P.s. Microsoft NT, 2000, XP handles double byte chars everywhere, it is
called BSTR and in order to make string comparasion etc library-routines are
called all the time. However, since Unicode can be 4 byte, I don't know how
that is encoded into 2 bytes.

Bug reports:  FAQ:
To unsubscribe, mail  Archives: