English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Unicode (was RE: JIT-compilation for OCaml?)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2001-01-12 (09:19)
From: John Max Skaller <skaller@o...>
Subject: Re: Unicode (was RE: JIT-compilation for OCaml?)
Dave Berry wrote:
> I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> range 0-2^16.  Also, don't Windows NT/2000 use Unicode?

	Yes and Yes. More precisely, Unicode is often 'ahead' of ISO,
adding new characters which make it into new versions of ISO-10646

> My knowledge of C/C++ is probably out of date, but I thought they just used
> the wide character type, without requiring a particular internal
> representation.  In what way do ISO C/C++ support ISO-10646?

	There are, for example, both 16 and 31 bit escapes.
What the compiler does with them is implementation defined I think,
that is, it can silently truncate to 16 or even 8 bits, but
the programmer can still encode any ISO-10646 character.

	The type 'whchar_t' has implementation defined size in C++
(like all the other integral types). This doesn't exclude using
32 bit characters.

> (I realise this isn't directly on-topic, but it may be relevant for future
> extensions to OCaml?)

	I think it is. In particular, Ocaml supports 8 bit characters,
and even allows the high 128 bytes to be used in identifiers
(to allow French names :-)

	When and if this support is upgraded, Ocaml should go to
full ISO-10646 support: for identifiers this is easily done by
using UTF-8 (and providing an codec to convert Latin-1 for
backward compatibility). Supporting 2^31 code points in regular
expressions is more difficult. Collation is a nightmare :-)

John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net