Browse thread
Unicode (was RE: JIT-compilation for OCaml?)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2001-01-12 (09:19) |
From: | John Max Skaller <skaller@o...> |
Subject: | Re: Unicode (was RE: JIT-compilation for OCaml?) |
Dave Berry wrote: > > I thought Unicode was a recognised subset of ISO-10646, corresponding to the > range 0-2^16. Also, don't Windows NT/2000 use Unicode? Yes and Yes. More precisely, Unicode is often 'ahead' of ISO, adding new characters which make it into new versions of ISO-10646 later. > My knowledge of C/C++ is probably out of date, but I thought they just used > the wide character type, without requiring a particular internal > representation. In what way do ISO C/C++ support ISO-10646? There are, for example, both 16 and 31 bit escapes. What the compiler does with them is implementation defined I think, that is, it can silently truncate to 16 or even 8 bits, but the programmer can still encode any ISO-10646 character. The type 'whchar_t' has implementation defined size in C++ (like all the other integral types). This doesn't exclude using 32 bit characters. > (I realise this isn't directly on-topic, but it may be relevant for future > extensions to OCaml?) I think it is. In particular, Ocaml supports 8 bit characters, and even allows the high 128 bytes to be used in identifiers (to allow French names :-) When and if this support is upgraded, Ocaml should go to full ISO-10646 support: for identifiers this is easily done by using UTF-8 (and providing an codec to convert Latin-1 for backward compatibility). Supporting 2^31 code points in regular expressions is more difficult. Collation is a nightmare :-) -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 checkout Vyper http://Vyper.sourceforge.net download Interscript http://Interscript.sourceforge.net