Browse thread
Unicode (was RE: JIT-compilation for OCaml?)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2001-01-20 (15:32) |
From: | Pierpaolo BERNARDI <bernardp@c...> |
Subject: | Re: Unicode (was RE: JIT-compilation for OCaml?) |
On Thu, 18 Jan 2001, John Max Skaller wrote: > Pierpaolo BERNARDI wrote: > > > > On Thu, 11 Jan 2001, Dave Berry wrote: > > > > > I thought Unicode was a recognised subset of ISO-10646, corresponding to the > > > range 0-2^16. > > > > No. ISO-10646 and Unicode contains exactly the same code points. > > Unicode has room for about 2^20 code points. The ISO committee has > > agreed to limit ISO-10646 to the same range. > > Unless it has changed recently, the first 64K code points of ISO-10646 > are known as the Basic Multilingual Plane (BMP), which corresponds > to ISO-10646. The other 'planes' are not currently used AFAIK, > but they exist. Let me repeat: ISO has formally agreed to not use code points outside of the Unicode possibility. This leaves room for about 2^20 characters. Today has been published a draft of Unicode 3.1 (the definitive version is due out in a couple of months, which already uses code points outside of the BMP. See the Unicode FAQs at www.unicode.org for more informations. > Indeed, some code points from the BMP are reserved > so Unicode can use multi-word encodings of the lower 4 planes. Unicode can be encoded in several ways, for example, UTF-8, UTF-16, UTF-32, UCS2, etc.. This has nothing to do with the number of characters that can be encoded. Cheers, Pierpaolo