English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Unicode (was RE: JIT-compilation for OCaml?)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2001-01-20 (15:32)
From: Pierpaolo BERNARDI <bernardp@c...>
Subject: Re: Unicode (was RE: JIT-compilation for OCaml?)

On Thu, 18 Jan 2001, John Max Skaller wrote:

> Pierpaolo BERNARDI wrote:
> > 
> > On Thu, 11 Jan 2001, Dave Berry wrote:
> > 
> > > I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> > > range 0-2^16.
> > 
> > No. ISO-10646 and Unicode contains exactly the same code points.
> > Unicode has room for about 2^20 code points. The ISO committee has
> > agreed to limit ISO-10646 to the same range.
> 	Unless it has changed recently, the first 64K code points of ISO-10646
> are known as the Basic Multilingual Plane (BMP), which corresponds
> to ISO-10646. The other 'planes' are not currently used AFAIK,
> but they exist. 

Let me repeat: ISO has formally agreed to not use code points outside of
the Unicode possibility.  This leaves room for about 2^20 characters.
Today has been published a draft of Unicode 3.1 (the definitive version 
is due out in a couple of months, which already uses code points outside
of the BMP.  See the Unicode FAQs at www.unicode.org for more

> Indeed, some code points from the BMP are reserved
> so Unicode can use multi-word encodings of the lower 4 planes.

Unicode can be encoded in several ways, for example, UTF-8, UTF-16,
UTF-32, UCS2, etc..  This has nothing to do with the number of characters
that can be encoded.