Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Unicode (was RE: JIT-compilation for OCaml?)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2001-01-12 (18:57)
From: John Max Skaller <skaller@o...>
Subject: Re: Unicode (was RE: JIT-compilation for OCaml?)
Xavier Leroy wrote:

> Shall we "do it right" (for some notion of "right") or favor
> interoperability?  Hard question.  My current answer is to
> procrastinate...  Actually, multi-byte encoded strings (UTF-8) are not
> so bad and already have full support in OCaml :-)

	I personally think this is the first step, since no
new data types are required. Instead, what is needed would seem to be
simple. What I believe is required is

	1. changes to the lexer to support \uXXXX and \UXXXXXXXX escapes
(in strings, and probably in identifiers)

	2. changes to the lexer to recognize the 'letters'
which can be used in identifiers. The letters which should be
allowed are specified in an ISO document. 

	3. Provide a codec to convert Latin-1 to UTF-8.
[One can argue about whether it is applied by default or not :-]
You might provide other codecs too, such as UCS-16 -> UTF-8

	I guess most of the rest can be done in Ocaml or C
without impacting the compiler/run-time, and when it is right,
the compiler/run-time can be tuned to make more efficient
representations possible. [For example, to generate inline
code to compare 16/31 bit unsigned integers, rather than
call a C routine]

John (Max) Skaller,
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper
download Interscript