Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Announcement: PXP 1.1.92 (development version)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2002-09-01 (08:52)
From: John Max Skaller <skaller@o...>
Subject: Re: [Caml-list] Announcement: PXP 1.1.92 (development version)
Gerd Stolpmann wrote:

> previous versions of PXP, the internal representation of the XML trees was 
> restricted to either UTF-8 or ISO-8859-1. Now, a number of additional 
> encodings are supported, including the whole ISO-8859 series. 

I have ALL the code sets specified at in
programmatic form. Easy to generate Ocaml versions
of the tables.

however, how about developing a standard I18n library
with an eye to future inclusion in the standard

The questions are mainly: what form should the
encode/decode functions take?

My functions are in Python, and take the form:

	decode: string -> (int * string)
	encode: int -> string

where string is an 8 bit byte stream,
and int is a unicode (or other) code point.

The actual python functions use dynamically loaded
data tables, but each character set has a fixed
format for the tables that knows about the raw
structure of the character set (eg what ranges of
hi and low bytes are allowed in two byte encodings
of Shift-Jis, KSC, etc). For Ocaml, we'd probably
want to bind the encodings at compile time
(since there is no well defined way to find
the data tables at run time :(

The tables are very compact, but there are quite
a few encodings -- some overhead if they're all
in the one module ..

John Max Skaller,
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: