English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Correct way of programming a CGI script
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-10-16 (00:00)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] Warning on home-made functions dealing with UTF-8.

On Mon, 2007-10-15 at 22:35 +0200, Julien Moutinho wrote:
> On Fri, Oct 12, 2007 at 12:48:16AM +1000, skaller wrote:
> > On Thu, 2007-10-11 at 16:21 +0200, Vincent Hanquez wrote:
> > > On Thu, Oct 11, 2007 at 11:54:24PM +1000, skaller wrote:
> > > > You can't: Camomile is massive for a reason.. the problem it
> > > > aims to solve is complex and hard to do efficiently without
> > > > a large set of specialised functions.
> > > 
> > > You are assuming that i want efficiency where i want to print few
> > > unicode string in an ui here and there. I *DON'T* want to be exposed to
> > > full unicode, i need something like 1/100 of camomile library.
> > 
> > In that case, you can use an int Array.t for Unicode provided 
> > it is only 31 bit OR you have a 64 bit machine. These routines 
> > should help converting to and from UTF-8:
> > [...]
> Just in case someone would want to use this parse_utf8,
> be aware that depending on the trust you have in your input,
> it may be sorely discouraged to do so.
> Indeed, this code does not check comprehensively for invalid characters.

That is correct. It is specifically designed NOT to do so.
The last thing you want in 99% of codec use is to abort due 
to an error.

Try switching codecs on Firefox.. do you really want to abort
if you have bad input or the wrong codec?

UTF-8 is primarily used for Unicode which is human readable text.
Errors and faults in the text aren't important most of the time.

It has nothing to do with a 'trusted' source. It has to do with
the fact that the human text is an approximation in the first

John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net