Version française
Home     About     Download     Resources     Contact us    
Browse thread
Correct way of programming a CGI script
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Julien Moutinho <julien.moutinho@g...>
Subject: Re: [Caml-list] Warning on home-made functions dealing with UTF-8.
On Tue, Oct 16, 2007 at 09:51:16AM +1000, skaller wrote:
> On Mon, 2007-10-15 at 22:35 +0200, Julien Moutinho wrote:
> > Just in case someone would want to use this parse_utf8,
> > be aware that depending on the trust you have in your input,
> > it may be sorely discouraged to do so.
> > Indeed, this code does not check comprehensively for invalid characters.
> 
> That is correct. It is specifically designed NOT to do so.
At your own risk, there was no offense.

> The last thing you want in 99% of codec use is to abort due 
> to an error.
> 
> Try switching codecs on Firefox.. do you really want to abort
> if you have bad input or the wrong codec?
I would say, whatever Firefox does I want it to be a minimun safe.

> UTF-8 is primarily used for Unicode which is human readable text.
> Errors and faults in the text aren't important most of the time.
Unless they are voluntarily put by a malicious assailant.
cf. [1] where a backslash is used with the overlong UTF-8 form "\xC1\x9C"
instead of "\x5C", fooling IIS' string search algorithm.

> It has nothing to do with a 'trusted' source. It has to do with
> the fact that the human text is an approximation in the first
> place.

[1] http://www.securityfocus.com/bid/1806/discuss