Browse thread
Correct way of programming a CGI script
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Julien Moutinho <julien.moutinho@g...> |
| Subject: | Re: [Caml-list] Warning on home-made functions dealing with UTF-8. |
On Tue, Oct 16, 2007 at 09:51:16AM +1000, skaller wrote: > On Mon, 2007-10-15 at 22:35 +0200, Julien Moutinho wrote: > > Just in case someone would want to use this parse_utf8, > > be aware that depending on the trust you have in your input, > > it may be sorely discouraged to do so. > > Indeed, this code does not check comprehensively for invalid characters. > > That is correct. It is specifically designed NOT to do so. At your own risk, there was no offense. > The last thing you want in 99% of codec use is to abort due > to an error. > > Try switching codecs on Firefox.. do you really want to abort > if you have bad input or the wrong codec? I would say, whatever Firefox does I want it to be a minimun safe. > UTF-8 is primarily used for Unicode which is human readable text. > Errors and faults in the text aren't important most of the time. Unless they are voluntarily put by a malicious assailant. cf. [1] where a backslash is used with the overlong UTF-8 form "\xC1\x9C" instead of "\x5C", fooling IIS' string search algorithm. > It has nothing to do with a 'trusted' source. It has to do with > the fact that the human text is an approximation in the first > place. [1] http://www.securityfocus.com/bid/1806/discuss