Version française
Home     About     Download     Resources     Contact us    
Browse thread
Storing UTF-8 in plain strings
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Dario Teixeira <darioteixeira@y...>
Subject: Re: [Caml-list] Storing UTF-8 in plain strings
Hi,

> Thank you all for your comments.  Ulex has caught all the intentionally
> malformed code points I've inserted in the stream, so I'm fairly confident
> it's up to the task.  But if I find a problem I'll keep Netconversion's
> and Extlib's validation functions in mind...

By the way, I just noticed that the 'validate' function in Extlib's UTF8
module accepts 5-byte and 6-byte sequences.  Though these were part of
UTF-8's original specification, they have been deprecated by RFC 3629.
Perhaps adding a 'Deprecated_code' exception for these cases is in order?
(Or just raise the existing 'Malformed_code' exception).  Note that Ulex
correctly raises an exception if any of these deprecated sequences are
found.

Cheers,
Dario Teixeira