Browse thread
Storing UTF-8 in plain strings
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Dario Teixeira <darioteixeira@y...> |
| Subject: | Re: [Caml-list] Storing UTF-8 in plain strings |
Hi,
> Thank you all for your comments. Ulex has caught all the intentionally
> malformed code points I've inserted in the stream, so I'm fairly confident
> it's up to the task. But if I find a problem I'll keep Netconversion's
> and Extlib's validation functions in mind...
By the way, I just noticed that the 'validate' function in Extlib's UTF8
module accepts 5-byte and 6-byte sequences. Though these were part of
UTF-8's original specification, they have been deprecated by RFC 3629.
Perhaps adding a 'Deprecated_code' exception for these cases is in order?
(Or just raise the existing 'Malformed_code' exception). Note that Ulex
correctly raises an exception if any of these deprecated sequences are
found.
Cheers,
Dario Teixeira