English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Storing UTF-8 in plain strings
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2009-08-13 (13:14)
From: Dario Teixeira <darioteixeira@y...>
Subject: Re: [Caml-list] Storing UTF-8 in plain strings

> Thank you all for your comments.  Ulex has caught all the intentionally
> malformed code points I've inserted in the stream, so I'm fairly confident
> it's up to the task.  But if I find a problem I'll keep Netconversion's
> and Extlib's validation functions in mind...

By the way, I just noticed that the 'validate' function in Extlib's UTF8
module accepts 5-byte and 6-byte sequences.  Though these were part of
UTF-8's original specification, they have been deprecated by RFC 3629.
Perhaps adding a 'Deprecated_code' exception for these cases is in order?
(Or just raise the existing 'Malformed_code' exception).  Note that Ulex
correctly raises an exception if any of these deprecated sequences are

Dario Teixeira