Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Storing UTF-8 in plain strings
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2009-08-13 (09:37)
From: Dario Teixeira <darioteixeira@y...>
Subject: Re: [Caml-list] Storing UTF-8 in plain strings

> I'm using Ulex + Menhir to parse UTF-8 encoded source code, and I'm relying
> on plain strings for processing and storing data.  I *think* I can get away
> with using only the String module to handle this variable-length encoding
> as long as I am careful with the way I treat these strings.  Here are the
> assumptions I am making:

Thank you all for your comments.  Ulex has caught all the intentionally
malformed code points I've inserted in the stream, so I'm fairly confident
it's up to the task.  But if I find a problem I'll keep Netconversion's
and Extlib's validation functions in mind...

Best regards,
Dario Teixeira