Browse thread
[Caml-list] [ANN] The Missing Library
-
John Goerzen
-
Kenneth Knowles
- Alexander V. Voinov
-
John Goerzen
-
Maxence Guesdon
-
John Goerzen
- Maxence Guesdon
-
John Goerzen
-
Alain.Frisch@e...
-
John Goerzen
-
Alain.Frisch@e...
-
Nicolas Cannasse
-
Yamagata Yoriyuki
- Gerd Stolpmann
-
Nicolas Cannasse
-
Yamagata Yoriyuki
- Jacques GARRIGUE
-
Nicolas Cannasse
- skaller
-
Yamagata Yoriyuki
- Nicolas Cannasse
-
Yamagata Yoriyuki
-
Yamagata Yoriyuki
-
Nicolas Cannasse
- oliver@f...
-
Alain.Frisch@e...
-
John Goerzen
- Henri DF
- Shawn Wagner
- james woodyatt
-
Alain.Frisch@e...
- Basile STARYNKEVITCH
-
John Goerzen
- Kenneth Knowles
- Florian Hars
-
Maxence Guesdon
- Eric C. Cooper
-
Kenneth Knowles
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Yamagata Yoriyuki <yoriyuki@m...> |
| Subject: | Re: [Caml-list] Re: Common IO structure |
From: John Goerzen <jgoerzen@complete.org> Subject: Re: [Caml-list] Re: Common IO structure Date: Thu, 29 Apr 2004 09:02:40 -0500 > On Thu, Apr 29, 2004 at 10:40:36PM +0900, Yamagata Yoriyuki wrote: > > > > > > OK, but then you can leave out readline(), readlines() and xreadlines(), > > > > > > because they don't make any sense unless you've already dealt with > > > > > > character encodings. > > > > > > > > > > No, they can simply be implemented in terms of read(). > > > > > > > > It will break when UTF-16/UTF-32 are used. The line separator should > > > > be handled after code conversion. At least that is the idea of > > > > Unicode standard. (But Since Unicode standard is challenged by > > > > reality in every aspect, maybe nobody cares.) > > > > > > You are missing the point. read() could handle the code conversion. > > > > No, what I wanted to say is that the line separator should be handled > > in the Unicode level, not the byte-character level. Your design > > assumes read() always returns new line characters as in ASCII. This > > would not hold when read() returns UTF-16/UTF-32. > > I don't see why that is the case. If read() returns UTF-16 data, > readlines() works with it, and would of course be scanning it for a > UTF-16 EOL character or string. I don't see where that's the problem. Encoding could be stateful, so there would be no single representation of EOL. (*) Ok, this is very unlikely case currently, but I think there is an interesting encoding for Unicode which is fully stateful. So, readlines() needs to fully aware of the encoding. My proposal is mainly for sharing common channel types among libraries, so that a user can pass a channel from a libraries to anonther withoug writing a glue code. Since parsing endline, or loading the whole file into the string mainly occurs in the endpoint of IO, I do not think standardizing them are necessary for this purpose. I do not think standardizing the endpoint API is important, because I think that in the end, we will use only one library as the endpoint of IO. (*) IIRC, RFC defines the endianness of UTF-16 is swapped in the middle of the stream, when "BOM" 0xfffe appears. -- Yamagata Yoriyuki ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners