Browse thread
[Caml-list] Bug with really_input under cygwin
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Nuutti Kotivuori <naked+caml@n...> |
| Subject: | Re: [Caml-list] Bug with really_input under cygwin |
skaller@users.sourceforge.net wrote: > On Wed, 2004-03-10 at 09:30, Eric Dahlman wrote: >> Howdy all, >> >> I have some code which is reads in a whole file in and returns it >> as a string. If you have a master's degree in reading in between the rant, you probably picked out the right answer from the text below. But here it is as a simple answer: Loop doing 'input' on the file, until 'input' returns zero. 'really_input' is ofcourse nice and easy, but since you have no really proper way of knowing how large the entire file is going to be in the end, you need to make a decision with the buffer size anyway. Binary or non-binary mode only affects the \r\n -> \n translation while reading the file - and vice versa while writing. > The only correct way to do this is to read a block at a time > until you get a partial block. > > This is so EVEN in 'binary' mode, which is just another > ill conceived Unix hack :-) [...] > It is unfortunate that C and Unix do not provide a coherent > abstraction in this area. Even binary I/O is ill-conceived: [...] > C has been plagued by extremely ill considered functions. > Even the basic IO operation is not correctly defined. [...] > There is no such thing as 'the number of characters > in a file'. Perhaps there is a number of bytes in a file. [...] > In MS-DOS, files *always* consist of a number of 256 > byte blocks. It is impossible to have a file with > a non-256 byte multiple size. Of course, text files > uses an encoding with a Ctrl-Z at the end. [...] > Under Linux, the Standard for text encoding is UTF-8. [...] > I personally believe the easiest way to work around this > quagmire of malspecification is to > > (a) ONLY use 8 bit binary I/O > (b) ALWAYS read and write bytes > > even if you're processing text. Never depend on the > language or OS conversion functions, its very unlikely > they'll be right. Do all the conversions needed yourself. > At least when you find a problem you're not handling > correctly you can fix it. Luckily not everybody sees the world as glum :-) -- Naked ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners