English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
[Caml-list] Bug with really_input under cygwin
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-03-10 (15:25)
From: Nuutti Kotivuori <naked+caml@n...>
Subject: Re: [Caml-list] Bug with really_input under cygwin
skaller@users.sourceforge.net wrote:
> On Wed, 2004-03-10 at 09:30, Eric Dahlman wrote:
>> Howdy all,
>> I have some code which is reads in a whole file in and returns it
>> as a string.

If you have a master's degree in reading in between the rant, you
probably picked out the right answer from the text below. But here it
is as a simple answer:

  Loop doing 'input' on the file, until 'input' returns zero.

'really_input' is ofcourse nice and easy, but since you have no really
proper way of knowing how large the entire file is going to be in the
end, you need to make a decision with the buffer size anyway.

Binary or non-binary mode only affects the \r\n -> \n translation while
reading the file - and vice versa while writing.

> The only correct way to do this is to read a block at a time
> until you get a partial block.
> This is so EVEN in 'binary' mode, which is just another
> ill conceived Unix hack :-)


> It is unfortunate that C and Unix do not provide a coherent
> abstraction in this area. Even binary I/O is ill-conceived:


> C has been plagued by extremely ill considered functions.
> Even the basic IO operation is not correctly defined.


> There is no such thing as 'the number of characters
> in a file'. Perhaps there is a number of bytes in a file.


> In MS-DOS, files *always* consist of a number of 256
> byte blocks. It is impossible to have a file with
> a non-256 byte multiple size. Of course, text files
> uses an encoding with a Ctrl-Z at the end.


> Under Linux, the Standard for text encoding is UTF-8.


> I personally believe the easiest way to work around this
> quagmire of malspecification is to 
> (a) ONLY use 8 bit binary I/O
> (b) ALWAYS read and write bytes
> even if you're processing text. Never depend on the
> language or OS conversion functions, its very unlikely
> they'll be right. Do all the conversions needed yourself.
> At least when you find a problem you're not handling
> correctly you can fix it.

Luckily not everybody sees the world as glum :-)

-- Naked

To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners