Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] [ANN] The Missing Library
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Benjamin Geer <ben@s...>
Subject: Re: [Caml-list] Re: Common IO structure
Yamagata Yoriyuki wrote:
> I doubt the benefit of
> bufferd IO, as I stated in the previous mail.  Unless operation is
> very simple, and atoms are very small, (that is, character IO) extra
> cost of element-wise IO is not important.

But there are times when you want to read one byte or character at a 
time, and in those cases, buffering saves the overhead of a function or 
method call per byte/char.

Buffering is also useful w you have to process a very large amount of 
data, and cannot keep it all in memory at once.

> I'm interested in (potential) users of IO libraries.  Could someone
> comment on IO system of Jave, Perl, Python, for example?

In Java there are two I/O libraries, the original one (java.io)[1] and 
the new one (java.nio)[2].  The old one has the virtue of being easy to 
understand and use, and flexible enough for many situations.  The basic 
InputStream and OutputStream classes deal only with bytes, have 
Unix-like 'read' and 'write' methods, and do no buffering.  There are 
derived classes such as FileInputStream and SocketInputStream.  The API 
allows you to add functionality to a stream by using wrappers.  For 
example, to add buffering to any InputStream, you wrap it in a 
BufferedInputStream (which is a class derived from InputStream).  To 
marshal Java objects to a byte stream, you wrap an OutputStream in an 
ObjectOutputStream, and pass objects to the ObjectOutputStream.

Classes derived from Reader and Writer deal with characters, and can be 
wrapped around streams to perform conversions between bytes and 
characters.  For example, to read bytes and convert them to characters, 
you wrap an InputStream in an InputStreamReader, which has a constructor 
that says which encoding to read, and 'read' methods that return 
(Unicode) characters.  Another example of a Reader is LineNumberReader, 
which counts lines in its input.

This is all fine as far as it goes, but it turns out to be cumbersome, 
and in some cases impossible, to implement certain things efficiently 
using this API.  The java.nio API solves these problems, but it is much 
more complicated to use.

For example, suppose you have to read a large amount of text from a 
network connection, convert it to another encoding, and save it in a 
file.  There's too much text to store all of it in memory at once, and 
you're dealing with a lot of network requests at the same time, so in 
any case you want to minimise the amount of memory used by each request. 
  You'd like to be able to read about 4K at a time, convert the bytes to 
the target encoding, and write them to the file.  You could make a 4K 
byte array and use it as a buffer, but what if the input encoding is 
UTF-8?  You might get an incomplete character at the end of the buffer; 
if the UTF-8 decoder is expecting a complete string, it will choke.

The solution in java.nio is to have two different kinds of buffer 
classes: ByteBuffer and CharBuffer.  You can fill up a ByteBuffer, and 
use a Decoder to convert the bytes to Unicode characters; the Decoder 
will read as many complete characters as it can, and put them in a 
CharBuffer.  You then 'compact' the ByteBuffer, which moves any 
remaining bytes to the beginning of the buffer, and start again. 
(Similarly, you can use an Encoder to convert the characters to bytes in 
the target encoding, filling up a ByteBuffer which you can then write to 
an output channel.)

Some of other useful things java.nio provides are:

* 'Direct' byte buffers.  'Given a direct byte buffer, the Java virtual 
machine will make a best effort to perform native I/O operations 
directly upon it. That is, it will attempt to avoid copying the buffer's 
content to (or from) an intermediate buffer before (or after) each 
invocation of one of the underlying operating system's native I/O 
operations.'

* Buffers that correspond to a memory-mapped region of a file.  This can 
be useful for dealing with huge files; it takes advantage of the 
operating system's support for memory-mapped files, where available.

* 'Scattering' channels.  'A scattering read operation reads, in a 
single invocation, a sequence of bytes into one or more of a given 
sequence of buffers. Scattering reads are often useful when implementing 
network protocols or file formats that, for example, group data into 
segments consisting of one or more fixed-length headers followed by a 
variable-length body.  Similar gathering write operations are defined in 
the GatheringByteChannel interface.'

My own view is that the flexibility and efficiency permitted by java.nio 
are valuable, but that its complexity is a problem.  The behaviour of 
the buffer classes[3] is tricky to understand and therefore error-prone.

Ben

[1] http://java.sun.com/j2se/1.4.2/docs/api/java/io/package-summary.html

[2] http://java.sun.com/j2se/1.4.2/docs/api/java/nio/package-summary.html

[3] http://java.sun.com/j2se/1.4.2/docs/api/java/nio/Buffer.html

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners