Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] [ANN] The Missing Library
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: John Goerzen <jgoerzen@c...>
Subject: Re: [Caml-list] Re: Common IO structure
On Thu, Apr 29, 2004 at 12:23:03PM +0100, Benjamin Geer wrote:
> >InputStreamReader, OutputStream, OutputStreamWriter, RandomAccessFile,
> >Reader, or Writer.  Really, I literally *do not know how to open a
> >simple file*.  I would not call that intuitive.
> 
> You actually have to *read* the documentation, not just glance at the 
> class names. :)  That's to be expected with a powerful API.  Once you 

We were talking about being intuitive here.  I'd have to read maybe a
dozen different class descriptions, have to understand the differences
between them, and cross-reference back and forth between them to figure
out how to get the object I want.  Or pay for a Java book that describes
these relationships itself.

> understand the key concepts governing the design of the API, it makes 
> sense, it and becomes intuitive to select the classes you need.  I tried 
> to point out these concepts in the message you replied to.

Even when I lived and breathed Java every day for a year, its I/O API
was not intuitive.  Actually, most of its APIs were not intuitive.
Everybody I worked with had the Javadoc for the API bookmarked and
referred to it constantly.

In contrast, the API in Python, Perl, or even C is very easy to use.

Here's one of the problems.  Java's API makes it complex to do simple
things without simplifying complex things.  Recall my example -- being
able to open a file read/write and seeking around in it?  In C, I'd do:

   int file = open(filename, O_RDWR) or FILE * file = fopen(filename, "r+")

Perl, it is:
 
   open(FH, "+<", $filename)

Or, in Python:

   file = open(filename, "r+")

If I don't know my language's code for file modes, I have one simple
place to look.

Now, none of these examples do UTF-8 or other conversions.  That's fine.
I usually don't need that.  In fact, I dare say that any kind of
conversion like that is by far the minority case.

Java requires me to wade through and think about all of these things
plus the way the file will eventually be used (do I want an array of
bytes, an array of chars, strings, etc?) right when I open it.  That's
bad form.  Make the open a generic call, and let people build upon the
file object from there.  This is how C and Python work.  (Perl is a
little wacko with its open call, but it works that way too, mostly.)

> To read a file containing UTF-8 text, one line at a time:
> 
> BufferedReader in =
>     new BufferedReader
>     (new InputStreamReader
>      (new FileInputStream(filename), "UTF8"));
> 
> while (true)
> {
>     String line = in.readLine();
> 
>     if (line == null)
>     {
>         break;
>     }
> 
>     System.out.println(line);
> }

But the scary part is that this is about how hard it is to read a file
of ASCII text, one line at a time.  Whereas, with Python, I'd do:

for line in open(filename, "r").xreadlines():
    print line

See what I mean about intuitive?

But what about UTF-8 in Python?

import codecs
file = codecs.open(filename, "r", "UTF-8")
for line in file.xreadlines():
    print line

By all means, if we are going to emulate a design from another language,
let us emulate this one.  It is far cleaner and sensible.  For more
info, see file:/usr/share/doc/python2.3/html/lib/module-codecs.html.

Essentially, the codecs.open call opens a file handle and returns a
StreamReader object that has the file handle passed in to it.  But
here's the key: this StreamReader object is itself a "file-like object"
in Python parlance.  That means you can use it everywhere you could have
used a standard file object (assuming the code is capable of handling
Unicode strings, which it usually is.)  So you still have the helpful
abstraction of Java without all the mess.

And before you say, "See, Java has a StreamReader too!", note that
codecs defines *4* classes: StreamWriter, StreamReader,
StreamReaderWriter, and StreamRecoder.  I can handle that.

> functionality (like buffering).  All the classes whose names end in 
> 'Stream' deal with bytes only; the ones whose names end in 'Reader' or 
> 'Writer' deal with characters.  See?  It's easy once you know the pattern.

But the point is, this distinction is at the wrong place.

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners