Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] GC and file descriptors
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Brian Hurt <bhurt@s...>
Subject: Re: [Caml-list] GC and file descriptors
On 18 Nov 2003, skaller wrote:

> On Mon, 2003-11-17 at 06:19, Brian Hurt wrote:
> > On 16 Nov 2003, skaller wrote:
> 
> > I wish the mainstream would crawl out of the sixties era computer science 
> > and get at least into the eighties. 
> 
> Lol, they're up to the sixties already?

Pretty precisely, actually.  Java's type system is quite comprable to 
Algol 68's- which is why downcasting from Object is such a common thing in 
Java.  Generics are being added to the language, but they will still be 
"special" things (the Java designers are seemingly intent on turning Java 
into C++).  And Java is the only language whose memory management is more 
advanced than 1968-era LISP.

> 
> >  The only "mainstream" programming 
> > language I know of with *excusable* garbage collection is Java. 
> 
> I think Ocaml is very close to mainstream now.
> [And I think the FP contest and Bagley are responsible,
> and ORiley book in English will be the clincher]

The O'Reilly book will be a great benefit.  Let me know when it comes out, 
I want a copy.  But I don't know how close to mainstream it is.  Perl, 
Python, and Ruby are scripting languages, still mainly used for short, 
single-person, throw-away projects.  And they aren't that far from 
"conventional" languages (think Perl and Shell, Python/Ruby and Java).  
Ocaml is more of an applications language- it's benefits start to really 
shine when you're looking at tens of thousands (or more) lines of code.  
C++ succeed (and Objective-C and Smalltalk didn't) because you could write 
C in it.  Java succeeded because IBM, Sun, Oracle, and a number of other 
huge companies got behind it.

> 
> Very hard to be the second (and third) fastest programming
> language around and not be noticed :-)

Not that hard.  All you need to have happen is to have dismissed as a 
"fruity, compsci language".

> 
> >  Of 
> > course, Java's type system is state of the art- for 1968. 
> 
> Err.. since when is downcasting everthing from Object
> a type system??

It's a type system with a hole in it.  Which I'll agree is almost as bad 
as no type system at all, but Java does have a type system (and DOS 
qualifies as an operating system, technically).

> 
> > > Because of this problem, my own major Python program
> > > interscript would not run on my interpreter, since
> > > it relied on orderly finalisation.
> 
> > You were abusing finalization.  
> 
> Well no. I was *trying* to abuse it but failed :-)
> 
> In fact, partly my prompting on this was responsible
> for Ocaml coded finalisers (instead of just C ones).
> 
> > A much better way to do this would be to 
> > have a central command queue. 
> 
> This would have been impossible in interscript.
> It works by executing user commands directly
> (but in a restricted environment).

Then you need to have an execute() function which is called as soon as the 
command can execute.  I assumed you were trying to delay execution of the 
commands for a while.

> 
> > > 
> > > Anyhow .. there do exist cases where explicit finalisation
> > > is difficult. 
> > 
> > IIRC, in the general case it's equivelent to solving the halting problem.
> 
> If I can make a general comment here: there are things like comparing
> functions which are equivalent to the halting problem. Another is
> verifying program correctness :-)
> 
> But it is irrelevant, because programmers do not intend to write
> arbitrary code. They intend to write comprehensible code, whose
> correctness is therefore necessarily easy to prove.

What they intend to do and what they do do are often completely different.  
It is a rare programmer who *intentionally* adds bugs to his code.  Yet 
bugs still happen.

But my point here is that determining the *exact* time that an object 
becomes garbage can be arbitrarily complex.  Most of the time, I agree, it 
won't be.  It's like factoring numbers- half of all numbers have 2 as a 
factor, and a third of the rest have 3 as a factor.  So 2/3rds of all 
numbers are easy to get a factor out of.  But there are still some numbers 
which are very hard to factor.

Even reference counting can fail at this.  It's possible for references 
to an object to "escape".  The object is still dead, but it's not being 
collected for a while.  Which is why depending on finalizations happening 
at a particular time, or in a particular order, is a bad idea.

Note that the style/paradigm of code you are writting makes a huge 
difference.  It's a lot easier to track object lifetimes in a procedural 
language than in an object oriented language.

Once you allow for the fact that finalizers/destructors may not happen in 
a defined order or at defined times, why not go whole-hog?  Especially 
given that it gives you a non-trivial performance boost.  This performance 
boost is critical for functional languages like Ocaml, which has (compared 
to imperitive programs) a shockingly high allocation rate.  I've seen 
studies that say that your average ML program allocates one word of memory 
every six program instructions.

> 
> This is also true of finalisation. Necessarily. The general case is
> irrelevant, because people don't intend to write arbitrary code.
> 
> Even when multiple people work on an Open Source project (just about the
> worst environment I can think of .. ) there is at least an intent to
> follow a design.
> 
> > On a more general note, I will agree that having the GC free non-memory
> > resources raises problems.  I agree (and strongly beleive) that everything
> > that isn't memory should have some way for the program to free them
> > without invoking the GC- for example, file descriptors should have a close 
> > routine.  And I agree that programs *should* use them instead of relying 
> > on the GC in all but the most intractable cases.
> 
> If I may modify that: *even* in the most intractible cases :-)

In the most intractible cases, neither the compiler nor the programmer may 
be able to determine when the last reference is released.  But even in the 
simple cases, the programmer may have the intent to release the resources 
and simply doesn't.  It's called a bug.  

> 
> However, two things spring to mind. First: if close is a primitive
> operation, we need a high level way of not only tracking
> what to close, but also of doing that tracking efficiently.
> 
> Secondly, since the resource is represented in memory,
> and much of the time the dependencies which are *not*
> represented in memory could be, using the code
> of the garbage collector (and not just the same algorithm),
> makes some sense.

Why special-case close?  It's what we've been talking about, but it could 
just as easily be "release a mutex", etc.

> 
> 
> > The question is, is it worse to have the GC try to reclaim the resources
> > than it is to have a guarenteed leak? 
> 
> You are making an incorrect assumption here. You're assuming
> that finalisers only release resources. Consider again
> the case where the final action in generating a document
> is to create a table of contents (know nnow that all chapters
> are seen).
> 
> That is a quite non-trivial task which can take a lot
> of time and involve a very large amount of code,
> build new objects, etc.
> 
> The time to do this operation is "when the user cannot
> add anymore chapters to the document" and the easiest
> time to specify that is "when the user no longer
> has any copy of the document handle".
> 
> This finalisation task isn't a destructor and it isn't
> releasing anything, but it is, by specification,
> a finalisation task.


I wouldn't put that into a destructor.  DON'T USE DESTRUCTORS FOR 
ENFORCING THE ORDER OF OPERATIONS.  Destructors are safety nets.  When all 
else fails, they have a shot at catching and fixing the mistakes.
You're asking the language to support a seriously broken programming 
example.  There is a huge difference between "creating an appendix" and 
"closing a file handle".  

> 
> >  Consider the case where close 
> > return EINTR and doesn't close the file descriptor.  Since the descriptor 
> > has already leaked- for it to reach this point the program no longer has 
> > any way to reach the descriptor (if the program did, the object wouldn't 
> > be being collected).  So what's the difference in this case between the 
> > collector silently ignoring the error, and the collector not even trying 
> > to free the resource?
> 
> Probably none, BUT you have again made an assumption and this
> time it is a fatal mistake. Closing a file is not usually done
> to free the resource, but to flush the buffers.
> 
> In interscript, this proved difficult, because sometimes
> client code would expect to reopen and read the file
> after the last reference: unfortunately, the open actually
> works and reads 90% of the file (the last 10% still in a buffer).
> 
> I fix that problem by always writing a temporary file, and copying
> it when the file is closed. So here, the finalisation is vital
> (it copies a file, as well as closing one).

And this fixes the problem how?

If it matters when the buffers are flushed, manually flush the buffers.  
Even reference counting can fail if a reference to the object "leaks" (I'm 
being nice and assuming that circular references aren't a problem).

> 
> > Actually, an idea I've had is to add Java-style "throws" information- 
> > basically what exceptions a function can throw- to the type information of 
> > a function.  With type inference, this shouldn't be the headache in Ocaml 
> > it is in Java.  The advantage here is that you could enlist the type 
> > system to gaurentee that a destructor doesn't throw any exceptions- i.e. 
> > the code for a destructor should handle all exceptions it generates.  
> 
> Sigh. I argued strongly in C++ that exception specifications should
> be statically enforced. They're not. Its a total pain, because
> they ALMOST are. Unfortunately, if you throw an exception you're
> not allowed to .. a run time check detects the fault
> and throw an Unexpected exception. You can rely on it .. and
> it totally defeats any attempt to do static analysis (that one leak
> can be turned into a complete defeat of the exception specifications).

This seems par for the course for the C/C++ standards committee.  Consider 
const and register as an example.

> 
> On thing is for sure .. I *hate* seeing
> 
> "program terminated with uncaught "Not_found" exception"
> 
> because I do hundreds of Hashtbl.find calls....
> 
> 

There is a religous debate between returning 'a and throwing an exception 
if the item isn't found, or returning 'a option.  Myself, I'd prefer 'a 
option *currently* because that enlists the typechecker to find errors.  
The counter argument is that this forces you to make a lot of unnecessary 
checks for finds you know will succeed.  I'd rather have redundant error 
checks than uncaught errors, but that's me.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners