finalisation

From: skaller (skaller@maxtal.com.au)
Date: Thu Feb 03 2000 - 15:06:04 MET

  • Next message: Xavier Leroy: "Re: <cannot fetch remote object> - WHY ?"

    I'm not sure if my previous email got through to this list,
    so I'll address this issue again.

    I've installed David's ocaml 2.999, which supports ocaml finalisation
    functions.

    The semantics provided offer a number of significant advances over other
    finalisation technology, particularly the allowance of multiple
    finalisers,
    and the fact that finalisers can cause an object to become reachable --
    without any problems. Contrast this to the mess Java made of
    finalisation.

    However, the code I have has a serious deficiency: there is no way
    to control the _order_ in which finalisers are invoked, and in
    particular,
    the order implemented is 'first in, first finalised', which is usually
    the _wrong_ order for me.

    In Vyper, my ocaml based Python interpreter, I have now enabled
    python __del__ methods, which are finalisers for python class instances.
    CPython uses reference counting, and so it finalises an acyclic graph
    of objects top down: first the finaliser (__del__ method) of the root
    object is invoked, possibly causing some child object's ref counts
    to go to zero, in which case they're finalised recursively, then the
    attributes of the object are detached, causing all remaining child
    reference count to be decremented, possibly going to zero (and invoking
    finalisers).

    This scheme has two advantages: finalisers are invoked as early as
    possible, and always in the correct order, and it has a disadvantage,
    that circular references leave garbage.

    Using the new finalisation feature, synchronous finalisation
    is lost, which is not really a problem, whereas garbage is correctly
    collected, which is a distinct advantage.

    However, the problem is that the finalisers are invoked
    in an 'arbitrary' order, which happens to be the order of
    object construction. This is no good at all. In a typical
    scenario, child objects are created first, and passed as
    arguments to the parent constructor. Finalising the child
    objects first, before the parent, makes it impossible to
    correctly execute the parent finaliser, since it assumes
    the parent object maintains invariants, including the ability
    to access fully constructed children.

    When there is an acyclic graph of references, the parent MUST be
    finalised before any children. If the order of finalisers
    was first in, last finalised, then more code would work correctly
    (almost nothing works correctly at the moment). However, this isn't
    really an adequate solution either.

    An algorithm which finds a garbage object on which no other garbage
    objects depend (if any) can proceed to safely finalise that object.
    By applying this algorithm repeatedly to all garbage objects, a correct
    order
    of finalisation is found for acyclic graphs of objects.

    In principle, the garbage collector must already know how to do this,
    although I do not know the details enough to know if there is a
    convenient
    way to reuse existing code to sequence finalisers correctly in this
    case.

    This leaves the problem of cyclic references. In Python, this will
    rarely
    be a problem because where finalisation is important, the client must
    already break any cycles manually: the only potential issue would be
    if the code depended on a finaliser of an object in a cycle which is not
    broken NOT being executed -- and ocaml actually invoked it.

    However, to extend the available functionality, it would be useful
    to be able to break cycles. Clearly, in a cycle, if there is only
    one object with a finaliser, it can be invoked without a problem.
    [Note it is important never to _collect_ any objects which a finaliser
    _might_ refer to!]

    This leaves the problem of a cycle with more than one finaliser.
    In this case, for Python, there is no way to control the order
    of finalisation (because it is never done). So it would probably
    be reasonable to invoke the finalisers in an arbitrary order.
    However, in the case of _other_ software, a different strategy might
    be required (weak pointers? explicit declaration of a finalisation
    order?)

    I'd be very interested to see if it is possible to produce a
    finalisation
    strategy which at least handles acyclic graphs, since this is necessary
    for
    my interpreter to operate correctly. I'd be even more interested
    in any comments on how to handle cycles (I have no idea, really).

    -- 
    John (Max) Skaller, mailto:skaller@maxtal.com.au
    10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
    homepage: http://www.maxtal.com.au/~skaller
    download: ftp://ftp.cs.usyd.edu/au/jskaller
    



    This archive was generated by hypermail 2b29 : Fri Feb 04 2000 - 08:50:23 MET