17db96d56Sopenharmony_ciIntro
27db96d56Sopenharmony_ci=====
37db96d56Sopenharmony_ci
47db96d56Sopenharmony_ciThe basic rule for dealing with weakref callbacks (and __del__ methods too,
57db96d56Sopenharmony_cifor that matter) during cyclic gc:
67db96d56Sopenharmony_ci
77db96d56Sopenharmony_ci    Once gc has computed the set of unreachable objects, no Python-level
87db96d56Sopenharmony_ci    code can be allowed to access an unreachable object.
97db96d56Sopenharmony_ci
107db96d56Sopenharmony_ciIf that can happen, then the Python code can resurrect unreachable objects
117db96d56Sopenharmony_citoo, and gc can't detect that without starting over.  Since gc eventually
127db96d56Sopenharmony_ciruns tp_clear on all unreachable objects, if an unreachable object is
137db96d56Sopenharmony_ciresurrected then tp_clear will eventually be called on it (or may already
147db96d56Sopenharmony_cihave been called before resurrection).  At best (and this has been an
157db96d56Sopenharmony_cihistorically common bug), tp_clear empties an instance's __dict__, and
167db96d56Sopenharmony_ci"impossible" AttributeErrors result.  At worst, tp_clear leaves behind an
177db96d56Sopenharmony_ciinsane object at the C level, and segfaults result (historically, most
187db96d56Sopenharmony_cioften by setting a class's mro pointer to NULL, after which attribute
197db96d56Sopenharmony_cilookups performed by the class can segfault).
207db96d56Sopenharmony_ci
217db96d56Sopenharmony_ciOTOH, it's OK to run Python-level code that can't access unreachable
227db96d56Sopenharmony_ciobjects, and sometimes that's necessary.  The chief example is the callback
237db96d56Sopenharmony_ciattached to a reachable weakref W to an unreachable object O.  Since O is
247db96d56Sopenharmony_cigoing away, and W is still alive, the callback must be invoked.  Because W
257db96d56Sopenharmony_ciis still alive, everything reachable from its callback is also reachable,
267db96d56Sopenharmony_ciso it's also safe to invoke the callback (although that's trickier than it
277db96d56Sopenharmony_cisounds, since other reachable weakrefs to other unreachable objects may
287db96d56Sopenharmony_cistill exist, and be accessible to the callback -- there are lots of painful
297db96d56Sopenharmony_cidetails like this covered in the rest of this file).
307db96d56Sopenharmony_ci
317db96d56Sopenharmony_ciPython 2.4/2.3.5
327db96d56Sopenharmony_ci================
337db96d56Sopenharmony_ci
347db96d56Sopenharmony_ciThe "Before 2.3.3" section below turned out to be wrong in some ways, but
357db96d56Sopenharmony_ciI'm leaving it as-is because it's more right than wrong, and serves as a
367db96d56Sopenharmony_ciwonderful example of how painful analysis can miss not only the forest for
377db96d56Sopenharmony_cithe trees, but also miss the trees for the aphids sucking the trees
387db96d56Sopenharmony_cidry <wink>.
397db96d56Sopenharmony_ci
407db96d56Sopenharmony_ciThe primary thing it missed is that when a weakref to a piece of cyclic
417db96d56Sopenharmony_citrash (CT) exists, then any call to any Python code whatsoever can end up
427db96d56Sopenharmony_cimaterializing a strong reference to that weakref's CT referent, and so
437db96d56Sopenharmony_cipossibly resurrect an insane object (one for which cyclic gc has called-- or
447db96d56Sopenharmony_ciwill call before it's done --tp_clear()).  It's not even necessarily that a
457db96d56Sopenharmony_ciweakref callback or __del__ method does something nasty on purpose:  as
467db96d56Sopenharmony_cisoon as we execute Python code, threads other than the gc thread can run
477db96d56Sopenharmony_citoo, and they can do ordinary things with weakrefs that end up resurrecting
487db96d56Sopenharmony_ciCT while gc is running.
497db96d56Sopenharmony_ci
507db96d56Sopenharmony_ci    https://www.python.org/sf/1055820
517db96d56Sopenharmony_ci
527db96d56Sopenharmony_cishows how innocent it can be, and also how nasty.  Variants of the three
537db96d56Sopenharmony_cifocused test cases attached to that bug report are now part of Python's
547db96d56Sopenharmony_cistandard Lib/test/test_gc.py.
557db96d56Sopenharmony_ci
567db96d56Sopenharmony_ciJim Fulton gave the best nutshell summary of the new (in 2.4 and 2.3.5)
577db96d56Sopenharmony_ciapproach:
587db96d56Sopenharmony_ci
597db96d56Sopenharmony_ci    Clearing cyclic trash can call Python code.  If there are weakrefs to
607db96d56Sopenharmony_ci    any of the cyclic trash, then those weakrefs can be used to resurrect
617db96d56Sopenharmony_ci    the objects.  Therefore, *before* clearing cyclic trash, we need to
627db96d56Sopenharmony_ci    remove any weakrefs.  If any of the weakrefs being removed have
637db96d56Sopenharmony_ci    callbacks, then we need to save the callbacks and call them *after* all
647db96d56Sopenharmony_ci    of the weakrefs have been cleared.
657db96d56Sopenharmony_ci
667db96d56Sopenharmony_ciAlas, doing just that much doesn't work, because it overlooks what turned
677db96d56Sopenharmony_ciout to be the much subtler problems that were fixed earlier, and described
687db96d56Sopenharmony_cibelow.  We do clear all weakrefs to CT now before breaking cycles, but not
697db96d56Sopenharmony_ciall callbacks encountered can be run later.  That's explained in horrid
707db96d56Sopenharmony_cidetail below.
717db96d56Sopenharmony_ci
727db96d56Sopenharmony_ciOlder text follows, with a some later comments in [] brackets:
737db96d56Sopenharmony_ci
747db96d56Sopenharmony_ciBefore 2.3.3
757db96d56Sopenharmony_ci============
767db96d56Sopenharmony_ci
777db96d56Sopenharmony_ciBefore 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs.
787db96d56Sopenharmony_ciSegfaults in Zope3 resulted.
797db96d56Sopenharmony_ci
807db96d56Sopenharmony_ciweakrefs in Python are designed to, at worst, let *other* objects learn
817db96d56Sopenharmony_cithat a given object has died, via a callback function.  The weakly
827db96d56Sopenharmony_cireferenced object itself is not passed to the callback, and the presumption
837db96d56Sopenharmony_ciis that the weakly referenced object is unreachable trash at the time the
847db96d56Sopenharmony_cicallback is invoked.
857db96d56Sopenharmony_ci
867db96d56Sopenharmony_ciThat's usually true, but not always.  Suppose a weakly referenced object
877db96d56Sopenharmony_cibecomes part of a clump of cyclic trash.  When enough cycles are broken by
887db96d56Sopenharmony_cicyclic gc that the object is reclaimed, the callback is invoked.  If it's
897db96d56Sopenharmony_cipossible for the callback to get at objects in the cycle(s), then it may be
907db96d56Sopenharmony_cipossible for those objects to access (via strong references in the cycle)
917db96d56Sopenharmony_cithe weakly referenced object being torn down, or other objects in the cycle
927db96d56Sopenharmony_cithat have already suffered a tp_clear() call.  There's no guarantee that an
937db96d56Sopenharmony_ciobject is in a sane state after tp_clear().  Bad things (including
947db96d56Sopenharmony_cisegfaults) can happen right then, during the callback's execution, or can
957db96d56Sopenharmony_cihappen at any later time if the callback manages to resurrect an insane
967db96d56Sopenharmony_ciobject.
977db96d56Sopenharmony_ci
987db96d56Sopenharmony_ci[That missed that, in addition, a weakref to CT can exist outside CT, and
997db96d56Sopenharmony_ci any callback into Python can use such a non-CT weakref to resurrect its CT
1007db96d56Sopenharmony_ci referent.  The same bad kinds of things can happen then.]
1017db96d56Sopenharmony_ci
1027db96d56Sopenharmony_ciNote that if it's possible for the callback to get at objects in the trash
1037db96d56Sopenharmony_cicycles, it must also be the case that the callback itself is part of the
1047db96d56Sopenharmony_citrash cycles.  Else the callback would have acted as an external root to
1057db96d56Sopenharmony_cithe current collection, and nothing reachable from it would be in cyclic
1067db96d56Sopenharmony_citrash either.
1077db96d56Sopenharmony_ci
1087db96d56Sopenharmony_ci[Except that a non-CT callback can also use a non-CT weakref to get at
1097db96d56Sopenharmony_ci CT objects.]
1107db96d56Sopenharmony_ci
1117db96d56Sopenharmony_ciMore, if the callback itself is in cyclic trash, then the weakref to which
1127db96d56Sopenharmony_cithe callback is attached must also be trash, and for the same kind of
1137db96d56Sopenharmony_cireason:  if the weakref acted as an external root, then the callback could
1147db96d56Sopenharmony_cinot have been cyclic trash.
1157db96d56Sopenharmony_ci
1167db96d56Sopenharmony_ciSo a problem here requires that a weakref, that weakref's callback, and the
1177db96d56Sopenharmony_ciweakly referenced object, all be in cyclic trash at the same time.  This
1187db96d56Sopenharmony_ciisn't easy to stumble into by accident while Python is running, and, indeed,
1197db96d56Sopenharmony_ciit took quite a while to dream up failing test cases.  Zope3 saw segfaults
1207db96d56Sopenharmony_ciduring shutdown, during the second call of gc in Py_Finalize, after most
1217db96d56Sopenharmony_cimodules had been torn down.  That creates many trash cycles (esp. those
1227db96d56Sopenharmony_ciinvolving classes), making the problem much more likely.  Once you
1237db96d56Sopenharmony_ciknow what's required to provoke the problem, though, it's easy to create
1247db96d56Sopenharmony_citests that segfault before shutdown.
1257db96d56Sopenharmony_ci
1267db96d56Sopenharmony_ciIn 2.3.3, before breaking cycles, we first clear all the weakrefs with
1277db96d56Sopenharmony_cicallbacks in cyclic trash.  Since the weakrefs *are* trash, and there's no
1287db96d56Sopenharmony_cidefined-- or even predictable --order in which tp_clear() gets called on
1297db96d56Sopenharmony_cicyclic trash, it's defensible to first clear weakrefs with callbacks.  It's
1307db96d56Sopenharmony_cia feature of Python's weakrefs too that when a weakref goes away, the
1317db96d56Sopenharmony_cicallback (if any) associated with it is thrown away too, unexecuted.
1327db96d56Sopenharmony_ci
1337db96d56Sopenharmony_ci[In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not
1347db96d56Sopenharmony_ci those weakrefs are themselves CT, and whether or not they have callbacks.
1357db96d56Sopenharmony_ci The callbacks (if any) on non-CT weakrefs (if any) are invoked later,
1367db96d56Sopenharmony_ci after all weakrefs-to-CT have been cleared.  The callbacks (if any) on CT
1377db96d56Sopenharmony_ci weakrefs (if any) are never invoked, for the excruciating reasons
1387db96d56Sopenharmony_ci explained here.]
1397db96d56Sopenharmony_ci
1407db96d56Sopenharmony_ciJust that much is almost enough to prevent problems, by throwing away
1417db96d56Sopenharmony_ci*almost* all the weakref callbacks that could get triggered by gc.  The
1427db96d56Sopenharmony_ciproblem remaining is that clearing a weakref with a callback decrefs the
1437db96d56Sopenharmony_cicallback object, and the callback object may *itself* be weakly referenced,
1447db96d56Sopenharmony_civia another weakref with another callback.  So the process of clearing
1457db96d56Sopenharmony_ciweakrefs can trigger callbacks attached to other weakrefs, and those
1467db96d56Sopenharmony_cilatter weakrefs may or may not be part of cyclic trash.
1477db96d56Sopenharmony_ci
1487db96d56Sopenharmony_ciSo, to prevent any Python code from running while gc is invoking tp_clear()
1497db96d56Sopenharmony_cion all the objects in cyclic trash,
1507db96d56Sopenharmony_ci
1517db96d56Sopenharmony_ci[That was always wrong:  we can't stop Python code from running when gc
1527db96d56Sopenharmony_ci is breaking cycles.  If an object with a __del__ method is not itself in
1537db96d56Sopenharmony_ci a cycle, but is reachable only from CT, then breaking cycles will, as a
1547db96d56Sopenharmony_ci matter of course, drop the refcount on that object to 0, and its __del__
1557db96d56Sopenharmony_ci will run right then.  What we can and must stop is running any Python
1567db96d56Sopenharmony_ci code that could access CT.]
1577db96d56Sopenharmony_ci                                     it's not quite enough just to invoke
1587db96d56Sopenharmony_citp_clear() on weakrefs with callbacks first.  Instead the weakref module
1597db96d56Sopenharmony_cigrew a new private function (_PyWeakref_ClearRef) that does only part of
1607db96d56Sopenharmony_citp_clear():  it removes the weakref from the weakly-referenced object's list
1617db96d56Sopenharmony_ciof weakrefs, but does not decref the callback object.  So calling
1627db96d56Sopenharmony_ci_PyWeakref_ClearRef(wr) ensures that wr's callback object will never
1637db96d56Sopenharmony_citrigger, and (unlike weakref's tp_clear()) also prevents any callback
1647db96d56Sopenharmony_ciassociated *with* wr's callback object from triggering.
1657db96d56Sopenharmony_ci
1667db96d56Sopenharmony_ci[Although we may trigger such callbacks later, as explained below.]
1677db96d56Sopenharmony_ci
1687db96d56Sopenharmony_ciThen we can call tp_clear on all the cyclic objects and never trigger
1697db96d56Sopenharmony_ciPython code.
1707db96d56Sopenharmony_ci
1717db96d56Sopenharmony_ci[As above, not so:  it means never trigger Python code that can access CT.]
1727db96d56Sopenharmony_ci
1737db96d56Sopenharmony_ciAfter we do that, the callback objects still need to be decref'ed.  Callbacks
1747db96d56Sopenharmony_ci(if any) *on* the callback objects that were also part of cyclic trash won't
1757db96d56Sopenharmony_ciget invoked, because we cleared all trash weakrefs with callbacks at the
1767db96d56Sopenharmony_cistart.  Callbacks on the callback objects that were not part of cyclic trash
1777db96d56Sopenharmony_ciacted as external roots to everything reachable from them, so nothing
1787db96d56Sopenharmony_cireachable from them was part of cyclic trash, so gc didn't do any damage to
1797db96d56Sopenharmony_ciobjects reachable from them, and it's safe to call them at the end of gc.
1807db96d56Sopenharmony_ci
1817db96d56Sopenharmony_ci[That's so.  In addition, now we also invoke (if any) the callbacks on
1827db96d56Sopenharmony_ci non-CT weakrefs to CT objects, during the same pass that decrefs the
1837db96d56Sopenharmony_ci callback objects.]
1847db96d56Sopenharmony_ci
1857db96d56Sopenharmony_ciAn alternative would have been to treat objects with callbacks like objects
1867db96d56Sopenharmony_ciwith __del__ methods, refusing to collect them, appending them to gc.garbage
1877db96d56Sopenharmony_ciinstead.  That would have been much easier.  Jim Fulton gave a strong
1887db96d56Sopenharmony_ciargument against that (on Python-Dev):
1897db96d56Sopenharmony_ci
1907db96d56Sopenharmony_ci    There's a big difference between __del__ and weakref callbacks.
1917db96d56Sopenharmony_ci    The __del__ method is "internal" to a design.  When you design a
1927db96d56Sopenharmony_ci    class with a del method, you know you have to avoid including the
1937db96d56Sopenharmony_ci    class in cycles.
1947db96d56Sopenharmony_ci
1957db96d56Sopenharmony_ci    Now, suppose you have a design that makes has no __del__ methods but
1967db96d56Sopenharmony_ci    that does use cyclic data structures.  You reason about the design,
1977db96d56Sopenharmony_ci    run tests, and convince yourself you don't have a leak.
1987db96d56Sopenharmony_ci
1997db96d56Sopenharmony_ci    Now, suppose some external code creates a weakref to one of your
2007db96d56Sopenharmony_ci    objects.  All of a sudden, you start leaking.  You can look at your
2017db96d56Sopenharmony_ci    code all you want and you won't find a reason for the leak.
2027db96d56Sopenharmony_ci
2037db96d56Sopenharmony_ciIOW, a class designer can out-think __del__ problems, but has no control
2047db96d56Sopenharmony_ciover who creates weakrefs to his classes or class instances.  The class
2057db96d56Sopenharmony_ciuser has little chance either of predicting when the weakrefs he creates
2067db96d56Sopenharmony_cimay end up in cycles.
2077db96d56Sopenharmony_ci
2087db96d56Sopenharmony_ciCallbacks on weakref callbacks are executed in an arbitrary order, and
2097db96d56Sopenharmony_cithat's not good (a primary reason not to collect cycles with objects with
2107db96d56Sopenharmony_ci__del__ methods is to avoid running finalizers in an arbitrary order).
2117db96d56Sopenharmony_ciHowever, a weakref callback on a weakref callback has got to be rare.
2127db96d56Sopenharmony_ciIt's possible to do such a thing, so gc has to be robust against it, but
2137db96d56Sopenharmony_ciI doubt anyone has done it outside the test case I wrote for it.
2147db96d56Sopenharmony_ci
2157db96d56Sopenharmony_ci[The callbacks (if any) on non-CT weakrefs to CT objects are also executed
2167db96d56Sopenharmony_ci in an arbitrary order now.  But they were before too, depending on the
2177db96d56Sopenharmony_ci vagaries of when tp_clear() happened to break enough cycles to trigger
2187db96d56Sopenharmony_ci them.  People simply shouldn't try to use __del__ or weakref callbacks to
2197db96d56Sopenharmony_ci do fancy stuff.]
220