More Warts in Python Exceptions
Thursday, 17 May 2007
I’ve previously blogged about a little Python wart in the way exceptions differ from the way it’s raised.
Here’s another one that may surprise you, especially if you are writing long running processes like servers. But I have a real love/hate relationship with it. On one hand, the behavior makes Python really sweet to work with, on the other it can really trip people up.
Have a look and see what you think will happen here, and then tell me what do you prefer? Easy debuggability or surprising behavior?
class A:
def __init__(self):
print "Initialized %r" % self
def __del__(self):
print "Finalized %r" % self
def foo():
a = A()
raise Exception
try:
foo()
except:
pass
raw_input("Press enter to continue:")
The surprising result is that the instance of A is not finalized when it goes out of scope:
Initialized <__main__.A instance at 0x00B86B20>
Press enter to continue: Press enter
Finalized <__main__.A instance at 0x00B86B20>
Reason: When an exception is thrown, python stores every stackframe (including local variables) in the traceback. This means that the stackframe holding a reference to the A instance has not been destroyed yet, as it’s still available for debuggers to inspect. Finalization occurs only when another exception is thrown.
If a contains open file handles that gets closed when finalized, you end up with mystery processes that’s locking your file.
No. 1 — May 17th, 2007 at 6:16 pm
Standard behavior for garbage-collected languages. You should use finally or the newly-introduced with.
No. 2 — May 17th, 2007 at 7:52 pm
I think its probably good that people don’t rely on objects getting finalised immediately as they go out of scope, as it’s not something python guarantees (In fact, the language spec explicitely says that its not required behaviour). Other implementations, like Jython or IronPython don’t finalise their objects immediately even when exceptions aren’t thrown, as they don’t use refcounting - instead, objects only get collected when the garbage collector reaches them. Even future versions of CPython could potentially be implemented like this. If people surprised by objects not being immediately collected, then its probably good that they are, before they write unportable code that relys it.
That said, exceptions holding onto objects for long times could potentially lead to confusion. On the whole though, I think its a net benefit to get the whole frame data when debugging.
No. 3 — May 17th, 2007 at 11:14 pm
help(sys.exc_clear)
No. 4 — May 18th, 2007 at 12:14 am
I have very little trust for __del__. Well, it’s not a matter of trust, it’s just what you point out here - not knowing when the final reference goes away.
In any case, it’s probably better to use try/finally (or Python 2.5’s context managers and ‘with’ statement). That’s a habit I’m now getting into, finally (ha!), after many years.
Change ‘foo’ to put the `a = A(); raise Exception` in a `try` statement, followed by `finally: del a`. Then you’ll see the behavior you want.
If you want to ensure that resources get cleaned up (files get closed, etc), try/finally is your friend. And when it comes to open files, the ‘with’ statement in Python 2.5 makes this easier to work with and remember.
http://docs.python.org/ref/try.html (the bottom paragraphs explain the rules on when ‘finally’ is executed)
http://docs.python.org/ref/with.html
A good summary of the ‘with’ statement is here: http://www.python.org/doc/2.5/whatsnew/pep-343.html
PEP 343 itself has some examples of context managers handling things like locks, open files, etc.
http://www.python.org/dev/peps/pep-0343/
And there you have it - the best way to ensure an object is ‘finalized’ when you want it to be is to do it yourself. try/finally allows this.
No. 5 — May 18th, 2007 at 12:42 am
In my opinion, this looks like a wart because it’s being viewed from a C++-type background where resource acquisition is initialization (RAII) is a _really_cool_ pattern, and you want to use the same pattern in Python. Problem is, however, that __del__ is called at sometimes surprising times. The solution pre-2.5 has been to use a try:…finally:… wrapper around anything that needs to be finalized, just like you have to do in Java. It has never, to the best of my knowledge, been recommended to use __del__in the way you describe.
The solution now (post-2.5) is to use the “with” statement (http://www.python.org/dev/peps/pep-0343/). Basically, you say “with f=open(filename):…”, and your file will be automatically closed for you at the end of the block. The same pattern can be used with cursors, mutexes, etc. Anyway, just my $.02.
No. 6 — May 18th, 2007 at 2:02 am
“Finalization occurs only when another exception is thrown.”
No. 7 — May 18th, 2007 at 2:03 am
You said “Finalization occurs only when another exception is thrown.” , but even that is not necessarily true.
http://www.python.org/doc/current/ref/customization.html#l2h-177
“It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.”
It doesn’t sound like __del__ is the reliable finalization method you’re looking for. But __exit__ when used with the new ‘with’ statement, that might do what you are hoping. See http://www.python.org/dev/peps/pep-0343/
No. 8 — May 18th, 2007 at 4:39 am
Python actually makes no guarantees about finalization when an object leaves a scope like C++ does. This is why we promote explicit finalization when it’s proper (e.g., calling file.close() instead of leaving it up to the implicit finalization of when the file object is garbage collected).
So it really isn’t any wart about exceptions but how the language semantics specify when garbage collection occurs. You should never rely on object being garbage collected quickly, it just happens to be that tends to be the case.
No. 9 — May 18th, 2007 at 8:03 am
Generally, you should avoid relying on __del__ for finalization. It’s much better to explicitly close any resources you are no longer using. This is because Python uses reference counting, which does not work when you have reference cycles. There is a garbage collector that can find and kill these loops, but only if no object in the cycle has an __del__. See the gc module documentation (http://docs.python.org/lib/module-gc.html) for more on this.
No. 10 — May 18th, 2007 at 10:27 am
I knew in the back of my mind that there’s no guarantee about the finalization, but, there you go. Theory not applied is as good as wasted.
No. 11 — May 18th, 2007 at 9:46 pm
By the way, thanks Rick. I already feel pretty embarrassed about the whole thing. You see, in the back of my mind, I already *know* I’m running against CPython, which primarily uses refcounts, and GC collects circular references, so in theory it’s OK.
It reminds me this is precisely why LISP has macros like WITH-OPEN-FILE.
Here’s another example of how you can have an unexpected failure.
import os open('myfile.txt', 'w').write('abc') def process(): fd = open('myfile.txt', 'r') text = process(fd.read()) return text+1 try: text = process() except: text = '' os.unlink('myfile.txt') $ python test.py Traceback (most recent call last): File "test.py", line 14, in ? os.unlink('myfile.txt') OSError: [Errno 13] Permission denied: 'myfile.txt'