-
-
Save sigmavirus24/318102d7db3cdd77e06e to your computer and use it in GitHub Desktop.
class Iter(object): | |
def __init__(self, iterable): | |
self.iterable = iterable | |
self.closed = False | |
def __iter__(self): | |
try: | |
for chunk in self.iterable: | |
yield chunk | |
finally: | |
self.closed = True | |
iter = Iter(range(5)) | |
try: | |
for i in iter: | |
raise IOError() | |
except IOError: | |
pass | |
assert iter.closed is True |
@lrowe okay that makes sense. I had a suspicion it had to do with garbage collection but I wasn't 100% certain.
Let's say that what we're wrapping here is actually a file and if there's any kind of exception we want to close it to prevent the number of open file descriptors from growing too large. Is there a way that we can be certain that the file is always closed?
Normally I'd reach for contextlib.closing but if I make the script look like:
import contextlib
class FileLikeObject(object):
def __init__(self):
self.iter = [b"foo", b"bar", b"baz"]
self.closed = False
def __iter__(self):
for i in self.iter:
yield i
def close(self):
self.closed = True
class Iter(object):
def __init__(self):
self.iterable = FileLikeObject()
def __iter__(self):
with contextlib.closing(self.iterable):
for chunk in self.iterable:
yield chunk
iter = Iter()
try:
for i in iter:
raise IOError()
except IOError:
pass
assert iter.iterable.closed is True
Then I get:
:~ python example.py
:~ pypy example.py
Traceback (most recent call last):
File "<builtin>/app_main.py", line 75, in run_toplevel
File "example.py", line 34, in <module>
assert iter.iterable.closed is True
AssertionError
Basically, I just want to make sure that the file is always closed. Do I have to add an explicit gc.collect
just to have that guarantee?
Rather than calling gc.collect
which does not really guarantee that all garbage will be collected, you want to restructure things so that you manage resources in your main routine rather than the generator coroutine:
import contextlib
class FileLikeObject(object):
def __init__(self):
self.iter = [b"foo", b"bar", b"baz"]
self.closed = False
def __iter__(self):
for i in self.iter:
yield i
def close(self):
self.closed = True
class Iter(object):
def __init__(self, iterable):
self.iterable = iterable
def __iter__(self):
with contextlib.closing(self.iterable):
for chunk in self.iterable:
yield chunk
f = FileLikeObject()
with contextlib.closing(f):
for i in Iter(f):
break
assert f.closed is True
Otherwise you need to ensure that you explicitly close all generators:
import contextlib
class FileLikeObject(object):
def __init__(self):
self.iter = [b"foo", b"bar", b"baz"]
self.closed = False
def __iter__(self):
for i in self.iter:
yield i
def close(self):
self.closed = True
class Iter(object):
def __init__(self):
self.iterable = FileLikeObject()
def __iter__(self):
with contextlib.closing(self.iterable):
for chunk in self.iterable:
yield chunk
it = Iter()
iter_it = iter(it)
with contextlib.closing(iter_it):
try:
for i in iter_it:
raise IOError()
except IOError:
pass
assert it.iterable.closed is True
The difference in behaviour is down to the difference in garbage collection between CPython and PyPy.
I've forked your gist to show more detail of what's going on: https://gist.github.com/lrowe/581702c59869143dd6e2
When the generator object it garbage collected, the generator is closed, GeneratorExit is raised within the generator and then the finally clause is executed.
With CPython this happens through reference counting. PyPy has a different gc strategy. Calling gc.collect() manually shows it closing the generator during garbage collection.
By creating a reference cycle which prevents garbage collection through reference counting you see the same behaviour with both CPython and PyPy. https://gist.github.com/lrowe/95c5e76a86a27eb00b49
(Interestingly it seems Python2 is not able to find the reference cycle during gc and fails to collect the generator. Python3 behaves the same as PyPy.)