Last active
August 29, 2015 14:19
-
-
Save aykut/5819c88aaa4081be1e7d to your computer and use it in GitHub Desktop.
QuerySet chunker
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def queryset_chunker(queryset, order_by='-pk', chunk_size=5000): | |
""" | |
Takes lazy queryset and chunks it by given chunk_size. | |
:param queryset: type of queryset. This is a lazy queryset to | |
split into chunks | |
:type queryset: QuerySet | |
:param chunk_size: size of smaller pieces | |
:type chunk_size: int | |
""" | |
if not queryset.query.order_by: | |
if queryset.query.distinct: | |
order_by = queryset.query.distinct_fields[0] | |
queryset = queryset.order_by(order_by) | |
for pivot in xrange(0, queryset.count(), chunk_size): | |
yield queryset[pivot:pivot + chunk_size] |
Actually, it wouldn't throw error in case of an empty queryset is passed. xrange
will return empty generator.
Ex:
In [1]: list(xrange(0,0,1000))
Out[1]: []
gc.collect()
seems unnecessary, here is an example:
In [1]: users=User.objects.all()
In [2]: users.count()
Out[2]: 47603
In [3]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:import gc
:
:
:def queryset_chunker(queryset, chunk_size=5000):
: """
: Takes lazy queryset and chunks them by given chunk_size.
:
: :param queryset: type of queryset. This is a lazy queryset to
: split into chunks
: :type queryset: QuerySet
: :param chunk_size: size of smaller pieces
: :type chunk_size: int
: """
:
: queryset = queryset.order_by('-pk')
: for pivot in xrange(0, queryset.count(), chunk_size):
: yield queryset[pivot:pivot + chunk_size]
: print gc.collect()
:
:<EOF>
In [4]: for chunk in queryset_chunker(users):
pass
...:
0
0
0
0
0
0
0
0
0
0
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I would add:
Just to be certain you won't throw an error. And it might be good to do a test trying to see how much using the
gc
module helps memory, so adding agc.collect()
afteryield
might be good, an example.