Created
April 1, 2011 08:41
-
-
Save dbrgn/897894 to your computer and use it in GitHub Desktop.
queryset_generator and queryset_list_generator
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def queryset_generator(queryset, chunksize=1000): | |
""" | |
Iterate over a Django Queryset ordered by the primary key | |
This method loads a maximum of chunksize (default: 1000) rows in its | |
memory at the same time while django normally would load all rows in its | |
memory. Using the iterator() method only causes it to not preload all the | |
classes. | |
Note that the implementation of the generator does not support ordered query sets. | |
""" | |
last_pk = queryset.order_by('-pk')[0].pk | |
queryset = queryset.order_by('pk') | |
pk = queryset[0].pk - 1 | |
while pk < last_pk: | |
for row in queryset.filter(pk__gt=pk)[:chunksize]: | |
pk = row.pk | |
yield row | |
gc.collect() | |
def queryset_list_generator(queryset, listsize=10000, chunksize=1000): | |
""" | |
Iterate over a Django Queryset ordered by the primary key and return a | |
list of model objects of the size 'listsize'. | |
This method loads a maximum of chunksize (default: 1000) rows in its memory | |
at the same time while django normally would load all rows in its memory. | |
In contrast to the queryset_generator, it doesn't return each row on its own, | |
but returns a list of listsize (default: 10000) rows at a time. | |
Note that the implementation of the generator does not support ordered query sets. | |
""" | |
it = queryset_generator(queryset, chunksize) | |
i = 0 | |
row_list = [] | |
for row in it: | |
i += 1 | |
row_list.append(row) | |
if i >= listsize: | |
yield row_list | |
i = 0 | |
row_list = [] |
Oops, you're right. I guess your fix is quite ok then :) I changed the gist to pk = queryset[0].pk - 1
.
Just making sure. Thanks for writing and updating it! It is an
exceptionally helpful code snippet!
Jon
On Fri, May 6, 2011 at 5:30 AM, gwrtheyrn < ***@***.***>wrote:
Oops, you're right. I guess your fix is quite ok then :) I changed the gist
to `pk = queryset[0].pk - 1`.
##
Reply to this email directly or view it on GitHub:
https://gist.github.com/897894
##
"The man with the new idea is a Crank until the idea succeeds."
- Mark Twain, from 'Following the Equator: A Journey Around the World'
Thanks. By the way, I just realized the naming of the functions is kind of wrong. Those are generators, not iterators. I'll change the snippet accordingly :)
Thanks, very useful improvements! Most of the "it's" should be changed to "its", though!
@gwrtheryrn, I added a date-based queryset generator if you're interested in 'pulling' it in. It's here.
@mlissner that's already pretty specialized, so i'll leave it away for now. but it's now linked in your comment and in the "forks" list, so people needing it will find it :) thanks for the contribution.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
But then would you be including the last row of each chunk twice, both in the previous and the next chunk?