Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
queryset_generator and queryset_list_generator
def queryset_generator(queryset, chunksize=1000):
"""
Iterate over a Django Queryset ordered by the primary key
This method loads a maximum of chunksize (default: 1000) rows in its
memory at the same time while django normally would load all rows in its
memory. Using the iterator() method only causes it to not preload all the
classes.
Note that the implementation of the generator does not support ordered query sets.
"""
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
pk = queryset[0].pk - 1
while pk < last_pk:
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()
def queryset_list_generator(queryset, listsize=10000, chunksize=1000):
"""
Iterate over a Django Queryset ordered by the primary key and return a
list of model objects of the size 'listsize'.
This method loads a maximum of chunksize (default: 1000) rows in its memory
at the same time while django normally would load all rows in its memory.
In contrast to the queryset_generator, it doesn't return each row on its own,
but returns a list of listsize (default: 10000) rows at a time.
Note that the implementation of the generator does not support ordered query sets.
"""
it = queryset_generator(queryset, chunksize)
i = 0
row_list = []
for row in it:
i += 1
row_list.append(row)
if i >= listsize:
yield row_list
i = 0
row_list = []
@jonathanmorgan

This comment has been minimized.

Copy link

commented May 6, 2011

Hello,

I am trying this out in a large-scale django app I am building, and it is not processing the very first record. I believe it is because of the way pk is first initialized. You set it to the first ID, then you do a pk__gt filter on pk, which skips that first ID. Am I mistaken? I am setting pk to pk-1 initially in my code, and it is now getting the initial pk value in the first chunk where it would not have before. Is this a reasonable fix?

Thanks,

Jon Morgan

@dbrgn

This comment has been minimized.

Copy link
Owner Author

commented May 6, 2011

Jon: Looks like you're right. A cleaner way to fix this than to set pk to pk-1 is to use the queryset filter "__gte": queryset.filter(pk__gte=pk)[:chunksize]. Thanks for the comment.

@jonathanmorgan

This comment has been minimized.

Copy link

commented May 6, 2011

But then would you be including the last row of each chunk twice, both in the previous and the next chunk?

@dbrgn

This comment has been minimized.

Copy link
Owner Author

commented May 6, 2011

Oops, you're right. I guess your fix is quite ok then :) I changed the gist to pk = queryset[0].pk - 1.

@jonathanmorgan

This comment has been minimized.

Copy link

commented May 6, 2011

@dbrgn

This comment has been minimized.

Copy link
Owner Author

commented May 6, 2011

Thanks. By the way, I just realized the naming of the functions is kind of wrong. Those are generators, not iterators. I'll change the snippet accordingly :)

@atdt

This comment has been minimized.

Copy link

commented Jun 2, 2011

Thanks, very useful improvements! Most of the "it's" should be changed to "its", though!

@dbrgn

This comment has been minimized.

Copy link
Owner Author

commented Jun 2, 2011

@atdt: Thanks for the hint. The gist is based on this Code, so I simply copied most of the comment without noticing the grammar errors. I'll fix it :)

@mlissner

This comment has been minimized.

Copy link

commented Mar 10, 2012

@gwrtheryrn, I added a date-based queryset generator if you're interested in 'pulling' it in. It's here.

@dbrgn

This comment has been minimized.

Copy link
Owner Author

commented Mar 11, 2012

@mlissner that's already pretty specialized, so i'll leave it away for now. but it's now linked in your comment and in the "forks" list, so people needing it will find it :) thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.