Skip to content

Instantly share code, notes, and snippets.

@sivy
Last active September 17, 2022 10:23
Show Gist options
  • Save sivy/4471054 to your computer and use it in GitHub Desktop.
Save sivy/4471054 to your computer and use it in GitHub Desktop.
Split a large array (inlist) into sublists (shards) of length (shard_size). Good for batch-jobbing large lists of data.
def _shard_array(inlist, shard_size):
# inlist = 150-element list
# shard_size = 40
num_shards = len(inlist) / shard_size
# num_shards == 3
shards = []
for i in range(num_shards):
# i == 0
start = shard_size * i # start == 0, then 40, then 80...
end = shard_size * (i + 1) - 1 # end == 39, then 79, then 119...
shards.append(inlist[start:end])
return shards
@berg
Copy link

berg commented Jan 7, 2013

Sure, that'll work. Here's what we use in the App.net codebase (it's more concise, but that's not necessarily a good thing):

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i + n]

@akavlie
Copy link

akavlie commented Jan 7, 2013

Wait... app.net is Python? I thought Dalton Caldwell was a Ruby guy.

Anyway, yeah, yield is definitely the thing to use when dealing with huge lists of stuff. And ideally, generators all the way down so that's not all being stored in RAM. If that becomes a constraint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment