Skip to content

Instantly share code, notes, and snippets.

@miku
Created February 10, 2011 13:16
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save miku/820490 to your computer and use it in GitHub Desktop.
Save miku/820490 to your computer and use it in GitHub Desktop.
How do you split a csv file into evenly sized chunks in Python?
#!/usr/bin/env python
# import csv
# reader = csv.reader(open('4956984.csv', 'rb'))
def gen_chunks(reader, chunksize=100):
"""
Chunk generator. Take a CSV `reader` and yield
`chunksize` sized slices.
"""
chunk = []
for index, line in enumerate(reader):
if (index % chunksize == 0 and index > 0):
yield chunk
del chunk[:]
chunk.append(line)
yield chunk
import sys
print sys.version
for chunk in gen_chunks(range(10), chunksize=3):
print chunk # process chuck
# $ python 4956984.py
# 2.5.4 (r254:67916, Jun 24 2010, 21:47:25)
# [GCC 4.2.1 (Apple Inc. build 5646)]
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9]
@carolinamattsson
Copy link

This is a godsend for parallelizing .csv processing tasks - it works seamlessly with multiprocessing.Pool - thank you!

@wantongtang
Copy link

thanks, good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment