Skip to content

Instantly share code, notes, and snippets.

@palewire
Created September 24, 2010 21:09
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save palewire/596056 to your computer and use it in GitHub Desktop.
Save palewire/596056 to your computer and use it in GitHub Desktop.
A Python CSV splitter
import os
def split(filehandler, delimiter=',', row_limit=10000,
output_name_template='output_%s.csv', output_path='.', keep_headers=True):
"""
Splits a CSV file into multiple pieces.
A quick bastardization of the Python CSV library.
Arguments:
`row_limit`: The number of rows you want in each output file. 10,000 by default.
`output_name_template`: A %s-style template for the numbered output files.
`output_path`: Where to stick the output files.
`keep_headers`: Whether or not to print the headers in each output file.
Example usage:
>> from toolbox import csv_splitter;
>> csv_splitter.split(open('/home/ben/input.csv', 'r'));
"""
import csv
reader = csv.reader(filehandler, delimiter=delimiter)
current_piece = 1
current_out_path = os.path.join(
output_path,
output_name_template % current_piece
)
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
current_limit = row_limit
if keep_headers:
headers = reader.next()
current_out_writer.writerow(headers)
for i, row in enumerate(reader):
if i + 1 > current_limit:
current_piece += 1
current_limit = row_limit * current_piece
current_out_path = os.path.join(
output_path,
output_name_template % current_piece
)
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
if keep_headers:
current_out_writer.writerow(headers)
current_out_writer.writerow(row)
@airbob
Copy link

airbob commented May 20, 2013

Based on this script, I added one more filter criteria (split based on unique row value of certain columns)

https://github.com/airbob/personal-backup/blob/master/scripts/csvsplit.py

@bbmak
Copy link

bbmak commented Mar 13, 2015

It creates an empty line for each record in the csv file.
Anyone know why ?

@mihaispetrescu
Copy link

To avoid empty lines, you can open the file in binary mode (tested on python 2.7) 'wb' instead of 'w'

@ApoloSiskos
Copy link

Based on this script, I added one more filter criteria (split based on unique row value of certain columns)

https://github.com/airbob/personal-backup/blob/master/scripts/csvsplit.py

Link has expired

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment