Created
September 24, 2010 21:09
-
-
Save palewire/596056 to your computer and use it in GitHub Desktop.
A Python CSV splitter
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
def split(filehandler, delimiter=',', row_limit=10000, | |
output_name_template='output_%s.csv', output_path='.', keep_headers=True): | |
""" | |
Splits a CSV file into multiple pieces. | |
A quick bastardization of the Python CSV library. | |
Arguments: | |
`row_limit`: The number of rows you want in each output file. 10,000 by default. | |
`output_name_template`: A %s-style template for the numbered output files. | |
`output_path`: Where to stick the output files. | |
`keep_headers`: Whether or not to print the headers in each output file. | |
Example usage: | |
>> from toolbox import csv_splitter; | |
>> csv_splitter.split(open('/home/ben/input.csv', 'r')); | |
""" | |
import csv | |
reader = csv.reader(filehandler, delimiter=delimiter) | |
current_piece = 1 | |
current_out_path = os.path.join( | |
output_path, | |
output_name_template % current_piece | |
) | |
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) | |
current_limit = row_limit | |
if keep_headers: | |
headers = reader.next() | |
current_out_writer.writerow(headers) | |
for i, row in enumerate(reader): | |
if i + 1 > current_limit: | |
current_piece += 1 | |
current_limit = row_limit * current_piece | |
current_out_path = os.path.join( | |
output_path, | |
output_name_template % current_piece | |
) | |
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) | |
if keep_headers: | |
current_out_writer.writerow(headers) | |
current_out_writer.writerow(row) |
It creates an empty line for each record in the csv file.
Anyone know why ?
To avoid empty lines, you can open the file in binary mode (tested on python 2.7) 'wb' instead of 'w'
Based on this script, I added one more filter criteria (split based on unique row value of certain columns)
https://github.com/airbob/personal-backup/blob/master/scripts/csvsplit.py
Link has expired
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Based on this script, I added one more filter criteria (split based on unique row value of certain columns)
https://github.com/airbob/personal-backup/blob/master/scripts/csvsplit.py