Skip to content

Instantly share code, notes, and snippets.

@jpterry
Last active August 29, 2015 14:12
Show Gist options
  • Save jpterry/d9b0fb06699310f827b1 to your computer and use it in GitHub Desktop.
Save jpterry/d9b0fb06699310f827b1 to your computer and use it in GitHub Desktop.
Pipe delimted file splitter
require 'csv'
class FileSplitter
SPLIT_COUNT = 250_000
COPY_HEADERS = true
def initialize(filename)
@filename = filename
@header = nil
@file_count = 1
@outfile = File.open("split_out_#{@file_count}.csv", 'wb')
end
def split
row_count = 0
CSV.foreach(@filename, col_sep: '|') do |row|
row_count += 1
@header ||= row
if (row_count % SPLIT_COUNT) == 0
puts row_count
start_new_file
end
outfile << row.to_csv
end
end
def start_new_file
@outfile.close
@outfile = File.open("split_out_#{@file_count+=1}.csv", 'wb')
@outfile << @header.to_csv if COPY_HEADERS
end
def outfile
@outfile
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment