Skip to content

Instantly share code, notes, and snippets.

@tomprats
Last active September 25, 2015 02:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tomprats/506f5b3ba14661c1e583 to your computer and use it in GitHub Desktop.
Save tomprats/506f5b3ba14661c1e583 to your computer and use it in GitHub Desktop.
Split a large CSV into smaller CSVs specifying the max rows per file
require "csv"
def split_csv(file, max)
csv = CSV.read(file, headers: true)
csv_headers = csv.headers
total_files = ((csv.length - 1)/max.to_f).ceil
max = max.to_i
csv = csv.to_a
name = file.split(".csv")[0]
path = "./csv/"
Dir.mkdir(path) unless File.exists?(path)
puts "Splitting #{csv.length - 1} rows into #{total_files} files"
total_files.times do |index|
csv_start = index * max + 1
csv_end = csv_start + max - 1
puts "#{path}#{name}-#{index + 1}"
CSV.open("#{path}#{name}-#{index + 1}.csv", "w") do |writer|
writer << csv_headers
binding.pry unless csv[csv_start, csv_end]
csv[csv_start, csv_end].each do |row|
writer << row
end
end
end
end
split_csv(ARGV[0], ARGV[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment