Skip to content

Instantly share code, notes, and snippets.

@pnomolos
Created October 30, 2012 18:19
Show Gist options
  • Save pnomolos/3982009 to your computer and use it in GitHub Desktop.
Save pnomolos/3982009 to your computer and use it in GitHub Desktop.
Splits a CSV file into multiple, smaller chunks.
#!/usr/bin/env ruby -w
file = "products.csv"
lines_per_file = 20000
header_lines = 1
extension = File.extname(file)
basename = File.basename(file, extension)
File.open(file) do |f|
i = 0
header = []
header_lines.times { header << f.readline }
lines = []
begin
while line = f.readline do
lines << line
if lines.length >= lines_per_file && line[-2..-2] == '"'
puts "Wrote #{lines.length}"
File.open("#{basename}#{i+=1}#{extension}", "w") do |put|
put.write header.join + lines.join
end
lines = []
end
end
rescue
if lines.length > 0
File.open("#{basename}#{i+=1}#{extension}", "w") do |put|
put.write header.join + lines.join
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment