Skip to content

Instantly share code, notes, and snippets.

@jkamenik
Created May 1, 2014 19:27
Show Gist options
  • Save jkamenik/9867a246450542e901d8 to your computer and use it in GitHub Desktop.
Save jkamenik/9867a246450542e901d8 to your computer and use it in GitHub Desktop.
Performance testing CSV
#!/usr/bin/env ruby
require 'benchmark'
require 'csv'
file = ARGV.first
Benchmark.bm 40, 'total' do |x|
t1 = x.report "csv parse" do
CSV.parse(File.open(file), headers: true) do |row|
# get the 'pr' row
row['sa']
end
end
t2 = x.report 'csv foreach' do
CSV.foreach(file, headers: true) do |row|
# get the 'pr' row
row['sa']
end
end
t3 = x.report 'csv foreach without headers' do
headers = nil
CSV.foreach(file) do |row|
if headers.nil?
headers = row
next
end
# get the pr 'row'
row[headers.index('sa')]
end
end
[t1+t2+t3]
end
@jkamenik
Copy link
Author

jkamenik commented May 1, 2014

output

[15:04 lookingglass ~/lookingglass/netflow (johnk_performance)]$ bin/csv-foreach-streaming-compression output-1mb/flo30_for_sv_2013121202_1_1.csv
                                               user     system      total        real
csv parse                                  0.150000   0.000000   0.150000 (  0.150464)
csv foreach                                0.140000   0.000000   0.140000 (  0.145904)
csv foreach without headers                0.090000   0.000000   0.090000 (  0.088820)
total                                      0.380000   0.000000   0.380000 (  0.385188)
[15:04 lookingglass ~/lookingglass/netflow (johnk_performance)]$ bin/csv-foreach-streaming-compression output-100mb/flo30_for_sv_2013121202_1_1.csv
                                                user     system      total        real
csv parse                                 14.140000   0.040000  14.180000 ( 14.177631)
csv foreach                               14.410000   0.050000  14.460000 ( 14.451660)
csv foreach without headers                8.640000   0.040000   8.680000 (  8.686026)
total                                     37.190000   0.130000  37.320000 ( 37.315317)

A 40% increase in speed by not converting the row to a hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment