Skip to content

Instantly share code, notes, and snippets.

@simpl1g
Last active May 29, 2024 18:07
Show Gist options
  • Save simpl1g/07dc9e776bc5c9555b2f3491d3396515 to your computer and use it in GitHub Desktop.
Save simpl1g/07dc9e776bc5c9555b2f3491d3396515 to your computer and use it in GitHub Desktop.
Arrow CSV bench
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'benchmark-ips'
gem 'red-arrow'
end
require 'arrow'
require 'benchmark/ips'
def benchmark_sum(filename)
f = Arrow::Function.find('sum')
Benchmark.ips do |ips|
ips.report('Arrow') { f.execute([Arrow::Table.load(filename)['revenue'].data]).value.value }
ips.report('File') { c = 0; File.foreach(filename) { |row| c += row.split(',').last.to_f } }
ips.report('CSV') { c = 0; CSV.foreach(filename) { |row| c += row[1].to_f } }
ips.compare!
end
end
benchmark_sum('data10k.csv')
# Warming up --------------------------------------
# Arrow 132.000 i/100ms
# File 46.000 i/100ms
# CSV 22.000 i/100ms
# Calculating -------------------------------------
# Arrow 1.373k (±12.8%) i/s - 6.732k in 5.024702s
# File 722.441 (± 2.6%) i/s - 3.634k in 5.033618s
# CSV 211.504 (± 5.7%) i/s - 1.078k in 5.114960s
#
# Comparison:
# Arrow: 1372.6 i/s
# File: 722.4 i/s - 1.90x (± 0.00) slower
# CSV: 211.5 i/s - 6.49x (± 0.00) slower
benchmark_sum('data10M.csv')
# Warming up --------------------------------------
# Arrow 1.000 i/100ms
# File 1.000 i/100ms
# CSV 1.000 i/100ms
# Calculating -------------------------------------
# Arrow 1.769 (±56.5%) i/s - 8.000 in 5.238670s
# File 0.069 (± 0.0%) i/s - 1.000 in 14.407080s
# CSV 0.016 (± 0.0%) i/s - 1.000 in 62.368109s
#
# Comparison:
# Arrow: 1.8 i/s
# File: 0.1 i/s - 25.48x (± 0.00) slower
# CSV: 0.0 i/s - 110.31x (± 0.00) slower
@SrikantPadala
Copy link

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment