Skip to content

Instantly share code, notes, and snippets.

@lokeshh
Last active September 17, 2016 18:36
Show Gist options
  • Save lokeshh/e72e6ab6ca90e886c493c74167caae8e to your computer and use it in GitHub Desktop.
Save lokeshh/e72e6ab6ca90e886c493c74167caae8e to your computer and use it in GitHub Desktop.
require 'daru'
require 'benchmark'
# Vector :b is not required here. Taking it only to compare the performance
# with second dataframe which also has two columns.
df1 = Daru::DataFrame.new({
a: [1, 2, 3]*1000,
b: [1, 2, 3]*1000},
index: Daru::CategoricalIndex.new([:a, :b, :c, :d, :e, :f]*500)
)
# Here we instead made the categorical index a column
df2 = Daru::DataFrame.new({
a: [1, 2, 3]*1000,
idx: [:a, :b, :c, :d, :e, :f]*500
})
# Lets fetch entries with value :a
Benchmark.bm do |x|
x.report 'with categorical index' do
1000.times { df1[:a] }
end
x.report 'without categorical index' do
1000.times { df2.where df2[:idx].eq(:a) }
end
end
# Result
user system total real
with categorical index 0.010000 0.000000 0.010000 ( 0.010677)
without categorical index 4.820000 0.020000 4.840000 ( 4.885409)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment