public
Last active

include? vs. cover? vs. between?

  • Download Gist
include-vs-cover-vs-between.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
require 'date'
require 'benchmark'
 
n = 1_000_000
start_date = Date.new(2012, 01, 01)
end_date = Date.new(2012, 03, 01)
act_date = Date.new(2012, 02, 01)
 
Benchmark.bm(10) do |x|
x.report('include?') do
n.times do
(start_date..end_date).include? (act_date)
end
end
x.report('cover?') do
n.times do
(start_date..end_date).cover? (act_date)
end
end
x.report('between?') do
n.times do
act_date.between?(start_date, end_date)
end
end
end
output.txt
1 2 3 4 5
js@horadrim [~/Code] % ruby ./include-vs-cover-vs-between.rb
user system total real
include? 38.910000 0.130000 39.040000 ( 40.826650)
cover? 1.100000 0.000000 1.100000 ( 1.147182)
between? 0.630000 0.010000 0.640000 ( 0.642790)

What about converting dates to UNIX timestamp?

Small monkey patching:

class Date
  def to_i
    to_time.to_i
  end
end

Related benchmark:

x.report('to_i') do
    n.times do
      start_date = start_date.to_i
      end_date = end_date.to_i
      act_date = act_date.to_i

      act_date >= start_date && act_date <= end_date
    end
  end

My times:

                 user     system      total        real
include?    19.630000   0.010000  19.640000 ( 19.649433)
cover?       0.540000   0.000000   0.540000 (  0.539379)
between?     0.290000   0.000000   0.290000 (  0.284120)
to_i         0.240000   0.000000   0.240000 (  0.242973)

Ugly, but if you're seeking for every possible optimization, it may be a solution.

Stupid me! I replaced the original variables. :P

My solution is slower, actually.

@fnando your idea still has merit. Storing dates (times, really) as integers can yield performance improvements if you're dealing with time series data where relative comparison is more important than having native Date objects. The key is, the data should be persisted as timestamps and not converted to Date or Time unless it's required for display.

For example, I wrote a few internal tools for working with call data from a telecomm carrier (the records are called CDR). I had to do a lot of calculation of duration measured in seconds. I also needed to answer questions like "how many calls were active at 2012-11-13 10:45 UTC?" In this case, storing the time portion of the records as timestamps made a lot of sense. I could convert the input (one Time object operation), then perform all the comparisons using integer comparison. All my date comparisons became integer operations, which are much faster than Date/Time/DateTime operations.

i don't understand what would make between? better than cover?. any idea?

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.