Create a gist now

Instantly share code, notes, and snippets.

include? vs. cover? vs. between?
require 'date'
require 'benchmark'
n = 1_000_000
start_date = Date.new(2012, 01, 01)
end_date = Date.new(2012, 03, 01)
act_date = Date.new(2012, 02, 01)
Benchmark.bm(10) do |x|
x.report('include?') do
n.times do
(start_date..end_date).include? (act_date)
end
end
x.report('cover?') do
n.times do
(start_date..end_date).cover? (act_date)
end
end
x.report('between?') do
n.times do
act_date.between?(start_date, end_date)
end
end
end
js@horadrim [~/Code] % ruby ./include-vs-cover-vs-between.rb
user system total real
include? 38.910000 0.130000 39.040000 ( 40.826650)
cover? 1.100000 0.000000 1.100000 ( 1.147182)
between? 0.630000 0.010000 0.640000 ( 0.642790)
@fnando

What about converting dates to UNIX timestamp?

Small monkey patching:

class Date
  def to_i
    to_time.to_i
  end
end

Related benchmark:

x.report('to_i') do
    n.times do
      start_date = start_date.to_i
      end_date = end_date.to_i
      act_date = act_date.to_i

      act_date >= start_date && act_date <= end_date
    end
  end

My times:

                 user     system      total        real
include?    19.630000   0.010000  19.640000 ( 19.649433)
cover?       0.540000   0.000000   0.540000 (  0.539379)
between?     0.290000   0.000000   0.290000 (  0.284120)
to_i         0.240000   0.000000   0.240000 (  0.242973)

Ugly, but if you're seeking for every possible optimization, it may be a solution.

@fnando

Stupid me! I replaced the original variables. :P

My solution is slower, actually.

@bradland

@fnando your idea still has merit. Storing dates (times, really) as integers can yield performance improvements if you're dealing with time series data where relative comparison is more important than having native Date objects. The key is, the data should be persisted as timestamps and not converted to Date or Time unless it's required for display.

For example, I wrote a few internal tools for working with call data from a telecomm carrier (the records are called CDR). I had to do a lot of calculation of duration measured in seconds. I also needed to answer questions like "how many calls were active at 2012-11-13 10:45 UTC?" In this case, storing the time portion of the records as timestamps made a lot of sense. I could convert the input (one Time object operation), then perform all the comparisons using integer comparison. All my date comparisons became integer operations, which are much faster than Date/Time/DateTime operations.

@jrochkind

i don't understand what would make between? better than cover?. any idea?

@gerrywastaken

@fnando on top of what @bradland said, I think you are comparing apples and oranges. The initial tests are assuming a static object that already has the values setup, however you are calling to_i during each run for all variables. It would be more in line with the original tests if you also set all of these values before the test. In many cases, if somebody is wanting to do a lot of between tests, either the act_date or the start-end date combination won't be changing.

However, unless all three are not changing, between is still going to win. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment