Apprentice Blog of the Week: Setting Date Ranges in Ruby

Apprentice Blog of the Week: Setting Date Ranges in Ruby

Makis Otman

September 03, 2014

Yesterday, my pairing partner and I came across an interesting situation. The feature we were working on required receiving some data, constructing a date range (a year, for example), and then filtering that data through that range to see which ones fell within that range.

A simple example to demonstrate this:

dates_sample_one = {
		person_1: Date.today + 10,
		person_2: Date.today,
		person_3: Date.today + 366,
		person_4: Date.today + 95,
		person_5: Date.today + 400
}

def filter_dates_with_include(dates)
		dates.select do |person, date|
				(Date.today..Date.today + 365).include?(date)
		end
end

filter_dates_with_include(dates_sample_one)
#=> { person_1, person_2, ... }

The above example is fairly simple: Five people with different dates, some that go beyond a year and others that fall within the year. The method has a fixed range of one year, starting from the current date, and checks if the dates coming through fall within that year. Something like the above will work fine for situations in which the data set is fairly small and the calculation is fairly simple. But once you're dealing with a decent amount of data, then you have a problem.

To show this, I'll use Ruby's Benchmark module and increase the data pool:

require 'benchmark'
dates_sample_one = {
		person_1: Date.today + 10,
		person_2: Date.today,
		person_3: Date.today + 366,
		person_4: Date.today + 95,
		person_5: Date.today + 400
}
dates_sample_two = {}
(1..730).to_a.shuffle.take(40).each do |days|
		dates_sample_two["person_#{days}"] = Date.today + days
end

def filter_dates_with_include(dates)
		dates.select do |person, date|
				(Date.today..Date.today + 365).include?(date)
		end
end

# Benchmarking both
Benchmark.bm do |x|
		x.report { filter_dates_with_include(dates_sample_one) }
end
#=> user system total real
#=> times: 0.000000 0.000000 0.000000 ( 0.002043) # measured in seconds
Benchmark.bm do |x|
		x.report { filter_dates_with_include(dates_sample_two) }
end
#=> user system total real
#=> 0.040000 0.000000 0.040000 ( 0.040604)

Here, the second batch is made out of 40 random dates. When running the benchmark, the impact on the performance is clear: It's nearly 20 times slower, and that's just 40 records.

Include vs. Cover

As it was made obvious in the example above, using include? can be quite costly. When we ran the code, it was taking quite a while to load the page—but our assumption at the time was that there could be an issue with the API we were hitting in order to get the data. However, after running the tests, it became obvious that the problem was elsewhere.

The explanation of the include? method in the Range module states: "Returns true if obj is an element of the range, false otherwise." In other words, when you ask whether an element is in a range, the include? method will check that element against each entry of the range. If it finds it, then it will stop at the point where it found the element and won't proceed with the search. The problem with this is that in situations where the element is not within the range, it will go through the whole range checking each element. Efficiency fail.

After getting quite frustrated with the slow tests, a friend asked if we were using cover?. After checking the documentation, we realised that this was definitely the choice we should have been using. What cover? will do is take the beginning and the end of a range, and check them against the given date. Below is an example in which the end of the range is also included in the range (see exclude_end?):

date_to_check = Date.today + 30
Date.today <= date_to_check && date_to_check <= Date.today + 365
#=> true

This is definitely way more efficient than include, but I will let the data do the talking:


# Running the same benchmarks but with cover instead
def filter_dates_with_cover(dates)
		dates.select do |person, date|
				(Date.today..Date.today + 365).cover?(date)
		end
end

Benchmark.bm do |x|
		x.report { filter_dates_with_cover(dates_sample_one) }
end
#=> user system total real
#=> times: 0.000000 0.000000 0.000000 ( 0.000055)

Benchmark.bm do |x|
		x.report { filter_dates_with_cover(dates_sample_two) }
end
#=> user system total real
#=> 0.040000 0.000000 0.040000 ( 0.000181) # Over 200 times faster than include!

Knowing the above will definitely make me think twice with regard to the choices I make. It's certainly easy to fall into the trap of following what seems to be the right choice. Having things like the Benchmark module help with measuring each decision. It's all nice and beautiful when talking about the options and writing them down; but at the end of the day, numbers rule.