Last active
November 12, 2019 07:44
-
-
Save tnj/38cfd344d539f2e2882dce10971bce73 to your computer and use it in GitHub Desktop.
Efficiently finds a record by creation time in a huge table even without having index on created_at
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class ActiveRecord::Base | |
# @param [Time] created_at | |
# @return [ActiveRecord::Base] | |
def self.find_closest_created_at(created_at) | |
return if self.last.created_at < created_at | |
high = self.last.id | |
low = self.first.id | |
while low <= high | |
mid = (high - low) / 2 + low | |
closest = self.select(:id, :created_at).where('id >= ?', mid).first | |
return if closest.nil? | |
return self.find(closest.id) if high == low | |
if created_at > closest.created_at | |
low = mid + 1 | |
else | |
high = mid | |
end | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Usage
Install this as an initializer on your Rails project, or just paste whole code into
rails console
.Example
You have a
Log
model that contains billions of rows and doesn't have an index oncreated_at
.To retrieve rows for the last 24 hours, do the following two steps.
id
of the row closest to the time of 24 hours agoid
to narrow down the range of data using a primary key.It generates a query that uses the primary key to determine the range and runs very fast, even on a table with billions of rows.
Above code is an equivalent to this naïve one:
However, this will result in a full table scan that doesn't return like forever. Not a good idea to run on a table with billions of rows.