Skip to content

Instantly share code, notes, and snippets.

@EdwardDiehl
Forked from iamatypeofwalrus/README.md
Created June 11, 2016 09:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EdwardDiehl/10e5b9ae7377b300adfe8de38bfb075e to your computer and use it in GitHub Desktop.
Save EdwardDiehl/10e5b9ae7377b300adfe8de38bfb075e to your computer and use it in GitHub Desktop.
A Rails 4 pluck in batches implementation

pluck_in_batches

Sometimes you need to iterate over a ton of items and you don't want the overhead of creating AR objects out of all of them. Hell, you only need a few things! Well, #pluck has your back.

But what if you want to iterate over many tonnes of items?

Pluck in batches to the rescue!

This isn't the exact code that I use in my code base, but it is damn close.

Enjoy!

# This assumes you are in Rails 4 and you can pluck multiple columns
class ActiveRecord::Relation
# pluck_in_batches: yields an array of *columns that is at least size
# batch_size to a block.
#
# Special case: if there is only one column selected than each batch
# will yield an array of columns like [:column, :column, ...]
# rather than [[:column], [:column], ...]
# Arguments
# columns -> an arbitrary selection of columns found on the table.
# batch_size -> How many items to pluck at a time
# &block -> A block that processes an array of returned columns.
# Array is, at most, size batch_size
#
# Returns
# nothing is returned from the function
def pluck_in_batches(*columns, batch_size: 1000)
if columns.empty?
raise "There must be at least one column to pluck"
end
# the :id to start the query at
batch_start = 1
# It's cool. We're only taking in symbols
# no deep clone needed
select_columns = columns.dup
# Find index of :id in the array
remove_id_from_results = false
id_index = columns.index(:id)
# :id is still needed to calculate offsets
# add it to the front of the array and remove it when yielding
if id_index.nil?
id_index = 0
select_columns.unshift(:id)
remove_id_from_results = true
end
loop do
items = self.where(where_statement, batch_start)
.limit(batch_size)
.order(:id)
.pluck(select_columns)
break if items.empty?
# Use the last id to calculate where to offset queries
last_id = items.last[id_index]
# Remove :id column if not in *columns
items.map! { |row| row[1..-1] } if remove_id_from_results
yield items
batch_start = last_id + 1
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment