Skip to content

Instantly share code, notes, and snippets.

@iamatypeofwalrus
Last active March 30, 2023 20:54
Show Gist options
  • Save iamatypeofwalrus/f9f6a3049f63dd05d02c to your computer and use it in GitHub Desktop.
Save iamatypeofwalrus/f9f6a3049f63dd05d02c to your computer and use it in GitHub Desktop.
A Rails 4 pluck in batches implementation

pluck_in_batches

Sometimes you need to iterate over a ton of items and you don't want the overhead of creating AR objects out of all of them. Hell, you only need a few things! Well, #pluck has your back.

But what if you want to iterate over many tonnes of items?

Pluck in batches to the rescue!

This isn't the exact code that I use in my code base, but it is damn close.

Enjoy!

# This assumes you are in Rails 4 and you can pluck multiple columns
class ActiveRecord::Relation
# pluck_in_batches: yields an array of *columns that is at least size
# batch_size to a block.
#
# Special case: if there is only one column selected than each batch
# will yield an array of columns like [:column, :column, ...]
# rather than [[:column], [:column], ...]
# Arguments
# columns -> an arbitrary selection of columns found on the table.
# batch_size -> How many items to pluck at a time
# &block -> A block that processes an array of returned columns.
# Array is, at most, size batch_size
#
# Returns
# nothing is returned from the function
def pluck_in_batches(*columns, batch_size: 1000)
if columns.empty?
raise "There must be at least one column to pluck"
end
# the :id to start the query at
batch_start = 1
# It's cool. We're only taking in symbols
# no deep clone needed
select_columns = columns.dup
# Find index of :id in the array
remove_id_from_results = false
id_index = columns.index(:id)
# :id is still needed to calculate offsets
# add it to the front of the array and remove it when yielding
if id_index.nil?
id_index = 0
select_columns.unshift(:id)
remove_id_from_results = true
end
loop do
items = self.where(where_statement, batch_start)
.limit(batch_size)
.order(:id)
.pluck(select_columns)
break if items.empty?
# Use the last id to calculate where to offset queries
last_id = items.last[id_index]
# Remove :id column if not in *columns
items.map! { |row| row[1..-1] } if remove_id_from_results
yield items
batch_start = last_id + 1
end
end
end
@amalagaura
Copy link

I think this is necessary in ActiveRecord core. Have you tried seeing if the core team is interested in this?

@siannopollo
Copy link

I made some changes that get this working more smoothly.

@iamatypeofwalrus
Copy link
Author

@amalgaura: I hadn't thought about it. Seems like a quick add.

@iamatypeofwalrus
Copy link
Author

@siannopollo: great fixes. thanks!

@ericcj
Copy link

ericcj commented Jan 12, 2017

and I made it work on any sortable, not just integer key: https://gist.github.com/ericcj/e58ae3b11ef14fc070862c8567cd1b7f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment