Skip to content

Instantly share code, notes, and snippets.

@tonilin
Last active August 29, 2015 14:17
Show Gist options
  • Save tonilin/393986d3588f6c1a00c1 to your computer and use it in GitHub Desktop.
Save tonilin/393986d3588f6c1a00c1 to your computer and use it in GitHub Desktop.
pluck_in_batches.rb
# Reference:
# 1. https://gist.github.com/siannopollo/03d646eb7525f7fce678#file-pluck_in_batches-rb
# 2. https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation/batches.rb
class ActiveRecord::Relation
# pluck_in_batches: yields an array of *columns that is at least size
# batch_size to a block.
#
# Special case: if there is only one column selected than each batch
# will yield an array of columns like [:column, :column, ...]
# rather than [[:column], [:column], ...]
# Arguments
# columns -> an arbitrary selection of columns found on the table.
# batch_size -> How many items to pluck at a time
# &block -> A block that processes an array of returned columns.
# Array is, at most, size batch_size
#
# Returns
# nothing is returned from the function
def pluck_in_batches(*columns, batch_size: 1000, start: nil)
if columns.empty?
raise "There must be at least one column to pluck"
end
relation = self
select_columns = columns.dup
# Find index of :id in the array
remove_id_from_results = false
id_index = columns.index(:id)
# :id is still needed to calculate offsets
# add it to the front of the array and remove it when yielding
if id_index.nil?
id_index = 0
select_columns.unshift(:id)
remove_id_from_results = true
end
unless block_given?
return to_enum(:pluck_in_batches, *select_columns, batch_size: batch_size, start: start) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
items = start ? relation.where(table[primary_key].gteq(start)).pluck(*select_columns) : relation.pluck(*select_columns)
while items.any?
items_size = items.size
primary_key_offset = items.last[id_index]
# Remove :id column if not in *columns
items.map! { |row| row[1..-1] } if remove_id_from_results
# Only pluck one column return flatten array
items.flatten! if select_columns.length == 2 && remove_id_from_results
yield items
break if items_size < batch_size
items = relation.where(table[primary_key].gt(primary_key_offset)).pluck(*select_columns)
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment