Skip to content

Instantly share code, notes, and snippets.

@searls
Created December 24, 2021 15:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save searls/50a13d7687ad67e072354ac7de20975b to your computer and use it in GitHub Desktop.
Save searls/50a13d7687ad67e072354ac7de20975b to your computer and use it in GitHub Desktop.
This is an attempt to efficiently upsert/overwrite a large chunk of records' (items) associated rows (terms).
Item.where("updated_at > ?", after).includes(:english_terms).find_in_batches(batch_size: 5000).with_index do |items, i|
good_term_attrs = []
bad_term_ids = []
items.each do |item|
# This just gathers all the potential valid english terms associated with a dictionary entry:
term_texts = @expands_parentheticals.call(item.meaning_texts).map { |s|
@massages_english.call(s)
}.uniq
good_term_attrs += term_texts.map { |text|
{item_id: item.id, type: item.type, text: text}
}
bad_term_ids += item.english_terms.select { |existing_term|
!term_texts.include?(existing_term.text)
}.map(&:id)
end
if good_term_attrs.present?
EnglishTerm.upsert_all(good_term_attrs, unique_by: [:item_id, :text])
end
if bad_term_ids.present?
EnglishTerm.delete_by(id: bad_term_ids)
end
puts "Updated english_terms through #{items.size + i * 5000} items"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment