Skip to content

Instantly share code, notes, and snippets.

@DonSchado
Created November 24, 2014 10:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DonSchado/1dc732a84d32ec0d344c to your computer and use it in GitHub Desktop.
Save DonSchado/1dc732a84d32ec0d344c to your computer and use it in GitHub Desktop.
the elasticsearch-rails client provides a relatively simple interface for the Elasticsearch Bulk API, which can speed up the indexing
# A FASTER IMPORTER
#
# the built-in import is not very efficient.
# It grabs every record, builds the ActiveRecord object
# and calls #as_indexed_json on it.
#
# for large sets of data that spans multiple relations,
# this can take a lot of time to complete.
#
# the elasticsearch-rails client provides a relatively simple interface for
# the Elasticsearch Bulk API, which can speed up this process.
#
# usage:
#
# => FastElasticsearchImporter.new(Model1, Model2, Model3).import!
#
class FastElasticsearchImporter
attr_accessor :client, :klasses
def initialize(*klasses)
@klasses = klasses
end
def client
@client ||= klasses.first.__elasticsearch__.client
end
def target_index
klasses.first.index_name
end
def settings
with_all_klasses do |klass, _hash|
_hash.deep_merge! klass.__elasticsearch__.settings.to_hash
end
end
def mappings
with_all_klasses do |klass, _hash|
_hash.deep_merge! klass.__elasticsearch__.mappings.to_hash
end
end
def with_all_klasses(&block)
klasses.each_with_object({}) { |klass, _hash| block.call(klass, _hash) }
end
def create_index
if client.indices.exists(index: target_index)
Logger.new(STDOUT).info "index: #{target_index} already exists"
else
Logger.new(STDOUT).info "creating index: #{target_index}"
client.indices.create(index: target_index, body: { settings: settings, mappings: mappings })
end
end
def import!
create_index
import
end
def import
klasses.each do |klass|
klass.includes(*associations(klass)).find_in_batches do |models|
Logger.new(STDOUT).info "importing #{klass.document_type} with associated #{associations(klass)}"
client.bulk({
index: target_index,
type: klass.document_type,
body: prepare_documents(models)
})
end
end
end
def associations(klass)
klass.__elasticsearch__
.mapping
.to_hash[klass.document_type.to_sym].fetch(:properties)
.select { |k, h| h[:type] == 'nested' || h[:type] == 'object' }
.keys
end
def prepare_documents(models)
models.map do |model|
{ index: { _id: model.id, data: model.as_indexed_json } }
end
end
end
@toomus
Copy link

toomus commented Jun 15, 2021

Thx, it helps a lot to speedup elasticsearch import

@DonSchado
Copy link
Author

DonSchado commented Jun 16, 2021

@toomus wow, didn't even remember that I have written this code :D
glad you found it and that it was useful to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment