Skip to content

Instantly share code, notes, and snippets.

@karmi
Last active December 14, 2015 19:19
Show Gist options
  • Save karmi/5135885 to your computer and use it in GitHub Desktop.
Save karmi/5135885 to your computer and use it in GitHub Desktop.
Simple CSV import to Elasticsearch with Tire
# Usage:
#
# $ ruby generate.rb > people.csv
require 'time'
require 'faker'
require 'oj'
COUNT = (ENV['COUNT'] || 1000).to_i
people = []
1.upto(COUNT) do |i|
people << {
'first_name' => Faker::Name.first_name,
'last_name' => Faker::Name.last_name,
'phone' => Faker::PhoneNumber.cell_phone,
'department' => ['Marketing', 'Development', 'Production'].sample,
'birthday' => Time.at(0.0 + rand * ((Time.now-631152000).to_f)).iso8601
}
end
STDOUT.puts 'FirstName,LastName,Phone,Department,Birthday'
people.each_with_index do |person, i|
STDOUT.puts [
person['first_name'],
person['last_name'],
person['phone'],
person['department'],
person['birthday']
].join(',')
end
# Usage:
#
# $ ruby import /path/to/file.csv
require 'tire'
require 'csv'
path = ARGV[0]
batch_size = 100
buffer = []
index_name = ENV['INDEX'] || File.basename(path, '.*')
unless path
puts "Usage: #{__FILE__} /path/to/csv"
exit 1
end
CSV.foreach path, headers: true do |row|
# Add line as JSON into buffer.
#
buffer << row.to_hash
# When we hit the batch boundary...
if buffer.size % batch_size == 0
# ... load batch into Elasticsearch ...
Tire.index index_name do
STDERR.puts import(buffer), '-'*80
end
# ... and empty the buffer.
buffer = []
end
end
# Import any rest.
#
Tire.index index_name do
STDERR.puts import(buffer), '-'*80
end unless buffer.empty?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment