Skip to content

Instantly share code, notes, and snippets.

@drush
Created July 28, 2015 20:55
Show Gist options
  • Save drush/4590dd2c7057b1d807c3 to your computer and use it in GitHub Desktop.
Save drush/4590dd2c7057b1d807c3 to your computer and use it in GitHub Desktop.
Elasticsearch Scan & Scroll Proto
require 'mechanize'
a = Mechanize.new
body = a.post('http://localhost:9200/sbir_dom/Firm/_search?search_type=scan&scroll=1m', {
"query": {
"query_string": { "query": "name:z*" }
},
"_source": false, "fields": ["_id"], "size": 10
}.to_json).body
loops = 1
ids = []
loop do
result = JSON.parse(body)
#puts result
scroll = result["_scroll_id"]
raise if scroll.nil? || scroll == ''
hits = result["hits"]["hits"]
ids += hits.collect{|h| h["_id"]}
puts "#{loops}\t#{hits.count}\t#{ids.last}"
break if hits.count == 0 && loops > 1
loops += 1
url = "http://localhost:9200/_search/scroll?scroll=1m&scroll_id=#{scroll}"
body = a.get(url).body
end
p "Found #{ids.count} objects by completed #{loops} loops"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment