Skip to content

Instantly share code, notes, and snippets.

@toripiyo
Last active April 24, 2020 15:04
Show Gist options
  • Save toripiyo/8b14e8a387069bae372d49296b0077d7 to your computer and use it in GitHub Desktop.
Save toripiyo/8b14e8a387069bae372d49296b0077d7 to your computer and use it in GitHub Desktop.
obtain over 10000 documents from elasticsearch
{
"size": 10000,
"sort": [
"_doc"
],
"query": {
"bool": {
"must" : [
{ "match": { "sports": "baseball" }},
{ "match": { "date": "2020-04-20" }},
{ "match": { "price": "400" }}
]
}
}
}
#!/bin/bash
es_url='https://elastisearch-domain'
# index=my-index
# response=$(curl -s $es_url/$index/_search?scroll=1m -d @query.json)
response=$(curl -s $es_url/_search?scroll=1m -H 'Content-Type: application/json' -d @query.json)
scroll_id=$(echo $response | jq -r ._scroll_id)
hits_count=$(echo $response | jq -r '.hits.hits | length')
hits_so_far=${hits_count}
echo Got initial response with $hits_count hits and scroll ID $scroll_id
# TODO process first page of results here (ex. put the response into result.json)
echo $response | jq . >> result.json
while [ "$hits_count" != "0" ]; do
response=$(curl -s $es_url/_search/scroll -H 'Content-Type: application/json' -d "{ \"scroll\": \"1m\", \"scroll_id\": \"$scroll_id\" }")
scroll_id=$(echo $response | jq -r ._scroll_id)
hits_count=$(echo $response | jq -r '.hits.hits | length')
hits_so_far=$((hits_so_far + hits_count))
echo "Got response with $hits_count hits (hits so far: $hits_so_far), new scroll ID $scroll_id"
# TODO process page of results (ex. put the response into result.json)
echo $response | jq . >> result.json
done
echo Done!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment