Create a gist now

Instantly share code, notes, and snippets.

Embed
Elasticsearch Python bulk index API example
>>> import itertools
>>> import string
>>> from elasticsearch import Elasticsearch,helpers
es = Elasticsearch()
>>> # k is a generator expression that produces
... # a series of dictionaries containing test data.
... # The test data are just letter permutations
... # created with itertools.permutations.
... #
... # We then reference k as the iterator that's
... # consumed by the elasticsearch.helpers.bulk method.
>>> k = ({'_type':'foo', '_index':'test','letters':''.join(letters)}
... for letters in itertools.permutations(string.letters,2))
>>> # calling k.next() shows examples
... # (while consuming the generator, of course)
>>> # each dict contains a doc type, index, and data (at minimum)
>>> k.next()
{'_type': 'foo', 'letters': 'ab', '_index': 'test'}
>>> k.next()
{'_type': 'foo', 'letters': 'ac', '_index': 'test'}
>>> # create our test index
>>> es.indices.create('test')
{u'acknowledged': True}
>>> helpers.bulk(es,k)
(2650, [])
>>> # check to make sure we got what we expected...
>>> es.count(index='test')
{u'count': 2650, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
@zyxwu

This comment has been minimized.

Show comment
Hide comment
@zyxwu

zyxwu Apr 3, 2016

but when I index documents one by one, everything's fine
res = es.index(index=INDEX, doc_type=DOC_TYPE, id=ind, body=JS_message)

zyxwu commented Apr 3, 2016

but when I index documents one by one, everything's fine
res = es.index(index=INDEX, doc_type=DOC_TYPE, id=ind, body=JS_message)

@k2xl

This comment has been minimized.

Show comment
Hide comment
@k2xl

k2xl Mar 22, 2017

Would this work for a python list of JSON documents?

k2xl commented Mar 22, 2017

Would this work for a python list of JSON documents?

@thannaske

This comment has been minimized.

Show comment
Hide comment
@thannaske

thannaske Jul 8, 2017

@zyxwu Butt it's hell slow.

@zyxwu Butt it's hell slow.

@tusharkale197

This comment has been minimized.

Show comment
Hide comment
@tusharkale197

tusharkale197 Nov 27, 2017

I have a use case where I am updating/adding documents using the bulk API call and just after I fire the bulk call I check the ES count using the count API. Problem is there is some delay that happens after firing the bulk call which results in a delay in reflecting the correct count. How do I make sure the correct count reflects after I fire the bulk call?

I have a use case where I am updating/adding documents using the bulk API call and just after I fire the bulk call I check the ES count using the count API. Problem is there is some delay that happens after firing the bulk call which results in a delay in reflecting the correct count. How do I make sure the correct count reflects after I fire the bulk call?

@vinitpayal

This comment has been minimized.

Show comment
Hide comment
@vinitpayal

vinitpayal Feb 5, 2018

@tusharkale197 ElasticSearch is near real time database not exact real time database and hence once you index a document changes will be reflected within index.refresh_interval time period default value of which is 1s you can change it accordingly and also if you want you can manually refresh index using refresh api

@tusharkale197 ElasticSearch is near real time database not exact real time database and hence once you index a document changes will be reflected within index.refresh_interval time period default value of which is 1s you can change it accordingly and also if you want you can manually refresh index using refresh api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment