- Doing a query with a has_parent filter when a parent-child relation references a mapping that doesn't exist returns a NullPointerException (instead of a more informative error)
- Adding a port number to a unicast host in elasticsearch.yml causes that node to recieve invalid (ie unparseable) http requests
- Missing a newline in a bulk insert request caused subsequent queries on that index to return invalid json
- Doing a delete by query on an index that removed a significant number of documents caused refresh requests on that index to return NullPointerExceptions
- Shards moving between nodes for no apparent reason
- Shards becoming unassigned for no apparent reason
- Shards becoming unassigned even when all of the shards in the cluster had been routed manually and shard allocation had been disabled
- Shards losing all of their documents if a write is performed while it's unavailable
-
-
Save bobpoekert/f4613bde4fabae5b50bb to your computer and use it in GitHub Desktop.
Manual refresh API requests?
Yes (via elasticsearch-head)
Not really a bug, Elasticsearch will rebalance the cluster to maintain even distribution, according a variety of mechanisms (disk based allocation, balance settings, etc)
This might be intended behavior but it makes the db difficult to work with operationally. You want to be able to predict when expensive operations are going to happen so you can make sure that they don't interfere with the work your db is supposed to be doing.
Not sure I understand this one. Was it a valid, unused port? Invalid string? Used port? host:port combinations are definitely allowed
It was like this: discovery.zen.ping.unicast.hosts: ["db1.example.com:1234", "db2.example.com:1234"]
Why were you manually routing shards? What do you mean by "unavailable"
I was manually routing shards because when I didn't shards were getting rerouted continuously and were constantly becoming unassigned. The "Overview" view in elasticsearch-head looked like Conway's Game of Life.
By "unavailable" I mean not in a state where it can accept writes (e.g.: in the "initializing" state)
Opened this ticket for the bulk problem (applies to regular indexing too it seems): elastic/elasticsearch#7299
Will look into the delete-by-query and refresh situation, see if i can reproduce it.
Will look into the "initializing"-delete-all-docs situation too.
No idea about the port situation...Ive never heard that before (and we routinely change/configure ports for ourselves and various customers). Were you using the transport port, or the HTTP port?
This might be intended behavior but it makes the db difficult to work with operationally. You want to be able to predict when expensive operations are going to happen so you can make sure that they don't interfere with the work your db is supposed to be doing.
If you don't want shards moving around at all, you can set:
curl -XPUT "http://localhost:9200/_cluster/settings" -d'
{
"persistent": {
"cluster.routing.allocation.enable" : "none"
}
}'
Prevents any rebalancing/allocation at all. Or you could set new_primaries
, which is likely the better option: it will allocate new primaries but nothing else.
Ultimately, ES is designed to perform these maintenance operations in the background. Rather than preventing it, it's better to just throttle the operations until they don't affect your cluster anymore.
- You can throttle the process using:
indices.recovery.max_bytes_per_sec
, and set it to something reasonable that doesn't overwhelm your network/disk IO. - You could also set
cluster.routing.allocation.cluster_concurrent_rebalance: 1
which only allows one rebalance to occur in the cluster at a given time, to throttle how much background activity is happening.
Will try to reproduce and open a ticket myself if possible
Not sure I understand this one. Was it a valid, unused port? Invalid string? Used port? host:port combinations are definitely allowed
Heh, so it does:
The newlines are how ES decides where to split elements into internal bulks requests, so that the actual JSON doesn't need to be parsed. After the first action item, everything between newlines is considered one doc. And after that, the bulk syntax is screwy so nothing else gets indexed.
That said, there should probably be a warning/flag/exception that alerts the user to this, instead of blindly indexing invalid JSON. I'll open a ticket.
Manual refresh API requests?
Not really a bug, Elasticsearch will rebalance the cluster to maintain even distribution, according a variety of mechanisms (disk based allocation, balance settings, etc)
These we'd have to dig into more deeply, and as you said are tricky to reliably reproduce. Any info you could provide would be helpful. Why were you manually routing shards? What do you mean by "unavailable"