Skip to content

Instantly share code, notes, and snippets.

@yuriybash
Created August 23, 2018 23:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yuriybash/4933e99af0faa41e588d6f8b70950b30 to your computer and use it in GitHub Desktop.
Save yuriybash/4933e99af0faa41e588d6f8b70950b30 to your computer and use it in GitHub Desktop.

Elasticsearch tips & tricks: part 1 - Speeding up migrations & reindexing

Background

Elasticsearch is a distributed, document-oriented storage engine particularly adept at search (thanks partly to the use of Lucene under the hood). It exposes a RESTful API for both query execution and management.

At Percolate, we managed multiple ES clusters (ranging from 50M-5B docs each) for a variety of uses, including storing denormalized object data and analytics data about those objects (i.e. social metrics). ES's aggregation capabilities are useful for slicing and dicing across various dimensions.

ES is a powerful tool for specific purposes, but there are nuances, pitfalls, and tricks that can cause you many headaches, several of which are not explicitly listed in documentation. This article is the first in a series to share that knowledge. The parts are:

  • Part 1: Speeding up migrations
  • Part 2: Ensuring snapshots succeed
  • Part 3: Field data cache size
  • Part 4: Obscure errors

Before proceeding, I'd

Part 1: Speeding up migrations & reindexing

From time to time, you will want to run an entire cluster migration, whether for a version upgrade, fleet-wide instance replacement, or other reasons. Or you may wish to reindex due to a change in a field. The amount of time this takes can vary wildly based on cluster size, configuration, instance type, network speed, and a myriad of other factors.

There are two config options that have a large impact on this: replica count and refresh interval.

Temporarily decreasing the replica count

Each index (or indices) have a specific number of replica shards - this can be queried via

https://gist.github.com/c44c5a9b7952ac2462273c62da79eb82

Replica shards are used for both backup and reads, and if you are running ES in any kind of production environment, you should absolutely have multiple replicas for each index.

However, having a large number of replicas during a migration can slow down the speed. One way to increase migration speed is to temporarily decrease the replica count, then restore it to a higher number once the initial migration is done with fewer replicas. This allows ES to run a simple disk copy operation at the end, which is faster than the default migration scheme.

If you have 3 replicas, as in the sample response above, you can update the replica count to 1 via a PUT:

https://gist.github.com/2261748e9284b1dbff79ad429d26012e

If it is production data, I recommend keeping at least 1 replica at all times, even during the migration. Once the migration is complete (you can confirm via kopf or GET _cluster/health), you can send another PUT request to bring the replica count back up.

Temporarily increasing the refresh interval

First, some background:

When a document is indexed, it is first added to an in-memory buffer and appended to the translog. When an index refresh takes place, documents held in the buffer are written to a new segment. This interval is set by the index.refresh_interval setting, but it does not mean that the document has been persisted (fsync'd) to disk. This interval is determined by the index.translog.sync_interval setting. In other words, a document's search-ability and whether it has persisted to disk are orthogonal.

Let's say you have the following index settings:

https://gist.github.com/77f369807ca92b99015385ff22572e0e

This diagram roughly outlines what happens when you add a document to the cluster at t=0:

Image of ES Intervals

The act of refreshing indices slows down indexing. You can increase this interval, or turn it off altogether to speed up indexing:

https://gist.github.com/26be39c44392c7195dcbe4563799e1a3

The command above turns off indexing until it is turned back on again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment