Skip to content

Instantly share code, notes, and snippets.

@haizi-zh
Last active February 5, 2016 17:44
Show Gist options
  • Save haizi-zh/18ccb2fe82d86c80b969 to your computer and use it in GitHub Desktop.
Save haizi-zh/18ccb2fe82d86c80b969 to your computer and use it in GitHub Desktop.
Elasticsearch: Index Management

Index Management

Creation

To create an index, the most easy way is just to put a document to the specified index. The index is created with the default settings, and new fields are added to the type mapping by using dynamic mapping.

If one wants to apply more control over the process, he needs the following API:

PUT /my_index
{
	"settings": { ... any settings ... },
	"mappings": {
		"type_1": { ... any mappings ... },
		"type_2": { ... any mappings ... }
	}
}

The following setting in elasticsearch.yml can disable auto index-creation:

action.auto_create_index: false

Deleting an Index

DELETE /my_index
DELETE /my_ind*
DELETE /my_index1,other_index*

Index Settings

  • number_of_shards: defaults to 5, which cannot be changed after index creation.
  • number_of_replicas: defaults to 1, and can be changed at any time.
PUT /my_index/_settings
{
	"number_of_replicas": `
}

Configuring Analyzers

By default, an index uses standard analyzers. Under most cirumstances, this would be good choice for western languages.

How to Deal with Mappings

The uppermost level of a maping is known as the root object. It contains the following:

  • properties: lists the mapping of each field.
  • Various metadata fields, such as _type, _id and _sourc.
  • Other settings

Properties

As for properties, the following matters:

  • type
  • index: analyzed for full text, not_analyzed for strings that are searchable as exact values, and no for fields that are not searchable at all.
  • analyzer: which analyzer to use for a full-text field, both at index time and search time.

_source field

A _source field is essential for the following reasons:

  • It provides a convenient way to retrieve data: the document is available directly from the search result. Without _source, one need to round-trip to another datastore to fetch the document.
  • Partial update will not be functional without this field.
  • When performing re-index, there is no need to retrieve original documents from another datastore, which might be slower. All the documents have already been in the Elasticsearch node, just in the _source field!

However, if none of the preceding reasons matters, to save disk usage, you can disable the _source field:

PUT /my_index
{
	"mappings": {
		"my_type": {
			"_source": {
				"enable": false
			}
		}
	}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment