Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save glenacota/7ab4a33c9d30518ea1532d75dcafe13c to your computer and use it in GitHub Desktop.
Save glenacota/7ab4a33c9d30518ea1532d75dcafe13c to your computer and use it in GitHub Desktop.
An exercise to practice with aliases, reindexing, and ingest pipelines in Elasticsearch.
# GOAL: Create an alias, reindex indices, and create ingest pipelines
# INITIAL SETUP: (i) a running Elasticsearch cluster with at least one node and a Kibana instance, (ii) no index or index template that starts by `hamlet`
# Copy-paste the following instructions into your Kibana console, and work directly from there
# Create the indices `hamlet-1` and `hamlet-2`, each with two primary shards and no replicas
# Add some documents to `hamlet-1` by running the following _bulk command
PUT hamlet-1/_doc/_bulk
{"index":{"_index":"hamlet-1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_index":"hamlet-1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet-1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"}
{"index":{"_index":"hamlet-1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"}
# Add some documents to `hamlet-2` by running the following _bulk command
PUT hamlet-2/_doc/_bulk
{"index":{"_index":"hamlet-2","_id":4}}
{"line_number":"2.1.1","speaker":"LORD POLONIUS","text_entry":"Give him this money and these notes, Reynaldo."}
{"index":{"_index":"hamlet-2","_id":5}}
{"line_number":"2.1.2","speaker":"REYNALDO","text_entry":"I will, my lord."}
{"index":{"_index":"hamlet-2","_id":6}}
{"line_number":"2.1.3","speaker":"LORD POLONIUS","text_entry":"You shall do marvellous wisely, good Reynaldo,"}
{"index":{"_index":"hamlet-2","_id":7}}
{"line_number":"2.1.4","speaker":"LORD POLONIUS","text_entry":"Before you visit him, to make inquire"}
# Create the alias `hamlet` that maps both `hamlet-1` and `hamlet-2`
# Verify that the documents grouped by `hamlet` are 8
# Configure `hamlet-1` to be the write index of the `hamlet` alias
# Add a document to `hamlet`, so that the document (i) has id "8", (ii) has "_doc" type, (iii) has a field `text_entry` with value "With turbulent and dangerous lunacy?", (iv) has a field `line_number` with value "3.1.4", (v) has a field `speaker` with value "KING CLAUDIUS"
# Create a script named `control_reindex_batch` and save it into the cluster state. The script checks whether a document has the field `reindexBatch`, and (i) in the affirmative case, it increments the field value by a script parameter named `increment`, (ii) otherwise, the script adds the field to the document setting its value to "1"
# Create the index `hamlet-new` with two primary shard and no replicas
# Reindex `hamlet` into `hamlet-new`, while satisfying the following criteria: (i) disable any refresh of `hamlet-new` during the reindexing, (ii) apply the `control_reindex_batch` script with the `increment` parameter set to "1", (iii) reindex using two parallel slices
# In one request, add `hamlet-new` to the alias `hamlet` and delete the `hamlet` and `hamlet-2` indices
# Create an ingest pipeline named `split_act_scene_line`. The pipeline splits the value of `line_number` using the dots as a separator, and stores the split values into three new fields named `number_act`, `number_scene`, and `number_line`, respectively
# Test the ingest pipeline on the following document
{
"_source": {
"line_number": "1.2.3"
}
}
# Update all documents in `hamlet-new` by using the `split_act_scene_line` pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment