Skip to content

Instantly share code, notes, and snippets.

@glenacota
Last active February 21, 2019 21:57
Show Gist options
  • Save glenacota/4109613edbf6941ee371eb3a45a629ef to your computer and use it in GitHub Desktop.
Save glenacota/4109613edbf6941ee371eb3a45a629ef to your computer and use it in GitHub Desktop.
An extended exercise that covers the "Indexing Data" and part of the "Mapping and Text Analysis" objectives of the Elastic exam.
# ** EXAM OBJECTIVES: INDEXING DATA + MAPPINGS AND TEXT ANALYSIS **
# (remove, if present, any `hamlet*` index and index template)
# Create the index `hamlet_raw`, with one primary shard and four replicas
# Index in `hamlet_raw` a document that satisfies the following criteria: (i) has id "1"; (ii) has default type; (iii) has a field `line` with value "To be, or not to be: that is the question"
# Update the document with id "1" by adding the field `line_number` with value "3.1.64"
# Index in `hamlet_raw` a new document without specifying any id. The fields of this document are: (i) `text_entry` with value "Whether tis nobler in the mind to suffer"; (ii) `line_number` with value "3.1.66"
# Update the precedent document by setting `line_number` to "3.1.65"
# (in one request) Update all documents in `hamlet_raw` by adding a new field `speaker` with value "HAMLET"
# Update the document with id "1" by renaming the field `line` into `text_entry`
# Delete the `hamlet_raw` index
# Create the index template `hamlet_template`, which satisfies the following criteria: (i) it matches the index patterns "hamlet_*" and "hamlet-*"; (ii) it allocates one primary shard and no replicas for each matching index
# Create two indices named `hamlet2` and `hamlet_test`. Verify that `hamlet_template` applied only to `hamlet_test`
# (in one request) Delete the `hamlet2` and `hamlet_test` indices
# Update `hamlet_template` by defining a mapping that satisfies the following criteria: (i) the type is "_doc"; (ii) has three fields named `speaker`, `line_number` and `text_entry`; (iii) `speaker` and `line_number` map to a unanalysed string; (iv) `text_entry` is a text associated with the "english" analyzer
# Create the index `hamlet_1`, and populate it by running the _bulk command with the request-body below
{"index":{"_index":"hamlet_1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_index":"hamlet_1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet_1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"}
{"index":{"_index":"hamlet_1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"}
{"index":{"_index":"hamlet_1","_id":4}}
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The memory be green, and that it us befitted"}
{"index":{"_index":"hamlet_1","_id":5}}
{"line_number":"1.3.1","speaker":"LAERTES","text_entry":"My necessaries are embarkd: farewell:"}
{"index":{"_index":"hamlet_1","_id":6}}
{"line_number":"1.3.4","speaker":"LAERTES","text_entry":"But let me hear from you."}
{"index":{"_index":"hamlet_1","_id":7}}
{"line_number":"1.3.5","speaker":"OPHELIA","text_entry":"Do you doubt that?"}
{"index":{"_index":"hamlet_1","_id":8}}
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites shrewdly; it is very cold."}
{"index":{"_index":"hamlet_1","_id":9}}
{"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a nipping and an eager air."}
{"index":{"_index":"hamlet_1","_id":10}}
{"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour now?"}
{"index":{"_index":"hamlet_1","_id":11}}
{"line_number":"1.5.2","speaker":"Ghost","text_entry":"Mark me."}
{"index":{"_index":"hamlet_1","_id":12}}
{"line_number":"1.5.3","speaker":"HAMLET","text_entry":"I will."}
# Create the index `hamlet_2`, and populate it by running the _bulk command with the request-body below
{"index":{"_index":"hamlet_2","_id":14}}
{"line_number":"2.1.1","speaker":"LORD POLONIUS","text_entry":"Give him this money and these notes, Reynaldo."}
{"index":{"_index":"hamlet_2","_id":15}}
{"line_number":"2.1.2","speaker":"REYNALDO","text_entry":"I will, my lord."}
{"index":{"_index":"hamlet_2","_id":16}}
{"line_number":"2.1.3","speaker":"LORD POLONIUS","text_entry":"You shall do marvellous wisely, good Reynaldo,"}
{"index":{"_index":"hamlet_2","_id":17}}
{"line_number":"2.1.4","speaker":"LORD POLONIUS","text_entry":"Before you visit him, to make inquire"}
{"index":{"_index":"hamlet_2","_id":18}}
{"line_number":"2.2.1","speaker":"KING CLAUDIUS","text_entry":"Welcome, dear Rosencrantz and Guildenstern!"}
{"index":{"_index":"hamlet_2","_id":19}}
{"line_number":"2.2.2","speaker":"KING CLAUDIUS","text_entry":"Moreover that we much did long to see you,"}
{"index":{"_index":"hamlet_2","_id":20}}
{"line_number":"2.2.3","speaker":"KING CLAUDIUS","text_entry":"The need we have to use you did provoke"}
# Create an alias named `hamlet` that maps both `hamlet_1` and `hamlet_2`
# Verify that the documents grouped in `hamlet` are 20
# Configure `hamlet_1` to be the write index of the `hamlet` alias
# Index in `hamlet` a document with id "13", default type, and the following fields: (i) `text_entry` with value "My hour is almost come,"; (ii) `line_number` with value "1.5.4"; (iii) `speaker`, with value "Ghost"
# Update the mapping of `hamlet_template`, satisfying the following criteria: (i) remove the definitions of the `line_number` and `speaker` fields; (ii) disable aggregations for `text_entry`; (iii) dynamically assign an integer type to any field starting by "number_"; (iv) dynamically map strings to unanalysed text as a default
# Create the index `hamlet_3`, and populate it by running the _bulk command with the request-body below
{"index":{"_index":"hamlet_3","_id":21}}
{"line_number":"3.1.4","speaker":"KING CLAUDIUS","text_entry":"With turbulent and dangerous lunacy?"}
{"index":{"_index":"hamlet_3","_id":22}}
{"line_number":"3.1.5","speaker":"ROSENCRANTZ","text_entry":"He does confess he feels himself distracted;"}
{"index":{"_index":"hamlet_3","_id":23}}
{"line_number":"3.1.64","speaker":"HAMLET","text_entry":"To be, or not to be: that is the question:"}
{"index":{"_index":"hamlet_3","_id":24}}
{"line_number":"3.1.65","speaker":"HAMLET","text_entry":"Whether tis nobler in the mind to suffer"}
{"index":{"_index":"hamlet_3","_id":25}}
{"line_number":"3.1.66","speaker":"HAMLET","text_entry":"The slings and arrows of outrageous fortune,"}
{"index":{"_index":"hamlet_3","_id":26}}
{"line_number":"3.1.67","speaker":"HAMLET","text_entry":"Or to take arms against a sea of troubles,"}
{"index":{"_index":"hamlet_3","_id":27}}
{"line_number":"3.1.68","speaker":"HAMLET","text_entry":"And by opposing end them? To die: to sleep;"}
{"index":{"_index":"hamlet_3","_id":28}}
{"line_number":"3.1.69","speaker":"HAMLET","text_entry":"No more; and by a sleep to say we end"}
# Store in the cluster state a new script named `control_reindex_batch`, which checks whether the `reindexBatch` field exists in a document. In the affirmative case, then the script increments the field value by a parameter named `increment`; otherwise, the script sets the field value to 1
# Reindex `hamlet` into `hamlet_3`, satisfying the following criteria: (i) disable refreshes of `hamlet_3` during the operation; (ii) apply the `control_reindex_batch` script with the `increment` parameter set to 1; (iii) reindex in two parallel slices
# Update all documents in `hamlet_3` by initialising the `reindexBatch` field to 1, if not present
# (in one request) Add `hamlet_3` to the alias `hamlet`, and delete the `hamlet_1` and `hamlet_2` indices
# Update all documents in `hamlet_3` by running the `control_reindex_batch` script with an `increment` of 10
# Remove from `hamlet_3` the documents that have "KING CLAUDIUS" as `speaker`
# Store in the cluster state a new ingest pipeline named `split_act_scene_line`, which satisfies the following criteria: (i) it splits the value of `line_number` by using dots as the separator; (ii) it stores the split values into three new numeric fields, named `number_act`, `number_scene`, and `number_line`, respectively
# Update all documents in `hamlet_3` using the `split_act_scene_line` pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment