Skip to content

Instantly share code, notes, and snippets.

@glenacota
Last active February 22, 2019 09:13
Show Gist options
  • Save glenacota/8c0132c217ba72328ba4e105124768d3 to your computer and use it in GitHub Desktop.
Save glenacota/8c0132c217ba72328ba4e105124768d3 to your computer and use it in GitHub Desktop.
An extended exercise that covers the "Mapping and Text Analysis" objective of the Elastic exam.
# ** EXAM OBJECTIVES: MAPPINGS AND TEXT ANALYSIS **
# (remove, if present, any `hamlet*` index and index template)
# Create the index `hamlet_1`, with one primary shard and no replicas
# Define the mapping for `hamlet_1`, satisfying the following criteria: (i) has a type "_doc" with three string fields named `speaker`, `line_number`, and `text_entry`; (ii) only `text_entry` is analysed; (iii) `text_entry` has a multi-field named `english`, associated with the built-in "english" analyzer; (iv) no aggregations supported by `line_number`
# Populate `hamlet_1` by running the _bulk command with the request-body below
{"index":{"_index":"hamlet_1","_id":0}}
{"line_number":"1.1.1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_index":"hamlet_1","_id":1}}
{"line_number":"1.1.2","speaker":"FRANCISCO","text_entry":"Nay, answer me: stand, and unfold yourself."}
{"index":{"_index":"hamlet_1","_id":2}}
{"line_number":"1.1.3","speaker":"BERNARDO","text_entry":"Long live the king!"}
{"index":{"_index":"hamlet_1","_id":3}}
{"line_number":"1.2.1","speaker":"KING CLAUDIUS","text_entry":"Though yet of Hamlet our dear brothers death"}
{"index":{"_index":"hamlet_1","_id":4}}
{"line_number":"1.2.2","speaker":"KING CLAUDIUS","text_entry":"The memory be green, and that it us befitted"}
{"index":{"_index":"hamlet_1","_id":5}}
{"line_number":"1.3.1","speaker":"LAERTES","text_entry":"My necessaries are embarkd: farewell:"}
{"index":{"_index":"hamlet_1","_id":6}}
{"line_number":"1.3.4","speaker":"LAERTES","text_entry":"But let me hear from you."}
{"index":{"_index":"hamlet_1","_id":7}}
{"line_number":"1.3.5","speaker":"OPHELIA","text_entry":"Do you doubt that?"}
{"index":{"_index":"hamlet_1","_id":8}}
{"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites shrewdly; it is very cold."}
{"index":{"_index":"hamlet_1","_id":9}}
# Create the index `hamlet_2`, which updates the mapping of `hamlet_1` by defining a multi-field for `speaker`. Such multi-field is named `token` and it maps to a (default) analysed text
# Reindex `hamlet_1` into `hamlet_2`
# Verify that full-text queries on "speaker.token" are enabled on `hamlet_2`
# Index more documents in `hamlet_2` by running the _bulk command with the request-body below
{"index":{"_index":"hamlet_2","_id":"p1"}}
{"name":"HAMLET","relationship":[{"name":"HORATIO","type":"friend"},{"name":"GERTRUDE","type":"mother"}]}
{"index":{"_index":"hamlet_2","_id":"p2"}}
{"name":"KING CLAUDIUS","relationship":[{"name":"HAMLET","type":"nephew"}]}
# The items of the `relationship` array cannot be searched independently. For example, the query below returns 1 hit
GET hamlet_2/_search
{
"query": {
"bool": {
"must": [
{ "match": { "relationship.name": "gertrude" } },
{ "match": { "relationship.type": "friend" } }
]
}
}
}
# Create the index `hamlet_3`, which updates the mapping of `hamlet_2` by satisfying the following criteria: (i) the inner objects of `relationship` can be searched independently; (ii) the fields of the inner objects of `relationship` are all keywords type
# Reindex `hamlet_2` into `hamlet_3`
# Verify that the items in `relationship` can be searched independently of each other. For example, the query below should return 0 hits
GET hamlet_3/_search
{
"query": {
"nested": {
"path": "relationship",
"query": {
"bool": {
"must": [
{ "match": { "relationship.name": "GERTRUDE" } },
{ "match": { "relationship.type": "friend" } }
]
}
}
}
}
}
# Change the value of `relationship.type` in the query above to get 1 hit
# So far, we have indexed two kinds of documents, either related to a character or to the dialogue. Notice that a profile-related document can be linked to many dialogue-related documents. We will model this one-to-many relation in the next step
# Create the index `hamlet_4`, which updates the mapping of `hamlet_3` by satisfying the following criteria: (i) has a join field named `profile_or_dialogue`; (ii) such join field defines a parent/child relation between a `profile` and a `dialogue`, respectively
# Reindex `hamlet_3` into `hamlet_4`
# Update the document with id "p2" (i.e., the profile document of King Claudius) by adding the field `profile_or_dialogue` and setting its property `name` to "profile"
# To be continued
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment