Docker containers are really convenient for this.
Get a recent Elasticsearch Docker container:
docker pull docker.elastic.co/elasticsearch/elasticsearch:6.2.1
6.2.1: Pulling from elasticsearch/elasticsearch
Digest: sha256:cc5fe435b81c0b586363ca33573e802a060468ae93a6cb4702caa1e336f4d1c3
Status: Image is up to date for docker.elastic.co/elasticsearch/elasticsearch:6.2.1
Run ES, exposing port 9200 for queries:
docker run --name es -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.2.1
0ea334fdb6e7f4d9542a2f6768290c0743fbf464eef390328aeb99fd065bf041
Query the container to test aliveness:
curl localhost:9200
{
"name" : "5G3WnUv",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "gwX4DRPSStiP38PBzqiJ_Q",
"version" : {
"number" : "6.2.1",
"build_hash" : "7299dc3",
"build_date" : "2018-02-07T19:34:26.990113Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Dynamic mapping is great for exploratory work. You don’t need to plan your data structure, just index it and query away.
Create an imaginatively-named new index:
curl -XPUT http://localhost:9200/new_index?pretty -H "content-type: application/json" -d @- <<EOF
{}
EOF
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "new_index"
}
Add three documents into the index, each with a different structure but some overlapping keys:
curl -XPUT http://localhost:9200/new_index/_doc/001 -H "content-type: application/json" -d @- <<EOF
{
"user": "thecambian",
"when": "yesterday",
"message": "trying out Elasticsearch's essential stuff",
"source": "elasticsearch documentation"
}
EOF
echo
curl -XPUT http://localhost:9200/new_index/_doc/002 -H "content-type: application/json" -d @- <<EOF
{
"when": "2017-01-01T12:00:33Z",
"url": "https://www.google.com/search?q=elasticsearch",
"source_ip": "127.0.0.1"
}
EOF
echo
curl -XPUT http://localhost:9200/new_index/_doc/003 -H "content-type: application/json" -d @- <<EOF
{
"start": "2017-02-01T14:41:23Z",
"author": "cambium",
"article": {
"topic": "great features for elasticsearch",
"tags": [ "elasticsearch", "databases", "technology" ]
}
}
EOF
{"_index":"new_index","_type":"_doc","_id":"001","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
{"_index":"new_index","_type":"_doc","_id":"002","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}
{"_index":"new_index","_type":"_doc","_id":"003","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
Run a quick search for elastic*
on these new data, using JQ to
filter the result down to just the hits count:
curl 'http://localhost:9200/new_index/_search?q=elastic*' | jq .hits.total
3
All three documents match!
How did Elasticsearch map the source_ip
field?
curl http://localhost:9200/new_index/_mapping/_doc/field/source_ip?pretty
{
"new_index" : {
"mappings" : {
"_doc" : {
"source_ip" : {
"full_name" : "source_ip",
"mapping" : {
/* here is the source_ip mapping: */
"source_ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
It’s mapped as a text field with a keyword subfield, and not as an IP address data type.
The start
field was first mapped when document 003 was indexed:
curl 'http://localhost:9200/new_index/_doc/003?pretty&_source_include=start' | jq ._source
{
"start": "2017-02-01T14:41:23Z"
}
It is a date field:
curl http://localhost:9200/new_index/_mapping/_doc/field/start?pretty
{
"new_index" : {
"mappings" : {
"_doc" : {
"start" : {
"full_name" : "start",
"mapping" : {
"start" : {
"type" : "date"
}
}
}
}
}
}
}
What happens when you try to index an incompatible field? In this case, we try to index a lat/lon pair into a date field.
curl -XPUT http://localhost:9200/new_index/_doc/004?pretty -H "content-type: application/json" -d @- <<EOF
{
"route_name": "bike ride boulder to denver"
"start": "40.01,-105.27",
"end": "39.73N,-104.99"
}
EOF
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "json_parse_exception",
"reason" : "Unexpected character ('\"' (code 34)): was expecting comma to separate Object entries\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@61370f12; line: 1, column: 50]"
}
},
"status" : 400
}
Elasticsearch refuses to index the document. There is no reasonable
way to coerce the start
field’s value 40.01,-105.27
into a date datatype.
Let’s look at how the when
field mapping is set up:
curl http://localhost:9200/new_index/_mapping/_doc/field/when?pretty
{
"new_index" : {
"mappings" : {
"_doc" : {
"when" : {
"full_name" : "when",
"mapping" : {
"when" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
It’s a text field, because the first document indexed with a when
field had a value of yesterday
:
curl 'http://localhost:9200/new_index/_doc/001?pretty&_source_include=when' | jq ._source
{
"when": "yesterday"
}
The second document indexed with this field had a date value:
curl 'http://localhost:9200/new_index/_doc/002?pretty&_source_include=when' | jq ._source
{
"when": "2017-01-01T12:00:33Z"
}
Unfortuntely, this also got indexed as a text field, not a date field.
A common error is to get a field name subtly wrong. Document 002 might
represent an activity log event, with a source_ip
field. If we index
a new document having a source-ip
field, we can see the impact of
this mistake.
Index a new document with a subtle typo in the source_ip
field:
curl -XPUT http://localhost:9200/new_index/_doc/005?pretty -H "content-type: application/json" -d @- <<EOF
{
"when": "2017-02-10T08:10:10Z",
"url": "https://www.google.com/search?q=json",
"source-ip": "127.0.0.1"
}
EOF
{
"_index" : "new_index",
"_type" : "_doc",
"_id" : "005",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
Now we search for all events coming from “127.0.0.1”:
curl 'http://localhost:9200/new_index/_search?q=source_ip:"127.0.0.1"&pretty' | jq .hits
{
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "new_index",
"_type": "_doc",
"_id": "002",
"_score": 0.2876821,
"_source": {
"when": "2017-01-01T12:00:33Z",
"url": "https://www.google.com/search?q=elasticsearch",
"source_ip": "127.0.0.1"
}
}
]
}
Only document 002 is returned. Document 005 is not in the result set
because it doesn’t have a source_ip
field.
Create a new index with dynamic mapping disabled as “strict”:
curl -XPUT http://localhost:9200/dynamic_strict_visitor_log?pretty -H "content-type: application/json" -d @- <<EOF
{
"mappings": {
"_doc": {
"dynamic": "strict",
"properties": {
"user-id": { "type": "keyword" },
"ip": { "type": "text" },
"session-id": { "type": "keyword" },
"ts": { "type": "date" },
"url": { "type": "text" },
"method": { "type": "keyword" }
}
}
}
}
EOF
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "dynamic_strict_visitor_log"
}
Index a fully conforming document:
curl -XPUT http://localhost:9200/dynamic_strict_visitor_log/_doc/c01 -H "content-type: application/json" -d @- <<EOF
{
"user-id": "6ae92f19",
"ip": "10.76.54.12",
"session-id": "3d86c3ed",
"ts": "2018-02-01T12:13:14Z",
"url": "https://www.example.com/api/reports/827362",
"method": "PUT"
}
EOF
{"_index":"dynamic_strict_visitor_log","_type":"_doc","_id":"c01","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}
Try to index a document with an extra field:
curl -XPUT http://localhost:9200/dynamic_strict_visitor_log/_doc/c02?pretty -H "content-type: application/json" -d @- <<EOF
{
"user-id": "6ae92f19",
"ip": "10.76.54.12",
"session-id": "3d86c3ed",
"ts": "2018-02-01T12:13:14Z",
"url": "https://www.example.com/api/reports/827362",
"parameters": [ { "attr": "foo", "value": "bar" } ],
"method": "PUT"
}
EOF
{
"error" : {
"root_cause" : [
{
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [parameters] within [_doc] is not allowed"
}
],
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [parameters] within [_doc] is not allowed"
},
"status" : 400
}
This document wasn’t accepted - any deviations from the mapping will cause an error.
Create a new index with dynamic mapping set to “false”:
curl -XPUT http://localhost:9200/dynamic_false_visitor_log -H "content-type: application/json" -d @- <<EOF
{
"mappings": {
"_doc": {
"dynamic": false,
"properties": {
"user-id": { "type": "keyword" },
"ip": { "type": "text" },
"session-id": { "type": "keyword" },
"ts": { "type": "date" },
"url": { "type": "text" },
"method": { "type": "keyword" }
}
}
}
}
EOF
{"acknowledged":true,"shards_acknowledged":true,"index":"dynamic_false_visitor_log"}
Index a fully conforming document:
curl -XPUT http://localhost:9200/dynamic_false_visitor_log/_doc/b01 -H "content-type: application/json" -d @- <<EOF
{
"user-id": "6ae92f19",
"ip": "10.76.54.12",
"session-id": "3d86c3ed",
"ts": "2018-02-01T12:13:14Z",
"url": "https://www.example.com/api/reports/827362",
"method": "PUT"
}
EOF
{"_index":"dynamic_false_visitor_log","_type":"_doc","_id":"b01","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
Index a document with an extra field:
curl -XPUT http://localhost:9200/dynamic_false_visitor_log/_doc/b02 -H "content-type: application/json" -d @- <<EOF
{
"user-id": "6ae92f19",
"ip": "10.76.54.12",
"session-id": "3d86c3ed",
"ts": "2018-02-01T12:13:14Z",
"url": "https://www.example.com/api/reports/827362",
"parameters": [ { "attr": "foo", "value": "bar" } ],
"method": "PUT"
}
EOF
{"_index":"dynamic_false_visitor_log","_type":"_doc","_id":"b02","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
Retrieve the second document:
curl http://localhost:9200/dynamic_false_visitor_log/_doc/b02?pretty
{
"_index" : "dynamic_false_visitor_log",
"_type" : "_doc",
"_id" : "b02",
"_version" : 1,
"found" : true,
"_source" : {
"user-id" : "6ae92f19",
"ip" : "10.76.54.12",
"session-id" : "3d86c3ed",
"ts" : "2018-02-01T12:13:14Z",
"url" : "https://www.example.com/api/reports/827362",
"parameters" : [
{
"attr" : "foo",
"value" : "bar"
}
],
"method" : "PUT"
}
}
The extra “parameters” field is stored in the _source
, but is not
indexed and so those fields won’t cause search hits. If we search for
“foo”, we would expect to match no documents:
curl http://localhost:9200/dynamic_false_visitor_log/_doc/_search?q=foo | jq .
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Create a new index with dynamic mapping disabled as “strict”, and a nested field with dynamic mapping enabled for arbitrary attributes:
curl -XPUT http://localhost:9200/dynamic_object_visitor_log -H "content-type: application/json" -d @- <<EOF
{
"mappings": {
"_doc": {
"dynamic": "strict",
"properties": {
"user-id": { "type": "keyword" },
"ip": { "type": "text" },
"session-id": { "type": "keyword" },
"ts": { "type": "date" },
"url": { "type": "text" },
"method": { "type": "keyword" },
"parameters": {
"dynamic": true,
"properties": {}
}
}
}
}
}
EOF
{"acknowledged":true,"shards_acknowledged":true,"index":"dynamic_object_visitor_log"}
Index two documents with entirely different nested data:
curl -XPUT http://localhost:9200/dynamic_object_visitor_log/_doc/d01 -H "content-type: application/json" -d @- <<EOF
{
"user-id": "6ae92f19",
"ip": "10.76.54.12",
"session-id": "3d86c3ed",
"ts": "2018-02-01T12:13:14Z",
"url": "https://www.example.com/api/reports/827362",
"parameters": [ { "attr": "foo", "value": "bar" } ],
"method": "PUT"
}
EOF
{"_index":"dynamic_object_visitor_log","_type":"_doc","_id":"d01","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
curl -XPUT http://localhost:9200/dynamic_object_visitor_log/_doc/d02 -H "content-type: application/json" -d @- <<EOF
{
"user-id": "6ae92f19",
"ip": "10.76.54.12",
"session-id": "3d86c3ed",
"ts": "2018-02-01T12:13:14Z",
"url": "https://www.example.com/api/reports/241826",
"parameters": { "account_created_on": "2016-01-02", "extra_info": "PDF format please" },
"method": "PUT"
}
EOF
{"_index":"dynamic_object_visitor_log","_type":"_doc","_id":"d02","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
Inspect the mapping, and note all the additional mapping fields within
parameters
generated by dynamic mapping:
curl http://localhost:9200/dynamic_object_visitor_log/_doc/_mapping?pretty
{
"dynamic_object_visitor_log" : {
"mappings" : {
"_doc" : {
"dynamic" : "strict",
"properties" : {
"ip" : {
"type" : "text"
},
"method" : {
"type" : "keyword"
},
"parameters" : {
"dynamic" : "true",
"properties" : {
/* NOTE: everything in properties is a result of dynamic mapping */
"account_created_on" : {
"type" : "date"
},
"attr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"extra_info" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"value" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"session-id" : {
"type" : "keyword"
},
"ts" : {
"type" : "date"
},
"url" : {
"type" : "text"
},
"user-id" : {
"type" : "keyword"
}
}
}
}
}
}
Search for a document based on a match in one of those dynamically mapped fields:
curl http://localhost:9200/dynamic_object_visitor_log/_doc/_search?q=PDF | jq .hits.hits
[
{
"_index": "dynamic_object_visitor_log",
"_type": "_doc",
"_id": "d02",
"_score": 0.2876821,
"_source": {
"user-id": "6ae92f19",
"ip": "10.76.54.12",
"session-id": "3d86c3ed",
"ts": "2018-02-01T12:13:14Z",
"url": "https://www.example.com/api/reports/241826",
"parameters": {
"account_created_on": "2016-01-02",
"extra_info": "PDF format please"
},
"method": "PUT"
}
}
]