Skip to content

Instantly share code, notes, and snippets.

@thecambian
Created February 21, 2018 14:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save thecambian/81afe991e2ad695ec33880764a9bb35b to your computer and use it in GitHub Desktop.
Save thecambian/81afe991e2ad695ec33880764a9bb35b to your computer and use it in GitHub Desktop.
Part 1: Explicit mapping

Part 1: Explicit Mapping

Start an Elasticsearch Docker container

Docker containers are really convenient for this.

Get a recent Elasticsearch Docker container:

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.2.1
6.2.1: Pulling from elasticsearch/elasticsearch
Digest: sha256:cc5fe435b81c0b586363ca33573e802a060468ae93a6cb4702caa1e336f4d1c3
Status: Image is up to date for docker.elastic.co/elasticsearch/elasticsearch:6.2.1

Run ES, exposing port 9200 for queries:

docker run --name es -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.2.1
0ea334fdb6e7f4d9542a2f6768290c0743fbf464eef390328aeb99fd065bf041

Query the container to test aliveness:

curl localhost:9200
{
  "name" : "5G3WnUv",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "gwX4DRPSStiP38PBzqiJ_Q",
  "version" : {
    "number" : "6.2.1",
    "build_hash" : "7299dc3",
    "build_date" : "2018-02-07T19:34:26.990113Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Dynamic mapping example

Create a new index

Dynamic mapping is great for exploratory work. You don’t need to plan your data structure, just index it and query away.

Create an imaginatively-named new index:

curl -XPUT http://localhost:9200/new_index?pretty -H "content-type: application/json" -d @- <<EOF
{}
EOF
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "new_index"
}

Add three initial documents

Add three documents into the index, each with a different structure but some overlapping keys:

curl -XPUT http://localhost:9200/new_index/_doc/001 -H "content-type: application/json" -d @- <<EOF
{
  "user": "thecambian",
  "when": "yesterday",
  "message": "trying out Elasticsearch's essential stuff",
  "source": "elasticsearch documentation"
}
EOF

echo

curl -XPUT http://localhost:9200/new_index/_doc/002 -H "content-type: application/json" -d @- <<EOF
{
  "when": "2017-01-01T12:00:33Z",
  "url": "https://www.google.com/search?q=elasticsearch",
  "source_ip": "127.0.0.1"
}
EOF

echo

curl -XPUT http://localhost:9200/new_index/_doc/003 -H "content-type: application/json" -d @- <<EOF
{
  "start": "2017-02-01T14:41:23Z",
  "author": "cambium",
  "article": {
    "topic": "great features for elasticsearch",
    "tags": [ "elasticsearch", "databases", "technology" ]
  }
}
EOF
{"_index":"new_index","_type":"_doc","_id":"001","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
{"_index":"new_index","_type":"_doc","_id":"002","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}
{"_index":"new_index","_type":"_doc","_id":"003","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Query the new index

Run a quick search for elastic* on these new data, using JQ to filter the result down to just the hits count:

curl 'http://localhost:9200/new_index/_search?q=elastic*' | jq .hits.total
3

All three documents match!

Dynamic mapping problems

Not all datatypes are dynamically mapped

How did Elasticsearch map the source_ip field?

curl http://localhost:9200/new_index/_mapping/_doc/field/source_ip?pretty
{
  "new_index" : {
    "mappings" : {
      "_doc" : {
        "source_ip" : {
          "full_name" : "source_ip",
          "mapping" : {
            /* here is the source_ip mapping: */
            "source_ip" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }
      }
    }
  }
}

It’s mapped as a text field with a keyword subfield, and not as an IP address data type.

Indexing into incompatible fields

The start field was first mapped when document 003 was indexed:

curl 'http://localhost:9200/new_index/_doc/003?pretty&_source_include=start' | jq ._source
{
  "start": "2017-02-01T14:41:23Z"
}

It is a date field:

curl http://localhost:9200/new_index/_mapping/_doc/field/start?pretty
{
  "new_index" : {
    "mappings" : {
      "_doc" : {
        "start" : {
          "full_name" : "start",
          "mapping" : {
            "start" : {
              "type" : "date"
            }
          }
        }
      }
    }
  }
}

What happens when you try to index an incompatible field? In this case, we try to index a lat/lon pair into a date field.

curl -XPUT http://localhost:9200/new_index/_doc/004?pretty -H "content-type: application/json" -d @- <<EOF
{
  "route_name": "bike ride boulder to denver"
  "start": "40.01,-105.27",
  "end": "39.73N,-104.99"
}
EOF
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse",
    "caused_by" : {
      "type" : "json_parse_exception",
      "reason" : "Unexpected character ('\"' (code 34)): was expecting comma to separate Object entries\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@61370f12; line: 1, column: 50]"
    }
  },
  "status" : 400
}

Elasticsearch refuses to index the document. There is no reasonable way to coerce the start field’s value 40.01,-105.27 into a date datatype.

“First-in wins”

Let’s look at how the when field mapping is set up:

curl http://localhost:9200/new_index/_mapping/_doc/field/when?pretty
{
  "new_index" : {
    "mappings" : {
      "_doc" : {
        "when" : {
          "full_name" : "when",
          "mapping" : {
            "when" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }
      }
    }
  }
}

It’s a text field, because the first document indexed with a when field had a value of yesterday:

curl 'http://localhost:9200/new_index/_doc/001?pretty&_source_include=when' | jq ._source
{
  "when": "yesterday"
}

The second document indexed with this field had a date value:

curl 'http://localhost:9200/new_index/_doc/002?pretty&_source_include=when' | jq ._source
{
  "when": "2017-01-01T12:00:33Z"
}

Unfortuntely, this also got indexed as a text field, not a date field.

Silent errors

A common error is to get a field name subtly wrong. Document 002 might represent an activity log event, with a source_ip field. If we index a new document having a source-ip field, we can see the impact of this mistake.

Index a new document with a subtle typo in the source_ip field:

curl -XPUT http://localhost:9200/new_index/_doc/005?pretty -H "content-type: application/json" -d @- <<EOF
{
  "when": "2017-02-10T08:10:10Z",
  "url": "https://www.google.com/search?q=json",
  "source-ip": "127.0.0.1"
}
EOF
{
  "_index" : "new_index",
  "_type" : "_doc",
  "_id" : "005",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

Now we search for all events coming from “127.0.0.1”:

curl 'http://localhost:9200/new_index/_search?q=source_ip:"127.0.0.1"&pretty' | jq .hits
{
  "total": 1,
  "max_score": 0.2876821,
  "hits": [
    {
      "_index": "new_index",
      "_type": "_doc",
      "_id": "002",
      "_score": 0.2876821,
      "_source": {
        "when": "2017-01-01T12:00:33Z",
        "url": "https://www.google.com/search?q=elasticsearch",
        "source_ip": "127.0.0.1"
      }
    }
  ]
}

Only document 002 is returned. Document 005 is not in the result set because it doesn’t have a source_ip field.

Explicit mapping example

Index with dynamic mapping = strict

Create a new index with dynamic mapping disabled as “strict”:

curl -XPUT http://localhost:9200/dynamic_strict_visitor_log?pretty -H "content-type: application/json" -d @- <<EOF
{
  "mappings": {
    "_doc": {
      "dynamic": "strict",
      "properties": {
        "user-id":    { "type": "keyword" },
        "ip":         { "type": "text" },
        "session-id": { "type": "keyword" },
        "ts":         { "type": "date" },
        "url":        { "type": "text" },
        "method":     { "type": "keyword" }
      }
    }
  }
}
EOF
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "dynamic_strict_visitor_log"
}

Index a fully conforming document:

curl -XPUT http://localhost:9200/dynamic_strict_visitor_log/_doc/c01 -H "content-type: application/json" -d @- <<EOF
{
  "user-id": "6ae92f19",
  "ip": "10.76.54.12",
  "session-id": "3d86c3ed",
  "ts": "2018-02-01T12:13:14Z",
  "url": "https://www.example.com/api/reports/827362",
  "method": "PUT"
}
EOF
{"_index":"dynamic_strict_visitor_log","_type":"_doc","_id":"c01","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

Try to index a document with an extra field:

curl -XPUT http://localhost:9200/dynamic_strict_visitor_log/_doc/c02?pretty -H "content-type: application/json" -d @- <<EOF
{
  "user-id": "6ae92f19",
  "ip": "10.76.54.12",
  "session-id": "3d86c3ed",
  "ts": "2018-02-01T12:13:14Z",
  "url": "https://www.example.com/api/reports/827362",
  "parameters": [ { "attr": "foo", "value": "bar" } ],
  "method": "PUT"
}
EOF
{
  "error" : {
    "root_cause" : [
      {
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [parameters] within [_doc] is not allowed"
      }
    ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [parameters] within [_doc] is not allowed"
  },
  "status" : 400
}

This document wasn’t accepted - any deviations from the mapping will cause an error.

Index with dynamic mapping = false

Create a new index with dynamic mapping set to “false”:

curl -XPUT http://localhost:9200/dynamic_false_visitor_log -H "content-type: application/json" -d @- <<EOF
{
  "mappings": {
    "_doc": {
      "dynamic": false,
      "properties": {
        "user-id":    { "type": "keyword" },
        "ip":         { "type": "text" },
        "session-id": { "type": "keyword" },
        "ts":         { "type": "date" },
        "url":        { "type": "text" },
        "method":     { "type": "keyword" }
      }
    }
  }
}
EOF
{"acknowledged":true,"shards_acknowledged":true,"index":"dynamic_false_visitor_log"}

Index a fully conforming document:

curl -XPUT http://localhost:9200/dynamic_false_visitor_log/_doc/b01 -H "content-type: application/json" -d @- <<EOF
{
  "user-id": "6ae92f19",
  "ip": "10.76.54.12",
  "session-id": "3d86c3ed",
  "ts": "2018-02-01T12:13:14Z",
  "url": "https://www.example.com/api/reports/827362",
  "method": "PUT"
}
EOF
{"_index":"dynamic_false_visitor_log","_type":"_doc","_id":"b01","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Index a document with an extra field:

curl -XPUT http://localhost:9200/dynamic_false_visitor_log/_doc/b02 -H "content-type: application/json" -d @- <<EOF
{
  "user-id": "6ae92f19",
  "ip": "10.76.54.12",
  "session-id": "3d86c3ed",
  "ts": "2018-02-01T12:13:14Z",
  "url": "https://www.example.com/api/reports/827362",
  "parameters": [ { "attr": "foo", "value": "bar" } ],
  "method": "PUT"
}
EOF
{"_index":"dynamic_false_visitor_log","_type":"_doc","_id":"b02","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Retrieve the second document:

curl http://localhost:9200/dynamic_false_visitor_log/_doc/b02?pretty
{
  "_index" : "dynamic_false_visitor_log",
  "_type" : "_doc",
  "_id" : "b02",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "user-id" : "6ae92f19",
    "ip" : "10.76.54.12",
    "session-id" : "3d86c3ed",
    "ts" : "2018-02-01T12:13:14Z",
    "url" : "https://www.example.com/api/reports/827362",
    "parameters" : [
      {
        "attr" : "foo",
        "value" : "bar"
      }
    ],
    "method" : "PUT"
  }
}

The extra “parameters” field is stored in the _source, but is not indexed and so those fields won’t cause search hits. If we search for “foo”, we would expect to match no documents:

curl http://localhost:9200/dynamic_false_visitor_log/_doc/_search?q=foo | jq .
{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

Index with dynamic mapping in a nested field

Create a new index with dynamic mapping disabled as “strict”, and a nested field with dynamic mapping enabled for arbitrary attributes:

curl -XPUT http://localhost:9200/dynamic_object_visitor_log -H "content-type: application/json" -d @- <<EOF
{
  "mappings": {
    "_doc": {
      "dynamic": "strict",
      "properties": {
        "user-id":    { "type": "keyword" },
        "ip":         { "type": "text" },
        "session-id": { "type": "keyword" },
        "ts":         { "type": "date" },
        "url":        { "type": "text" },
        "method":     { "type": "keyword" },
        "parameters": {
          "dynamic": true,
          "properties": {}
        }
      }
    }
  }
}
EOF
{"acknowledged":true,"shards_acknowledged":true,"index":"dynamic_object_visitor_log"}

Index two documents with entirely different nested data:

curl -XPUT http://localhost:9200/dynamic_object_visitor_log/_doc/d01 -H "content-type: application/json" -d @- <<EOF
{
  "user-id": "6ae92f19",
  "ip": "10.76.54.12",
  "session-id": "3d86c3ed",
  "ts": "2018-02-01T12:13:14Z",
  "url": "https://www.example.com/api/reports/827362",
  "parameters": [ { "attr": "foo", "value": "bar" } ],
  "method": "PUT"
}
EOF
{"_index":"dynamic_object_visitor_log","_type":"_doc","_id":"d01","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
curl -XPUT http://localhost:9200/dynamic_object_visitor_log/_doc/d02 -H "content-type: application/json" -d @- <<EOF
{
  "user-id": "6ae92f19",
  "ip": "10.76.54.12",
  "session-id": "3d86c3ed",
  "ts": "2018-02-01T12:13:14Z",
  "url": "https://www.example.com/api/reports/241826",
  "parameters": { "account_created_on": "2016-01-02", "extra_info": "PDF format please" },
  "method": "PUT"
}
EOF
{"_index":"dynamic_object_visitor_log","_type":"_doc","_id":"d02","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

Inspect the mapping, and note all the additional mapping fields within parameters generated by dynamic mapping:

curl http://localhost:9200/dynamic_object_visitor_log/_doc/_mapping?pretty
{
  "dynamic_object_visitor_log" : {
    "mappings" : {
      "_doc" : {
        "dynamic" : "strict",
        "properties" : {
          "ip" : {
            "type" : "text"
          },
          "method" : {
            "type" : "keyword"
          },
          "parameters" : {
            "dynamic" : "true",
            "properties" : {
              /* NOTE: everything in properties is a result of dynamic mapping */
              "account_created_on" : {
                "type" : "date"
              },
              "attr" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "extra_info" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "value" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "session-id" : {
            "type" : "keyword"
          },
          "ts" : {
            "type" : "date"
          },
          "url" : {
            "type" : "text"
          },
          "user-id" : {
            "type" : "keyword"
          }
        }
      }
    }
  }
}

Search for a document based on a match in one of those dynamically mapped fields:

curl http://localhost:9200/dynamic_object_visitor_log/_doc/_search?q=PDF | jq .hits.hits
[
  {
    "_index": "dynamic_object_visitor_log",
    "_type": "_doc",
    "_id": "d02",
    "_score": 0.2876821,
    "_source": {
      "user-id": "6ae92f19",
      "ip": "10.76.54.12",
      "session-id": "3d86c3ed",
      "ts": "2018-02-01T12:13:14Z",
      "url": "https://www.example.com/api/reports/241826",
      "parameters": {
        "account_created_on": "2016-01-02",
        "extra_info": "PDF format please"
      },
      "method": "PUT"
    }
  }
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment