Skip to content

Instantly share code, notes, and snippets.

@spaghetticode
Last active October 18, 2018 08:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save spaghetticode/dcbed5db5b07aa226ff7 to your computer and use it in GitHub Desktop.
Save spaghetticode/dcbed5db5b07aa226ff7 to your computer and use it in GitHub Desktop.
Elasticsearch Introduction

Introduction

Installation on the development machine

We're going to use our development machine in order to build the knowledge on elasticsearch and its usage. If you're on OSX like us, you can install elasticsearch with a simple brew install elasticsearch. Follow the instruction to start the server and to autostart the service at boot. If you are on a different platform you can go to http://www.elasticsearch.org/download/ and choose your preferred installation way.

Now we need to install the elasticsearch development console. Change directory to the elasticsearch installation (you can find it with which elasticsearch... in my case it's /usr/local) and install the marvel plugin: bin/plugin -i elasticsearch/marvel/latest

The developer console available at http://localhost:9200/_plugin/marvel/sense/ on your browser. That's what we're going to use to interact with elasticsearch.

Remember to restart elasticsearch to enable the console.

Create the schema mapping

You need to create an index, and eventually a type. Let's create the "docs" index with "doc" type:

PUT /docs
{
  "mappings": {
    "doc" : {
      "properties" : {
        "title" : {
          "type" : "string",
          "fields": {
            "en": {
              "type": "string",
              "analyzer": "english"
            }
          }
        },
        "body" : {
          "type" : "string",
          "fields": {
            "en": {
              "type": "string",
              "analyzer": "english"
            }
          }
        },
        "keywords" : {
          "type":   "string",
          "analyzer": "standard"
        },
        "location" : {
          "type" : "geo_point"
        }
      }
    }
  }
}

Whenever you need to change existing fields mappings you have to reinsert all the data. This happens because the old records were analyzed according to the old mapping. On the other hand if you decide to add new fields you can keep your old data. You can drop the index with DELETE /indexname

CRUD

Let's add some dummy data to our index.

use POST if you want automatic ids:

POST /docs/doc
{
  "title": "The adventures of Robin Hood",
  "body": "Marian warms up to Robin's fight against injustice (and to Robin himself), eventually becoming a trusted ally.",
  "keywords":"oldies, action, Adventure"
}

on the other hand you can use PUT with explicit id:

PUT /docs/doc/1
{
  "title": "Transformers",
  "body": "a movie about big robots saving the planet",
  "keywords":"scifi, action, robots"
}
PUT /docs/doc/2
{
  "title": "Finding Nemo",
  "body": "A beautyful cartoon about a little fish named Nemo",
  "keywords":"kids, cartoons, pixar"
}

Update document: use PUT and the id of the document you want to update:

PUT /docs/doc/1
{
  "title": "Transformers",
  "body": "a movie about big robots saving the planet",
  "keywords":"scifi, action, robots, autorobot, cars"
}

You can have partial updates as well:

POST /docs/doc/1/_update
{
   "doc" : {
      "keywords" : "scifi, action, robots, autorobot, cars, aliens"
   }
}

Destroy document: use DELETE and the id of the document you want to destroy:

DELETE /docs/doc/1

### A search overview to whet your appetite

basic search, using "body.en" field which is analyzed by the english analyzer:

GET /docs/doc/_search 
{
  "query": {
    "match": {
      "body.en": "save"
    }
  }
}

note how “save” will be matched to “saving”. If we used the regular body field no result would have been found.

If we decide to use “keywords” for the search we need to provide the exact wording, because this field is analyzed with the “standard” analyzer, which only tokenizes and lowercases words.

GET /docs/doc/_search 
{
  "query": {
    "match": {
      "keywords": "robots"
    }
  }
}

If we searched for “robot” then no result would have been found.

Analyzers

Want to know how a string is analyzed by the parser? Let's try with the title field:

GET /docs/_analyze?field=title
{ "Fishing Nemo" }

Now use the title.en field and see the difference:

GET /docs/_analyze?field=title.en
{ "Fishing Nemo" }

In the first case "Fishing" is indexed as "fishing", exact wording, while on the second it's indexed as "fish", the original word is stemmed so it can match a broader quantity of queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment