viztastic/Elastic Search Notes Part 1.md

## Elastic Search Notes Part 1.md

      
    Raw
  

              Elastic Search Notes Part 1.md
            
          
    Elasticsearch 101 - Inverted Index, Normalistion and Analyzers.

Examples

"The quick brown fox jumped over the lazy dog”
“Quick brown foxes leap over lazy dogs in summer”
Inverted index:

Seperate words and terms (Tokenization)
Sort unique terms
List documents containing terms.

Normalised Index:

Reduce everything to lowercase.
Remove stop words (e.g. "the")
Stem words to their root form (e.g. foxes and fox have the same stem)
Draw on synonyms (e.g. jump and leap can be merged into jump)

Analysis
Is: Tokenization + Normalisation
Analzyers
Are: Tokenizer + Token Filters
For example:

E.g. Standard Analyzer is comprised of:

Standard tokenizer (The, quick, brown, foxes...)
Lowercase filter (the, quick, brown, foxes...)
Stopwords filter (quick, brown, foxes...)

E.g. English Analyzer has everything the 'Standard Analyzer' has plus:

"English stemmer" (quick, brown, foxes...)
"English stopwords" (the, quick, brown, fox)

Testing this...

Now, once we do our search (e.g. GET /_search?q=+Quick +foxes), we still don't get anything. This is because we also need to normalise our search query (and not just our search index). Once we search for  GET /_search?q=+quick +foxes we get what we need.
Exact vs Full Text

To Analyze or not to Analyze

If we want a field to be matched exactly, we should set it to be 'not_analyzed':
{ "tweet": {"type": "string", "index": "analyzed" } }
If we want the perks of full text search, we can set it to be analyzed
{ "nickname": {"type": "string", "index": "analyzed" } }
If we want the information stored, but simply not indexed (i.e. not searchable, we can simply set 'index:no'
{ "type": "string", "index": "no" }
Types of Analyzers

If we know a certain string will be english, we can set the type of analzyer. This will be the search and index analyzer:
{ "tweet": {"type": "string", "analyzer": "english" } }
This implies that the tweet is analyzed.

  
## Elastic Search Notes Part 2.md

      
    Raw
  

              Elastic Search Notes Part 2.md
            
          
    Elasticsearch 101 - Querying

Building Queries

GET /_search?q=STRING is not the recommended way to search.
We should pass in proper full body searches like:

Find all documents:

GET /_search
    '{
        "query": {
        "match_all": {}
      },
         "from": 0,
         "size": 10
     }'


More realistically, something like this: find all documents containing "car" in the "tweet" field:

GET /_search
    '{
        "query": {
        "match": { "tweet" : "car" }
      },
         "from": 0,
         "size": 10
     }'

Filters vs. Queries

Filters


Exact matching
Binary yes/no
Fast
Cacheable

Queries


Full text search
Relevance scoring
Heavier (i.e. more taxing performance wise)
Not cacheable

You can either, or both.

Querying and Filtering

Need to wrap the query and filter in a "filtered" property within the query, as per below:
GET /_search
    {
      "query": {
        "filtered": {
          "query": {
            "match": { "tweet": "search" }
          },
          "filter": {
            "term": { "nick": "@mary" }
          }
       }
     }
    }'

Just Filtering

GET /_search
    {
      "query": {
        "filtered": {
          "query": {
            "match_all": {}
          },
          "filter": {
            "term": { "nick": "@mary" }
          }
       }
     }
    }'

which is the same as:
GET /_search
    {
      "query": {
        "filtered": {
          "filter": {
            "term": { "nick": "@mary" }
          }
       }
     }
    }'

You could also specify a sorting mechanism:
GET /_search
    {
      "query": {
        "filtered": {
          "filter": {
            "term": { "nick": "@mary" }
          }
       }
     },
      "sort": {"date":"desc"}
    }'

There are different types of filters, for example, the range filter, to return results within the month of May (for example):
GET /_search
    {
      "query": {
        "filtered": {
          "filter": {
            "range": { 
                "date": {
                 "gte": "2016-05-01",
                 "lte": "2016-05-31"
                 }
          }
       }
     },
      "sort": {"date":"desc"}
    }'