viztastic/Elastic Search Notes Part 3.md

## Elastic Search Notes Part 3.md

      
    Raw
  

              Elastic Search Notes Part 3.md
            
          
    Aggregations

Really should checkout the Elastic Aggregations documentation, it's great:
In short, the format is something like this
  "aggregations" : {
      "<aggregation_name>" : {
          "<aggregation_type>" : {
              <aggregation_body>
          }
          [,"meta" : {  [<meta_data_body>] } ]?
          [,"aggregations" : { [<sub_aggregation>]+ } ]?
      }
      [,"<aggregation_name_2>" : { ... } ]*
  }
Autocomplete

N-grams

N-grams, give us a window on a word.
E.g. N-gram of length 1: j,o,h,n,s,m,i,t,h
N-gram of length 2: jo,oh,hn,sm,mi,it,th
N-gram of length 3: joh,ohn,smi,mit,ith
N-gram of length 4: john, smit, mith
Great for partial matching.
Edge N-grams (aka anchored N-grams)

But not so great for auto complete, for this we need to use "Edge N-grams" (anchored N-grams)
j,
jo,
joh,
john,
s,
sm,
smi,
smit,
smith
Perfect for the purpose of autocompletion.
N-grams in action:

Step 1: Setup N-gram
  { 
    "filter" : {
      "autocomplete" : {
        "type" : "edge_ngram",
        "min_ngram" : 1,
        "max_ngram" : 20
      }
    }
   }
Step 2: Setup analyzers
    { 
      "analyzer" : {
        "name" : {
            "type" : "standard",
            "stopwords" : []   // we don't want stopwords so we don't endup removing 'A' from the likes of 'A A Miller'
        },
        "name_autocomplete" : {
            "type" : "custom",
            "tokenizer" : "standard",
            "filter" : ["lowercase","autocomplete"]       // we're refercing the autocomplete filter we created in the first step.
        }
      }
    
    }
Step 3: Now that we've defined the analyzers, we need to apply them to the name field:
Previously, the name field was {"name": {"type" : "string"} }, but now that we want to use the name field in two different ways, we need to declare it as a a multifield:
{
   "name" : {
      "type" : "multifield",
      "fields" : {
          "name" : { "type" : "string", "analyzer" : "name"   },      // the name analyser is the analyzer we created in previous step.
          "autocomplete" : {
             "type" : "string",
             "index_analyzer" : "name_autcomplete",     // in the index we want to store j, jo, joh, john
             "search_analyzer": "name"                  // but in the search, we want to only search based on what ther user last typed in (i.e. just joh not j,jo,joh)
          }
       }
   }

}

Step 4: DEL /indexName
Step 5: Recreate the index with our new settings and mappings:
{
  "settings" : {
     "analysis" : {
        "analyzer" : { ... },
        "filter" : { ... },
     }
  },
  "mappings" : {
     "tweet" : {
        "properties" : { ... }
     }
  }
}
Now, we need to do the autocomplete...
Creating the Autocomplete Query

Rudementary Implementation

At its simplest, we can do this:
{
  "match" : {
      "name.autocomplete" : "john smi"
  }
}
But, we should find a way of boosting results where there are full matches (john) in addition to the n-gram matches (smi)
Nicer implementation

{
 "bool": {
   "must": {
     "match": {
       "name.autocomplete": "john smi"
      }
   },
   "should": {               // if the criteria meets this, it gets extra points, it will because john matches
     "match": {
     "name": "john smi"      // note we're checking against the name field, not name.autocomplete, so only john will match
   }
 }
}
}