Skip to content

Instantly share code, notes, and snippets.

@viztastic
Last active September 29, 2016 14:02
Show Gist options
  • Save viztastic/7c87b757f6b1fc2022398b4d1d6b33b1 to your computer and use it in GitHub Desktop.
Save viztastic/7c87b757f6b1fc2022398b4d1d6b33b1 to your computer and use it in GitHub Desktop.

Aggregations

Really should checkout the Elastic Aggregations documentation, it's great:

In short, the format is something like this

  "aggregations" : {
      "<aggregation_name>" : {
          "<aggregation_type>" : {
              <aggregation_body>
          }
          [,"meta" : {  [<meta_data_body>] } ]?
          [,"aggregations" : { [<sub_aggregation>]+ } ]?
      }
      [,"<aggregation_name_2>" : { ... } ]*
  }

Autocomplete

N-grams

N-grams, give us a window on a word. E.g. N-gram of length 1: j,o,h,n,s,m,i,t,h N-gram of length 2: jo,oh,hn,sm,mi,it,th N-gram of length 3: joh,ohn,smi,mit,ith N-gram of length 4: john, smit, mith

Great for partial matching.

Edge N-grams (aka anchored N-grams)

But not so great for auto complete, for this we need to use "Edge N-grams" (anchored N-grams)

j, jo, joh, john, s, sm, smi, smit, smith

Perfect for the purpose of autocompletion.

N-grams in action:

Step 1: Setup N-gram

  { 
    "filter" : {
      "autocomplete" : {
        "type" : "edge_ngram",
        "min_ngram" : 1,
        "max_ngram" : 20
      }
    }
   }

Step 2: Setup analyzers

    { 
      "analyzer" : {
        "name" : {
            "type" : "standard",
            "stopwords" : []   // we don't want stopwords so we don't endup removing 'A' from the likes of 'A A Miller'
        },
        "name_autocomplete" : {
            "type" : "custom",
            "tokenizer" : "standard",
            "filter" : ["lowercase","autocomplete"]       // we're refercing the autocomplete filter we created in the first step.
        }
      }
    
    }

Step 3: Now that we've defined the analyzers, we need to apply them to the name field:

Previously, the name field was {"name": {"type" : "string"} }, but now that we want to use the name field in two different ways, we need to declare it as a a multifield:

{
   "name" : {
      "type" : "multifield",
      "fields" : {
          "name" : { "type" : "string", "analyzer" : "name"   },      // the name analyser is the analyzer we created in previous step.
          "autocomplete" : {
             "type" : "string",
             "index_analyzer" : "name_autcomplete",     // in the index we want to store j, jo, joh, john
             "search_analyzer": "name"                  // but in the search, we want to only search based on what ther user last typed in (i.e. just joh not j,jo,joh)
          }
       }
   }

}

Step 4: DEL /indexName

Step 5: Recreate the index with our new settings and mappings:

{
  "settings" : {
     "analysis" : {
        "analyzer" : { ... },
        "filter" : { ... },
     }
  },
  "mappings" : {
     "tweet" : {
        "properties" : { ... }
     }
  }
}

Now, we need to do the autocomplete...

Creating the Autocomplete Query

Rudementary Implementation

At its simplest, we can do this:

{
  "match" : {
      "name.autocomplete" : "john smi"
  }
}

But, we should find a way of boosting results where there are full matches (john) in addition to the n-gram matches (smi)

Nicer implementation

{
 "bool": {
   "must": {
     "match": {
       "name.autocomplete": "john smi"
      }
   },
   "should": {               // if the criteria meets this, it gets extra points, it will because john matches
     "match": {
     "name": "john smi"      // note we're checking against the name field, not name.autocomplete, so only john will match
   }
 }
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment