Really should checkout the Elastic Aggregations documentation, it's great:
In short, the format is something like this
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"meta" : { [<meta_data_body>] } ]?
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
N-grams, give us a window on a word. E.g. N-gram of length 1: j,o,h,n,s,m,i,t,h N-gram of length 2: jo,oh,hn,sm,mi,it,th N-gram of length 3: joh,ohn,smi,mit,ith N-gram of length 4: john, smit, mith
Great for partial matching.
But not so great for auto complete, for this we need to use "Edge N-grams" (anchored N-grams)
j, jo, joh, john, s, sm, smi, smit, smith
Perfect for the purpose of autocompletion.
Step 1: Setup N-gram
{
"filter" : {
"autocomplete" : {
"type" : "edge_ngram",
"min_ngram" : 1,
"max_ngram" : 20
}
}
}
Step 2: Setup analyzers
{
"analyzer" : {
"name" : {
"type" : "standard",
"stopwords" : [] // we don't want stopwords so we don't endup removing 'A' from the likes of 'A A Miller'
},
"name_autocomplete" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["lowercase","autocomplete"] // we're refercing the autocomplete filter we created in the first step.
}
}
}
Step 3: Now that we've defined the analyzers, we need to apply them to the name field:
Previously, the name field was {"name": {"type" : "string"} }
, but now that we want to use the name field in two different ways, we need to declare it as a a multifield:
{
"name" : {
"type" : "multifield",
"fields" : {
"name" : { "type" : "string", "analyzer" : "name" }, // the name analyser is the analyzer we created in previous step.
"autocomplete" : {
"type" : "string",
"index_analyzer" : "name_autcomplete", // in the index we want to store j, jo, joh, john
"search_analyzer": "name" // but in the search, we want to only search based on what ther user last typed in (i.e. just joh not j,jo,joh)
}
}
}
}
Step 4: DEL /indexName
Step 5: Recreate the index with our new settings and mappings:
{
"settings" : {
"analysis" : {
"analyzer" : { ... },
"filter" : { ... },
}
},
"mappings" : {
"tweet" : {
"properties" : { ... }
}
}
}
Now, we need to do the autocomplete...
At its simplest, we can do this:
{
"match" : {
"name.autocomplete" : "john smi"
}
}
But, we should find a way of boosting results where there are full matches (john) in addition to the n-gram matches (smi)
{
"bool": {
"must": {
"match": {
"name.autocomplete": "john smi"
}
},
"should": { // if the criteria meets this, it gets extra points, it will because john matches
"match": {
"name": "john smi" // note we're checking against the name field, not name.autocomplete, so only john will match
}
}
}
}