Skip to content

Instantly share code, notes, and snippets.

@eliasah
Last active July 7, 2016 21:33
Show Gist options
  • Save eliasah/0e9da58ae4ea4e8e4ecb to your computer and use it in GitHub Desktop.
Save eliasah/0e9da58ae4ea4e8e4ecb to your computer and use it in GitHub Desktop.
[elasticsearch] compute K-nearest neighbor for training a classifier purposes
##########################################################################################
# use case: training a classifier
#
# Many systems classify documents by assigning “tag” or “category” fields. Classifying
# documents can be a tedious manual process and so in this example we will train a classifier
# to automatically spot keywords in new documents that suggest a suitable category.
curl -XGET "http://localhost:9200/products_fr/_search" -d'
{
"query": {
"function_score": {
"query": {
"query_string": {
"query": "samsung",
"default_operator": "AND",
"fields": [
"title^3",
"description"
]
}
},
"functions": [
{
"script_score": {
"script": "_score * log1p(doc[\"hits\"].value) + log1p(doc[\"hits\"].value)"
}
}
]
}
},
"aggregations": {
"knn": {
"significant_terms": {
"field": "title","size": 3
}
}
}
, "size": 0
}'
##########################################################################################
# use case: training a classifier
#
# Many systems classify documents by assigning “tag” or “category” fields. Classifying
# documents can be a tedious manual process and so in this example we will train a classifier
# to automatically spot keywords in new documents that suggest a suitable category.
curl -XGET "http://localhost:9200/products_fr/_search" -d'
{
"query": {
"query_string": {
"fields": ["title","description"],
"query": "galaxy"
}
},
"aggregations": {
"knn": {
"significant_terms": {
"field": "title","size": 3
}
}
}
, "size": 0
}'
@archit12
Copy link

Can you please explain how this can be used for classifying new documents. I have something similar to do, and this might work.

@FuadEfendi
Copy link

Nothing related to K-Nearest-Neighbour here. Provided queries are Aggregations. It is not even "More-Like-This" type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment