Skip to content

Instantly share code, notes, and snippets.

@rajivmehtaflex
Last active December 27, 2020 13:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajivmehtaflex/97b973ab4cba56f44f53c77243c6b1dd to your computer and use it in GitHub Desktop.
Save rajivmehtaflex/97b973ab4cba56f44f53c77243c6b1dd to your computer and use it in GitHub Desktop.
ElasticSearch
The boost is used when query with multi query clauses, example:
{
"bool":{
"should":[
{
"match":{
"clause1":{
"query":"query1",
"boost":3
}
}
},
{
"match":{
"clause2":{
"query":"query2",
"boost":2
}
}
},
{
"match":{
"clause3":{
"query":"query1",
"boost":1
}
}
}
]
}
}
In the above query, it means clause1 is three times important than clause3, clause2 is the twice important than clause2, It's not simply multiply 3, 2, because when calculate score, because there is normalized for scores.
also if you just query with one query clause with boost, it's not useful.
An usage scenario for using boost:
A set of page document set with title and content field.
You want to search title and content with some terms, and you think title is more important than content when search these documents. so you can set title query clause boost more than content. Such as if your query hit one document by title field, and one hit document by content field, and you want to hit title field's document prior to the content field document. so boost can help you do it.
Bool Query Clause
Query clauses that are built from other query clauses are called
compound query clauses. Note that compound query clauses can also be
comprised of other compound query clauses, allowing for multi-layer
nesting.The bool query clause is an example of a compound query clause, as it is
used to combine multiple query clauses using boolean operators. The
three supported boolean operators are "must" "must_not" and "should", which correspond toAND,NOT, andOR, respectively.
# install es server
!apt install default-jdk > /dev/null
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.tar.gz -q --show-progress
!tar -xzf elasticsearch-6.5.4.tar.gz
!chown -R daemon:daemon elasticsearch-6.5.4
# start server
import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-6.5.4/bin/elasticsearch'],
stdout=PIPE, stderr=STDOUT,
preexec_fn=lambda: os.setuid(1) # as daemon
)
# client-side
!pip install elasticsearch -q
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.ping() # got True
docker run -d -p 9200:9200 -p 9300:9300 -h elasticsearch --name mm420 -e "discovery.type=single-node" --restart always elasticsearch:7.4.2
docker run -d --link mm420:elasticsearch -p 5601:5601 --restart always kibana:6.8.1
PUT /book_index
{
"settings": {
"number_of_shards": 1
}
}
PUT /book_index/_bulk
{"index":{"_id":1}}
{"title":"Elasticsearch: The Definitive Guide","authors":["clinton gormley","zachary tong"],"summary":"A distibuted real-time search and analytics engine","publish_date":"2015-02-07","num_reviews":20,"publisher":"oreilly"}
{"index":{"_id":2}}
{"title":"Taming Text: How to Find, Organize, and Manipulate It","authors":["grant ingersoll","thomas morton","drew farris"],"summary":"organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization","publish_date":"2013-01-24","num_reviews":12,"publisher":"manning"}
{"index":{"_id":3}}
{"title":"Elasticsearch in Action","authors":["radu gheorge","matthew lee hinman","roy russo"],"summary":"build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms","publish_date":"2015-12-03","num_reviews":18,"publisher":"manning"}
{"index":{"_id":4}}
{"title":"Solr in Action","authors":["trey grainger","timothy potter"],"summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr","publish_date":"2014-04-05","num_reviews":23,"publisher":"manning"}
GET /book_index/_search
{
"query": {
"term": {
"summary": "build"
}
}
}
GET /book_index/_search
{
"_source": [
"title",
"num_reviews"
],
"size": 10,
"query": {
"match": {
"title": "in"
}
}
}
PUT my-index
{
"settings": {
"index.knn": true
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2
},
"my_vector2": {
"type": "knn_vector",
"dimension": 4
}
}
}
}
POST _bulk
{ "index": { "_index": "my-index", "_id": "1" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }
{ "index": { "_index": "my-index", "_id": "2" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-index", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-index", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-index", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-index", "_id": "6" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 }
{ "index": { "_index": "my-index", "_id": "7" } }
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 }
{ "index": { "_index": "my-index", "_id": "8" } }
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 }
{ "index": { "_index": "my-index", "_id": "9" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }
GET my-index/_search
{
"size": 2,
"query": {
"knn": {
"my_vector2": {
"vector": [2, 3, 5, 6],
"k": 2
}
}
}
}
https://hackernoon.com/building-a-k-nn-similarity-search-engine-using-amazon-elasticsearch-and-sagemaker-zx583yr7
https://blog.logrocket.com/exploring-sql-elasticsearch-open-distro/
https://towardsdatascience.com/building-a-k-nn-similarity-search-engine-using-amazon-elasticsearch-and-sagemaker-98df18d883bd
https://towardsdatascience.com/elasticsearch-meets-bert-building-search-engine-with-elasticsearch-and-bert-9e74bf5b4cf2
https://medium.com/analytics-vidhya/elasticbert-information-retrieval-using-bert-and-elasticsearch-51fef465b9ae
{ "match": { "description": "Fourier analysis signals processing" }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "visible": true }}
Multi Match Query Clause
The multi match query clause is a match query that is run across multiple
fields instead of just one.
{
"multi_match": {
"query": "probability theory",
"fields": ["title", "body"]
}
}
Range Filter Query Clause
The range filter query clause is used to filter number and date fields in
ranges, using the operators
gt gte lt lte
short for
greater_than
greater_than_or_equal less_than
and
less_than_or_equal
, respectively.
{ "range" : { "age" : { "gt" : 30 } } }
{
"range": {
"born" : {
"gte": "01/01/2012",
"lte": "2013",
"format": "dd/MM/yyyy||yyyy"
}
}
}
GET /gmmv2/_search
{
"_source": [],
"size": 1,
"from": 11,
"query": {
"bool": {
"filter": [],
"must": [],
"must_not": [],
"should": []
}
}
}
GET /gmmv2/_search
{
"_source": [
"title",
"director"
],
"size": "10",
"query": {
"bool": {
"must": [],
"filter": [
{
"exists": {
"field": "rating"
}
}
],
"should": [
{
"match_phrase": {
"director": "Richard"
}
}
],
"must_not": []
}
},
"aggs": {
"sample": {
"terms": {
"field": "rating",
"order": {
"_count": "desc"
},
"size": "1500"
}
}
},
"highlight": {
"pre_tags": [
"<b>"
],
"post_tags": [
"</b>"
],
"tags_schema": "styled",
"fields": {
"director": {}
}
}
}
GET /gmmv2/_search
{
"query": {
"bool": {
"filter": [],
"must": [],
"must_not": [],
"should": []
}
},
"suggest": {
"foo": {
"text": "Richard f",
"term": {
"field": "director"
}
}
}
}
GET gmmv2/_search
{
"_source": "title",
"size": 20,
"query": {
"bool": {
"should": [
{
"match_phrase": {
"director": "Richard"
}
}
]
}
},
"highlight": {
"pre_tags": [
"<b>"
],
"post_tags": [
"</b>"
],
"tags_schema": "styled",
"fields": {
"director": {}
}
}
}
PUT gmmv3
PUT gmmv3
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"myloc": {
"type": "geo_point"
}
}
}
}
GET gmmv3/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "100km",
"myloc": {
"lat": 48.75,
"lon": -121.48
}
}
}
}
}
}
PUT gmmv3/_doc/1
{
"myloc": {
"lat": 48.75,
"lon": -121.48
},
"skills": [
"vb.net",
"c++"
],
"names": [
{
"firstname": "ray",
"lastname": "marker",
"locality": "bellingham",
"region": "washington",
"subregion": "whatcom county",
"country": "united states",
"continent": "north america",
"type": "locality",
"postal_code": "98225",
"most_recent": false,
"is_primary": true
}
]
}
GET /gmmv2/_search
{
"_source": [
"title",
"description"
],
"size": 2000,
"min_score": 0.7,
"query": {
"query_string": {
"fields": [
"title",
"description"
],
"query": "(iron man or hulk)",
"minimum_should_match": "50%"
}
}
}
GET /gmmv2/_search
{
"_source": ["title","description"],
"min_score":0.7,
"query": {
"bool": {
"must": [
{
"query_string": {"query":"title:'(Iron man)'"}
},
{
"query_string": {"query":"description:'(Iron man)'"}
}
]
}
}
}
Term/Terms Query Clause
The term and terms query clauses are used to filter by a exact value
fields by single or multiple values, respectively. In the case of multiple
values, the logical connection is
OR
.
For example, the first query finds all documents with the tag “math”. The
second query finds all documents with the tags “math” or “statistics”.
{ "term": { "tag": "math" }}
{ "terms": { "tag": ["math", "statistics"] }}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment