Last active
December 27, 2020 13:41
-
-
Save rajivmehtaflex/97b973ab4cba56f44f53c77243c6b1dd to your computer and use it in GitHub Desktop.
ElasticSearch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The boost is used when query with multi query clauses, example: | |
{ | |
"bool":{ | |
"should":[ | |
{ | |
"match":{ | |
"clause1":{ | |
"query":"query1", | |
"boost":3 | |
} | |
} | |
}, | |
{ | |
"match":{ | |
"clause2":{ | |
"query":"query2", | |
"boost":2 | |
} | |
} | |
}, | |
{ | |
"match":{ | |
"clause3":{ | |
"query":"query1", | |
"boost":1 | |
} | |
} | |
} | |
] | |
} | |
} | |
In the above query, it means clause1 is three times important than clause3, clause2 is the twice important than clause2, It's not simply multiply 3, 2, because when calculate score, because there is normalized for scores. | |
also if you just query with one query clause with boost, it's not useful. | |
An usage scenario for using boost: | |
A set of page document set with title and content field. | |
You want to search title and content with some terms, and you think title is more important than content when search these documents. so you can set title query clause boost more than content. Such as if your query hit one document by title field, and one hit document by content field, and you want to hit title field's document prior to the content field document. so boost can help you do it. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bool Query Clause | |
Query clauses that are built from other query clauses are called | |
compound query clauses. Note that compound query clauses can also be | |
comprised of other compound query clauses, allowing for multi-layer | |
nesting.The bool query clause is an example of a compound query clause, as it is | |
used to combine multiple query clauses using boolean operators. The | |
three supported boolean operators are "must" "must_not" and "should", which correspond toAND,NOT, andOR, respectively. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# install es server | |
!apt install default-jdk > /dev/null | |
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.tar.gz -q --show-progress | |
!tar -xzf elasticsearch-6.5.4.tar.gz | |
!chown -R daemon:daemon elasticsearch-6.5.4 | |
# start server | |
import os | |
from subprocess import Popen, PIPE, STDOUT | |
es_server = Popen(['elasticsearch-6.5.4/bin/elasticsearch'], | |
stdout=PIPE, stderr=STDOUT, | |
preexec_fn=lambda: os.setuid(1) # as daemon | |
) | |
# client-side | |
!pip install elasticsearch -q | |
from elasticsearch import Elasticsearch | |
es = Elasticsearch() | |
es.ping() # got True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docker run -d -p 9200:9200 -p 9300:9300 -h elasticsearch --name mm420 -e "discovery.type=single-node" --restart always elasticsearch:7.4.2 | |
docker run -d --link mm420:elasticsearch -p 5601:5601 --restart always kibana:6.8.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PUT /book_index | |
{ | |
"settings": { | |
"number_of_shards": 1 | |
} | |
} | |
PUT /book_index/_bulk | |
{"index":{"_id":1}} | |
{"title":"Elasticsearch: The Definitive Guide","authors":["clinton gormley","zachary tong"],"summary":"A distibuted real-time search and analytics engine","publish_date":"2015-02-07","num_reviews":20,"publisher":"oreilly"} | |
{"index":{"_id":2}} | |
{"title":"Taming Text: How to Find, Organize, and Manipulate It","authors":["grant ingersoll","thomas morton","drew farris"],"summary":"organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization","publish_date":"2013-01-24","num_reviews":12,"publisher":"manning"} | |
{"index":{"_id":3}} | |
{"title":"Elasticsearch in Action","authors":["radu gheorge","matthew lee hinman","roy russo"],"summary":"build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms","publish_date":"2015-12-03","num_reviews":18,"publisher":"manning"} | |
{"index":{"_id":4}} | |
{"title":"Solr in Action","authors":["trey grainger","timothy potter"],"summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr","publish_date":"2014-04-05","num_reviews":23,"publisher":"manning"} | |
GET /book_index/_search | |
{ | |
"query": { | |
"term": { | |
"summary": "build" | |
} | |
} | |
} | |
GET /book_index/_search | |
{ | |
"_source": [ | |
"title", | |
"num_reviews" | |
], | |
"size": 10, | |
"query": { | |
"match": { | |
"title": "in" | |
} | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PUT my-index | |
{ | |
"settings": { | |
"index.knn": true | |
}, | |
"mappings": { | |
"properties": { | |
"my_vector1": { | |
"type": "knn_vector", | |
"dimension": 2 | |
}, | |
"my_vector2": { | |
"type": "knn_vector", | |
"dimension": 4 | |
} | |
} | |
} | |
} | |
POST _bulk | |
{ "index": { "_index": "my-index", "_id": "1" } } | |
{ "my_vector1": [1.5, 2.5], "price": 12.2 } | |
{ "index": { "_index": "my-index", "_id": "2" } } | |
{ "my_vector1": [2.5, 3.5], "price": 7.1 } | |
{ "index": { "_index": "my-index", "_id": "3" } } | |
{ "my_vector1": [3.5, 4.5], "price": 12.9 } | |
{ "index": { "_index": "my-index", "_id": "4" } } | |
{ "my_vector1": [5.5, 6.5], "price": 1.2 } | |
{ "index": { "_index": "my-index", "_id": "5" } } | |
{ "my_vector1": [4.5, 5.5], "price": 3.7 } | |
{ "index": { "_index": "my-index", "_id": "6" } } | |
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } | |
{ "index": { "_index": "my-index", "_id": "7" } } | |
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } | |
{ "index": { "_index": "my-index", "_id": "8" } } | |
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } | |
{ "index": { "_index": "my-index", "_id": "9" } } | |
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } | |
GET my-index/_search | |
{ | |
"size": 2, | |
"query": { | |
"knn": { | |
"my_vector2": { | |
"vector": [2, 3, 5, 6], | |
"k": 2 | |
} | |
} | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://hackernoon.com/building-a-k-nn-similarity-search-engine-using-amazon-elasticsearch-and-sagemaker-zx583yr7 | |
https://blog.logrocket.com/exploring-sql-elasticsearch-open-distro/ | |
https://towardsdatascience.com/building-a-k-nn-similarity-search-engine-using-amazon-elasticsearch-and-sagemaker-98df18d883bd | |
https://towardsdatascience.com/elasticsearch-meets-bert-building-search-engine-with-elasticsearch-and-bert-9e74bf5b4cf2 | |
https://medium.com/analytics-vidhya/elasticbert-information-retrieval-using-bert-and-elasticsearch-51fef465b9ae |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ "match": { "description": "Fourier analysis signals processing" }} | |
{ "match": { "date": "2014-09-01" }} | |
{ "match": { "visible": true }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Multi Match Query Clause | |
The multi match query clause is a match query that is run across multiple | |
fields instead of just one. | |
{ | |
"multi_match": { | |
"query": "probability theory", | |
"fields": ["title", "body"] | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Range Filter Query Clause | |
The range filter query clause is used to filter number and date fields in | |
ranges, using the operators | |
gt gte lt lte | |
short for | |
greater_than | |
greater_than_or_equal less_than | |
and | |
less_than_or_equal | |
, respectively. | |
{ "range" : { "age" : { "gt" : 30 } } } | |
{ | |
"range": { | |
"born" : { | |
"gte": "01/01/2012", | |
"lte": "2013", | |
"format": "dd/MM/yyyy||yyyy" | |
} | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GET /gmmv2/_search | |
{ | |
"_source": [], | |
"size": 1, | |
"from": 11, | |
"query": { | |
"bool": { | |
"filter": [], | |
"must": [], | |
"must_not": [], | |
"should": [] | |
} | |
} | |
} | |
GET /gmmv2/_search | |
{ | |
"_source": [ | |
"title", | |
"director" | |
], | |
"size": "10", | |
"query": { | |
"bool": { | |
"must": [], | |
"filter": [ | |
{ | |
"exists": { | |
"field": "rating" | |
} | |
} | |
], | |
"should": [ | |
{ | |
"match_phrase": { | |
"director": "Richard" | |
} | |
} | |
], | |
"must_not": [] | |
} | |
}, | |
"aggs": { | |
"sample": { | |
"terms": { | |
"field": "rating", | |
"order": { | |
"_count": "desc" | |
}, | |
"size": "1500" | |
} | |
} | |
}, | |
"highlight": { | |
"pre_tags": [ | |
"<b>" | |
], | |
"post_tags": [ | |
"</b>" | |
], | |
"tags_schema": "styled", | |
"fields": { | |
"director": {} | |
} | |
} | |
} | |
GET /gmmv2/_search | |
{ | |
"query": { | |
"bool": { | |
"filter": [], | |
"must": [], | |
"must_not": [], | |
"should": [] | |
} | |
}, | |
"suggest": { | |
"foo": { | |
"text": "Richard f", | |
"term": { | |
"field": "director" | |
} | |
} | |
} | |
} | |
GET gmmv2/_search | |
{ | |
"_source": "title", | |
"size": 20, | |
"query": { | |
"bool": { | |
"should": [ | |
{ | |
"match_phrase": { | |
"director": "Richard" | |
} | |
} | |
] | |
} | |
}, | |
"highlight": { | |
"pre_tags": [ | |
"<b>" | |
], | |
"post_tags": [ | |
"</b>" | |
], | |
"tags_schema": "styled", | |
"fields": { | |
"director": {} | |
} | |
} | |
} | |
PUT gmmv3 | |
PUT gmmv3 | |
{ | |
"settings": { | |
"number_of_shards": 1, | |
"number_of_replicas": 0 | |
}, | |
"mappings": { | |
"properties": { | |
"myloc": { | |
"type": "geo_point" | |
} | |
} | |
} | |
} | |
GET gmmv3/_search | |
{ | |
"query": { | |
"bool": { | |
"must": { | |
"match_all": {} | |
}, | |
"filter": { | |
"geo_distance": { | |
"distance": "100km", | |
"myloc": { | |
"lat": 48.75, | |
"lon": -121.48 | |
} | |
} | |
} | |
} | |
} | |
} | |
PUT gmmv3/_doc/1 | |
{ | |
"myloc": { | |
"lat": 48.75, | |
"lon": -121.48 | |
}, | |
"skills": [ | |
"vb.net", | |
"c++" | |
], | |
"names": [ | |
{ | |
"firstname": "ray", | |
"lastname": "marker", | |
"locality": "bellingham", | |
"region": "washington", | |
"subregion": "whatcom county", | |
"country": "united states", | |
"continent": "north america", | |
"type": "locality", | |
"postal_code": "98225", | |
"most_recent": false, | |
"is_primary": true | |
} | |
] | |
} | |
GET /gmmv2/_search | |
{ | |
"_source": [ | |
"title", | |
"description" | |
], | |
"size": 2000, | |
"min_score": 0.7, | |
"query": { | |
"query_string": { | |
"fields": [ | |
"title", | |
"description" | |
], | |
"query": "(iron man or hulk)", | |
"minimum_should_match": "50%" | |
} | |
} | |
} | |
GET /gmmv2/_search | |
{ | |
"_source": ["title","description"], | |
"min_score":0.7, | |
"query": { | |
"bool": { | |
"must": [ | |
{ | |
"query_string": {"query":"title:'(Iron man)'"} | |
}, | |
{ | |
"query_string": {"query":"description:'(Iron man)'"} | |
} | |
] | |
} | |
} | |
} | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Term/Terms Query Clause | |
The term and terms query clauses are used to filter by a exact value | |
fields by single or multiple values, respectively. In the case of multiple | |
values, the logical connection is | |
OR | |
. | |
For example, the first query finds all documents with the tag “math”. The | |
second query finds all documents with the tags “math” or “statistics”. | |
{ "term": { "tag": "math" }} | |
{ "terms": { "tag": ["math", "statistics"] }} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment