-
Popularaly known as ELK stack ( Elastic search, Logstash, Kibana)
-
Who uses ELK?
- To be Added
-
Cloud Providers (Managed Infrastruture for Search implementations)
- Azure:https://azuremarketplace.microsoft.com/en-us/marketplace/apps/elastic.elasticsearch?tab=Reviews
- Google Cloud: https://console.cloud.google.com/marketplace/details/click-to-deploy-images/elasticsearch
- AWS ES service: https://aws.amazon.com/elasticsearch-service/
- Elastic Search cloud: https://www.elastic.co/cloud
- Amazon Cloud search ( It uses Apache Solr internally)
-
Other options
- Elastic Search Docker Containers: https://hub.docker.com/_/elasticsearch
- Apache Solr
-
Is it open source?
- yes ( ELK is open source)
- X-pack is paid feature
- Spring framework's official support
- Faster Time to market
- Fuzzy search
- Feature rich search support
- Auto complete
- Aggregation
- Sorting, paging, selective field retrieval
- Regex based search
- Ability to tweak scoring and ranking algorithms
- Fully managed Cloud implementations are available
- Community support ( Q&A on stackoverflow, blogs, documentation)
- Plenty of ways of learn this skill
- pluralsight, lynda, udemy, youtube
- New infrastructure cost
- It could take some time to tune cluster as per our search needs.
- There is some learning curve
- Staff training
- ES specific Json based query language.
- We still have to write Elasticsearch-specific code for:
- Indexing data
- Background job that upserts documents in the index, due to user activity
PROS:
- When people leave, finding java, sql skillset in the market is easier than elastic search ( Just a guess)
- Existing infrastructure is enough
CONS:
- We will have code features that ES provides out of box:
- Auto completion paging, sorting, regex support, aggregation etc
- Change existing logic each time we have to support new requirements.
- Search could get slow.
- We will have to tune our code to make sure we don’t breach SLA or user experience
- Development & maintenance of the growing search related codebase
- Think through a search use case
- Indexing: Getting data into the ES
- Each item in index is document. so decide the shape of json that represents a document
- Do the appropriate mapping to solr data types
- Create indexing job and incremental indexing job that updates/deletes/adds new entries
- Mapping: Deciding shape of json and data type of fields to the ES data types
EMA: Evaluation management platform, that enables evaluators to grade submissions easily. Following items are searchable in EMA
- Assessment, task properties
- EMA User properties such as Roles, Permissions, Assigned Tasks
- Submission Search
- Evaluation Searchs
Getting Setup Locally
- Install elastic search 6.2.2
- Install Kibana 6.2.2
- As an alternative you can get a docker image that packs everything you need
- Top features in Elastic search Aggregation Auto completion Full text search Paging, Sorting Scoring & Ranking search result
Infrastructural action items:
- Run it locally
- Understand shard, index, cluster management and challenges with it
- Understand resource requirements
- Memory
- Compute
- Disk space
- Cost comparison
- Predict or Project cost of using as ES data size increases.
- Backup policy
- Time to reindex everything
- How to run it without downtime. ( Availability)
- running managed service in aws vs running elastic service in ec2 instances
The only dev task we need to do is to create:
- A job that builds the index.
- A job that scans the database tables to selectively insert, update or delete documents in the index.
- Options to continuously feed data to index as new data arrives.
- A service that feeds data to index?
- Create a scheduled service that checks a database to see if there are new entries or updated entries.
- Customize the document structure as per use case
- Logstash
- Data ingestion tool provided by elastic search
Some practical examples for querying data:
-
Query Use cases
- Nested query
- find submission by evaluator name, employee id, full name
- Search Term
- Starting with search term
- Ending with search term
- Containing search term
- Find by id ( number)
- Find by date range
- Find by regex search pattern
- Fuzzy Searches
- Aggregation
- Auto completion
- Nested query
-
Boolean Query
- field1 and field2
- Field1 OR field2
-
Paging related
- Sort asc desc
- Size of result
- Specific page in the result set
- Get only certain columns in the result set
Time is saved in UTC 2018-10-15 12:59:23 is saved as 1539633563000 considering Mountain time zone 12:59:23 becomes 19:59:23
Search in Date range
GET submission/_search
{
"query": {
"range" : {
"dateCreated" : {
"gte": "10/15/2018 19:59:23",
"lte": "10/15/2018 19:59:24",
"format": "MM/dd/yyyy HH:mm:ss"
}
}
}
}
Get all the documents
GET /submission/_search?pretty
{
"query": {
"match_all": {}
}
}
Get the mapping
GET /submission/submission/_mapping
GET /submission/
Delete the index
DELETE /submission
curl -XDELETE "http://localhost:9200/submission"
Create the index
curl -XPUT "http://localhost:9200/submission/" -H 'Content-Type: application/json' -d'
{}'
PUT /submission/
{}
Nested Query Example
curl -XGET "http://localhost:9200/submission/_search?pretty" -H 'Content-Type: application/json' -d'
{
"_source": false,
"query": {
"nested": {
"path": "evaluations",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"match": {
"evaluations.evaluationId": "56818"
}
},
{
"term": {
"evaluations.evaluator.firstName": {
"value": "Jennifer"
}
}
}
]
}
}
}
}
}'
Example: Combining Nested Query with other query
curl -XGET "http://localhost:9200/submission/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "evaluations",
"query": {
"bool": {
"must": [
{
"match": {
"evaluations.evaluationId": "56818"
}
},
{
"term": {
"evaluations.evaluator.firstName": {
"value": "Jennifer"
}
}
}
]
}
}
}
},
{
"range" : {
"dateCreated" : {
"gte": "10/15/2018 19:59:22",
"lte": "10/15/2018 19:59:24",
"format": "MM/dd/yyyy HH:mm:ss"
}
}
}
]
}
}
}'
Aggregations
curl -XGET "http://localhost:9200/submission/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"user_defined": {
"nested": {
"path": "evaluations"
},
"aggs": {
"user_defined_string": {
"stats": {
"field": "evaluations.evaluationId"
}
}
}
}
}
}'
# of submissions by taskId grouped by submission-status
GET /submission/_search
{
"size": 0,
"query": {
"match": {
"taskId": "107"
}
},
"aggs": {
"number_of_submission": {
"terms": {
"field": "status"
}
}
}
}
# of submissions by evaluator Id grouped by submission-status
GET /submission/_search
{
"size": 0,
"query": {
"match": {
"evaluations.evaluatorId": "E00104876"
}
},
"aggs": {
"number_of_submission": {
"terms": {
"field": "status"
}
}
}
}
# of submissions by student and task Id grouped by submission-status
GET submission/_search?pretty
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"studentId": "000067181"
}
},{
"match": {
"taskId": "183"
}
}
]
}
},
"aggs": {
"submission_status": {
"terms": {
"field": "status",
"size": 10
}
}
}
}
Specifying fields in the result, size of result and the offset
GET submission/_search?pretty
{
"_source": [
"evaluations.evaluationId",
"evaluations.evaluator.employeeId",
"evaluations.status"
],
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"studentId": "000971426"
}
},
{
"match": {
"taskId": "263"
}
}
]
}
}
}
EMA existing search features support
- Find submissions by Student Id
- Find submissions by Submission Id
- Find submissions Evaluator first name
- Find submissions Evaluator last Name
- Find submissions Submission Status
- < 24hrs 1 day ago
- <72hrs 3 days ago
- <7days 7 days ago
- <30days 30 days
GET submission/_search?pretty
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"studentId": "000971426"
}
},
{
"match": {
"submissionId": "4818"
}
},
{
"match": {
"evaluations.evaluator.lastName": "Widick"
}
},
{
"match": {
"evaluations.evaluator.firstName": "Lariann"
}
},
{
"match": {
"taskId": "263"
}
},
{
"match": {
"status": "64"
}
},
{
"range": {
"dateUpdated": {
"gte": "17/09/2018 05:54:37",
"lte": "17/09/2018 05:54:37",
"format": "dd/MM/yyyy HH:mm:ss"
}
}
}
]
}
}
}
https://stackoverflow.com/questions/54754790/how-to-use-high-level-rest-client-in-spring-data-es-3-2-0-m1
https://github.com/spring-projects/spring-data-elasticsearch/blob/master/src/main/java/org/springframework/data/elasticsearch/core/ElasticsearchRestTemplate.java
https://docs.spring.io/spring-data/elasticsearch/docs/3.2.0.M1/reference/html/#reference