Ce TP est a faire sous Linux
- Installation de ElasticSearch et Kibana
- Allez sur le lien : https://www.elastic.co/start
- Extraire les deux fichiers
- Aller dans le dossier ElasticSearch/config
- ouvrir le fichier jvm.options
- Definissez les mémoires (exemple avec 300 mégabytes)
-Xms300m
-Xmx300m
- Démarrage des services
Pour pouvoir lancer Kibana, il faut d'abord lancer ElasticSearch, pour ça, ouvrez deux terminal (un pour elasticsearch, l'autre pour kibana) Lancer l'application en faisant
$ bin/elasticsearch
Verifiez que vous avez accès à localhost:9200 Faire la même chose pour Kibana
$ bin/kibana
Vérifiez que vous avez accès à localhost:5601
Le but de cette partie est d'inserer trois documents suivant :
Document 1:
- Title : Reactive Streams in Java
- Year 2019
- Author: Adam L.Davies
- Publisher: Apress
- Language: English
Document 1:
- Title : Scala Machine Learning Projects
- Year 2018
- Author: Md. Rezaul Karim
- Publisher: Packt
- Language: English
Document 3:
- Title : A Beginner's Guide to Scala Orientation and Functional Programming
- Year 2018
- Author: John Hunt
- Publisher: Springer
- Language: English
Le code pour cela est
# Creation de l'index ebook
PUT ebook
# Insertion des documents
POST ebook/_doc/1
{
"Title": "Reactive Streams in Java",
"Year": 2019,
"Author": "Adam L. Davis",
"Publisher": "Apress",
"Language": "English"
}
POST ebook/_doc/
{
"Title": "Scala Machine Learning Projects",
"Year": 2018,
"Author": "Md. Rezaul Karm",
"Publisher": "Packt",
"Language": "English"
}
POST ebook/_doc/3
{
"Title": "A Beginner's guide to Scala, Object Orientation and Functionnal Programming",
"Year": 2018,
"Author": "John Hunt",
"Publisher": "Springer",
"Language": "English"
}
# Comptage des documents contenue dans ebook
GET ebook/_count
Trouver la requete qui retourne uniquement le deuxième document (Titre : Scala Machine Learning Projetcs)
GET ebook/_search
{
"query": {
"match": {
"Title": "Scala Machine Learning Projects"
}
}
,
"size": 1
}
# Alternative
GET ebook/_search
{
"query": {
"term": {
"Title.keyword": "Scala Machine Learning Projects"
}
}
}
Delete ebook indice
# Suppression
DELETE ebook
# On vérifie la suppression
GET ebook
La vérification de la suppression de ebook a un affichage suivant
{
"error" : {
"root_cause" : [
{
"type" : "index_not_found_exception",
"reason" : "no such index [ebook]",
"resource.type" : "index_or_alias",
"resource.id" : "ebook",
"index_uuid" : "_na_",
"index" : "ebook"
}
],
"type" : "index_not_found_exception",
"reason" : "no such index [ebook]",
"resource.type" : "index_or_alias",
"resource.id" : "ebook",
"index_uuid" : "_na_",
"index" : "ebook"
},
"status" : 404
}
Re-créez l'indice ebook avec un meilleur mapping. On doit pouvoir avoir une aggrégation et une recherche full-text des auteurs et du titre
PUT ebook
{
"mappings": {
"properties": {
"Title": {
"type": "text"
},
"Year": {
"index": false,
"type": "date",
"format": "yyyy"
},
"Author": {
"type": "keyword"
},
"Publisher": {
"index": false,
"type": "text"
}
}
}
}
Reindexer les trois documents
# Redindexing documents
POST ebook/_doc/1
{
"Title": "Reactive Streams in Java",
"Year": 2019,
"Author": "Adam L. Davis",
"Publisher": "Apress",
"Language": "English"
}
POST ebook/_doc/
{
"Title": "Scala Machine Learning Projects",
"Year": 2018,
"Author": "Md. Rezaul Karm",
"Publisher": "Packt",
"Language": "English"
}
POST ebook/_doc/3
{
"Title": "A Beginner's guide to Scala, Object Orientation and Functionnal Programming",
"Year": 2018,
"Author": "John Hunt",
"Publisher": "Springer",
"Language": "English"
}
Mettre à jour le mapping de l'ebook avec un nouveau champs nommé Date avec le format suivant : "yyyy-MM-dd'T'HH:mm:ss"
PUT ebook/_mapping
{
"properties": {
"Date": {
"type" : "date",
"format": "yyyy-MM-dd'T'HH:mm:ss"
}
}
}
Ajouter la date actuelle pour les trois documents dans le champs date
Lancer une requete sur la base kibana_sample_dat_log
qui trouve les documents qui ont une valeur de host exactement www.elastic.co
GET kibana_sample_data_logs/_search
{
"query": {
"match": {
"host": "www.elastic.co"
}
}
}
# Ou
GET kibana_sample_data_logs/_search
{
"query": {
"term": {
"host": "www.elastic.co"
}
}
}
Lancer une requete sur l'indice kibana_sample_dat_log
qui trouve les documents qui matchent les deux valeurs en même temps parmis chrome
, linux
, mozilla
GET kibana_sample_data_logs/_search
{
"query": {
"bool": {
"should": [
{"match": {
"agent": "chrome"
}},
{"match": {
"agent": "linux"
}},
{"match": {
"agent": "mozilla"
}}
],
"minimum_should_match": 2
}
}
}
ou bien
POST kibana_sample_data_logs/_search
{
"query": {
"match": {
"agent": {
"query": "chrome linux mozilla",
"minimum_should_match": 2
}
}
}
}
Execute a matchquery on kibana_sample_data_logs indice that hits documents matching oneterms of “chrome”and “safari” in agentfield.
- Find two others queries that hits exactly the same result (same documents and same scores)
POST kibana_sample_data_logs/_search
{
"query": {
"match": {
"agent": "chrome safari"
}
}
}
POST kibana_sample_data_logs/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"agent": "chrome"
}
},
{
"match": {
"agent": "safari"
}
}
]
}
}
}
- Find another query that hits the same result with score ignoring
POST kibana_sample_data_logs/_search
{
"query": {
"constant_score": {
"filter": {
"terms": {
"agent": [
"safari",
"chrome"
]
}
},
"boost": 1
}
}
}
All aggregations should be executed on
kibana_sample_data_logs
index
- Find the average of memory for all requests
GET kibana_sample_data_logs/_search?size=0
{
"aggs": {
"avg_mem": {
"avg": {
"field": "memory"
}
}
}
}
GET kibana_sample_data_logs/_search?size=0
{
"aggs": {
"nbr_clientip": {
"cardinality": {
"field": "clientip"
}
}
}
}
1001 is the expected result :
{
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"nbr_clientip" : {
"value" : 1001
}
}
}
- Find the most 3 ip used per month
GET kibana_sample_data_logs/_search?size=0
{
"aggs": {
"monthly_agg": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "month"
},
"aggs": {
"occurence_ip": {
"terms": {
"field": "clientip",
"size": 3
}
}
}
}
}
}
- Add a pipeline aggregationthat compute the month with the most requests (documents) (Not Finished)
GET kibana_sample_data_logs/_search?size=0
{
"aggs": {
"monthly_aggs": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "month"
},
"aggs": {
"requests_count": {
"value_count": {
"field": "timestamp"
}
}
}
},
"most_requests_month" :{
"max_bucket": {
"buckets_path": "monthly_aggs>requests_count"
}
}
},
"size": 1
}
- Add a pipeline aggregation named "monthly_max_daily_ avg" that computes the month with the most daily memory average
Passage du text en keyword pour faire fonctionner les aggregations, reste a savoir comment lancer la recherche