Noobzik/TP1_Elastic Search.md

## TP1_Elastic Search.md

      
    Raw
  

              TP1_Elastic Search.md
            
          
    Ce TP est a faire sous Linux
0. Setup the environment


Installation de ElasticSearch et Kibana


Allez sur le lien : https://www.elastic.co/start
Extraire les deux fichiers
Aller dans le dossier ElasticSearch/config
ouvrir le fichier jvm.options
Definissez les mémoires (exemple avec 300 mégabytes)

-Xms300m
-Xmx300m


Démarrage des services

Pour pouvoir lancer Kibana, il faut d'abord lancer ElasticSearch, pour ça, ouvrez deux terminal (un pour elasticsearch, l'autre pour kibana)
Lancer l'application en faisant
$ bin/elasticsearch  

Verifiez que vous avez accès à localhost:9200
Faire la même chose pour Kibana
$ bin/kibana  

Vérifiez que vous avez accès à localhost:5601

1. Premiere manipulation avec les API

Le but de cette partie est d'inserer trois documents suivant :
Document 1:

Title : Reactive Streams in Java
Year 2019
Author: Adam L.Davies
Publisher: Apress
Language: English

Document 1:

Title : Scala Machine Learning Projects
Year 2018
Author: Md. Rezaul Karim
Publisher: Packt
Language: English

Document 3:

Title : A Beginner's Guide to Scala Orientation and Functional Programming
Year 2018
Author: John Hunt
Publisher: Springer
Language: English

Le code pour cela est

# Creation de l'index ebook

PUT ebook

# Insertion des documents

POST ebook/_doc/1
{
  "Title": "Reactive Streams in Java",
  "Year": 2019,
  "Author": "Adam L. Davis",
  "Publisher": "Apress",
  "Language": "English"
}


POST ebook/_doc/
{
  "Title": "Scala Machine Learning Projects",
  "Year": 2018,
  "Author": "Md. Rezaul Karm",
  "Publisher": "Packt",
  "Language": "English"
}

POST ebook/_doc/3
{
  "Title": "A Beginner's guide to Scala, Object Orientation and Functionnal Programming",
  "Year": 2018,
  "Author": "John Hunt",
  "Publisher": "Springer",
  "Language": "English"
}

# Comptage des documents contenue dans ebook

GET ebook/_count


Trouver la requete qui retourne uniquement le deuxième document (Titre : Scala Machine Learning Projetcs)
GET ebook/_search
{
  "query": {
    "match": {
      "Title": "Scala Machine Learning Projects"
    }
  }
  ,
  "size": 1
}

# Alternative

GET ebook/_search
{
  "query": {
    "term": {
      "Title.keyword": "Scala Machine Learning Projects"
    }
  }
}


2. Hands-on Exercice: Mapping and Analysis

Delete ebook indice
# Suppression
DELETE ebook
# On vérifie la suppression
GET ebook

La vérification de la suppression de ebook a un affichage suivant
{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [ebook]",
        "resource.type" : "index_or_alias",
        "resource.id" : "ebook",
        "index_uuid" : "_na_",
        "index" : "ebook"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index [ebook]",
    "resource.type" : "index_or_alias",
    "resource.id" : "ebook",
    "index_uuid" : "_na_",
    "index" : "ebook"
  },
  "status" : 404
}

Re-créez l'indice ebook avec un meilleur mapping. On doit pouvoir avoir une aggrégation et une recherche full-text des auteurs et du titre
PUT ebook
{
  "mappings": {
    "properties": { 
      "Title": {
        "type": "text"
      },
      "Year": {
        "index": false,
        "type": "date",
        "format": "yyyy"
      },
      "Author": {
        "type": "keyword"
      },
      "Publisher": {
        "index": false,
        "type": "text"
      }
    }
  }
}

Reindexer les trois documents
# Redindexing documents
POST ebook/_doc/1
{
  "Title": "Reactive Streams in Java",
  "Year": 2019,
  "Author": "Adam L. Davis",
  "Publisher": "Apress",
  "Language": "English"
}


POST ebook/_doc/
{
  "Title": "Scala Machine Learning Projects",
  "Year": 2018,
  "Author": "Md. Rezaul Karm",
  "Publisher": "Packt",
  "Language": "English"
}

POST ebook/_doc/3
{
  "Title": "A Beginner's guide to Scala, Object Orientation and Functionnal Programming",
  "Year": 2018,
  "Author": "John Hunt",
  "Publisher": "Springer",
  "Language": "English"
}

Mettre à jour le mapping de l'ebook avec un nouveau champs nommé Date avec le format suivant : "yyyy-MM-dd'T'HH:mm:ss"
PUT ebook/_mapping
{
  "properties": {
    "Date": {
      "type" : "date",
      "format": "yyyy-MM-dd'T'HH:mm:ss"
    }
  }
}

Ajouter la date actuelle pour les trois documents dans le champs date

3. Hands-On Exercise: Queries

Lancer une requete sur la base kibana_sample_dat_log qui trouve les documents qui ont une valeur de host exactement www.elastic.co
GET kibana_sample_data_logs/_search
{
  "query": {
    "match": {
      "host": "www.elastic.co"
    }
  }
}

# Ou

GET kibana_sample_data_logs/_search
{
  "query": {
    "term": {
      "host": "www.elastic.co"
    }
  }
}
Lancer une requete sur l'indice kibana_sample_dat_log qui trouve les documents qui matchent les deux valeurs en même temps parmis chrome, linux, mozilla
GET kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
          "agent": "chrome"
        }},
        {"match": {
            "agent": "linux"
          }},
        {"match": {
            "agent": "mozilla"
          }}
      ],
      "minimum_should_match": 2
    }
  }
}
ou bien
POST kibana_sample_data_logs/_search
{
  "query": {
    "match": {
      "agent": {
        "query": "chrome linux mozilla",
        "minimum_should_match": 2
      }
    }
  }
}
Execute a matchquery on kibana_sample_data_logs indice that hits documents matching oneterms of “chrome”and “safari” in agentfield.

Find two others queries that hits exactly the same result (same documents and same scores)

POST kibana_sample_data_logs/_search
{
  "query": {
    "match": {
      "agent": "chrome safari"
    }
  }
}

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "agent": "chrome"
          }
        },
        {
          "match": {
            "agent": "safari"
          }
        }
      ]
    }
  }
}

Find another query that hits the same result with score ignoring

POST kibana_sample_data_logs/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "agent": [
            "safari",
            "chrome"
          ]
        }
      },
      "boost": 1
    }
  }
}

4. Aggregations


All aggregations should be executed on kibana_sample_data_logsindex


Find the average of memory for all requests

GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "avg_mem": {
      "avg": {
        "field": "memory"
      }
    }
  }
}
GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "nbr_clientip": {
      "cardinality": {
        "field": "clientip"
      }
    }
  }
}
1001 is the expected result :
{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nbr_clientip" : {
      "value" : 1001
    }
  }
}

Find the most 3 ip used per month

GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "monthly_agg": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "month"
      },
      "aggs": {
        "occurence_ip": {
          "terms": {
            "field": "clientip",
            "size": 3
          }
        }
      }
    }
  }
}

Add a pipeline aggregationthat compute the month with the most requests (documents)
(Not Finished)

GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "monthly_aggs": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "month"
      },
      "aggs": {
        "requests_count": {
          "value_count": {
            "field": "timestamp"
          }
        }
      }
    },
    "most_requests_month" :{
      "max_bucket": {
        "buckets_path": "monthly_aggs>requests_count"
      }
    }
  },
  "size": 1
}

Add a pipeline aggregation named "monthly_max_daily_ avg" that computes the month with the most daily memory average