Skip to content

Instantly share code, notes, and snippets.

@Noobzik
Last active March 19, 2021 17:08
Show Gist options
  • Save Noobzik/1738028986770ece25c97a6c71fc72fb to your computer and use it in GitHub Desktop.
Save Noobzik/1738028986770ece25c97a6c71fc72fb to your computer and use it in GitHub Desktop.
TP1_Elastic Search.md

Ce TP est a faire sous Linux

0. Setup the environment

  1. Installation de ElasticSearch et Kibana
  • Allez sur le lien : https://www.elastic.co/start
  • Extraire les deux fichiers
  • Aller dans le dossier ElasticSearch/config
  • ouvrir le fichier jvm.options
  • Definissez les mémoires (exemple avec 300 mégabytes)
-Xms300m
-Xmx300m
  1. Démarrage des services

Pour pouvoir lancer Kibana, il faut d'abord lancer ElasticSearch, pour ça, ouvrez deux terminal (un pour elasticsearch, l'autre pour kibana) Lancer l'application en faisant

$ bin/elasticsearch  

Verifiez que vous avez accès à localhost:9200 Faire la même chose pour Kibana

$ bin/kibana  

Vérifiez que vous avez accès à localhost:5601


1. Premiere manipulation avec les API

Le but de cette partie est d'inserer trois documents suivant :

Document 1:

  • Title : Reactive Streams in Java
  • Year 2019
  • Author: Adam L.Davies
  • Publisher: Apress
  • Language: English

Document 1:

  • Title : Scala Machine Learning Projects
  • Year 2018
  • Author: Md. Rezaul Karim
  • Publisher: Packt
  • Language: English

Document 3:

  • Title : A Beginner's Guide to Scala Orientation and Functional Programming
  • Year 2018
  • Author: John Hunt
  • Publisher: Springer
  • Language: English

Le code pour cela est


# Creation de l'index ebook

PUT ebook

# Insertion des documents

POST ebook/_doc/1
{
  "Title": "Reactive Streams in Java",
  "Year": 2019,
  "Author": "Adam L. Davis",
  "Publisher": "Apress",
  "Language": "English"
}


POST ebook/_doc/
{
  "Title": "Scala Machine Learning Projects",
  "Year": 2018,
  "Author": "Md. Rezaul Karm",
  "Publisher": "Packt",
  "Language": "English"
}

POST ebook/_doc/3
{
  "Title": "A Beginner's guide to Scala, Object Orientation and Functionnal Programming",
  "Year": 2018,
  "Author": "John Hunt",
  "Publisher": "Springer",
  "Language": "English"
}

# Comptage des documents contenue dans ebook

GET ebook/_count

Trouver la requete qui retourne uniquement le deuxième document (Titre : Scala Machine Learning Projetcs)

GET ebook/_search
{
  "query": {
    "match": {
      "Title": "Scala Machine Learning Projects"
    }
  }
  ,
  "size": 1
}

# Alternative

GET ebook/_search
{
  "query": {
    "term": {
      "Title.keyword": "Scala Machine Learning Projects"
    }
  }
}

2. Hands-on Exercice: Mapping and Analysis

Delete ebook indice

# Suppression
DELETE ebook
# On vérifie la suppression
GET ebook

La vérification de la suppression de ebook a un affichage suivant

{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [ebook]",
        "resource.type" : "index_or_alias",
        "resource.id" : "ebook",
        "index_uuid" : "_na_",
        "index" : "ebook"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index [ebook]",
    "resource.type" : "index_or_alias",
    "resource.id" : "ebook",
    "index_uuid" : "_na_",
    "index" : "ebook"
  },
  "status" : 404
}

Re-créez l'indice ebook avec un meilleur mapping. On doit pouvoir avoir une aggrégation et une recherche full-text des auteurs et du titre

PUT ebook
{
  "mappings": {
    "properties": { 
      "Title": {
        "type": "text"
      },
      "Year": {
        "index": false,
        "type": "date",
        "format": "yyyy"
      },
      "Author": {
        "type": "keyword"
      },
      "Publisher": {
        "index": false,
        "type": "text"
      }
    }
  }
}

Reindexer les trois documents

# Redindexing documents
POST ebook/_doc/1
{
  "Title": "Reactive Streams in Java",
  "Year": 2019,
  "Author": "Adam L. Davis",
  "Publisher": "Apress",
  "Language": "English"
}


POST ebook/_doc/
{
  "Title": "Scala Machine Learning Projects",
  "Year": 2018,
  "Author": "Md. Rezaul Karm",
  "Publisher": "Packt",
  "Language": "English"
}

POST ebook/_doc/3
{
  "Title": "A Beginner's guide to Scala, Object Orientation and Functionnal Programming",
  "Year": 2018,
  "Author": "John Hunt",
  "Publisher": "Springer",
  "Language": "English"
}

Mettre à jour le mapping de l'ebook avec un nouveau champs nommé Date avec le format suivant : "yyyy-MM-dd'T'HH:mm:ss"

PUT ebook/_mapping
{
  "properties": {
    "Date": {
      "type" : "date",
      "format": "yyyy-MM-dd'T'HH:mm:ss"
    }
  }
}

Ajouter la date actuelle pour les trois documents dans le champs date


3. Hands-On Exercise: Queries

Lancer une requete sur la base kibana_sample_dat_log qui trouve les documents qui ont une valeur de host exactement www.elastic.co

GET kibana_sample_data_logs/_search
{
  "query": {
    "match": {
      "host": "www.elastic.co"
    }
  }
}

# Ou

GET kibana_sample_data_logs/_search
{
  "query": {
    "term": {
      "host": "www.elastic.co"
    }
  }
}

Lancer une requete sur l'indice kibana_sample_dat_log qui trouve les documents qui matchent les deux valeurs en même temps parmis chrome, linux, mozilla

GET kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
          "agent": "chrome"
        }},
        {"match": {
            "agent": "linux"
          }},
        {"match": {
            "agent": "mozilla"
          }}
      ],
      "minimum_should_match": 2
    }
  }
}

ou bien

POST kibana_sample_data_logs/_search
{
  "query": {
    "match": {
      "agent": {
        "query": "chrome linux mozilla",
        "minimum_should_match": 2
      }
    }
  }
}

Execute a matchquery on kibana_sample_data_logs indice that hits documents matching oneterms of “chrome”and “safari” in agentfield.

  • Find two others queries that hits exactly the same result (same documents and same scores)
POST kibana_sample_data_logs/_search
{
  "query": {
    "match": {
      "agent": "chrome safari"
    }
  }
}

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "agent": "chrome"
          }
        },
        {
          "match": {
            "agent": "safari"
          }
        }
      ]
    }
  }
}
  • Find another query that hits the same result with score ignoring
POST kibana_sample_data_logs/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "agent": [
            "safari",
            "chrome"
          ]
        }
      },
      "boost": 1
    }
  }
}

4. Aggregations

All aggregations should be executed on kibana_sample_data_logsindex

  1. Find the average of memory for all requests
GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "avg_mem": {
      "avg": {
        "field": "memory"
      }
    }
  }
}
GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "nbr_clientip": {
      "cardinality": {
        "field": "clientip"
      }
    }
  }
}

1001 is the expected result :

{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nbr_clientip" : {
      "value" : 1001
    }
  }
}
  1. Find the most 3 ip used per month
GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "monthly_agg": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "month"
      },
      "aggs": {
        "occurence_ip": {
          "terms": {
            "field": "clientip",
            "size": 3
          }
        }
      }
    }
  }
}
  1. Add a pipeline aggregationthat compute the month with the most requests (documents) (Not Finished)
GET kibana_sample_data_logs/_search?size=0
{
  "aggs": {
    "monthly_aggs": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "month"
      },
      "aggs": {
        "requests_count": {
          "value_count": {
            "field": "timestamp"
          }
        }
      }
    },
    "most_requests_month" :{
      "max_bucket": {
        "buckets_path": "monthly_aggs>requests_count"
      }
    }
  },
  "size": 1
}
  1. Add a pipeline aggregation named "monthly_max_daily_ avg" that computes the month with the most daily memory average
@Noobzik
Copy link
Author

Noobzik commented Mar 19, 2021

GET ebook/_search
{
  "aggs": {
    "Aggregate-Title": {
      "terms": {
        "field": "Title"
      }
    },
    "Aggregate-Author": {
      "term": {
        "field": "Author"
      }
    }
  }
}

Passage du text en keyword pour faire fonctionner les aggregations, reste a savoir comment lancer la recherche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment