ChenLiZhan/Elasticsearch.md

## Elasticsearch.md

      
    Raw
  

              Elasticsearch.md
            
          
    名詞解釋

Relational DB -> Server -> Databases -> Schema -> Tables -> Rows -> Columns
Elasticsearch -> Node -> Indices -> Mapping -> Types  -> Documents -> Fields


在 Elasticsearch 當中，每個儲存 Document 的動作我們稱之為 Indexing（索引）
Shard：通常叫做分片，這是 Elasticsearch 提供分散式搜尋的基礎，其含義是將一個完整的 Index 分成若干部分，儲存在相同或不同的 Node 上，這些組成 Index 的部分就叫做 Shard。
Replica：意思跟 Replication 差不多，就是 Shard 的備份，所以一個 Index 的 Shard 數量就等於 Shard × (1 + Replica)。
映射(mapping)用於進行字段類型確認，將每個字段匹配為一種確定的數據類型(string, number, booleans, date等)。
分析(analysis)用于進行全文文本(Full Text)的分词，以建立供搜索用的反向索引。

Elasticsearch 的 API 操作範例 （RESTful）

假設我們想要在名為 megacorp 的 Index 當中名為 employee 的 Type 下儲存一筆新的員工資料（Document)
PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

觀察所發送請求的 Path （/megacorp/employee/1) 可以整理出下面表格


Path
Description


megacorp
Index


employee
Type


1
the id of the document


Elasticsearch 檢索範例

GET /megacorp/employee/_search

使用_search而不員工的 ID 來做檢索，會返回類似下面的資料格式，其中hits的部分包含了搜尋的結果（Elasticsearch 預設返回前十項 document）
{
   "took":      6,
   "timed_out": false,
   "_shards": { ... },
   "hits": {
      "total":      3,
      "max_score":  1,
      "hits": [
         {
            "_index":         "megacorp",
            "_type":          "employee",
            "_id":            "3",
            "_score":         1,
            "_source": {
               "first_name":  "Douglas",
               "last_name":   "Fir",
               "age":         35,
               "about":       "I like to build cabinets",
               "interests": [ "forestry" ]
            }
         },
         {
            "_index":         "megacorp",
            "_type":          "employee",
            "_id":            "1",
            "_score":         1,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            }
         },
         {
            "_index":         "megacorp",
            "_type":          "employee",
            "_id":            "2",
            "_score":         1,
            "_source": {
               "first_name":  "Jane",
               "last_name":   "Smith",
               "age":         32,
               "about":       "I like to collect rock albums",
               "interests": [ "music" ]
            }
         }
      ]
   }
}
假如我們想要透過一個 document 中的其中一個屬性來做檢索，我們可以發送類似下面的請求
GET /megacorp/employee/_search?q=last_name:Smith


我們依然使用 _search 關鍵字，並且傳送一個 URL Parameter q

使用 DSL 來搜尋

Elasticsearch 提供靈活的查詢語言(DSL)，以提供我們建立更複雜的 Query

DSL (Domain Specific Language) 以 JSON 格式表現
我們可以以下列的 DSL 來取代剛剛的搜尋

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}

接著試著透過 DSL 來定義更複雜的 Query
GET /megacorp/employee/_search
{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "query" : {
                "match" : {
                    "last_name" : "smith" 
                }
            }
        }
    }
}

全文檢索

如果我們想要搜尋員工當中，about 欄位有提及到“rock climbing”的員工
GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

我們能得到類似下面的 response
{
   ...
   "hits": {
      "total":      2,
      "max_score":  0.16273327,
      "hits": [
         {
            ...
            "_score":         0.16273327,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            }
         },
         {
            ...
            "_score":         0.016878016,
            "_source": {
               "first_name":  "Jane",
               "last_name":   "Smith",
               "age":         32,
               "about":       "I like to collect rock albums",
               "interests": [ "music" ]
            }
         }
      ]
   }
}
在默認情況下 Elasticsearch 會對於搜尋結果做相關性的評分評且排序
短語搜尋

若是我們想要確切的匹配若干個單字或是短語（phrase），例如我們想要搜尋同時包含 rock 和 climbing（並且是相鄰的）員工，我們只需要把原來的 match 改成 match_pharase
GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

Hightlight 搜尋結果

我們只需要在之前的語句當中加入highlight關鍵字
GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

返回的結果當中會有一個新的部分叫做highlight並且用<em></em>包起來
{
   ...
   "hits": {
      "total":      1,
      "max_score":  0.23013961,
      "hits": [
         {
            ...
            "_score":         0.23013961,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            },
            "highlight": {
               "about": [
                  "I love to go <em>rock</em> <em>climbing</em>" 
               ]
            }
         }
      ]
   }
}
分析

Elasticsearch 也可以讓我們透過類似 SQL GROUP BY 的功能來提供管理者作分析，稱為聚合 (Aggregation)
假設我們想要找到所有員工共通的興趣愛好是什麼，我們可以發送以下請求
GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}

Elasticsearch 會返回我們以下結果
{
   ...
   "hits": { ... },
   "aggregations": {
      "all_interests": {
         "buckets": [
            {
               "key":       "music",
               "doc_count": 2
            },
            {
               "key":       "forestry",
               "doc_count": 1
            },
            {
               "key":       "sports",
               "doc_count": 1
            }
         ]
      }
   }
}
分散性特色：備份不重複


介紹 Elasticsearch 分散式搜尋系統
Elasticsearch 权威指南（中文版）