Relational DB -> Server -> Databases -> Schema -> Tables -> Rows -> Columns
Elasticsearch -> Node -> Indices -> Mapping -> Types -> Documents -> Fields
- 在 Elasticsearch 當中,每個儲存 Document 的動作我們稱之為 Indexing(索引)
- Shard:通常叫做分片,這是 Elasticsearch 提供分散式搜尋的基礎,其含義是將一個完整的 Index 分成若干部分,儲存在相同或不同的 Node 上,這些組成 Index 的部分就叫做 Shard。
- Replica:意思跟 Replication 差不多,就是 Shard 的備份,所以一個 Index 的 Shard 數量就等於 Shard × (1 + Replica)。
- 映射(mapping)用於進行字段類型確認,將每個字段匹配為一種確定的數據類型(string, number, booleans, date等)。
- 分析(analysis)用于進行全文文本(Full Text)的分词,以建立供搜索用的反向索引。
假設我們想要在名為 megacorp 的 Index 當中名為 employee 的 Type 下儲存一筆新的員工資料(Document)
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
觀察所發送請求的 Path (/megacorp/employee/1) 可以整理出下面表格
Path | Description |
---|---|
megacorp | Index |
employee | Type |
1 | the id of the document |
GET /megacorp/employee/_search
使用_search
而不員工的 ID 來做檢索,會返回類似下面的資料格式,其中hits的部分包含了搜尋的結果(Elasticsearch 預設返回前十項 document)
{
"took": 6,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
假如我們想要透過一個 document 中的其中一個屬性來做檢索,我們可以發送類似下面的請求
GET /megacorp/employee/_search?q=last_name:Smith
我們依然使用 _search 關鍵字,並且傳送一個 URL Parameter q
Elasticsearch 提供靈活的查詢語言(DSL),以提供我們建立更複雜的 Query
DSL (Domain Specific Language) 以 JSON 格式表現 我們可以以下列的 DSL 來取代剛剛的搜尋
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
接著試著透過 DSL 來定義更複雜的 Query
GET /megacorp/employee/_search
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"query" : {
"match" : {
"last_name" : "smith"
}
}
}
}
}
如果我們想要搜尋員工當中,about 欄位有提及到“rock climbing”的員工
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
我們能得到類似下面的 response
{
...
"hits": {
"total": 2,
"max_score": 0.16273327,
"hits": [
{
...
"_score": 0.16273327,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_score": 0.016878016,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
在默認情況下 Elasticsearch 會對於搜尋結果做相關性的評分評且排序
若是我們想要確切的匹配若干個單字或是短語(phrase),例如我們想要搜尋同時包含 rock 和 climbing(並且是相鄰的)員工,我們只需要把原來的 match 改成 match_pharase
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
我們只需要在之前的語句當中加入highlight
關鍵字
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
返回的結果當中會有一個新的部分叫做highlight
並且用<em></em>
包起來
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
Elasticsearch 也可以讓我們透過類似 SQL GROUP BY
的功能來提供管理者作分析,稱為聚合 (Aggregation)
假設我們想要找到所有員工共通的興趣愛好是什麼,我們可以發送以下請求
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
Elasticsearch 會返回我們以下結果
{
...
"hits": { ... },
"aggregations": {
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "forestry",
"doc_count": 1
},
{
"key": "sports",
"doc_count": 1
}
]
}
}
}