yangl/0-elasticsearch

## 0-elasticsearch
elasticsearch相关

1.安装
http://www.cnblogs.com/jstarseven/p/6803054.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/zip-targz.html

重要参数详见：https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html

2.常用插件

https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/intro.html

head插件：
https://github.com/mobz/elasticsearch-head

ik中文分词：
https://github.com/medcl/elasticsearch-analysis-ik/releases

sql支持：
https://github.com/NLPchina/elasticsearch-sql/


## 1-中文分词相关.txt
Elasticsearch 权威指南
https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html
https://github.com/elasticsearch-cn/elasticsearch-definitive-guide

插件：
https://github.com/medcl/elasticsearch-analysis-ik

测试：
http://101.200.196.99:9200/uxinlive/_analyze?text=我是测试一下中文分词的呢MN&tokenizer=ik_max_word

支持 ik_smart 和 ik_max_word两种

https://www.elastic.co/guide/cn/elasticsearch/guide/current/mapping-intro.html

创建string类型的精确值----keyword

string 域 index 属性默认是 analyzed 。如果我们想映射这个字段为一个精确值，我们需要设置它为 not_analyzed ：

curl -XPUT 'http://101.200.196.99:9200/uxinlive/_mapping/room_info?pretty' -H 'Content-Type: application/json' -d'
{
  "properties" : {
    "city" : {
      "type" :    "string",
      "index":    "not_analyzed"
    }
  }
}
'

注：es5.x.x版本已经把string细分成text和keyword两种，text走分词、keyword不会！

curl -XGET 'http://101.200.196.99:9200/uxinlive/_mapping/room_info?pretty  查看一下


## 2-插件国内加速.txt
你可以使用我们定制的 cnpm (gzip 压缩支持) 命令行工具代替默认的 npm:

$ npm install -g cnpm --registry=https://registry.npm.taobao.org

详见：https://npm.taobao.org/

## 3-附近的人
SearchRequestBuilder srb = null;

GeoDistanceQueryBuilder location = QueryBuilders.geoDistanceQuery("location").point(lat, lon).distance(1000, DistanceUnit.METERS);
srb.setPostFilter(location); // 获取距离多少公里 这个才是获取点与点之间的距离的
GeoDistanceSortBuilder sort = SortBuilders.geoDistanceSort("location", lat, lon);
sort.unit(DistanceUnit.METERS);
sort.order(SortOrder.ASC);
srb.addSort(sort);


距离在这个数组中：
hit.getSortValues()

## 4-青云配置建议
注意事项
详见：https://docs.qingcloud.com/guide/elasticsearch.html
使用 Elasticsearch 需要注意的一些事项

安全

Elasticsearch 本身的 API 没有提供安全机制，同时 Elasticsearch 的 API 的功能非常强大，所以强烈不建议通过公网将 Elasticsearch 直接暴露出去，Elasticsearch 需要在应用或者 API 网关后面。 针对Elasticsearch的攻击频发，因此建议用户通过VPN的方式而不是端口转发的方式访问集群节点，配置青云VPN的方法详见 用户指南 。

脚本

Elasticsearch 本身提供了查询和索引时使用脚本的功能，该功能强大但也比较危险。青云的 Elasticsearch 服务默认的脚本配置如下：

script.inline: sandbox
script.indexed: sandbox
script.file: false
script.update: false
script.mapping: false
默认只支持 sandbox 类型的脚本语言，比如 lucene expression，mustache，不支持 groovy，禁止通过脚本更新，用户可以在配置组中更改以上默认设置。

分片以及副本

青云的 Elasticsearch 服务的默认分片以及副本配置如下：

index.number_of_shards：3
index.number_of_replicas: 1
也就是说创建的索引默认有 3 个分片，每个分片 1 个副本。这个可以通过配置组修改默认值，也可以创建每个索引的时候通过 API 参数指定。 索引的分片数和节点的伸缩关系很大，一旦创建索引后不能变更，请创建索引前预先评估好。 比如3个分片的话，节点扩容到3个节点，每个节点一个分片刚合适，如果再继续扩容的话就意义不大了（指对当前索引，Elasticsearch 集群可能包含多个索引）。 更详细的说明请参看 Elasticsearch 官方文档的 shard-scale 章节 。

其他配置项说明

以下是青云的 Elasticsearch 配置组中的可变更配置说明，如果需要其他的配置项目，请在后台提交工单说明。

action.auto_create_index 是否自动创建索引，默认为 true，当提交数据的时候，如果该索引不存在则自动创建。如果为了避免误操作，可以关闭此选项。
action.destructive_requires_name 删除索引是是否需要明确指定名称， 默认为 true，即不允许通过通配符删除索引。如果是自动按日期创建索引，需要定时通过通配符清理，可以关闭此选项。
http.cors.enabled 是否开启cors，默认为 false。如果需要自己部署 Elasticsearch 的web管理界面（已经内置了kopf），则需要打开此设置项。
index.mapper.dynamic 是否动态给数据类型创建mapper，默认为 true。详细说明请参看 Elasticsearch 官方文档 dynamic-mapping 。
Discovery以及Recovery配置

Elasticsearch 提供的 Discovery 以及 Recovery 配置我们会自动根据节点数进行计算设置，无需手动设置。

discovery.zen.minimum_master_nodes 为 (len(nodes) / 2) + 1。比如3个节点的时候该值是2。但当节点数小于3的时候，该值为1。
discovery.zen.ping.unicast.hosts 会设置为除了本机之外的集群内其他所有节点。
gateway.recover_after_nodes 为 max(discovery.zen.minimum_master_nodes, len(nodes)-2)。
gateway.expected_nodes 为 len(nodes)，和集群内的节点数保持相等。
gateway.recover_after_time 保持 Elasticsearch 的默认值5m，未做变更。
以上设置对集群的可用性也有影响。比如5个节点的集群，minimum_master_nodes 为3，如果挂掉2台，集群还能正常运作，但如果挂掉3台，存活节点小于 minimum_master_nodes，集群无法选出 master，于是不能正常运作。

Recovery相关设置只对集群启动有影响（比如 停止整个集群后再启动）。 比如10个节点的集群，按照上面的规则配置，当集群重启后，首先系统等待 minimum_master_nodes（6）个节点加入才会选出 master， recovery 操作是在 master 节点上进行的，由于我们设置了 recover_after_nodes（8），系统会继续等待到 8 个节点加入， 才开始进行recovery。当开始 recovery 的时候，如果发现集群中的节点数小于 expected_nodes，也就是还有部分节点未加入， 于是开始recover_after_time 倒计时 (如果节点数达到expected_nodes则立刻进行 recovery)，5分钟后，如果剩余的2个节点依然没有加入，则会进行数据 recovery。
	elasticsearch相关

	1.安装
	http://www.cnblogs.com/jstarseven/p/6803054.html
	https://www.elastic.co/guide/en/elasticsearch/reference/current/zip-targz.html

	重要参数详见：https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html

	2.常用插件

	https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/intro.html

	head插件：
	https://github.com/mobz/elasticsearch-head

	ik中文分词：
	https://github.com/medcl/elasticsearch-analysis-ik/releases

	sql支持：
	https://github.com/NLPchina/elasticsearch-sql/
	Elasticsearch 权威指南
	https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html
	https://github.com/elasticsearch-cn/elasticsearch-definitive-guide

	插件：
	https://github.com/medcl/elasticsearch-analysis-ik

	测试：
	http://101.200.196.99:9200/uxinlive/_analyze?text=我是测试一下中文分词的呢MN&tokenizer=ik_max_word

	支持 ik_smart 和 ik_max_word两种

	https://www.elastic.co/guide/cn/elasticsearch/guide/current/mapping-intro.html

	创建string类型的精确值----keyword

	string 域 index 属性默认是 analyzed 。如果我们想映射这个字段为一个精确值，我们需要设置它为 not_analyzed ：

	curl -XPUT 'http://101.200.196.99:9200/uxinlive/_mapping/room_info?pretty' -H 'Content-Type: application/json' -d'
	{
	"properties" : {
	"city" : {
	"type" : "string",
	"index": "not_analyzed"
	}
	}
	}
	'

	注：es5.x.x版本已经把string细分成text和keyword两种，text走分词、keyword不会！

	curl -XGET 'http://101.200.196.99:9200/uxinlive/_mapping/room_info?pretty 查看一下
	你可以使用我们定制的 cnpm (gzip 压缩支持) 命令行工具代替默认的 npm:

	$ npm install -g cnpm --registry=https://registry.npm.taobao.org

	详见：https://npm.taobao.org/
	SearchRequestBuilder srb = null;

	GeoDistanceQueryBuilder location = QueryBuilders.geoDistanceQuery("location").point(lat, lon).distance(1000, DistanceUnit.METERS);
	srb.setPostFilter(location); // 获取距离多少公里这个才是获取点与点之间的距离的
	GeoDistanceSortBuilder sort = SortBuilders.geoDistanceSort("location", lat, lon);
	sort.unit(DistanceUnit.METERS);
	sort.order(SortOrder.ASC);
	srb.addSort(sort);


	距离在这个数组中：
	hit.getSortValues()
	注意事项
	详见：https://docs.qingcloud.com/guide/elasticsearch.html
	使用 Elasticsearch 需要注意的一些事项

	安全

	Elasticsearch 本身的 API 没有提供安全机制，同时 Elasticsearch 的 API 的功能非常强大，所以强烈不建议通过公网将 Elasticsearch 直接暴露出去，Elasticsearch 需要在应用或者 API 网关后面。针对Elasticsearch的攻击频发，因此建议用户通过VPN的方式而不是端口转发的方式访问集群节点，配置青云VPN的方法详见用户指南。

	脚本

	Elasticsearch 本身提供了查询和索引时使用脚本的功能，该功能强大但也比较危险。青云的 Elasticsearch 服务默认的脚本配置如下：

	script.inline: sandbox
	script.indexed: sandbox
	script.file: false
	script.update: false
	script.mapping: false
	默认只支持 sandbox 类型的脚本语言，比如 lucene expression，mustache，不支持 groovy，禁止通过脚本更新，用户可以在配置组中更改以上默认设置。

	分片以及副本

	青云的 Elasticsearch 服务的默认分片以及副本配置如下：

	index.number_of_shards：3
	index.number_of_replicas: 1
	也就是说创建的索引默认有 3 个分片，每个分片 1 个副本。这个可以通过配置组修改默认值，也可以创建每个索引的时候通过 API 参数指定。索引的分片数和节点的伸缩关系很大，一旦创建索引后不能变更，请创建索引前预先评估好。比如3个分片的话，节点扩容到3个节点，每个节点一个分片刚合适，如果再继续扩容的话就意义不大了（指对当前索引，Elasticsearch 集群可能包含多个索引）。更详细的说明请参看 Elasticsearch 官方文档的 shard-scale 章节。

	其他配置项说明

	以下是青云的 Elasticsearch 配置组中的可变更配置说明，如果需要其他的配置项目，请在后台提交工单说明。

	action.auto_create_index 是否自动创建索引，默认为 true，当提交数据的时候，如果该索引不存在则自动创建。如果为了避免误操作，可以关闭此选项。
	action.destructive_requires_name 删除索引是是否需要明确指定名称，默认为 true，即不允许通过通配符删除索引。如果是自动按日期创建索引，需要定时通过通配符清理，可以关闭此选项。
	http.cors.enabled 是否开启cors，默认为 false。如果需要自己部署 Elasticsearch 的web管理界面（已经内置了kopf），则需要打开此设置项。
	index.mapper.dynamic 是否动态给数据类型创建mapper，默认为 true。详细说明请参看 Elasticsearch 官方文档 dynamic-mapping 。
	Discovery以及Recovery配置

	Elasticsearch 提供的 Discovery 以及 Recovery 配置我们会自动根据节点数进行计算设置，无需手动设置。

	discovery.zen.minimum_master_nodes 为 (len(nodes) / 2) + 1。比如3个节点的时候该值是2。但当节点数小于3的时候，该值为1。
	discovery.zen.ping.unicast.hosts 会设置为除了本机之外的集群内其他所有节点。
	gateway.recover_after_nodes 为 max(discovery.zen.minimum_master_nodes, len(nodes)-2)。
	gateway.expected_nodes 为 len(nodes)，和集群内的节点数保持相等。
	gateway.recover_after_time 保持 Elasticsearch 的默认值5m，未做变更。
	以上设置对集群的可用性也有影响。比如5个节点的集群，minimum_master_nodes 为3，如果挂掉2台，集群还能正常运作，但如果挂掉3台，存活节点小于 minimum_master_nodes，集群无法选出 master，于是不能正常运作。

	Recovery相关设置只对集群启动有影响（比如停止整个集群后再启动）。比如10个节点的集群，按照上面的规则配置，当集群重启后，首先系统等待 minimum_master_nodes（6）个节点加入才会选出 master， recovery 操作是在 master 节点上进行的，由于我们设置了 recover_after_nodes（8），系统会继续等待到 8 个节点加入，才开始进行recovery。当开始 recovery 的时候，如果发现集群中的节点数小于 expected_nodes，也就是还有部分节点未加入，于是开始recover_after_time 倒计时 (如果节点数达到expected_nodes则立刻进行 recovery)，5分钟后，如果剩余的2个节点依然没有加入，则会进行数据 recovery。