penguinco/@naokiainoya-san

## @naokiainoya-san
@naokiainoya from リクルートテクノロジーズ

tech stack: AWS, node.js, elaticsearch, dynamo db

build own push notification service.

too detailed talk…

summary:

Elasticsearch used in (flexible)rule-based notification routing.

requirements:
- can handle many notification. (tens of millions notification)
- scan feature (they need full search result(=full candidates to be notified))


Q/A:

- how to autoscale elasticsearch cluster?
 - node fail during rebalance will be problematic.

## kentaro yoshida-san
kentaro yoshida from リブセンス

elasticsearch experience: from 2013 summer

theme: switching to elasticsearch with minimum effort.

- use elasticsearch-river-jdbc
- caveat: unstable

So they creates yamabiko(original river).

- it let you reliable sync MySQL/amazon RDS/MariaDV/PerconaServer to elasticsearch.

yamabiko's unique feature:
 - detect (MySQL) record deletion.

demo: yamabiko and head-plugin

and also describing geo search, mapping, kuromoji.
caution: dynamic mapping will problematic.

why they loves elasticsearch:
- official rpm was provided.
- rest-api
- facet
- array field is useful for tag search
- flexible indexing

caveat:
- Query DSL will confusing.
- less flexible than SQL(e.g. group_by join )
- etc…


## tady_jp-san
@tady_jp from zigen

they use for search engine. not for kibana.

elasticsearch benefits:
- well automated sharding/replication
- easy to scale.
- active plugin dev community.

elasticsearch information sources:

- documentation
- books in english
- stack overflow

what modern search engine is:
 - including full text search
 - multi column index
 - boosting
 - facet
 - grouping
 - suggester
 - autocomplete
 - geo search

search function often treat as optional in website.
but if you add search, it will ploblematic.
 - poor ranking
 - poor matching
 - etc...

So we testing or ensuring search result quality by spec:

tool: elasticsearch-ruby gem, rails
demo: how to build full-featured restaurant search.

including
- mapping
- analyzer
- kuromoji

tips:
- set doc['id'] to _id field
- use multi_field
- stored field required by highlighter

demo: testing with elasticsearch instead of stubbing elasticsearch.

- 東京都(tokyo prefecture) should not matched for 京都(kyoto prefecture)
 - prepare example doc in test db
 - querying
 - check result.
 - it will passed if you use kuromoji instead of ngram.
- ensure ranking(boosting)
 - prepare example doc in test db
 - querying
 - check result doc *order*.
 - it will passed if you tweak boost value.

etc…(suggester)

=> several requirement can be tested with spec!


## toyama-san
toyama-san from ipros

- elastic search in production
 - performance tuning
 - routing is important
 - how to grab logs from fluentd

intro:

environment: AWS git Elasticsearch

using plugging elasticsearch-cloud-aws
demonstrate how easy to use that plugin.
note: distinguish cluster by ec2 instance tag.(not security group)

describing multicast disabled in AWS to ensure why they use plugin.

discovery.zen.ping.multicast.enabled: false (recommended)

routing:

very important for performance sake.
balancing doc by routing. (they balanced by user_id)
and querying with routing lets you reduce shard access(=avoid full-scan).
routing across types.(routing strategy is same in all types.)
it is good because sequential queries by a user will be related to same user_id.

logstash:

fit into time series
 - easy to backup
 - short term logging(weeks or months)
not fit into yearly level logging

split in index level pros/cons
…(i can't keep up this part… too fast speech…)

request: improve backup functions.


fluentd -> elastic search

- use td-logger(ruby gem)
- all log come through fluentd
fluentd benefits:
 - built-in retry(elasticsearch can down/unreachable/gc pause.)

note: use record_reformer(fluentd plugin) to convert time field

OR mapping:

- use tire(ruby gem)
- same interface of active-record

- if you start at now, you should use other gem because tire renames to retire.

they handles 100GB logs to search.


Q/A

- they use m1.large x2

- how to delete old data?
 - add only. add machines.
	@naokiainoya from リクルートテクノロジーズ

	tech stack: AWS, node.js, elaticsearch, dynamo db

	build own push notification service.

	too detailed talk…

	summary:

	Elasticsearch used in (flexible)rule-based notification routing.

	requirements:
	- can handle many notification. (tens of millions notification)
	- scan feature (they need full search result(=full candidates to be notified))


	Q/A:

	- how to autoscale elasticsearch cluster?
	- node fail during rebalance will be problematic.
	kentaro yoshida from リブセンス

	elasticsearch experience: from 2013 summer

	theme: switching to elasticsearch with minimum effort.

	- use elasticsearch-river-jdbc
	- caveat: unstable

	So they creates yamabiko(original river).

	- it let you reliable sync MySQL/amazon RDS/MariaDV/PerconaServer to elasticsearch.

	yamabiko's unique feature:
	- detect (MySQL) record deletion.

	demo: yamabiko and head-plugin

	and also describing geo search, mapping, kuromoji.
	caution: dynamic mapping will problematic.

	why they loves elasticsearch:
	- official rpm was provided.
	- rest-api
	- facet
	- array field is useful for tag search
	- flexible indexing

	caveat:
	- Query DSL will confusing.
	- less flexible than SQL(e.g. group_by join )
	- etc…
	@tady_jp from zigen

	they use for search engine. not for kibana.

	elasticsearch benefits:
	- well automated sharding/replication
	- easy to scale.
	- active plugin dev community.

	elasticsearch information sources:

	- documentation
	- books in english
	- stack overflow

	what modern search engine is:
	- including full text search
	- multi column index
	- boosting
	- facet
	- grouping
	- suggester
	- autocomplete
	- geo search

	search function often treat as optional in website.
	but if you add search, it will ploblematic.
	- poor ranking
	- poor matching
	- etc...

	So we testing or ensuring search result quality by spec:

	tool: elasticsearch-ruby gem, rails
	demo: how to build full-featured restaurant search.

	including
	- mapping
	- analyzer
	- kuromoji

	tips:
	- set doc['id'] to _id field
	- use multi_field
	- stored field required by highlighter

	demo: testing with elasticsearch instead of stubbing elasticsearch.

	- 東京都(tokyo prefecture) should not matched for 京都(kyoto prefecture)
	- prepare example doc in test db
	- querying
	- check result.
	- it will passed if you use kuromoji instead of ngram.
	- ensure ranking(boosting)
	- prepare example doc in test db
	- querying
	- check result doc order.
	- it will passed if you tweak boost value.

	etc…(suggester)

	=> several requirement can be tested with spec!
	toyama-san from ipros

	- elastic search in production
	- performance tuning
	- routing is important
	- how to grab logs from fluentd

	intro:

	environment: AWS git Elasticsearch

	using plugging elasticsearch-cloud-aws
	demonstrate how easy to use that plugin.
	note: distinguish cluster by ec2 instance tag.(not security group)

	describing multicast disabled in AWS to ensure why they use plugin.

	discovery.zen.ping.multicast.enabled: false (recommended)

	routing:

	very important for performance sake.
	balancing doc by routing. (they balanced by user_id)
	and querying with routing lets you reduce shard access(=avoid full-scan).
	routing across types.(routing strategy is same in all types.)
	it is good because sequential queries by a user will be related to same user_id.

	logstash:

	fit into time series
	- easy to backup
	- short term logging(weeks or months)
	not fit into yearly level logging

	split in index level pros/cons
	…(i can't keep up this part… too fast speech…)

	request: improve backup functions.


	fluentd -> elastic search

	- use td-logger(ruby gem)
	- all log come through fluentd
	fluentd benefits:
	- built-in retry(elasticsearch can down/unreachable/gc pause.)

	note: use record_reformer(fluentd plugin) to convert time field

	OR mapping:

	- use tire(ruby gem)
	- same interface of active-record

	- if you start at now, you should use other gem because tire renames to retire.

	they handles 100GB logs to search.


	Q/A

	- they use m1.large x2

	- how to delete old data?
	- add only. add machines.