Skip to content

Instantly share code, notes, and snippets.

Last active August 29, 2015 13:56
Show Gist options
  • Save penguinco/8860850 to your computer and use it in GitHub Desktop.
Save penguinco/8860850 to your computer and use it in GitHub Desktop.
elasticsearch session memo
@naokiainoya from リクルートテクノロジーズ
tech stack: AWS, node.js, elaticsearch, dynamo db
build own push notification service.
too detailed talk…
Elasticsearch used in (flexible)rule-based notification routing.
- can handle many notification. (tens of millions notification)
- scan feature (they need full search result(=full candidates to be notified))
- how to autoscale elasticsearch cluster?
- node fail during rebalance will be problematic.
kentaro yoshida from リブセンス
elasticsearch experience: from 2013 summer
theme: switching to elasticsearch with minimum effort.
- use elasticsearch-river-jdbc
- caveat: unstable
So they creates yamabiko(original river).
- it let you reliable sync MySQL/amazon RDS/MariaDV/PerconaServer to elasticsearch.
yamabiko's unique feature:
- detect (MySQL) record deletion.
demo: yamabiko and head-plugin
and also describing geo search, mapping, kuromoji.
caution: dynamic mapping will problematic.
why they loves elasticsearch:
- official rpm was provided.
- rest-api
- facet
- array field is useful for tag search
- flexible indexing
- Query DSL will confusing.
- less flexible than SQL(e.g. group_by join )
- etc…
@tady_jp from zigen
they use for search engine. not for kibana.
elasticsearch benefits:
- well automated sharding/replication
- easy to scale.
- active plugin dev community.
elasticsearch information sources:
- documentation
- books in english
- stack overflow
what modern search engine is:
- including full text search
- multi column index
- boosting
- facet
- grouping
- suggester
- autocomplete
- geo search
search function often treat as optional in website.
but if you add search, it will ploblematic.
- poor ranking
- poor matching
- etc...
So we testing or ensuring search result quality by spec:
tool: elasticsearch-ruby gem, rails
demo: how to build full-featured restaurant search.
- mapping
- analyzer
- kuromoji
- set doc['id'] to _id field
- use multi_field
- stored field required by highlighter
demo: testing with elasticsearch instead of stubbing elasticsearch.
- 東京都(tokyo prefecture) should not matched for 京都(kyoto prefecture)
- prepare example doc in test db
- querying
- check result.
- it will passed if you use kuromoji instead of ngram.
- ensure ranking(boosting)
- prepare example doc in test db
- querying
- check result doc *order*.
- it will passed if you tweak boost value.
=> several requirement can be tested with spec!
toyama-san from ipros
- elastic search in production
- performance tuning
- routing is important
- how to grab logs from fluentd
environment: AWS git Elasticsearch
using plugging elasticsearch-cloud-aws
demonstrate how easy to use that plugin.
note: distinguish cluster by ec2 instance tag.(not security group)
describing multicast disabled in AWS to ensure why they use plugin. false (recommended)
very important for performance sake.
balancing doc by routing. (they balanced by user_id)
and querying with routing lets you reduce shard access(=avoid full-scan).
routing across types.(routing strategy is same in all types.)
it is good because sequential queries by a user will be related to same user_id.
fit into time series
- easy to backup
- short term logging(weeks or months)
not fit into yearly level logging
split in index level pros/cons
…(i can't keep up this part… too fast speech…)
request: improve backup functions.
fluentd -> elastic search
- use td-logger(ruby gem)
- all log come through fluentd
fluentd benefits:
- built-in retry(elasticsearch can down/unreachable/gc pause.)
note: use record_reformer(fluentd plugin) to convert time field
OR mapping:
- use tire(ruby gem)
- same interface of active-record
- if you start at now, you should use other gem because tire renames to retire.
they handles 100GB logs to search.
- they use m1.large x2
- how to delete old data?
- add only. add machines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment