Skip to content

Instantly share code, notes, and snippets.

@penguinco
Last active August 29, 2015 13:56
Show Gist options
  • Save penguinco/8860850 to your computer and use it in GitHub Desktop.
Save penguinco/8860850 to your computer and use it in GitHub Desktop.
elasticsearch session memo
@naokiainoya from リクルートテクノロジーズ
tech stack: AWS, node.js, elaticsearch, dynamo db
build own push notification service.
too detailed talk…
summary:
Elasticsearch used in (flexible)rule-based notification routing.
requirements:
- can handle many notification. (tens of millions notification)
- scan feature (they need full search result(=full candidates to be notified))
Q/A:
- how to autoscale elasticsearch cluster?
- node fail during rebalance will be problematic.
kentaro yoshida from リブセンス
elasticsearch experience: from 2013 summer
theme: switching to elasticsearch with minimum effort.
- use elasticsearch-river-jdbc
- caveat: unstable
So they creates yamabiko(original river).
- it let you reliable sync MySQL/amazon RDS/MariaDV/PerconaServer to elasticsearch.
yamabiko's unique feature:
- detect (MySQL) record deletion.
demo: yamabiko and head-plugin
and also describing geo search, mapping, kuromoji.
caution: dynamic mapping will problematic.
why they loves elasticsearch:
- official rpm was provided.
- rest-api
- facet
- array field is useful for tag search
- flexible indexing
caveat:
- Query DSL will confusing.
- less flexible than SQL(e.g. group_by join )
- etc…
@tady_jp from zigen
they use for search engine. not for kibana.
elasticsearch benefits:
- well automated sharding/replication
- easy to scale.
- active plugin dev community.
elasticsearch information sources:
- documentation
- books in english
- stack overflow
what modern search engine is:
- including full text search
- multi column index
- boosting
- facet
- grouping
- suggester
- autocomplete
- geo search
search function often treat as optional in website.
but if you add search, it will ploblematic.
- poor ranking
- poor matching
- etc...
So we testing or ensuring search result quality by spec:
tool: elasticsearch-ruby gem, rails
demo: how to build full-featured restaurant search.
including
- mapping
- analyzer
- kuromoji
tips:
- set doc['id'] to _id field
- use multi_field
- stored field required by highlighter
demo: testing with elasticsearch instead of stubbing elasticsearch.
- 東京都(tokyo prefecture) should not matched for 京都(kyoto prefecture)
- prepare example doc in test db
- querying
- check result.
- it will passed if you use kuromoji instead of ngram.
- ensure ranking(boosting)
- prepare example doc in test db
- querying
- check result doc *order*.
- it will passed if you tweak boost value.
etc…(suggester)
=> several requirement can be tested with spec!
toyama-san from ipros
- elastic search in production
- performance tuning
- routing is important
- how to grab logs from fluentd
intro:
environment: AWS git Elasticsearch
using plugging elasticsearch-cloud-aws
demonstrate how easy to use that plugin.
note: distinguish cluster by ec2 instance tag.(not security group)
describing multicast disabled in AWS to ensure why they use plugin.
discovery.zen.ping.multicast.enabled: false (recommended)
routing:
very important for performance sake.
balancing doc by routing. (they balanced by user_id)
and querying with routing lets you reduce shard access(=avoid full-scan).
routing across types.(routing strategy is same in all types.)
it is good because sequential queries by a user will be related to same user_id.
logstash:
fit into time series
- easy to backup
- short term logging(weeks or months)
not fit into yearly level logging
split in index level pros/cons
…(i can't keep up this part… too fast speech…)
request: improve backup functions.
fluentd -> elastic search
- use td-logger(ruby gem)
- all log come through fluentd
fluentd benefits:
- built-in retry(elasticsearch can down/unreachable/gc pause.)
note: use record_reformer(fluentd plugin) to convert time field
OR mapping:
- use tire(ruby gem)
- same interface of active-record
- if you start at now, you should use other gem because tire renames to retire.
they handles 100GB logs to search.
Q/A
- they use m1.large x2
- how to delete old data?
- add only. add machines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment