Skip to content

Instantly share code, notes, and snippets.

View geekpete's full-sized avatar
💭
Equilibrious

Peter Dyson geekpete

💭
Equilibrious
View GitHub Profile

Next Steps

  • Measure time spend on index, flush, refresh, merge, query, etc. (TD - done)
  • Take hot threads snapshots under read+write, read-only, write-only (TD - done)
  • Adjust refresh time to 10s (from 1s) and see how load changes (TD)
  • Measure time of a rolling restart doing disable_flush and disable_recovery (TD)
  • Specify routing on query -- make it choose same node for each shard each time (MD)
  • GC new generation size (TD)
  • Warmers
  • measure before/after of client query time with and without warmers (MD)
curl -XDELETE localhost:9200/nested_aggs
curl -XPUT localhost:9200/nested_aggs
curl -XPUT localhost:9200/nested_aggs/user/_mapping -d '
{
"_id" : {"index": "not_analyzed", "path" : "userId"},
"properties": {
"userId": {"type": "string", "index": "not_analyzed"},
"groups": {
"type": "nested",
"properties": {

Next Steps

  • Measure time spend on index, flush, refresh, merge, query, etc. (TD - done)
  • Take hot threads snapshots under read+write, read-only, write-only (TD - done)
  • Adjust refresh time to 10s (from 1s) and see how load changes (TD)
  • Measure time of a rolling restart doing disable_flush and disable_recovery (TD)
  • Specify routing on query -- make it choose same node for each shard each time (MD)
  • GC new generation size (TD)
  • Warmers
  • measure before/after of client query time with and without warmers (MD)
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:
- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so
@geekpete
geekpete / gist:dbbf6e89035f3b41d93c
Created October 29, 2015 06:25 — forked from garnaat/gist:1443559
Using get_all_instance_status method in boto
>>> import boto
>>> ec2 = boto.connect_ec2()
>>> stats = ec2.get_all_instance_status()
>>> stats
[InstanceStatus:i-67c81e0c]
>>> stat = stats[0]
>>> stat
InstanceStatus:i-67c81e0c
>>> stat.id
u'i-67c81e0c'
@geekpete
geekpete / install-comodo-ssl-cert-for-nginx.rst
Created February 12, 2016 04:04 — forked from bradmontgomery/install-comodo-ssl-cert-for-nginx.rst
Steps to install a Comodo PositiveSSL certificate with Nginx.

Setting up a SSL Cert from Comodo

I use Namecheap.com as a registrar, and they resale SSL Certs from a number of other companies, including Comodo.

These are the steps I went through to set up an SSL cert.

Purchase the cert

@geekpete
geekpete / es-metrics.md
Created May 5, 2016 00:54
Metrics to Monitor

Elasticsearch has many metrics that can be used to determine if a cluster is healthy. Listed below are the metrics that are currently a good idea to monitor with the reason(s) why they should be monitored and any possible recourse for issues.

Version

Unless otherwise noted, all of the API requests work starting with 1.0.0. If a newer version is required for a given metric, then it is noted by the metric's name.

Metrics

Metrics are an easy way to monitor the health of a cluster and they can be easily accessed from the HTTP API. Each Metrics table is broken down by their source.

@geekpete
geekpete / gist:570c99949991a29a44c86798e487b423
Created May 10, 2016 02:39 — forked from athoune/gist:5777474
Pushing mails to Elastic Search for a Kibana analysis.
#!/usr/bin/env python
import sys
# Lamson is an application, but also the best way to read email without
# struggling with "battery include" libraries.
from lamson.encoding import from_string as parse_mail
from pyelasticsearch import ElasticSearch
from pyelasticsearch.exceptions import ElasticHttpNotFoundError
@geekpete
geekpete / backblaze b2 backup script
Created June 9, 2016 23:32 — forked from scottlinux/backblaze b2 backup script
Backup script for backblaze b2
#!/usr/bin/env bash
#
# Backup selected directories to a Backblaze B2 bucket
#
# Example daily cron:
# @daily /usr/local/bin/b2backup >/dev/null
#
# Account creds
id=xxxxxxxxxx
@geekpete
geekpete / README.md
Created June 24, 2016 02:12 — forked from miguelmota/README.md
Multiple accounts with Mutt E-Mail Client
How to set up multiple accounts with Mutt E-mail Client

Thanks to this article by Christoph Berg

Instructions

Directories and files

~/