Craig Lebowitz lebowitz

## install-elasticsearch-centos.sh
# install missing libraries (if any)
cd ~
sudo yum update
yum install java-1.7.0-openjdk.x86_64
yum install unzip
yum install mc
yum install wget
yum install curl

# get and unpack elasticsearch zip file

## robocopy-backup
robocopy /b /e /xa:s /xjd /sl /a-:hs /mt /v /fp /eta /log:"D:\To\Directory\transfer.log" /tee "C:\From\Directory" "D:\To\Directory"

(Note that the paths don't have a trailing backslash.)

/b -- backup mode (there's a /zb option for restart mode, but it's a whole lot slower)
/e -- copies subdirectories (including empty directories) in addition to files
/xa:s -- exclude system files
/xjd -- exclude junction points
/sl -- copy symbolic links as links
/a-:hs -- remove hidden/system attributes from files

## 20130416-todo.md

      
              4 files
            
          
              40 forks
            
          
              0 comments
            
          
              74 stars
            
          
                mrflip
                / 20130416-todo.md
            
            
              Last active
              January 21, 2024 21:06
            
              
                Elasticsearch Tuning Plan
              
          
    Next Steps


Measure time spend on index, flush, refresh, merge, query, etc. (TD - done)
Take hot threads snapshots under read+write, read-only, write-only (TD - done)
Adjust refresh time to 10s (from 1s) and see how load changes (TD)
Measure time of a rolling restart doing disable_flush and disable_recovery (TD)
Specify routing on query -- make it choose same node for each shard each time (MD)
GC new generation size (TD)
Warmers


measure before/after of client query time with and without warmers (MD)


## DSL Examples
# simple match all query with term facet
ejs.Request()
    .indices("myindex")
    .types("mytype")
    .query(ejs.MatchAllQuery())
    .facet(
        ejs.TermsFacet('url')
            .field('url')
            .size(20))

## backup.sh
# Script to be placed in elasticsearch/bin
# Launch it from elasticsearch dir
# bin/backup indexname
# We suppose that data are under elasticsearch/data
# It will create a backup file under elasticsearch/backup

if [ -z "$1" ]; then
  INDEX_NAME="dummy"
else
  INDEX_NAME=$1

## elasticsearch.sh
#!/bin/bash

NAME=elasticsearch
PREFIX=/usr/local
ES_HOME=$PREFIX/$NAME

install() {
	v=$1;
	echo "Downloading $NAME $v...";
	file="$NAME-$v.tar.gz";

## elasticsearch_best_practices.txt
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so

## ios-8-web-app.html
<!doctype html>

<!-- http://taylor.fausak.me/2015/01/27/ios-8-web-apps/ -->

<html>
  <head>
    <title>iOS 8 web app</title>

    <!-- CONFIGURATION -->

## .gitignore
.DS_Store
Gemfile.lock
*.pem
node.json
tmp/*
!tmp/.gitignore

## backup.sh
# TO_FOLDER=/something
# FROM=/your-es-installation

DATE=`date +%Y-%m-%d_%H-%M`
TO=$TO_FOLDER/$DATE/
echo "rsync from $FROM to $TO"
# the first times rsync can take a bit long - do not disable flusing
rsync -a $FROM $TO

# now disable flushing and do one manual flushing
	# install missing libraries (if any)
	cd ~
	sudo yum update
	yum install java-1.7.0-openjdk.x86_64
	yum install unzip
	yum install mc
	yum install wget
	yum install curl

	# get and unpack elasticsearch zip file
	robocopy /b /e /xa:s /xjd /sl /a-:hs /mt /v /fp /eta /log:"D:\To\Directory\transfer.log" /tee "C:\From\Directory" "D:\To\Directory"

	(Note that the paths don't have a trailing backslash.)

	/b -- backup mode (there's a /zb option for restart mode, but it's a whole lot slower)
	/e -- copies subdirectories (including empty directories) in addition to files
	/xa:s -- exclude system files
	/xjd -- exclude junction points
	/sl -- copy symbolic links as links
	/a-:hs -- remove hidden/system attributes from files
	# simple match all query with term facet
	ejs.Request()
	.indices("myindex")
	.types("mytype")
	.query(ejs.MatchAllQuery())
	.facet(
	ejs.TermsFacet('url')
	.field('url')
	.size(20))
	# Script to be placed in elasticsearch/bin
	# Launch it from elasticsearch dir
	# bin/backup indexname
	# We suppose that data are under elasticsearch/data
	# It will create a backup file under elasticsearch/backup

	if [ -z "$1" ]; then
	INDEX_NAME="dummy"
	else
	INDEX_NAME=$1
	#!/bin/bash

	NAME=elasticsearch
	PREFIX=/usr/local
	ES_HOME=$PREFIX/$NAME

	install() {
	v=$1;
	echo "Downloading $NAME $v...";
	file="$NAME-$v.tar.gz";
	If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

	- Use create in the index API (assuming you can).
	- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
	- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
	- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
	- Increase the memory allocated to elasticsearch node. By default its 1g.
	- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
	- Increase the number of machines you have so
	<!doctype html>

	<!-- http://taylor.fausak.me/2015/01/27/ios-8-web-apps/ -->

	<html>
	<head>
	<title>iOS 8 web app</title>

	<!-- CONFIGURATION -->
	# TO_FOLDER=/something
	# FROM=/your-es-installation

	DATE=`date +%Y-%m-%d_%H-%M`
	TO=$TO_FOLDER/$DATE/
	echo "rsync from $FROM to $TO"
	# the first times rsync can take a bit long - do not disable flusing
	rsync -a $FROM $TO

	# now disable flushing and do one manual flushing